Environ. Res. Commun. 5 (2023) 115014 https://doi.org/10.1088/2515-7620/acf81b

PAPER

How to estimate carbon footprint when training deep learning
models? A guide and review

Lucía Bouza1, Aurélie Bugeau2,3 and Loïc Lannelongue4,5,6,7

1 Université Paris Cité, CNRS,MAP5UMR8145, 75006, Paris, France
2 Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Talence, France
3 IUF, France
4 Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge,
United Kingdom

5 British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge,
Cambridge, United Kingdom

6 Victor PhillipDahdalehHeart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom
7 HealthData ResearchUKCambridge,WellcomeGenomeCampus andUniversity of Cambridge, Cambridge, UnitedKingdom

E-mail: aurelie.bugeau@u-bordeaux.fr

Keywords:AI carbon footprint,measuring electrical consumption of AI, environmental impacts of deep learning

Abstract
Machine learning and deep learningmodels have become essential in the recent fast development of
artificial intelligence inmany sectors of the society. It is nowwidely acknowledge that the development
of thesemodels has an environmental cost that has been analyzed inmany studies. Several online and
software tools have been developed to track energy consumptionwhile trainingmachine learning
models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI
practitioners wishing to start estimating the environmental impact of their work.We review the
specific vocabulary, the technical requirements for each tool.We compare the energy consumption
estimated by each tool on two deep neural networks for image processing and on different types of
servers. From these experiments, we provide some advice for better choosing the right tool and
infrastructure.

1. Introduction

Deep learning has beenwidely used in every sector of the society for a few years. A search of Scopus shows that it
went from about 1,350 research papers in 2015 tomore than 85,000 in 2022. Results obtained in every domain
are impressive, andAI is a promising tool for tackling environmental challenges in particular (Rolnick et al 2019,
Vinuesa et al 2020, Kar et al 2022). But it is also nowwidely documented that training and deploying deep
learning projects has an impact on the environment (Strubell et al 2019, Gupta et al 2022, 2020, Ligozat et al
2022, Kaack et al 2021, Lannelongue and Inouye 2023, Bannour et al 2021, Thompson et al 2020,Dodge et al
2022,Henderson et al 2020a). These studies have assessed energy consumption and corresponding amount of
greenhouse gas emissions (in CO2 equivalent, denoted asCO2eq) from computer calculations when training a
deep learning program, and showed that recent large languagemodels can be responsible for hundreds of tonnes
of CO2eq (Luccioni et al 2022), whereas, for context, a limit of 2 tCO2eq/person/year is what is needed to keep
global warming under 1.5 °C (Arias et al 2021).

Some studies have also compared existing estimation tools (Bannour et al 2021, Lannelongue and
Inouye 2023, Jay et al 2023).

Despite thesemany studies, whenAI practitioners wish to start estimating their environmental impact, they
may face several difficulties. Depending on their backgrounds, itmight be difficult for them to get used to the
hardware-related vocabulary, knowhow to use the estimation tools (locally or on servers), and determinewhich
tool is best suited for their current use-case. This document aims to address these and ease the process of energy
consumptionmeasurement for AI practitioners. It can be used as a guide tomeasure the energy consumption

OPEN ACCESS

RECEIVED

27 June 2023

REVISED

23August 2023

ACCEPTED FOR PUBLICATION

8 September 2023

PUBLISHED

21November 2023

Original content from this
workmay be used under
the terms of the Creative
CommonsAttribution 4.0
licence.

Any further distribution of
this workmustmaintain
attribution to the
author(s) and the title of
thework, journal citation
andDOI.

© 2023TheAuthor(s). Published by IOPPublishing Ltd

https://doi.org/10.1088/2515-7620/acf81b
https://orcid.org/0000-0002-4858-4944
https://orcid.org/0000-0002-4858-4944
mailto:aurelie.bugeau@u-bordeaux.fr
https://crossmark.crossref.org/dialog/?doi=10.1088/2515-7620/acf81b&domain=pdf&date_stamp=2023-11-21
https://crossmark.crossref.org/dialog/?doi=10.1088/2515-7620/acf81b&domain=pdf&date_stamp=2023-11-21
http://creativecommons.org/licenses/by/4.0
http://creativecommons.org/licenses/by/4.0
http://creativecommons.org/licenses/by/4.0


and associated greenhouse gas emissionwhen training deep learning algorithms and althoughwhat will be
explained can be applied to other types of algorithms and other infrastructures, wewill focus on training deep-
learningmodels in different types of infrastructures.

In this context, this documentmakes the following contributions:

• We review existing tools formeasuring or estimating the energy consumption of computations, and explain
the specific notions that are not always knownbyAI practitioners. It goes further than previous
surveys (Bannour et al 2021, Lannelongue and Inouye 2023, Jay et al 2023) in providing details aboutwhat is
measured by each tool and onwhich infrastructure they can be used, themeasurement process, howusage
factor is being used, default values, and the source of information that are used. These informations are crucial
to correctly interpreting the data obtained.

• We test and compare these different approaches usingwattmeters to assess their accuracy.We also quantify
the energy consumption of the estimation tools themselves.

• We run a range of experiments to analyze the influence of key hyperparameters such as batch size, data load,
checkpoints and epochs. These lead to a set of recommendations on how andwhen to use these tools
depending on the infrastructure available to train themodels. For instance, we show that it seems possible to
onlymeasure part of training and extrapolate to avoid the small extra consumption from energy
measurement.We also show that batch size can influence energy consumption. The recommendations
complete previous works that intended tomakemachine learning researchers better understand their carbon
impact and to take steps tomitigate it (Ligozat and Luccioni 2021,Dodge et al 2022).

The seven different tools that we study are: Green-Algorithms (Lannelongue et al 2020) (GA), CodeCarbon
(Lottick et al 2019) (CC (P) for process, CC (M) formachine), Eco2AI (Budennyy et al 2022) (E2 (P) for process,
E2 (M) formachine), CarbonTracker (Anthony et al 2020) (CT), Experiment-Impact-Tracker (Henderson et al
Henderson 2020a) (EIT),MLCO2 (Lacoste et al 2019) andCumulator (Trebaol et al 2020) (CMLTRs).

We use the following infrastructures, all located in France, for trainingmodels: Labri servers (institutional
server),MAP5 servers (institutional server), Grid5000 distributed cluster and personal computers.Mentionwill
also bemade of theGoogle Colab environment.

In the Labri servers, personal computer and inGrid5000 there are wattmeters (WM), which can provide real
information on the consumption of energy of the infrastructure in a given period.

We focus on twomachine learning experiments, both for image processing. In the first one, a small neural
network is trained for digit classification on theMNIST dataset (Deng 2012). This experiment is short,
approximately 1minute. In the second, aDNCNNnetwork is trained for noisy image denoising. The training is
carried outwith the Imagenet validation dataset (Deng et al 2009). This experiment is longer, approximately 2 h.

Figures 1 and 2 summarize the energy consumption of the different tools in thefive tested infrastructure. As
we detail in this guide, the high variability comes from the different goals of the different tools, some estimate the
power consumption of the entiremachinewhile others focus on a particular process. The idle power
consumption is also accounted for differently, alongside usage factors, CPUs versusGPUs etc

The document is organized as follows. Users already familiar with carbon footprint estimationmay directly
jump to section 5 for the results. Section 2 reviews previous publications in this field. Section 3 details the
specificities of each tool: energy consumption of each hardware components and their communications, power
usage effectiveness and emission intensity. Section 4 details the type of infrastructures that are typically used to
trainAImodels andwhat tools can be used for each. Section 5 presents the experimental setup and an analysis of
the results. Discussions on the results presented and recommendations onwhen and how to estimate all
environmental impacts end this guide in section 6. Finally, errors reported and found in the tools are added in
the appendix.

2. Relatedworks

Only recently estimation tools have beenmade available and consequently, few studies have compared and
analyzed existing strategies formeasuring energy consumption of deep learning projects.

The authors of (Bannour et al 2021) reviewed six tools (CarbonTracker, Experiment-impact tracker, Green-
Algorithms,MLCO2, energy usage andCumulator) that are available tomeasure energy use andCO2eq
emissions in the context of natural language processing. They compared the tools according to publication
details, technical criteria (availability, online, easy-of-use, documentation etc), configuration criteria
(specification of carbon intensity, PUE, install dependent, etc) and functional criteria (idle power and

2

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://www.green-algorithms.org
https://codecarbon.io
https://github.com/sb-ai-lab/Eco2AI
https://github.com/lfwa/carbontracker/tree/master
https://github.com/Breakend/experiment-impact-tracker
https://mlco2.github.io/impact/
https://github.com/epfl-iglobalhealth/cumulator


communication between hardware). The authors observed a two-fold variation in estimates between tools and
concluded that further studies are needed to better understand these tools and estimate broader impacts.

In the same line of research, the authors of (Jay et al 2023) compared some tools on server nodes, not all
specifically designed for deep learning and therefore not all integratingGPUs. They categorized tools between
external and internal node sensors, power profiling software, energymeasurement software packages and online
energy calculators. They looked at publication year, environment criteria (hardware compatibility,
virtualization, etc), functional criteria (hardware compatibility, software powermodel, sampling frequency,
reporting and profiling), and user-friendliness. They tested each tool on the same server nodes and compared
themwith external powermeters. The authors drew some recommendations from this study: tomonitor power
consumption in real time, it is better to use power profiling software, but they do notmeasureGPUs
consumption; relationship between energymeasurement software tools and powermeter is not constant, so
software tools are not perfectly accurate.

Finally, (Lannelongue and Inouye 2023) provided general guidelines about the strength andweaknesses of
different types of estimation tools, namely online calculators, embedded packages and server-side tools. The
criteria that are discussed are compatibility with any hardware, any programming language, research field, some
ease of use criteria and scalability with number of jobs and long periods of time.

Figure 1.Energy consumption inWhof the differentmethods over the 5 different infrastructures for thefirst experiment. For the
tools that do not provide detail for CPU/GPU/Memory consumption, the total energy reported is plotted.

Figure 2.Energy consumption in kWhof the differentmethods over the 5 different infrastructures for the second experiment. For the
tools that do not provide detail for CPU/GPU/Memory consumption, the total energy reported is plotted.

3

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


The different tools discussed above focus on energy consumption during the training phase of AImodels,
which only constitute part of the broader environmental impacts of AI (Gupta et al 2022, 2020, Ligozat et al 2022,
Kaack et al 2021, Lannelongue and Inouye 2023). In this context, the authors of (Luccioni et al 2022) later
included embodied impacts as well as emissions from static infrastructure and deployment when studying
BLOOM, a large languagemodel.

3. Estimating greenhouse gas emissions

This section explains how toolsmeasure or estimate energy consumption andCO2eq emissions, fromPython
libraries integrated into the code (referred to as software tools), to web forms and physical wattmeasurement
devices connected to the infrastructure used. Some of these tools also have a server-side version, to be used in
HPC clusters and thus be able to collect informationmore easily to estimate energy consumption. Online tools
and server-side tools can be usedwithoutmodifying the code, and are independent of the programming
language used. Python libraries can only be used in Python code but enablemeasurements of the consumption
of different parts of the programs.Watt-measurement enablesmeasuring the consumption of thewhole node
but are not always available and can not isolate a paritcular process. Each tool has its ownway of estimating the
consumption of each component. A summary of the characteristics is shown in the table 7.

Themost power consuming devices on a personalmachine or a server are theGPUs (if present), CPUs, and
memory. There are other resources, such as storage or the network, that are generally not considered in software
measurements, since they do not provide a significant load over the duration of anAI task. Indeed, in regular use,
storage is typically solicited far less thanmemory and ismainly used as amore permanent record of the data,
independently of the task (Lannelongue et al 2020).

When themachine is in a data center, energy usage of all equipment that are necessary to power, cool and
maintain the datacenter should bemeasured as theymay account for an important amount of energy
consumption. This is done using the efficiency coefficient of the data center called power usage
effectiveness (PUE).

3.1. Energy consumption of each component
In this section, wewill see the different strategies used by the tools to estimate the energy consumed by the
different resources and estimate the consumption of the processes. Green-Algorithms andCodeCarbon are the
only Python tools that report the estimate of consumed or emissions, discriminated by each component:
memory, CPU andGPU.

A transversal concept to all resources is the usage factor. The usage factor of a resource refers to the
percentage of use that can be assigned to the process beingmeasured. For example, if the CPUpower is estimated
to 2W, but theCPUusage factor of the process was 50%, then the consumption of a one hour process is assumed
to be 1 kWh. If the usage factor is unknown, then 100%of the use of the resource is being assigned to the process,
when in fact theremay be other processes also using said resource.

During themeasured period, some tools query sensors or perform calculations to estimate power
consumption. Note that Lowermeasurements frequencymean fewermeasurements thatmay lead tomore
approximate results. By default, CodeCarbon performs thesemeasurements every 15 s. Eco2AI, CarbonTracker
and Experiment-Impact-Tracker takemeasurements every 10 s. Cumulator does not query sensors or
intermediatemeasurements to estimate energy consumption.

3.1.1. Energy consumed by CPU
There are twomethods used in the tools to estimate energy consumed byCPUs: usingCPU thermal design
power (TDP)provided by themanufacturer, or using software integrated tools (RAPL files or PowerGadget). A
provides explanations of these twomethodologies. Note hat software integrated toolsmay require privilege
permissions as summarized in section 4.1.We review in table 1 howCPUpower consumption ismeasured in AI
measurement tools.

3.1.2. Energy consumed byGPU
AswithCPUs, energy consumption forGPUs are computed either fromTDPs provided bymanufacturers or
from internal tools. The latter is donewith the pynvml library that only works forNvidia GPUs.We review in
table 2 howGPUpower consumption ismeasured inAImeasurement tools.

3.1.3. Energy consumed bymemory
According to (Hodak et al 2019)GPUs are responsible for around 70%of power consumption, CPU for 15%,
andRAM for 10%.

4

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


Some tools likeGreen-Algorithms consider that power consumption of RAMdepends strongly on the
availablememory, independently of thememory consumed (Karyakin and Salem 2017, Guo et al 2022), while
other tools like Eco2AI considers that it depends on the allocatedmemory by the process (Maevsky et al 2017).
We review in table 3 howmemory power consumption ismeasured inAImeasurement tools.

3.1.4. Energy consumed by communications
In ICT (Information andCommunication Technology), communications refer to the exchange of information
or data between two ormore nodes. Nodes can be any device that is connected to a network, including
computers, routers, servers, and evenmobile devices.Machine Learning algorithms typically involve the
exchange of data between nodes at various stages, such as during data generation, during training (parameter
updates across different nodes in the network), or while themodel is in production.

The only tool that estimates the cost of communications is Cumulator. Each time themodel sends a data file
to another node of the network, Cumulator records the size of the file which is communicated. The cost of
communication relies on the ‘1bytemodel’ of the Shift Project (The Shift Project 2019). The value from2017 is
6.894× 10−11 kWh/B.

3.2. PUE
PowerUsage Efficiency is the efficiency coefficient of the data center. If PUE is not given, we recommend
considering the 2022 average value of 1.55 (Uptime Institute 2022). For personal computers, PUE=1 as there are
no other large devices consuming power.We review in table 4 the PUEs used by each tools. All except Cumulator
report the total energy consumed, including PUE. To calculate this value for Cumulator, we can divide the

Table 1.Estimation of energy consumption for CPUs.

Green-algorithms

Energy uses themodel of CPUprovided by the user to pull the corresponding TDP from a database, or the user can input the TDP

manually. If TDP is unknown,GAuses an average of 12Wper core, but the paper does not explain this value. In this

model, a core power usage is assumed to be equal to the TDPdivided by the number of cores (if a chip has 2 cores and a
TDPof 50W, then the TDPper core is 25).

Usage factor uses usage factors if known, and assumes 100%usage if not.

Codecarbon

Energy uses RAPLfiles or PowerGadget to report CPU energy consumption (only for INTELCPUswith root access). The con-
sumption reported by RAPLfiles or PowerGadget represents the consumption of thewholemachine, and not only the

process. If CodeCarbon cannot find the software to track theCPUs, then the tool uses themodel of CPU to search in a list

the corresponding TDP. If themodel is unknown, it uses a TDPof 85W. The authors do no specify where is this value

taken from.

Usage factor Not computedwhen using RAPLfiles or PowerGadget

WhenTDP is used, CodeCarbon assumes that the average usage factor is 50%but this value is not explained and seems

arbitrary.

Eco2AI

Energy uses themodel of the CPU to search in a list the corresponding TDP. If TDP is unknown, it uses an average of

100W (Maevsky et al 2017).
Usage factor uses os and psutil pythonmodules to determine usage factor if the trackingmode current is set (default).

CarbonTracker

Energy uses RAPLfiles to report CPU energy consumption (only for INTELCPUswith root access).Without access to the RAPL

files, the tool will notmeasure CPU. CarbonTracker will work only if it canmeasure at least one component (CPUor

NVIDIAGPU).
Usage factor not computed. The power consumption values of the RAPL files are global to thewholemachine.

Experiment-Impact-Tracker (EIT)
Energy uses RAPLfiles to report CPU energy consumption (only for INTELCPUswith root access and Linux operating system)
Usage factor uses psutil pythonmodule to determine usage factor

MLCO2 does notmeasure CPUutilization.

Cumulator

Energy It is not possible tomeasureGPU andCPU components at the same time butCumulatormeasures CPUutilization by

default. It uses themodel of CPU to search in a list for the corresponding TDP. If TDP is unknown, it uses an average of

250W.This value is the one ofNvidia GeForceGTXTitanX, which is theGPUmodel in the IC cluster of the EPFL

Machine Learning andOptimization Laboratory (MLO). It considers just oneCPU.
Usage factor does not use usage factor.

5

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


Table 2.Estimation of energy consumption forGPUs.

Green-Algorithms

Energy uses themodel of GPU to search in a list the corresponding TDP. You can load the TDPof theGPU if themodel is not listed.

If TDP is unknown, it uses an average of 200W, but the paper does not explain the reason for choosing this value.

Usage factor GPUs usage factor is considered if known by the user. If not, GA considers 100%of usage.

CodeCarbon

Energy uses pynvml library (only forNVIDIAGPUs). CodeCarbon does notmeasure consumption of non-NVIDIAGPUs.

Usage factor not computed. The consumption reported by pynvml represents the consumption of thewholemachine, and not only the

process.

Eco2AI

Energy uses pynvml library (only forNVIDIAGPUs). Eco2AI does notmeasure consumption of non-NVIDIAGPUs.

Usage factor not computed. The consumption reported by pynvml represents the consumption of thewholemachine, and not only the

process.

CarbonTracker

Energy uses pynvml library (only forNVIDIAGPUs). CarbonTracker does notmeasure consumption of non-NVIDIAGPUs.

Usage factor not computed. The consumption reported by pynvml represents the consumption of thewholemachine, and not only the

process.

EIT

Energy uses nvidia-smi command line (only forNVIDIAGPUs). EIT does notmeasure consumption of non-NVIDIAGPUs.

Usage factor usesPopen to open a thread, execute the commandnvidia-smi-q-x, get the output in a xml, and parse it to get the

usage factor of theGPU.

MLCO2

Energy uses themodel of GPU to search in a list the corresponding TDP. It is not possible to load the TDPof theGPU if themodel is

not listed. In this case, it is necessary to do a pull request to add the value. It is not possible to choose the quantity ofGPUs.

Usage factor does not use usage factor. TheGPU is considered atmaximum load and this load is assumed to correspond to themeasured

process.

Cumulator

Energy uses themodel of GPU to search in a list the corresponding TDP. If TDP is unknown, it uses an average of 250W. It considers

just oneGPU.

Usage factor does not use usage factor. TheGPU is considered atmaximum load and this load is assumed to correspond to themeasured

process.

Table 3.Estimation of energy consumption formemory.

Green-Algorithms Energy consumption bymemory is 0.3725W/GBofmemory available (If we have all the servermemory available, it

will account for all the servermemory. If we are in anHPC cluster, it will account only for the amount ofmemory

requested, regardless of howmuch the process consumes). The value 0.3725was obtained experimentally.a

CodeCarbon Energy consumption bymemory is 0.375W/GBofmemory used.b If trackingmode is ‘process’, thememory used by

the process ismeasured via psutil.

Eco2AI Energy consumption ofmemory is 0.375W/GBofmemory used (Maevsky et al 2017).Memory used by the process is

measured via psutil.

CarbonTracker uses RAPLfiles to reportmemory energy consumption. Itmeasures the total energy ofmemory available, not only

the one used by the process.Without access to the RAPLfiles, the tool will notmeasurememory energy

consumption.

EIT uses RAPLfiles or PowerGadget to reportmemory energy consumption.Memory used by the process ismeasured

via psutil consideringmemory used exclusively by the process and the sharedmemory between processes (weigh-
ted by the number of processes).Without access to the RAPLfiles or PowerGadget, the tool cannot be used.

MLCO2 does notmeasurememory.

Cumulator does notmeasurememory.

a Source: www.tomshardware.com.
b Source: Crucial.

6

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://www.tomshardware.com/reviews/intel-core-i7-5960x-haswell-e-cpu,3918-13.html
https://www.crucial.com/support/articles-faq-memory/how-much-power-does-memory-use


reported value of greenhouse gas emissions (GHG) by the emission intensity (EI) of servers location:
Energy=GHG/EI. Note that for the purpose of comparing reported energy consumption between tools, PUE is
not taken into account, since each tool uses a different value.

3.3. Carbon emission and emission intensity
The origin of the energy used is keywhen determining greenhouse gas emissions from electricity production. To
carry out the calculation, the average emission intensity (or carbon intensity) of the country or regionwhere the
calculations weremade is used. Countries report these values, which can then be used by the tools to calculate
emissions.

It is important tomention thatmost of the tools do not yet take the information of carbon intensity in real
time.Only CarbonTracker (for UK andDenmark) andExperiment-Impact-Tracker (for California)do it. In
most cases, average values fromprevious years are used. Some variables, such as the time of day of execution, or
the distribution of energy sources at a givenmoment, are not represented, but can have an important influence
on the emissions, as shown on table 5.Machine learning users could look at current and planned energy
consumption ofmost of the countries before running their experiments, e.g. on ElectricityMaps. In some cases,
if users are running on clouds that have different geographic locations, users could choosewhere to run the
algorithms to emit fewerGHGs. For example, table 5 presents some values at different locations for two different
days.While it can bewise to carefully choose datacenter locations, developersmust keep inmind that
transferring large datasets fromone location to the other also has environmental impacts (section 3.1.4).
Therefore, depending on the training time, itmight be better to remain on the same serverwhen training on the
same large dataset.We present in table 6 how each tool handles carbon intensity.

3.4.Measuringwhole equipment consumptionwithwattmeters
Wattmeters are physical instruments that are used tomeasure the active electrical energy of a certain circuit. By
plugging them into the physical infrastructure, we can get the exact total consumption of themachine.With
wattmeters, it is not possible to determine howmuch energy each component of themachine consumes, neither
to discriminate consumption by process. It is also important to note that wattmeters havemeasurement
frequencies. Different wattmetersmay have differentmeasurement frequencies and therefore different
accuracies depending on the duration of processes.

3.5. Errors reported and found in the tools
Some tools had to bemodified to be used, as they had bugs not yetfixed by the authors. Themodificationswe
had tomake can be found in B.

Table 4.PUE values used in the different tools.

Green-Algorithms configurable. The default value is 1.67 (2019) (Lawrence 2019).

CodeCarbon not taken into consideration, except for cloud providers.

Eco2AI configurable. The default value is 1.

CarbonTracker configurable. Although the paper indicates that the 2020 PUE (1.58) is used, the 2022 PUE (1.55) is used in the code
(Uptime Institute 2022).

EIT configurable. The default value is 1.58 (2020) (Lawrence 2019).

MLCO2 not taken into consideration.

Cumulator not taken into consideration.

Table 5.Daily average carbon intensity for two different days. Data
taken fromElectricityMaps.

March 5th 2023 March 29th 2023

France 64 137

North Sweden 16 14

SouthAfrica 684 702

SouthCarolina -USA 432 786

7

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://www.electricitymap.org
https://www.electricitymap.org


3.6. Summary of the characteristics of existing tools
In addition to the tables presented in Bannour et al (2021) and Jay et al (2023), we summarize in table 7what is
configurable andwhat are default values for each component, and add details on usage factor.

4. Infrastructure

Depending on the infrastructure, users will have access to different resources, which restricts the list of tools that
can be used. Themost commonly used infrastructures formachine learning are physical or virtual servers,
virtualized environments in the cloud, supercomputers or personal computers. Table 8 summarizes the tools’
requirements and hardware compatibility.

4.1. Access to information and resources
Weexplain below how each type of infrastructure handles access to hardware information.

4.1.1. Virtual environments
Some tools require knowing the available CPUmodel tomake a better estimation. In virtual environments, the
information in the/proc/cpuinfo file (or equivalent tools forWindows ormacOS)may not be correct, and
may represent some characteristics of the CPU emulated by the virtualizer. Unfortunately, from the virtual
environment, there is noway for users to know exactly the real CPU that is being used for the execution.

4.1.2. RAPL files
Some tools require read access to the RAPLfiles. Access to thesefiles is restricted by default to the root user. An
administratormust be asked to grant read permission to those files. Also, these files are available only if the
machine has Intel CPUs, and has Linux as an operating system. A similar situation is experiencedwith Power
Gadget: it is exclusive to Intel CPUs, and the tool need to be installed.

Table 6.Emission intensity used in the different tools.

Green-Algorithms Most emission intensity data come fromCarbon Footprint but the tool also uses other sources like ElectricityMaps.

Information is collected in theCI_aggregated.csv file. The default value is 475 gCO2eq/kWh (world average
in 2018).

CodeCarbon ForUnited States andCanada, CodeCarbon uses regional data on emissions per unit of power consumed. For other

countries, the tool uses the energymix of the country, i.e. intensity data of each energy source (carbon, solar, wind,
etc), to calculate the intensity of the country. The average energymix for each country is taken fromGlobal Petrol

Prices. The information is collected in thefiles under data folder. The sources of each data are specified in the files.

The default value is 475 gCO2eq/kWh (world average in 2018).

Eco2AI For all countries the emission intensity calculationwasmade using the intensity data of each energy source (carbon,
solar, wind, etc) and the energymix of each country. The values used for the calculations nor their sources are not

explained, and only thefinal result of the intensity of emissions for each country is published in carbon_index.csv.

The default value is 436.5 gCO2eq/kWh (Ember 2022).

CarbonTracker CarbonTracker supports the fetching of carbon intensity in real-time through external APIs. It is currently limited to

Denmark andGreat Britain. ForDenmark they use data fromEnergi Data Service and forGreat Britain they use

theCarbon Intensity API. For other countries, it usesfixed values available in the carbon-intensities.csv file. The

sources are not published. The default value is 475 gCO2eq/kWh (2019).

EIT EIT supports the fetching of carbon intensity in real-time through external APIs. It is currently limited toCalifornia

using theAPI of California ISO. For other countries, it usesfixed values available in the co2eq_parameters.json file.

The sources are published and aremostly fromElectricityMaps. The default value is 301 gCO2eq/kWh (annual
mean carbon intensity of all electricityMap zones).

MLCO2 MLCO2published the sources and contains the information of theCloud providers in the impact.csv file. For private

infrastructure, it is necessary to provide the emission intensity value, whichmust be obtained by user ownmeans.

Cumulator The data of emission intensity is fromElectricityMaps. Information is collected in the country_dataset_adjusted.csv

file. The default value is 447 gCO2eq/kWh (average carbon intensity value in gCO2eq/kWh in the EU in 2018

Moro and Lonza 2018).

8

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://www.carbonfootprint.com
https://www.electricitymap.org
https://github.com/GreenAlgorithms/green-algorithms-tool/blob/master/data/latest/CI_aggregated.csv
https://www.globalpetrolprices.com
https://www.globalpetrolprices.com
https://github.com/mlco2/codecarbon/tree/master/codecarbon/data
https://github.com/sb-ai-lab/Eco2AI/blob/main/eco2ai/data/carbon_index.csv
https://energidataservice.dk/
https://carbonintensity.org.uk/
https://github.com/lfwa/carbontracker/blob/master/carbontracker/data/carbon-intensities.csv
http://caiso.com
https://github.com/Breakend/experiment-impact-tracker/blob/master/experiment_impact_tracker/emissions/data/co2eq_parameters.json
https://www.electricitymap.org
https://github.com/mlco2/impact/blob/master/data/impact.csv
https://www.electricitymap.org
https://github.com/epfl-iglobalhealth/cumulator/blob/master/src/cumulator/countries_data/country_dataset_adjusted.csv


Table 7. Summary of the characteristics of the energy andCO2eqmeasurement tools.Wattmeters are not included in the table.

Green-Algorithms CodeCarbon Eco2AI CarbonTracker EIT MLCO2 Cumulator

General Information

1. Type of tool Online calculator and Server-side tool Embedded package Embedded package Embedded package Embedded package Online calculator Embedded package

2. Embodied emissions no no no no no no no

3. Static (idle) emissionsw/o runs no no no no no no no

4. Process/machine estimation process both both machine process machine machine

5.Measurement frequency (sec) — 15 10 10 10 — —

EnergyConsumptionCPU

1.Measured yes yes yes yes yes no yes (if chosen)
2.UseModel of CPU yes yes (if no tracking tool) yes no no — yes

3.Use RAPLfiles or PowerGadget no yes no yes (RAPLfiles) yes — no

4.Default TDP 12 (normalized by core) 85 100 — — — 250

5.Usage Factor considered yes 50% (if default TDPused) yes no yes — no

6. Tool for usage factor — — psutil — psutil — —

EnergyConsumptionGPU

1.Measured yes yes yes yes yes yes yes (if chosen)
2.UseModel ofGPU yes no no no no yes yes

3.Default TDP 200 no no no no no 250

4. Tool to get power — pynvml pynvml pynvml nvidia-smi — —

5.Usage Factor considered yes no no no yes no no

6. Tool for usage factor — — — — nvidia-smi — —

7.OnlyNvidia GPUs no yes yes yes yes no no

EnergyConsumptionMemory

1.Measured yes yes yes yes yes no no

2. Source of information — system system RAPLfiles RAPLfiles — —

3.Usage Factor considered no yes (if trackingmode) yes no yes — —

4. Tool for usage factor — psutil psutil — psutil — —

5. Formula 0.3725W/GB 0.375W/GB 0.375W/GB — — — —

Emission intensity

1.Default E.I value 475 475 436.5 475 301 — 447

2. Real time no no no yes (just UK andDenmark) yes (just California) no no

PUE

1. PUE considered yes yes (just cloud) yes yes yes no no

2. PUE configurable yes no yes no yes — —

9

E
nviron.R

es.C
om

m
un.5

(2023)115014
L
B
ou

za
etal


Table 7. (Continued.)

Green-Algorithms CodeCarbon Eco2AI CarbonTracker EIT MLCO2 Cumulator

3.Default PUE value 1.67 — 1 1.58 1.58 — —

Errors

1.Need codemodification - — — yes (with Python 3.10) yes — yes

10

E
nviron.R

es.C
om

m
un.5

(2023)115014
L
B
ou

za
etal


4.1.3. Usage factor
Unfortunately, there is no tool that can be usedwith the command line that gives us the total time of the script
(whole time), the CPU time and theGPU time, in order to calculate theCPU andGPUusage factor required by
Green-Algorithms.However, workloadmanagers such as SLURMcommonly log this information. One option
is to take empirical and specificmeasurements of the use of theGPUduring the execution of their algorithm
using thenvidia-smi tool, and extrapolate that value ofGPUutilization to the entire execution. It is
important to note that this utilization percentage corresponds to the total utilization, and not just the utilization
of the process. There could be other processes running on the availableGPUs. Up to our knowledge, there is also
no tool thatmeasures GPU time for non-Nvidia GPUs.

4.1.4.Wattmeter
Finally, using awattmeter requires having one, and in the case of institutional infrastructure, consultingwith a
systems administrator tomake the physical connection. It is important to note that thewattmeter willmeasure
the consumption of the entire node, so ideally there should not be other processes running on the node, or if
there are, it is key to take it into account when analyzing the value returned by the device.

4.2.Description of the infrastructures used for experimentation
In this guidewe have tested on resources in two French laboratories (Labri andMAP5), Grid5000, personal
computers andwewill alsomentionGoogle Colab. In table 9we detail the hardware specifications of the
infrastructure used for the experiments.

4.2.1. Laboratory servers
Wehave tested the differentmeasuring tools in Labri (computer science laboratory of Bordeaux) andMAP5
(laboratory of appliedmathematics in Paris 5University). Labri has physical servers withNVIDIAGPUs, Intel
CPUs and Linux operating system.Wehave had the possibility to experiment usingWattmeter. Access to the
RAPLfiles is restricted to root, so the execution of the scripts need to be done by an administrator, in order to use
Experiment-Impact-Tracker andCarbonTracker.

MAP5 has physical servers withNVIDIAGPUs, Intel CPUs and Linux operating system. Access to the RAPL
files is currently available.We can test all the tools, but we do not have aWattmeter.

4.2.2. Super computers
Weexperimented one super computer: Grid5000which is a large-scale andflexible testbed for experiment-
driven research in all areas of computer science, with a focus on parallel and distributed computing including
Cloud,HPC andBigData, andAI. Grid5000 cluster allows numerous configurations and is verywell
documented. The cluster has servers withNVIDIAGPUs, Intel CPUs, Linux operating system and access to
RAPLfiles. Access toWattmetermeasurements on selected nodes is possible, so that all the tools can be used.

However, by requesting only a portion of the node, thewattmeter value, thatmeasures the entire node,
might not be really useful as other jobs can be running in the same server. Also, note that without booking the
whole node, it is not possible to get user privileges so EIT cannot be used, Carbontracker will notmeasureCPU,
andCodeCarbonwill use TDP to calculate CPU consumption.

Table 8.Requirements to run the tools.

Green-

Algorithms CodeCarbon Eco2AI CarbonTracker EIT MLCO2 Cumulator

Requirements

1.Operating

System

— — — Linux (if no-NVI-
DIAGPU)

— — —

2. Access to

RAPLfiles

no no no yes (if no-NVI-
DIAGPU)

yes no no

3. PowerGadget — no no — yes — no

Compatibility

1.Non Intel CPUs yes yes yes no no does notmea-

sure CPU

yes

2.NonNvi-

dia-GPUs

yes no no no no yes yes

11

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


4.2.3. Personal computers
In thesemachines, we could install the necessary tools and enable the permissions that are required.
CarbonTracker can be used if at least one of the 2 conditions ismet: having Intel CPUs orNVIDIAGPUs. If
neither of the two conditions ismet, the tool cannot be used. The tool willmeasure the power consumption of
theCPUs andMemory only if theCPUs are Intel, and it willmeasure the power consumption of theGPUs only if
they areNvidia.

If we have non-NVIDIAGPUs, we can only useGreen-Algorithms,MLCO2 (if theGPU is on the list),
Cumulator andCarbonTracker (if we have Intel CPUs).

If we have non-Intel CPUs, wewill not be able to use Experiment-Impact-Tracker and if we have only CPUs,
wewill not be able to useMLCO2 either. This explains theN/Avalue reported in results tables.

4.2.4. Colab
Google Colab is awidely use resource, with data centers located around theworld, but unfortunately the data
center cannot be selectedwhen the environment is created. The execution location can be checkedwith the
commandcurlipinfo.io and then using this information to determine the data center being used8.

When running a notebook, a virtual environment is generated, for which some commands are not available,
users are not administrators, do not have access to RAPL files and do not know the real resources that are being
used. This limits the tools that can be used. Experiment-Impact-tracker cannot be used. Green-Algorithms,
CodeCarbon, Eco2AI andCumulator can be used, assuming an average consumption. This assumption can lead
to reporting values of carbon emissions that are not the correct real ones. CarbonTracker can be used, but only
withGPU runtime, andwill notmeasure energy consumption of CPUnorMemory.

5. Experiments and results analysis

Wewill now compare the different tools and their use in different infrastructures for image processing and
analysis. Section 5.1 details the experimental settings. Then, in section 5.2we present the results. In section 5.2.1
we explain the high variability between the different tools, their differences withwattmetermeasurements
(section 5.2.2) and the impact of the infrastructure (section 5.3). Later, focusingmore on the second experiment,
we analyze the influence of the data load (section 5.4), of the batch size (section 5.5), of saving the checkpoints

Table 9.Hardware specifications of infrastructure used for experiments.

Gemini-1

(Grid5000)
Rosenblatt

(MAP5) Server (Labri) Personal Computer Colab

Operating

System

Linux Linux Linux Linux Linux

CPU

1.Quantity 2 2 1 1 1

2.Model Intel Xeon E5-

2698 v4

Intel XeonE5-

2609 v4

Intel Core i9-

7940XCPU@

3.10 GHz

AMDRyzen 5 2600

Six-Core Processor

(VE) Intel Xeon
CPU@2.20GHz

3. TDP 135W 85W 165W 65W Unknown

GPU

1.Quantity 8 2 3 1 1

2.Model NVIDIATesla

V100-SXM2-32 GB

NVIDIA

TITANXp

NVIDIA

TITANXp

NVIDIATITANV NVIDIATesla T4

3. TDP 250W 250W 250W 250W 70W

Memory

1.Quantity 512 GB 62 GB 126 GB 32 GB 12 GB

Wattmeters

1. Available yes no yes yes no

2. Frequency second — minute minute —

8
https://cloud.google.com/about/locations?hl=es

12

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

http://Google%20datacenters


(section 5.6) and of the energy consumption of the tools themselves (section 5.8). Finally, we comment on
additional idle consumption (section 5.9).

The theoretical analysis of the tools and results provides a better understanding of differences in
measurement between the tools, which (Bannour et al 2021) indicatedwas needed.

In order to also transparently acknowledge the impact of ourwork, we conducted an analysis using
wattmeters when available andCodeCarbonwhen not (machine tracking) to determine the total energy
consumed throughout all our experiments. The results revealed a cumulative consumption of approximately
14.5 kWh. This value includes all the runs that led to the paper. It does not include PUE.

5.1. Experiments settings
Wecarried out two experiments, with different characteristics, in different infrastructures.

First, we trained amanually written digit classifier on theMNIST dataset. TheMNIST dataset is a collection
of images of handwritten digits. Its training set has 60,000 examples, with a size of 50MB..The classifier is
implementedwith a fully connected, two-layer network (an inner layer of 32 neurons, and an output layer of 10
neurons), over 5 epochs and normally takes less than aminute on different infrastructures. This experiment runs
on a single GPU. Appendix C provides the architecture diagram.

Second, we trained an image denoiser on the Imagenet validation dataset. The ImageNet dataset is a
collection of images depicting diverse objects and scenes. Its validation set has 50,000 examples, with a size of
6 GB. TheDenoiser is implementedwith aDnCNNnetwork (Ryu et al 2019) over 80 epochs and takes
approximately two hours to run. Appendix Cprovides the architecture diagram. This experiment runs in
parallel on all availableGPUs. In order tomeasure the impact of other configurations, small variations of this
experiment were also performed.

The experiments were performed using Pytorch. Since each experiment has a different configuration
regarding the use of theGPUs, the choice of framework is key to enable the use of all available GPUs. PyTorch
enabledmulti-GPU training. This is also the case with Tensorflow, but it would have requirer additional
configuration to the default installation in order to use the availableGPUs.

The experiments were carried out in the infrastructures detailed in section 4.We also ran the experiments on
Gemini-1 requesting only a quarter of the resources (twoGPUs, 128 GBmemory and 10 cores of the 40
available). Depending on the available resources, certain tools could be used only on some infrastructures.We
nowdiscuss themain observations fromour results.

5.2. Results
This section presents and analyzes the results obtained for the two experiments on the different infrastructures.
In table 10we present the energy consumption for the first experiment, which corresponds to the training of a
manually written digit classifier. In table 11we present the consumption for the second experiment, which
corresponds to the training of an image denoiser.

The reported values correspond to individual runs and are not averaged values.However,multiple runs of
the experiments were performed on different infrastructures to validate the consistency of these numbers.
Experiment 1was executed 3 times onGrid5000, 2 times onMAP5, Labri andColab. Experiment 2was executed
twice onGrid5000 and Labri.

As said before, Cumulator does not report energy consumed. The values presented in the table were not
reported byCumulator, but calculated by us from carbon footprints.

5.2.1. Variability between the different tools
From the two tables 10 and 11, we observe a large difference between the energy consumption and carbon
emissions reported by the different tools. For instance, a 400% increase of consumption forMLCO2 compare to
Eco2AI on theGemini-1 node ofGrid5000.

Machine versus Process Some tools are focused on estimating the consumption of the entiremachine, and are
comparable withwattmeters, but others estimate the consumption of the process, trying to isolate it fromother
processes thatmay be running on themachine.

CodeCarbon andCarbonTracker have similar strategies forGPU andCPU consumption estimation,
focusing on fullmachine estimation. They differ inmethod for the estimation ofmemory consumption.We can
say that CodeCarbon strategy ismore accurate, since it reaches a valuemore similar to that of thewattmeter.

Eco2AI and EIT focusmore on isolating the consumption of the process that ismeasured. It can be seen
fromboth experiments that these tools show a lower consumption estimate thanCodeCarbon and
CarbonTracker. Green-Algorithms approach also attempts to isolate consumption from the process.

13

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


Table 10.Results for the training of a digit classifier (experiment 1). All consumption values are inWh. Carbon emissions are in gCO2e. ForCodeCarbon andEco2AI, (P) refers to the process trackingmode and (M) to themachine
trackingmode.

Green-Algorithms CodeCarbon (P) CodeCarbon (M) Eco2AI (P) Eco2AI (M) CarbonTracker EIT MLCO2 Cumulator Wattmeter

Gemini-1Whole node (57 sec)

Tot. Energy reported 5.990 8.800 12.50 7.200 7.100 13.30 2.570 38.00 4.771

Tot. Energyw/oPUE 3.590 8.80 12.50 7.200 7.100 8.580 1.630 38.00 4.771 13.00

Energy for CPU 0.007 1.500 1.500 — — — — — — —

Energy for GPU 0.395 7.200 7.200 — — — — — — —

Energy forMemory 3.16 0.0184 3.700 — — — — — — —

Carbon emissions 0.307 0.480 0.690 0.490 0.480 0.777 0.140 2.53 0.563

Gemini-1 2GPUs (56 sec)

Tot. Energy reported 1.689 1.630 4.570 1.640 1.620 2.350 N/A

Tot. Energyw/oPUE 1.008 1.630 4.570 1.640 1.620 1.516 N/A 9.333 3.729 N/A

Energy for CPU 0.130 0.000 0.000 — — — — —

Energy for GPU 0.139 1.620 1.620 — — — — —

Energy forMemory 0.739 0.013 2.950 — — — — —

Carbon emissions 0.086 0.090 0.250 0.110 0.110 0.140 0.622 0.440

Rosenblatt (1 min 36 sec)

Tot. Energy reported 1.030 3.190 3.800 2.000 2.100 4.56 3.860 13.30 6.711

Tot. Energyw/oPUE 0.617 3.190 3.800 2.000 2.100 2.940 2.440 13.30 6.711 N/A

Energy for CPU 0.148 1.200 1.200 — — — — — —

Energy for GPU 0.086 1.900 1.900 — — — — — —

Energy forMemory 0.389 0.0276 0.600 — — — — — —

Carbon emissions 0.0527 0.170 0.200 0.138 0.140 0.266 0.210 0.533 0.792

Labri (45 sec)

Tot. Energy reported 1.94 1.689 2.287 1.1459 1.126 2.219 1.91 9.375 2.093

Tot. Energyw/oPUE 1.16 1.689 2.287 1.1459 1.126 1.432 1.209 9.375 2.093 2.241

Energy for CPU 0.255 0.565 0.565 — — — — — — —

Energy for GPU 0.128 1.111 1.097 — — — — — — —

Energy forMemory 0.777 0.013 0.626 — — — — — — —

Carbon emissions 0.099 0.093 0.126 0.074 0.076 0.13 0.107 0.375 0.247

14

E
nviron.R

es.C
om

m
un.5

(2023)115014
L
B
ou

za
etal


Table 10. (Continued.)

Green-Algorithms CodeCarbon (P) CodeCarbon (M) Eco2AI (P) Eco2AI (M) CarbonTracker EIT MLCO2 Cumulator Wattmeter

Personal computer (57 sec)

Tot. Energy reported 0.356 1.000 1.190 0.733 0.728 1.415 N/A 4.167 3.949

Tot. Energyw/oPUE 0.356 1.000 1.190 0.733 0.728 0.913 N/A 4.167 3.949 1.404

Energy for CPU 0.032 0.330 0.330 — — — — — —

Energy for GPU 0.125 0.660 0.660 — — — — — -

Energy forMemory 0.199 0.015 0.195 — — — — — -

Carbon emissions 0.018 0.056 0.065 0.049 0.049 0.083 0.167 0.466

Colab -Oregon (1 min 6 sec)

Tot. Energy reported 0.381 1.500 1.600 3.000 3.000 0.805 N/A 1.280 5.15

Tot. Energyw/oPUE 0.343 1.500 1.600 3.000 3.000 0.519 N/A 1.280 5.15 N/A

Energy for CPU 0.219 0.900 0.900 — — — — —

Energy for GPU 0.041 0.600 0.600 — — — — —

Energy forMemory 0.0913 0.0206 0.100 — — — — —

Carbon emissions 0.024 0.200 0.200 0.600 0.600 0.290 0.367 1.03

15

E
nviron.R

es.C
om

m
un.5

(2023)115014
L
B
ou

za
etal


Table 11.Results for the training of an image denoiser (experiment 2). All consumption values are in kWh. Carbon emissions are in gCO2e. The consumption indicated forColab is extrapolated. An epochwas executed, the consumptions
were obtained, and the valueswere extrapolated.

Green-Algorithms CodeCarbon (P) CodeCarbon (M) Eco2AI (P) Eco2AI (M) CarbonTracker EIT MLCO2 Cumulator Wattmeter

Gemini-1whole node (2 hs)

Total Energy reported 1.92 1.39 1.69 1.07 1.10 2.09 2.09 4.80 0.5

Tot. Energy w/oPUE 1.15 1.39 1.69 1.07 1.10 1.35 1.32 4.80 0.5 2.10

Energy for CPU 0.09 0.22 0.22 — — — — — — -

Energy for GPU 0.69 1.14 1.09 — — — — — — -

Energy forMemory 0.37 0.03 0.37 — — — — — - -

Carbon emissions 100 80 90 70 80 120 120 280 60

Gemini-1 2GPUs (1h 17min)

Total Energy reported 0.76 0.36 0.61 0.35 0.37 0.59 N/A 0.77 0.45

Tot. Energyw/oPUE 0.47 0.36 0.61 0.35 0.37 0.38 N/A 0.77 0.45 N/A

Energy for CPU 0.05 0 0.00 — — — — —

Energy for GPU 0.36 0.359 0.37 — — — — —

Energy forMemory 0.06 0.008 0.24 — — — — —

Carbon emissions 40 20 34 24 25 34 51 38

Rosenblatt (3hs 16min)

Tot. Energy reported 1.77 1.07 1.12 0.89 0.99 1.71 1.75 1.63 0.84

Tot. Energyw/oPUE 1.06 1.07 1.12 0.89 0.99 1.10 1.11 1.63 0.84 N/A

Energy for CPU 0.10 0.17 0.17 — — — — — —

Energy for GPU 0.88 0.89 0.87 — — — — — —

Energy forMemory 0.08 0.02 0.08 — — — — — —

Carbon emissions 90 60 60 60 70 100 100 90 100

Labri (1h 13min)

Tot. Energy reported 0.80 0.76 0.79 0.69 0.72 1.16 1.17 0.9 0.3

Tot. Energy w/oPUE 0.48 0.76 0.79 0.69 0.72 0.75 0.74 0.9 0.3 0.83

Energy for CPU 0.15 0.097 0.097 — — — — — - -

Energy for GPU 0.27 0.66 0.64 — — — — — - -

Energy forMemory 0.06 0.03 0.056 — — — — — — —

Carbon emissions 41 42 44 47 48 68 65 36 24

16

E
nviron.R

es.C
om

m
un.5

(2023)115014
L
B
ou

za
etal


Table 11. (Continued.)

Green-Algorithms CodeCarbon (P) CodeCarbon (M) Eco2AI (P) Eco2AI (M) CarbonTracker EIT MLCO2 Cumulator Wattmeter

Personal computer (1h 49min)

Tot. Energy reported 0.37 0.34 0.35 0.25 0.27 0.52 N/A 0.45 0.46

Tot. Energyw/oPUE 0.37 0.34 0.35 0.25 0.27 0.34 N/A 0.45 0.46 0.40

Energy for CPU 0.001 0.09 0.09 — — — — — —

Energy for GPU 0.35 0.24 0.24 — — — — — -

Energy forMemory 0.02 0.01 0.02 — — — — — —

Carbon emissions 19 19 19 17 18 30 18 54

Colab -Oregon (17 hs est.)

Tot. Energy reported 1.22 1.49 1.56 1.03 1.82 0.96 N/A 1.19 0.36

Tot. Energyw/oPUE 1.10 1.49 1.56 1.03 1.82 0.62 N/A 1.19 0.36 N/A

Energy for CPU 0.07 0.73 0.73 — — —

Energy for GPU 0.95 0.75 0.75 — — — — —

Energy forMemory 0.08 0.02 0.08 — — — —

Carbon emissions 199 206 216 184 328 369 100 72

17

E
nviron.R

es.C
om

m
un.5

(2023)115014
L
B
ou

za
etal


Multiple GPUsCumulator onlymeasures CPUs orGPUs, according towhatwe specify when creating the
tracker. In both cases it considers a single unit of the hardware it ismeasuring, without checking howmany
CPUs orGPUs exist on themachine.

MLCO2 also has a simplified view, onlymeasuring the consumption of 1GPU. The values reported in the
tables were obtained bymultiplying the value obtained by the number ofGPUs available. The reported values for
1GPU forCumulator andMLCO2 are very similar because they follow the same strategy. In the case of the
personal computer or Colab, having a single GPU,we can come to consider these two tools, but we are also not
measuringCPU consumption. In addition, the tools onlymultiply the time consumed by the TDP, so it does not
verify actual consumption or compute usage factors. Their results can only be useful whenwe have a single unit
of the component to bemeasured (CPUorGPU forCumulator, and onlyGPU forMLCO2), and it has a usage
factor close to 100%.

Usage factorTheweb calculator of Green-Algorithms and their server toolG4HPC set default usage factor to
100%CPUandGPU loads if these data are not provided. This will overestimate power consumption inmost
cases. To be considered byGA, usage factorsmust be calculated by the user. TheCPUusage factor can be
calculated using theCPU time and the process time, but there is no easy way to get theGPUusage factor.We can
get empirical values frommeasurements usingnvidia-smiwhile the algorithm is running, and assume that it
maintains that usage factor throughout the run. In this casewe are assigning all the utilization percentage
reported bynvidia-smi to the process, but there could be other processes using theGPU. In our study, since
for both experiments the only process running onGPUswas the onemeasured, we took one sample per epoch of
thenvidia-smi output during code execution.We averaged the utilization percentage values of all GPUs
across all samples. Results are shown in table 12.We observe a lowusage factor, especially on servers. As shown
in table 11,MLCO2 seems to largely overestimate consumption onGrid5000, which is because it does not take
into account the usage factor of theGPUs. The average usage factor is 14%, butMLCO2 is considering 100% for
all 8GPUs. EIT queries and calculates usage factors during execution. Eco2AI only does this for CPU, as it
directly queries the consumed energy forGPU.CodeCarbon andCarbonTracker directly query the consumed
energy for bothGPU andCPU,without using the usage factor.

5.2.2. Comparison between software tools andwattmeter
Wattmeters were present in Labri server, the personal computer andGemini-1. Table 13 shows a summary of
the comparison presented. For experiment 1, wattmeter on the personal computer and labri server onlymade
onemeasurement during the entire experiment, so the reported valuemay not be exact.

In thefirst experiment, the value reported by the consumption of themachinewithCodeCarbon is almost
exactly the same as that reported by thewattmeter. For the second experiment the value is not as precise, but it is

Table 12.Usage factor of CPU andGPU in the infrastructures used. This
computed values are used byGreen-Algorithms.

CPU GPU CPU GPU

Expe. 1 Expe. 1 Expe. 2 Expe. 2

Gemini-1 (Grid5000) 5% 0.3% 16% 14%

Gemini-1 2GPUs

(Grid5000)
12% 1.5% 58% 46%

Server (Labri) 9% 1% 73% 35%

Rosenblatt (MAP5) 16% 1% 39% 54%

Personal Computer 22% 3% 4% 77%

Table 13.Comparison between software tools andwattmeter inGrid5000 (without considering
PUE), personal computer and Labri server. Values represent the percentage of energy reported by
tools w.r.t. the value reported by thewattmeter.

CodeCarbon (M) Eco2AI (M) CarbonTracker EIT

Expe. 1Grid5000 96% 55% 66% 13%

Expe. 2Grid5000 80% 60% 64% 63%

Expe. 1 Personal comp. 85% 52% 65% N/A

Expe. 2 Personal comp. 88% 68% 85% N/A

Expe. 1 Labri 102% 50% 64% 54%

Expe. 2 Labri 95% 87% 90% 89%

18

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


stillmore than 80% for all infrastructures. Thismeasuring tool is the one that gives the closest value with respect
towattmeters, followed byCarbonTracker, withmore variability between infrastructures.

Eco2AI and EIT report values larger than thewattmeter. Since these tools try to isolate the consumption of
the process, and notmeasure the total consumption of themachine, then the reports of energy consumption are
not comparable with thewattmeter value.

5.3. Influence of infrastructures
We ran the same experiments on different infrastructures. For both experiments, power consumption is higher
on larger infrastructures (e.g. Gemini-1).

As an example, theDenoiser training experiment took 2 hours onGemini-1 (Grid5000 server), while on
Rosenblatt (MAP5 server) it took 3 hours and 16min. Usage factor of CPUwas lower inGrid5000: 16% in
Grid5000 and 39% inMAP5. The estimation of usage factor of GPUwas also lower inGrid5000: 14.3% in
Grid5000while inMAP5 it was 54%. The consumption reported inGemini-1 byCodeCarbon (Machine
tracker) is 1.69 kWh,while the consumption reported in Rosenblatt by CodeCarbon (Machine tracker)was 1.12
kWh. Rosenblatt’s hardware is considerably smaller thanGemini-1ʼs (see table 9).

It can also be seen that in experiment 2 for Labri, the personal computer and onGemini-1 booking only 2
GPUs, the execution timewas less than in the case of execution on the entireGemini-1 node. This longer
execution ismore likely due to the parallelization strategy (using nn.DataParallel) that runs the training on all
GPUswithout requiring their full computing power. Thismight be a good reason for using, asmuch as possible,
a hardwarewhich size is adapted to the experiments where resources can be used asmuch as possible, even if the
experiments takemore time.Gemini-1 node has 8GPUswhich is not useful for both our experiments.

5.4.Data load
In theDenoiser training experiment, we seperately quantified the energy consumption of data loading (6 GB
Imagenet validation split) versus training themodel and found that only 0.5%of the energywas used to load the
data. This is partly because the data was already on the server, the impact of downloading the data and of data
storage is not beingmeasured.

5.5. Batch size
To study the impact of batch size during training, we usedCodeCarbon during experiment 2 (denoiser) on the
Gemini-1 node for 10 epochs. Using three batch sizes (32, 64 and 128), we showed that there is a tradeoff
between energy used and runtime (table 14).While larger batch sizes led to faster runtimes, the largest energy
usagewasmeasured for the smallest batch size (32), closely followed by the largest one (128). In this situation, an
intermediate batch size of 64 looks like a better compromise, combining a runtime not far off the shortest one
andminimising energy usage.

However, whenwe decrease the batchmore, the experiment takes longer, and the idle consumption of the
resources starts toweigh on the total consumption of the experiment. If we compare theGPU consumption of
experiments with batch size 32 and 128, we see that experiment 32 consumes less, still taking almost 3 times
longer. Nevertheless, comparing the experiment of 32with that of 64, we have that the consumption is higher,
probably because the experiment takes almost 10minmore, andwe have the static consumption of the
resources.

In conclusion, a balance is required between the length of the experiment, and the greater consumption of
theGPUmemory to obtain aminimumenergy consumption.

5.6. Checkpoints
We found that checkpointing had no impact on energy consumption or runtime (table 15).We tested this on
experiment 2 onGemini-1 usingCodeCarbon and awattmeter. In the first scenario, the values of the network
parameters were saved every epoch (ten epochs in total) and in the second scenario valueswere saved only once.

Table 14.Results of experiment 2with different batch sizes. All consumption values are inWh.

Experimentwith batch size 32 Experimentwith batch size 64 Experiment with batch size 128

Total Energy (CodeCarbon) 252 184 246

CPU (CodeCarbon) 41 29 20

GPU (CodeCarbon) 205 152 224

Memory (CodeCarbon) 6 3 2.3

Total Energy (Wattmeter) 391 280.3 320

Time spent 25:54 16:29 10:30

19

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


5.7. Variability of consumption through epochs
It is interesting to determine if it is possible to extrapolate the energy consumption of a training phase from the
values observed on only few epochs. To determine it, theDenoiser training experiment was executed during
different number of epochs onGemini-1; the time consumedwasmeasured, as well as the energy consumption.
Results in the figure 3 show that epochs duration and consumption are constant. Itmight therefore be possible
to extrapolate energy consumption for large experiments from experiments on just a few epochs. Same
conclusionwas reached inAnthony et al (2020)when usingCarbonTracker.

5.8. Ismeasuring really eco-friendly?
To compare the extra energy consumption of the tools themselves, we run 2 processes of experiment 2 in
parallel, onewith all seven trackers and onewithout any.We report energy consumption provided by the
wattmeter.We found that the codewith trackers was almost 10% slower and ended 11min later than the one
without trackers. The energy consumption during this extra timewas 0.19 kWh,while it was 2.58 kWh for the
timewhen both processes were running in parallel (+7.4%).

Another experiment was performed, testing each tracker at a time. As in previous test, we run two processes
in parallel, onewith a given tracker and onewithout any tracker. Energy ismeasuredwithwattmeter. Table 16
shows the result with 10 epochs. It can be seen that the additional energy is around 1%of the total consumption
for all the tools, except for Eco2AI, where consumption reaches 3.5%, a value that is not negligible.We think that
the biggest consumption compared to the other tools is not using the RAPLfiles to obtain thememory andCPU
consumption, but rathermaking queries to the operating system to later do the calculations. Although other
tools also do it this way, none do it to calculate the energy of both resources.

As a result of both experiments, we can conclude thatmeasuring the processes has an impact, but a small
one. Thefirst experiment carried outwith all the trackers has a longer execution time, probably due to delays
while access to resources. Itmight be a good idea to use online tools such asGreen-Algorithms, in order not to
add additional load to the algorithm and still being able tomeasure the impact.

Figure 3.Duration and energy consumption after different number of epochs of Experiment 2. All consumption values are inWh.

Table 15.Results of experimentwith different frequency of checkpoints. Both experiments are run for 10 epochs.On the left column, only
one checkpoint has been saved at the end of these epochs. On the right column, one checkpoint is saved per epoch. All consumption values
are inWh.

Experiment with one checkpoint Experimentwith ten checkpoints

Total Energy reported (CodeCarbon) 161 160

Energy for CPU (CodeCarbon) 24 24

Energy for GPU (CodeCarbon) 134 133

Energy forMemory (CodeCarbon) 3 3

Total Energy reported (Wattmeter) 206 206

Time spent (min) 14:10 13:47

20

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


5.9. Static and deployment consumption
All the tools discussed in this guide are limited to quantifying energy consumptionwhile training a deep learning
approach. But infrastructures also us energywhen nodes are not used orwhen thefinal solution is deployed. The
authors of (Luccioni et al 2022) studied static infrastructure emission and deployment emissionswhen training
BLOOM, a large languagemodel and found these to be substantial.

Wemeasured the energy consumption of idle resources onGemini-1 over the same period of time it takes to
run experiment 1. In an idle situation, no process is being run beyond those required by the operating system.
We performed the same procedure with experiment 2 (executed for 10 epochs). The results are shown in the
table 17. Idle energy consumption is around 745Wh.We see that the consumption of idle resources is high
comparingwith the consumption reported during training: 84.4% for experiment 1 and 72.9% for experiment
2.Note that for both experiments, the resources are not fully used. In the table 12we can see the percentage of
CPUandGPUutilization during the execution.

This result is interesting sincewe can see thatmost of the consumption occurs simply by having the
hardware available to use it. This tells us that we have to be very careful when leaving hardware on for availability.
The availability and immediacy of resources is very expensive in terms of energy consumption.Whenwe use
hardwarewherewe do not have the power to turn it off whenwe are not using it, such as the cloud, or shared
computers, wemust remember that there is an additional consumption to be able tomake a reservation at any
time for a given resource.

6.Discussions

This section summarizes our observations and anticipates questions that AI practitionersmay havewhen
starting tomeasure the energy consumption of their codes.

6.1.When tomeasure impacts?
Contrary to tracking tools, online ones likeGreen-Algorithmsmake it possible to estimate consumption both
after training, as concluded in Bannour et al (2021), and before training. Although this will be less precise, it
anticipating the environmental impacts of a project.

If we use software tools and performmore than one run, we recommend performing themeasurement only
for some runs.

Given that is possible to extrapolate the energy consumption of a training phase from the values observed on
only few epochs, we couldmeasure the consumption of the firsts epochs, and then estimate the consumption of
the total training. In this way, the consumption corresponding to themeasurement will be slightly lower.

Table 16.Results running experiment 2 twice in parallel onGemini-1: one process using trackers, the otherwithout.

CodeCarbon(P) Eco2AI(P) CarbonTracker EIT Cumulator

Run timew/ tracker (min) 15:09 15:33 16:35 16:29 15:02

Run timew/o tracker (min) 15:05 14:57 16:24 16:35 14:49

Extra timewith tracker (min) 0:04 0:36 0:11 −0:06:00 0:13

EnergyCons.when 2 processes running (Wh) 335.5 334 358 358.5 331.6

EnergyCons. during extra time (Wh) 3.1 12.2 4.29 0 5.4

Percentage of overload (%) 0.92 3.5 1.2 0 1.6

Table 17. Static (Idle) and dynamic energy consumptionmeasuredwith
wattmeters

Time

(min)
Energy consump-

tion (Wh)

Experiment 1 00:53 12.96

Idle 00:53 10.95

Experiment 2 (10
epochs)

16:29 280.3

Idle 16:29 204.4

21

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al


6.2.Which tools to use
Estimating power consumption using software tools adds small load, so itmight be a good idea to use online
tools like Green-Algorithms.

Green-Algorithms is themost versatile tool, as it can be used under different infrastructures, brands of CPUs
andGPUs.However, online tools requiresmanual intervention to obtain the information andmay be less
precise. Afirst step to remedy this is the toolGA4HPCwhich is used to obtain the resource reservation data of a
job in clusters that use SLURMasworkloadmanager.

MLCO2 is an online tool but ismuchmore limited. It just account for GPU consumption and the value
returnedmust be correctly weighted according to the number ofGPUs and the correct execution time of the
algorithm.

If wewant to use software tools, we found that CodeCarbon is the best tool among those studied to estimate
the total consumption of themachine. The consumption reportedwith it ismore accurate when accessing RAPL
files. However, a strength of this tool is that it can be usedwithout access to them.On the contrary, if what you
want is to isolate the consumption of the process using software tools, Eco2AI and EIT are those that try to do it.
Eco2AI does not require access to the RAPLfiles and ismaintained and updated. By contrast, EIT requires access
to theRAPL files and it is necessary tomodify the code to use the tool.

6.3.Which infrastructure to use
Since the idle consumption of resources is a large percentage of the total consumption, we recommend only
keeping available the resources needed to achieve a high usage factor and have theminimum idle consumption,
even if the execution time is longer.

With supercomputers, we recommend requesting only the necessary resources, and if it is adequate, to share
the infrastructure with other user processes.

If possible, we recommend turning off personal computers or servers as soon as computation is done.
If we are using cloud infrastructure, as far as possible, choose data centers that have the lowest PUE and that

are located in areas with low gas emissions.We recommend choosing low emission hours for the execution of
training. Carbon aware schedulers such as CATS, grid-intensity-go or carbon-aware-scheduler can be used to
helpwith this.

6.4.Other impacts
In this paper, we have been focusing only on energy consumption, and associated greenhouse gas emissions, for
training AImodels. This only a small part of total energy consumption of the complete life cycle of the AI service.

For the training phase, an AI practitioner generally trains themodel several times. Complete training
emissions should consider all runs. InGreen-Algorithms, we canmodelmultiple runs, associatedwith
retraining using the ‘pragmatic scaling factor’ parameter.

Asmentioned in previous studies (Bannour et al 2021, Luccioni et al 2022,Wu et al 2021), the energy
consumption is underestimated, since all the tools onlymeasure the consumption during training and not
during deployment. Studies (Wu et al 2021, Luccioni et al 2022)havemeasured the consumption of deployment
phases that can bemuch higher than the one of training.Here again, choosing appropriate resources to have a
high usage factor seems to be essential.

Many other environmental impacts (resource depletion, ecotoxicity, etc) linked to the life cycle of
equipments (manufacturing, transport, distribution, use, end of life), are here not discussed and should be
investigated. Even of carbon footprint, computing embodied emissions is a challenge since all data are notmade
public bymanufacturers. From several assumptions, the authors of (Luccioni et al 2022) propose an estimation
of embodied emissions equal to half the ones of training.

Datasets creation, transfer and storage are also very important aspects of AI. An estimate by (Malmodin and
Lundén 2016) is 0.023 kWh GB−1 for transferring data on the IP core. For storage, there are various estimates.
Following Seagatemeasurement,9 (Lannelongue and Inouye 2023) consider an order ofmagnitude of the carbon
footprint of storing 1 terabyte of data to be around 10 kgCO2e per year. Another study (Gröger et al 2021)
mention 52Wh for storing one gigabyte for one year. To knowmore about energymanagement techniques for
database systems, we refer the reader to the systematic review (Guo et al 2022).

6.5. Predicting impacts
Systematically estimating the carbon footprint ofAI project can raise awareness, encourage the development of
energy-efficient software and limit thewaste of resource (Lannelongue and Inouye 2023). Importantly, these
impacts shouldbe anticipated before the start of a project. Authors of (Lefèvre et al2023)propose a list of criteria

9
https://seagate.com/gb/en/global-citizenship/product-sustainability/

22

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://www.green-algorithms.org/GA4HPC/
https://github.com/GreenScheduler/cats
https://github.com/thegreenwebfoundation/grid-intensity-go
https://pypi.org/project/carbon-aware-scheduler/
https://www.seagate.com/gb/en/global-citizenship/product-sustainability/


for assessing the environmental impacts of projects involvingArtificial Intelligence (AI)methods. In addition to
measuringwhile training ordeploying anAImodel, AI users should try to anticipate asmuch as possible the
impacts of their computations are likely to have, aswell as thebehavioral, economic, or societal changes thatmight
be induced by the project. In the same line, (Wilsonand van derVelden2022) review ethics, explainability,
responsibility, and accountability concepts inAI andpropose amodel for sustainableAI in thepublic sector.

7. Conclusion

In this paperwehavepresented andanalyzed seven existing tools for estimating energy consumptionwhen training a
deep learningmodel.Wehave explained the specificities of each tool anddetailed thenotions thatmaybenotwell
knownbyAIpractitioners. Fromour study,wehavedrawn someanalysis and recommendations inprevious sections.
Remark that our twoexperimentswere related to training regularCNNs for imageprocessing andanalysis.Webelieve
that themain resultswouldhold for other typesof architectures, as carbon footprint estimatorshave shown the same
behaviors for other applicationsorworkloads in Jay et al (2023), Bannour et al (2021),Dodge et al (2022). In thepaper
wehavehighlighted the advantages and limits of online tools, and that the choiceof the software tool dependson the
infrastructure andoneither onewants tomeasure thewholenodeor theprocess only.Wehave also shown that
measuringwith software tools has a small impact that canbecomenonnegligible for large experiments.Weobserved
that consumption is constant throughepochs, and thereforemeasuringonlyon fewepochs andextrapolating canbe
sufficient.Wehave confirmed that it is important to trainmodels on infrastructures that is scaled to theneed, not
booking awholenodewhennotnecessary. Finally, all these toolsmeasureonlydynamic energy consumptionof
computing and further studies are required to include static consumptionandenvironmental impacts.

Acknowledgments

This study has been carried outwithfinancial support from the FrenchResearch Agency through the
PostProdLEAPproject (ANR-19-CE23-0027-01). Loïc Lannelonguewas supported by core funding from the
BritishHeart Foundation (RG/18/13/33946); theNIHRCambridge Biomedical ResearchCentre (BRC-1215-
20014;NIHR203312)[*]; theCambridge BritishHeart FoundationCentre of Research Excellence (RE/18/1/
34212); and the BHFChair Award (CH/12/2/29428). *The views expressed are those of the authors and not
necessarily those of theNIHRor theDepartment ofHealth and Social Care. The authors thankMichael Clément
andBorisMansencal for running experiments in Labri and personal computer.We also thankMathilde Jay,
Denis Trystram, Laurent Lefèvre andAnne-Laure Ligozat for fruitful discussions.

Data availability statement

Nonewdatawere created or analysed in this study.

AppendixA.Methodologies to estimate energy consumption ofCPUs andGPUs

This appendix described the twomethods used to estimate energy consumption of CPUs andGPUs
Knowing themodel of the CPUorGPU, the firstmethodmultiplies the TDPprovided by themanufacturer

by the duration of training to obtain the energy used in kWh. TDP is a specification that indicates themaximum
amount of power that a computer processor (CPUorGPU) can dissipate when operating at itsmaximum
performance. It refers to the power consumption under themaximum theoretical load. In general, CPUswith a
higher number of cores will have a higher TDPbecause they requiremore power to operate atmaximum
performance. However, the relationship betweenTDP and the number of cores is not always straightforward.
SomeCPUsmay have a higher TDP even though they have fewer cores, because they are designed to operate at a
higher clock speed or have a less efficient architecture.

The secondmethoduses the IntelRAPL (RunningAveragePowerLimit) systemmanagement interface integrated
in INTELCPUsor thePowerGadget tool.RAPLallows software tomonitor and control thepowerusageof the
processor and its components, suchas theCPUcores,memory controllers andGPUs.TheLinuxpowercapdriverhas
the ability to expose theRAPLhardware energy counters by a set offiles that canbe accessed through theLinuxfile
system.Thesefilesmake it possible to read the current powerusageof theprocessor and its components, aswell as to
set power limits to control powerusage.Drivers arebeingdeveloped to get the information fromRAPL interface from
Windows.A recent implementation is thewindows-rapl-driver10 from theScaphandreproject (Petit 2021).

10
https://github.com/hubblo-org/windows-rapl-driver

23

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://github.com/hubblo-org/windows-rapl-driver


PowerGadget is a standalone software application developed by Intel that provides real-timemonitoring of
the power usage of Intel processors. It does not rely on the RAPLfiles, but rather uses its own proprietary
methods to access and analyze power consumption data. PowerGadget presents power consumption data in a
user-friendly graphical interface that displays real-time power usage of the processor, CPU cores,memory
controller, and other components. This tool can be used onWindows andmacOS.

Appendix B. Bugsfix of some software tools

Some toolsmust bemodified to be used, as they have bugs that have not beenfixed by the authors. Here are the
changes tomake for each one.

B.1. Experiment-impact-tracker

• PyPi package is not the latest, and does not not correspond to documentation (issue).

• getiterator infile/gpu/nvidia.pymust be changed to iter.

• For long runs, the INFO log level is too heavy. Change it to the ERROR level.

• If you have other experiment-impact-tracker logs in the same folder or subfolders, correct the
data_interface.py file so that the results are shown only from the logs folder thatwas determined.

B.2. Cumulator
Correct imports inbase.py (structure defined in thisfile does not correspond to the structure of the package.
issue)

B.3. CarbonTracker
Correct decode deprecated function in Python 3.10 infile carbontracker/components/gpu/nvidia.py.

AppendixC.Neural network architectures of experiments

The neural network architecture of Experiment 1 is a fully connected networkwith a single hidden layer of 32
neurons and an output layer of 10 neurons. The image C1 shows the architecture.

The neural network architecture of Experiment 2 is theDnCNNnetwork presented in Ryu et al (2019). The
imageC2 shows the architecture proposed in the original paper, which is the onewe used in the experiment.

FigureC1. Experiment 1Network Architecture.

24

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://github.com/Breakend/experiment-impact-tracker/issues/76
https://github.com/epfl-iglobalhealth/cumulator/issues/25


ORCID iDs

Aurélie Bugeau https://orcid.org/0000-0002-4858-4944

References

Anthony L FW,Kanding B and Selvan R 2020Carbontracker: tracking and predicting the carbon footprint of training deep learningmodels
arXiv:2007.03051

Arias P et al 2021Climate change 2021: the physical science basis. Contribution ofworking group i to the sixth assessment report of the
intergovernmental panel on climate change; technical summary IPCC

BannourN,Ghannay S,Névéol A and Ligozat A-L 2021 Evaluating the carbon footprint of nlpmethods: a survey and analysis of existing
toolsProceedings of the SecondWorkshop on Simple and EfficientNatural Language Processing 11–21

Budennyy S et al 2022 Eco2ai: carbon emissions tracking ofmachine learningmodels as thefirst step towards sustainable aiDoklady
Mathematics.Moscow: Pleiades Publishing 106 S118–S128

Deng L 2012Themnist database of handwritten digit images formachine learning research IEEE Signal ProcessMag. 29 141–2
Deng J, DongW, Socher R, Li L-J, Li K and Fei-Fei L 2009 Imagenet: a large-scale hierarchical image database 2009 IEEEConference on

Computer Vision and Pattern Recognition. IEEE 248–55
Dodge J et al 2022Measuring the carbon intensity of ai in cloud instancesACMConference on Fairness, Accountability, andTransparency

1877–94
Ember 2022 https://ember-climate.org/insights/research/global-electricity-review-2022/Global electricity review 2022
Gröger J, Liu R, Stobbe L,Druschke J andRichterN 2021Green cloud computing Life CyclebasedData Collection on Environmental Impacts

of CloudComputing https://umweltbundesamt.de/sites/default/files/medien/5750/publikationen/2021-06-17_texte_94-2021_
green-cloud-computing.pdf

GuoB, Yu J, YangD, LengH and Liao B 2022 Energy-efficient database systems: a systematic surveyACMComputing Surveys 55 1–53
GuptaU et al 2021Chasing carbon: the elusive environmental footprint of computing IEEE International Symposium onHigh-Performance

Computer Architecture 42 854–67
Gupta A, Lanteigne C andKingsley S 2020 Secure: a social and environmental certificate for ai systems arXiv:2006.06217
Henderson P,Hu J, Romoff J, Brunskill E, JurafskyD and Pineau J 2020a Towards the systematic reporting of the energy and carbon

footprints ofmachine learning Journal ofMachine Learning Research 21 10039–81
HodakM,GorkovenkoMandDholakia A 2019Towards power efficiency in deep learning on data center hardware 2019 IEEE International

Conference on BigData (BigData) 1814–20
JayM,OstapencoV, Lefèvre L, TrystramD,Orgerie A-C and Fichel B 2023An experimental comparison of software-based powermeters:

focus onCPU andGPU IEEE/ACM International Symposium onCluster, Cloud and Internet Computing (https://doi.org/10.1109/
CCGrid57682.2023.00020)

Kaack LH,Donti P L, Strubell E, KamiyaG, Creutzig F andRolnickD 2022Aligning artificial intelligencewith climate changemitigation
Nature Climate Change 12 518–27

Kar AK,Choudhary SK and SinghVK2022How can artificial intelligence impact sustainability: a systematic literature review Journal of
Cleaner Production 134120

KaryakinA and SalemK2017A survey ofmain-memory energy efficiency techniquesProceedings of the 13th InternationalWorkshop onData
Management onNewHardware (DaMoN). ACM 1–9

Lacoste A, Luccioni S, Schmidt V andDandres T 2019Quantifying the carbon emissions ofmachine learning arXiv:1910.09700
Lannelongue L and InouyeM2023Carbon footprint estimation for computational researchNature ReviewsMethods Primers 3
Lannelongue L,Grealey J and InouyeM2021Green algorithms: quantifying the carbon emissions of computationAdvance Science 8

2100707
Lawrence A 2019 Is PUE actually going up?Uptime Institute Blog
Lefèvre L et al 2023Environmental assessment of projects involving aimethods hal-03922093 2023
Ligozat A-L and Luccioni S 2021Apractical guide to quantifying carbon emissions formachine learning researchers and practitioners

Research Report,MILA; LISN
Ligozat A-L, Lefevre J, BugeauA andCombaz J 2022Unraveling the hidden environmental impacts of ai solutions for environment life cycle

assessment of ai solutions Sustainability 14 5172
LottickK, Susai S, Friedler SA andWilson J P 2019 Energy usage reports: environmental awareness as part of algorithmic accountability

arXiv:1911.08354
Luccioni S, Viguier S and Ligozat A-L 2022Estimating the carbon footprint of bloom, a 176b parameter languagemodel arXiv:2211.02001
MaevskyDA,Maevskaya E J and Stetsuyk ED2017 Evaluating the ram energy consumption at the stage of software developmentGreen IT

Engineering: Concepts,Models, Complex Systems Architectures (Springer) pp 101–21

FigureC2.DnCNNNetworkArchitecture. Image taken fromRyu et al (2019).

25

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://orcid.org/0000-0002-4858-4944
https://orcid.org/0000-0002-4858-4944
https://orcid.org/0000-0002-4858-4944
https://orcid.org/0000-0002-4858-4944
http://arxiv.org/abs/2007.03051
https://doi.org/10.18653/v1/2021.sustainlp-1.2
https://doi.org/10.18653/v1/2021.sustainlp-1.2
https://doi.org/10.18653/v1/2021.sustainlp-1.2
https://doi.org/10.1134/S1064562422060230
https://doi.org/10.1109/MSP.2012.2211477
https://doi.org/10.1109/MSP.2012.2211477
https://doi.org/10.1109/MSP.2012.2211477
https://doi.org/10.1109/CVPR.2009.5206848
https://doi.org/10.1109/CVPR.2009.5206848
https://doi.org/10.1109/CVPR.2009.5206848
https://doi.org/10.1145/3531146.3533234
https://doi.org/10.1145/3531146.3533234
https://doi.org/10.1145/3531146.3533234
https://ember-climate.org/insights/research/global-electricity-review-2022/Global
https://umweltbundesamt.de/sites/default/files/medien/5750/publikationen/2021-06-17_texte_94-2021_green-cloud-computing.pdf%0A
https://umweltbundesamt.de/sites/default/files/medien/5750/publikationen/2021-06-17_texte_94-2021_green-cloud-computing.pdf%0A
https://doi.org/10.1145/3538225
https://doi.org/10.1145/3538225
https://doi.org/10.1145/3538225
https://doi.org/10.1109/MM.2022.3163226
https://doi.org/10.1109/MM.2022.3163226
https://doi.org/10.1109/MM.2022.3163226
http://arxiv.org/abs/arXiv:2006.06217
https://doi.org/10.5555/3455716.3455964
https://doi.org/10.5555/3455716.3455964
https://doi.org/10.5555/3455716.3455964
https://doi.org/10.1109/BigData47090.2019.9005632
https://doi.org/10.1109/BigData47090.2019.9005632
https://doi.org/10.1109/BigData47090.2019.9005632
https://doi.org/10.1109/CCGrid57682.2023.00020
https://doi.org/10.1109/CCGrid57682.2023.00020
https://doi.org/10.1038/s41558-022-01377-7
https://doi.org/10.1038/s41558-022-01377-7
https://doi.org/10.1038/s41558-022-01377-7
https://doi.org/10.1016/j.jclepro.2022.134120
http://arxiv.org/abs/1910.09700
https://doi.org/10.1038/s43586-023-00202-5
https://doi.org/10.1002/advs.202100707
https://doi.org/10.1002/advs.202100707
https://doi.org/10.3390/su14095172
http://arxiv.org/abs/1911.08354
http://arxiv.org/abs/2211.02001
https://doi.org/10.1007/978-3-319-44162-7_6
https://doi.org/10.1007/978-3-319-44162-7_6
https://doi.org/10.1007/978-3-319-44162-7_6


Malmodin J and LundénD2016The energy and carbon footprint of the ict and e&m sector in sweden 1990-2015 and beyond ICT for
Sustainability (Atlantis Press) pp 209–18

MoroA and Lonza L 2018 Electricity carbon intensity in europeanmember states: impacts on ghg emissions of electric vehicles
Transportation Research PartD: Transport and Environment 64 5–14The contribution of electric vehicles to environmental challenges
in transport.WCTRS conference in summer

Petit B 2021 https://hubblo-org.github.io/scaphandre-documentation/references/sensor-powercap_rapl.html scaphandre Version v0.3.
RolnickD et al 2022Tackling climate changewithmachine learningACMComputing Surveys 55 1–96
Ryu EK, Liu J,Wang S, ChenX,WangZ andYinW2019 Plug-and-playmethods provably converge with properly trained denoisers

International Conference onMachine Learning
Strubell E, GaneshA andMcCallumA2019 Energy and policy considerations for deep learning inNLP arXiv:1906.02243 [cs]
ThompsonNC,GreenewaldK, Lee K andMansoGF 2020The computational limits of deep learning arXiv:2007.05558
The Shift Project 2019 Lean ICT, towards digital sobrietyThe Shift Project
TrebaolM J T,HartleyM-A andGhadikolaei H S 2020A tool to quantify and report the carbon footprint ofmachine learning computations

and communication in academia and healthcare Infoscience EPFL: record 278189
Uptime Institute 2022 2022 data center industry surveyUptime Institute
Vinuesa R et al 2020The role of artificial intelligence in achieving the sustainable development goalsNature Communications 11 233
WuC-J et al 2022 Sustainable AI: environmental implications, challenges and opportunitiesMachine Learning and Systems 795–813

arXiv:2111.00364 4
WilsonC and van derVeldenM2022 Sustainable ai: an integratedmodel to guide public sector decision-makingTechnology in Society 68

101926

26

Environ. Res. Commun. 5 (2023) 115014 L Bouza et al

https://doi.org/10.2991/ict4s-16.2016.25
https://doi.org/10.2991/ict4s-16.2016.25
https://doi.org/10.2991/ict4s-16.2016.25
https://doi.org/10.1016/j.trd.2017.07.012
https://doi.org/10.1016/j.trd.2017.07.012
https://doi.org/10.1016/j.trd.2017.07.012
https://hubblo-org.github.io/scaphandre-documentation/references/sensor-powercap_rapl.html scaphandre
https://doi.org/10.1145/3485128
https://doi.org/10.1145/3485128
https://doi.org/10.1145/3485128
http://arxiv.org/abs/1906.02243
http://arxiv.org/abs/2007.05558
http://arxiv.org/abs/2111.00364
https://doi.org/10.1016/j.techsoc.2022.101926
https://doi.org/10.1016/j.techsoc.2022.101926

	1. Introduction
	2. Related works
	3. Estimating greenhouse gas emissions
	3.1. Energy consumption of each component
	3.1.1. Energy consumed by CPU
	3.1.2. Energy consumed by GPU
	3.1.3. Energy consumed by memory
	3.1.4. Energy consumed by communications

	3.2. PUE
	3.3. Carbon emission and emission intensity
	3.4. Measuring whole equipment consumption with wattmeters
	3.5. Errors reported and found in the tools
	3.6. Summary of the characteristics of existing tools

	4. Infrastructure
	4.1. Access to information and resources
	4.1.1. Virtual environments
	4.1.2. RAPL files
	4.1.3. Usage factor
	4.1.4. Wattmeter

	4.2. Description of the infrastructures used for experimentation
	4.2.1. Laboratory servers
	4.2.2. Super computers
	4.2.3. Personal computers
	4.2.4. Colab


	5. Experiments and results analysis
	5.1. Experiments settings
	5.2. Results
	5.2.1. Variability between the different tools
	5.2.2. Comparison between software tools and wattmeter

	5.3. Influence of infrastructures
	5.4. Data load
	5.5. Batch size
	5.6. Checkpoints
	5.7. Variability of consumption through epochs
	5.8. Is measuring really eco-friendly?
	5.9. Static and deployment consumption

	6. Discussions
	6.1. When to measure impacts?
	6.2. Which tools to use
	6.3. Which infrastructure to use
	6.4. Other impacts
	6.5. Predicting impacts

	7. Conclusion
	Acknowledgments
	Data availability statement
	Appendix A.
	Appendix B.
	B.1. Experiment-impact-tracker
	B.2. Cumulator
	B.3. CarbonTracker

	Appendix C.
	References