Environ. Res. Commun. 5 (2023) 115014 https://doi.org/10.1088/2515-7620/acf81b PAPER How to estimate carbon footprint when training deep learning models? A guide and review Lucía Bouza1, Aurélie Bugeau2,3 and Loïc Lannelongue4,5,6,7 1 Université Paris Cité, CNRS,MAP5UMR8145, 75006, Paris, France 2 Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Talence, France 3 IUF, France 4 Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom 5 British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom 6 Victor PhillipDahdalehHeart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom 7 HealthData ResearchUKCambridge,WellcomeGenomeCampus andUniversity of Cambridge, Cambridge, UnitedKingdom E-mail: aurelie.bugeau@u-bordeaux.fr Keywords:AI carbon footprint,measuring electrical consumption of AI, environmental impacts of deep learning Abstract Machine learning and deep learningmodels have become essential in the recent fast development of artificial intelligence inmany sectors of the society. It is nowwidely acknowledge that the development of thesemodels has an environmental cost that has been analyzed inmany studies. Several online and software tools have been developed to track energy consumptionwhile trainingmachine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work.We review the specific vocabulary, the technical requirements for each tool.We compare the energy consumption estimated by each tool on two deep neural networks for image processing and on different types of servers. From these experiments, we provide some advice for better choosing the right tool and infrastructure. 1. Introduction Deep learning has beenwidely used in every sector of the society for a few years. A search of Scopus shows that it went from about 1,350 research papers in 2015 tomore than 85,000 in 2022. Results obtained in every domain are impressive, andAI is a promising tool for tackling environmental challenges in particular (Rolnick et al 2019, Vinuesa et al 2020, Kar et al 2022). But it is also nowwidely documented that training and deploying deep learning projects has an impact on the environment (Strubell et al 2019, Gupta et al 2022, 2020, Ligozat et al 2022, Kaack et al 2021, Lannelongue and Inouye 2023, Bannour et al 2021, Thompson et al 2020,Dodge et al 2022,Henderson et al 2020a). These studies have assessed energy consumption and corresponding amount of greenhouse gas emissions (in CO2 equivalent, denoted asCO2eq) from computer calculations when training a deep learning program, and showed that recent large languagemodels can be responsible for hundreds of tonnes of CO2eq (Luccioni et al 2022), whereas, for context, a limit of 2 tCO2eq/person/year is what is needed to keep global warming under 1.5 °C (Arias et al 2021). Some studies have also compared existing estimation tools (Bannour et al 2021, Lannelongue and Inouye 2023, Jay et al 2023). Despite thesemany studies, whenAI practitioners wish to start estimating their environmental impact, they may face several difficulties. Depending on their backgrounds, itmight be difficult for them to get used to the hardware-related vocabulary, knowhow to use the estimation tools (locally or on servers), and determinewhich tool is best suited for their current use-case. This document aims to address these and ease the process of energy consumptionmeasurement for AI practitioners. It can be used as a guide tomeasure the energy consumption OPEN ACCESS RECEIVED 27 June 2023 REVISED 23August 2023 ACCEPTED FOR PUBLICATION 8 September 2023 PUBLISHED 21November 2023 Original content from this workmay be used under the terms of the Creative CommonsAttribution 4.0 licence. Any further distribution of this workmustmaintain attribution to the author(s) and the title of thework, journal citation andDOI. © 2023TheAuthor(s). Published by IOPPublishing Ltd https://doi.org/10.1088/2515-7620/acf81b https://orcid.org/0000-0002-4858-4944 https://orcid.org/0000-0002-4858-4944 mailto:aurelie.bugeau@u-bordeaux.fr https://crossmark.crossref.org/dialog/?doi=10.1088/2515-7620/acf81b&domain=pdf&date_stamp=2023-11-21 https://crossmark.crossref.org/dialog/?doi=10.1088/2515-7620/acf81b&domain=pdf&date_stamp=2023-11-21 http://creativecommons.org/licenses/by/4.0 http://creativecommons.org/licenses/by/4.0 http://creativecommons.org/licenses/by/4.0 and associated greenhouse gas emissionwhen training deep learning algorithms and althoughwhat will be explained can be applied to other types of algorithms and other infrastructures, wewill focus on training deep- learningmodels in different types of infrastructures. In this context, this documentmakes the following contributions: • We review existing tools formeasuring or estimating the energy consumption of computations, and explain the specific notions that are not always knownbyAI practitioners. It goes further than previous surveys (Bannour et al 2021, Lannelongue and Inouye 2023, Jay et al 2023) in providing details aboutwhat is measured by each tool and onwhich infrastructure they can be used, themeasurement process, howusage factor is being used, default values, and the source of information that are used. These informations are crucial to correctly interpreting the data obtained. • We test and compare these different approaches usingwattmeters to assess their accuracy.We also quantify the energy consumption of the estimation tools themselves. • We run a range of experiments to analyze the influence of key hyperparameters such as batch size, data load, checkpoints and epochs. These lead to a set of recommendations on how andwhen to use these tools depending on the infrastructure available to train themodels. For instance, we show that it seems possible to onlymeasure part of training and extrapolate to avoid the small extra consumption from energy measurement.We also show that batch size can influence energy consumption. The recommendations complete previous works that intended tomakemachine learning researchers better understand their carbon impact and to take steps tomitigate it (Ligozat and Luccioni 2021,Dodge et al 2022). The seven different tools that we study are: Green-Algorithms (Lannelongue et al 2020) (GA), CodeCarbon (Lottick et al 2019) (CC (P) for process, CC (M) formachine), Eco2AI (Budennyy et al 2022) (E2 (P) for process, E2 (M) formachine), CarbonTracker (Anthony et al 2020) (CT), Experiment-Impact-Tracker (Henderson et al Henderson 2020a) (EIT),MLCO2 (Lacoste et al 2019) andCumulator (Trebaol et al 2020) (CMLTRs). We use the following infrastructures, all located in France, for trainingmodels: Labri servers (institutional server),MAP5 servers (institutional server), Grid5000 distributed cluster and personal computers.Mentionwill also bemade of theGoogle Colab environment. In the Labri servers, personal computer and inGrid5000 there are wattmeters (WM), which can provide real information on the consumption of energy of the infrastructure in a given period. We focus on twomachine learning experiments, both for image processing. In the first one, a small neural network is trained for digit classification on theMNIST dataset (Deng 2012). This experiment is short, approximately 1minute. In the second, aDNCNNnetwork is trained for noisy image denoising. The training is carried outwith the Imagenet validation dataset (Deng et al 2009). This experiment is longer, approximately 2 h. Figures 1 and 2 summarize the energy consumption of the different tools in thefive tested infrastructure. As we detail in this guide, the high variability comes from the different goals of the different tools, some estimate the power consumption of the entiremachinewhile others focus on a particular process. The idle power consumption is also accounted for differently, alongside usage factors, CPUs versusGPUs etc The document is organized as follows. Users already familiar with carbon footprint estimationmay directly jump to section 5 for the results. Section 2 reviews previous publications in this field. Section 3 details the specificities of each tool: energy consumption of each hardware components and their communications, power usage effectiveness and emission intensity. Section 4 details the type of infrastructures that are typically used to trainAImodels andwhat tools can be used for each. Section 5 presents the experimental setup and an analysis of the results. Discussions on the results presented and recommendations onwhen and how to estimate all environmental impacts end this guide in section 6. Finally, errors reported and found in the tools are added in the appendix. 2. Relatedworks Only recently estimation tools have beenmade available and consequently, few studies have compared and analyzed existing strategies formeasuring energy consumption of deep learning projects. The authors of (Bannour et al 2021) reviewed six tools (CarbonTracker, Experiment-impact tracker, Green- Algorithms,MLCO2, energy usage andCumulator) that are available tomeasure energy use andCO2eq emissions in the context of natural language processing. They compared the tools according to publication details, technical criteria (availability, online, easy-of-use, documentation etc), configuration criteria (specification of carbon intensity, PUE, install dependent, etc) and functional criteria (idle power and 2 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://www.green-algorithms.org https://codecarbon.io https://github.com/sb-ai-lab/Eco2AI https://github.com/lfwa/carbontracker/tree/master https://github.com/Breakend/experiment-impact-tracker https://mlco2.github.io/impact/ https://github.com/epfl-iglobalhealth/cumulator communication between hardware). The authors observed a two-fold variation in estimates between tools and concluded that further studies are needed to better understand these tools and estimate broader impacts. In the same line of research, the authors of (Jay et al 2023) compared some tools on server nodes, not all specifically designed for deep learning and therefore not all integratingGPUs. They categorized tools between external and internal node sensors, power profiling software, energymeasurement software packages and online energy calculators. They looked at publication year, environment criteria (hardware compatibility, virtualization, etc), functional criteria (hardware compatibility, software powermodel, sampling frequency, reporting and profiling), and user-friendliness. They tested each tool on the same server nodes and compared themwith external powermeters. The authors drew some recommendations from this study: tomonitor power consumption in real time, it is better to use power profiling software, but they do notmeasureGPUs consumption; relationship between energymeasurement software tools and powermeter is not constant, so software tools are not perfectly accurate. Finally, (Lannelongue and Inouye 2023) provided general guidelines about the strength andweaknesses of different types of estimation tools, namely online calculators, embedded packages and server-side tools. The criteria that are discussed are compatibility with any hardware, any programming language, research field, some ease of use criteria and scalability with number of jobs and long periods of time. Figure 1.Energy consumption inWhof the differentmethods over the 5 different infrastructures for thefirst experiment. For the tools that do not provide detail for CPU/GPU/Memory consumption, the total energy reported is plotted. Figure 2.Energy consumption in kWhof the differentmethods over the 5 different infrastructures for the second experiment. For the tools that do not provide detail for CPU/GPU/Memory consumption, the total energy reported is plotted. 3 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al The different tools discussed above focus on energy consumption during the training phase of AImodels, which only constitute part of the broader environmental impacts of AI (Gupta et al 2022, 2020, Ligozat et al 2022, Kaack et al 2021, Lannelongue and Inouye 2023). In this context, the authors of (Luccioni et al 2022) later included embodied impacts as well as emissions from static infrastructure and deployment when studying BLOOM, a large languagemodel. 3. Estimating greenhouse gas emissions This section explains how toolsmeasure or estimate energy consumption andCO2eq emissions, fromPython libraries integrated into the code (referred to as software tools), to web forms and physical wattmeasurement devices connected to the infrastructure used. Some of these tools also have a server-side version, to be used in HPC clusters and thus be able to collect informationmore easily to estimate energy consumption. Online tools and server-side tools can be usedwithoutmodifying the code, and are independent of the programming language used. Python libraries can only be used in Python code but enablemeasurements of the consumption of different parts of the programs.Watt-measurement enablesmeasuring the consumption of thewhole node but are not always available and can not isolate a paritcular process. Each tool has its ownway of estimating the consumption of each component. A summary of the characteristics is shown in the table 7. Themost power consuming devices on a personalmachine or a server are theGPUs (if present), CPUs, and memory. There are other resources, such as storage or the network, that are generally not considered in software measurements, since they do not provide a significant load over the duration of anAI task. Indeed, in regular use, storage is typically solicited far less thanmemory and ismainly used as amore permanent record of the data, independently of the task (Lannelongue et al 2020). When themachine is in a data center, energy usage of all equipment that are necessary to power, cool and maintain the datacenter should bemeasured as theymay account for an important amount of energy consumption. This is done using the efficiency coefficient of the data center called power usage effectiveness (PUE). 3.1. Energy consumption of each component In this section, wewill see the different strategies used by the tools to estimate the energy consumed by the different resources and estimate the consumption of the processes. Green-Algorithms andCodeCarbon are the only Python tools that report the estimate of consumed or emissions, discriminated by each component: memory, CPU andGPU. A transversal concept to all resources is the usage factor. The usage factor of a resource refers to the percentage of use that can be assigned to the process beingmeasured. For example, if the CPUpower is estimated to 2W, but theCPUusage factor of the process was 50%, then the consumption of a one hour process is assumed to be 1 kWh. If the usage factor is unknown, then 100%of the use of the resource is being assigned to the process, when in fact theremay be other processes also using said resource. During themeasured period, some tools query sensors or perform calculations to estimate power consumption. Note that Lowermeasurements frequencymean fewermeasurements thatmay lead tomore approximate results. By default, CodeCarbon performs thesemeasurements every 15 s. Eco2AI, CarbonTracker and Experiment-Impact-Tracker takemeasurements every 10 s. Cumulator does not query sensors or intermediatemeasurements to estimate energy consumption. 3.1.1. Energy consumed by CPU There are twomethods used in the tools to estimate energy consumed byCPUs: usingCPU thermal design power (TDP)provided by themanufacturer, or using software integrated tools (RAPL files or PowerGadget). A provides explanations of these twomethodologies. Note hat software integrated toolsmay require privilege permissions as summarized in section 4.1.We review in table 1 howCPUpower consumption ismeasured in AI measurement tools. 3.1.2. Energy consumed byGPU AswithCPUs, energy consumption forGPUs are computed either fromTDPs provided bymanufacturers or from internal tools. The latter is donewith the pynvml library that only works forNvidia GPUs.We review in table 2 howGPUpower consumption ismeasured inAImeasurement tools. 3.1.3. Energy consumed bymemory According to (Hodak et al 2019)GPUs are responsible for around 70%of power consumption, CPU for 15%, andRAM for 10%. 4 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al Some tools likeGreen-Algorithms consider that power consumption of RAMdepends strongly on the availablememory, independently of thememory consumed (Karyakin and Salem 2017, Guo et al 2022), while other tools like Eco2AI considers that it depends on the allocatedmemory by the process (Maevsky et al 2017). We review in table 3 howmemory power consumption ismeasured inAImeasurement tools. 3.1.4. Energy consumed by communications In ICT (Information andCommunication Technology), communications refer to the exchange of information or data between two ormore nodes. Nodes can be any device that is connected to a network, including computers, routers, servers, and evenmobile devices.Machine Learning algorithms typically involve the exchange of data between nodes at various stages, such as during data generation, during training (parameter updates across different nodes in the network), or while themodel is in production. The only tool that estimates the cost of communications is Cumulator. Each time themodel sends a data file to another node of the network, Cumulator records the size of the file which is communicated. The cost of communication relies on the ‘1bytemodel’ of the Shift Project (The Shift Project 2019). The value from2017 is 6.894× 10−11 kWh/B. 3.2. PUE PowerUsage Efficiency is the efficiency coefficient of the data center. If PUE is not given, we recommend considering the 2022 average value of 1.55 (Uptime Institute 2022). For personal computers, PUE=1 as there are no other large devices consuming power.We review in table 4 the PUEs used by each tools. All except Cumulator report the total energy consumed, including PUE. To calculate this value for Cumulator, we can divide the Table 1.Estimation of energy consumption for CPUs. Green-algorithms Energy uses themodel of CPUprovided by the user to pull the corresponding TDP from a database, or the user can input the TDP manually. If TDP is unknown,GAuses an average of 12Wper core, but the paper does not explain this value. In this model, a core power usage is assumed to be equal to the TDPdivided by the number of cores (if a chip has 2 cores and a TDPof 50W, then the TDPper core is 25). Usage factor uses usage factors if known, and assumes 100%usage if not. Codecarbon Energy uses RAPLfiles or PowerGadget to report CPU energy consumption (only for INTELCPUswith root access). The con- sumption reported by RAPLfiles or PowerGadget represents the consumption of thewholemachine, and not only the process. If CodeCarbon cannot find the software to track theCPUs, then the tool uses themodel of CPU to search in a list the corresponding TDP. If themodel is unknown, it uses a TDPof 85W. The authors do no specify where is this value taken from. Usage factor Not computedwhen using RAPLfiles or PowerGadget WhenTDP is used, CodeCarbon assumes that the average usage factor is 50%but this value is not explained and seems arbitrary. Eco2AI Energy uses themodel of the CPU to search in a list the corresponding TDP. If TDP is unknown, it uses an average of 100W (Maevsky et al 2017). Usage factor uses os and psutil pythonmodules to determine usage factor if the trackingmode current is set (default). CarbonTracker Energy uses RAPLfiles to report CPU energy consumption (only for INTELCPUswith root access).Without access to the RAPL files, the tool will notmeasure CPU. CarbonTracker will work only if it canmeasure at least one component (CPUor NVIDIAGPU). Usage factor not computed. The power consumption values of the RAPL files are global to thewholemachine. Experiment-Impact-Tracker (EIT) Energy uses RAPLfiles to report CPU energy consumption (only for INTELCPUswith root access and Linux operating system) Usage factor uses psutil pythonmodule to determine usage factor MLCO2 does notmeasure CPUutilization. Cumulator Energy It is not possible tomeasureGPU andCPU components at the same time butCumulatormeasures CPUutilization by default. It uses themodel of CPU to search in a list for the corresponding TDP. If TDP is unknown, it uses an average of 250W.This value is the one ofNvidia GeForceGTXTitanX, which is theGPUmodel in the IC cluster of the EPFL Machine Learning andOptimization Laboratory (MLO). It considers just oneCPU. Usage factor does not use usage factor. 5 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al Table 2.Estimation of energy consumption forGPUs. Green-Algorithms Energy uses themodel of GPU to search in a list the corresponding TDP. You can load the TDPof theGPU if themodel is not listed. If TDP is unknown, it uses an average of 200W, but the paper does not explain the reason for choosing this value. Usage factor GPUs usage factor is considered if known by the user. If not, GA considers 100%of usage. CodeCarbon Energy uses pynvml library (only forNVIDIAGPUs). CodeCarbon does notmeasure consumption of non-NVIDIAGPUs. Usage factor not computed. The consumption reported by pynvml represents the consumption of thewholemachine, and not only the process. Eco2AI Energy uses pynvml library (only forNVIDIAGPUs). Eco2AI does notmeasure consumption of non-NVIDIAGPUs. Usage factor not computed. The consumption reported by pynvml represents the consumption of thewholemachine, and not only the process. CarbonTracker Energy uses pynvml library (only forNVIDIAGPUs). CarbonTracker does notmeasure consumption of non-NVIDIAGPUs. Usage factor not computed. The consumption reported by pynvml represents the consumption of thewholemachine, and not only the process. EIT Energy uses nvidia-smi command line (only forNVIDIAGPUs). EIT does notmeasure consumption of non-NVIDIAGPUs. Usage factor usesPopen to open a thread, execute the commandnvidia-smi-q-x, get the output in a xml, and parse it to get the usage factor of theGPU. MLCO2 Energy uses themodel of GPU to search in a list the corresponding TDP. It is not possible to load the TDPof theGPU if themodel is not listed. In this case, it is necessary to do a pull request to add the value. It is not possible to choose the quantity ofGPUs. Usage factor does not use usage factor. TheGPU is considered atmaximum load and this load is assumed to correspond to themeasured process. Cumulator Energy uses themodel of GPU to search in a list the corresponding TDP. If TDP is unknown, it uses an average of 250W. It considers just oneGPU. Usage factor does not use usage factor. TheGPU is considered atmaximum load and this load is assumed to correspond to themeasured process. Table 3.Estimation of energy consumption formemory. Green-Algorithms Energy consumption bymemory is 0.3725W/GBofmemory available (If we have all the servermemory available, it will account for all the servermemory. If we are in anHPC cluster, it will account only for the amount ofmemory requested, regardless of howmuch the process consumes). The value 0.3725was obtained experimentally.a CodeCarbon Energy consumption bymemory is 0.375W/GBofmemory used.b If trackingmode is ‘process’, thememory used by the process ismeasured via psutil. Eco2AI Energy consumption ofmemory is 0.375W/GBofmemory used (Maevsky et al 2017).Memory used by the process is measured via psutil. CarbonTracker uses RAPLfiles to reportmemory energy consumption. Itmeasures the total energy ofmemory available, not only the one used by the process.Without access to the RAPLfiles, the tool will notmeasurememory energy consumption. EIT uses RAPLfiles or PowerGadget to reportmemory energy consumption.Memory used by the process ismeasured via psutil consideringmemory used exclusively by the process and the sharedmemory between processes (weigh- ted by the number of processes).Without access to the RAPLfiles or PowerGadget, the tool cannot be used. MLCO2 does notmeasurememory. Cumulator does notmeasurememory. a Source: www.tomshardware.com. b Source: Crucial. 6 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://www.tomshardware.com/reviews/intel-core-i7-5960x-haswell-e-cpu,3918-13.html https://www.crucial.com/support/articles-faq-memory/how-much-power-does-memory-use reported value of greenhouse gas emissions (GHG) by the emission intensity (EI) of servers location: Energy=GHG/EI. Note that for the purpose of comparing reported energy consumption between tools, PUE is not taken into account, since each tool uses a different value. 3.3. Carbon emission and emission intensity The origin of the energy used is keywhen determining greenhouse gas emissions from electricity production. To carry out the calculation, the average emission intensity (or carbon intensity) of the country or regionwhere the calculations weremade is used. Countries report these values, which can then be used by the tools to calculate emissions. It is important tomention thatmost of the tools do not yet take the information of carbon intensity in real time.Only CarbonTracker (for UK andDenmark) andExperiment-Impact-Tracker (for California)do it. In most cases, average values fromprevious years are used. Some variables, such as the time of day of execution, or the distribution of energy sources at a givenmoment, are not represented, but can have an important influence on the emissions, as shown on table 5.Machine learning users could look at current and planned energy consumption ofmost of the countries before running their experiments, e.g. on ElectricityMaps. In some cases, if users are running on clouds that have different geographic locations, users could choosewhere to run the algorithms to emit fewerGHGs. For example, table 5 presents some values at different locations for two different days.While it can bewise to carefully choose datacenter locations, developersmust keep inmind that transferring large datasets fromone location to the other also has environmental impacts (section 3.1.4). Therefore, depending on the training time, itmight be better to remain on the same serverwhen training on the same large dataset.We present in table 6 how each tool handles carbon intensity. 3.4.Measuringwhole equipment consumptionwithwattmeters Wattmeters are physical instruments that are used tomeasure the active electrical energy of a certain circuit. By plugging them into the physical infrastructure, we can get the exact total consumption of themachine.With wattmeters, it is not possible to determine howmuch energy each component of themachine consumes, neither to discriminate consumption by process. It is also important to note that wattmeters havemeasurement frequencies. Different wattmetersmay have differentmeasurement frequencies and therefore different accuracies depending on the duration of processes. 3.5. Errors reported and found in the tools Some tools had to bemodified to be used, as they had bugs not yetfixed by the authors. Themodificationswe had tomake can be found in B. Table 4.PUE values used in the different tools. Green-Algorithms configurable. The default value is 1.67 (2019) (Lawrence 2019). CodeCarbon not taken into consideration, except for cloud providers. Eco2AI configurable. The default value is 1. CarbonTracker configurable. Although the paper indicates that the 2020 PUE (1.58) is used, the 2022 PUE (1.55) is used in the code (Uptime Institute 2022). EIT configurable. The default value is 1.58 (2020) (Lawrence 2019). MLCO2 not taken into consideration. Cumulator not taken into consideration. Table 5.Daily average carbon intensity for two different days. Data taken fromElectricityMaps. March 5th 2023 March 29th 2023 France 64 137 North Sweden 16 14 SouthAfrica 684 702 SouthCarolina -USA 432 786 7 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://www.electricitymap.org https://www.electricitymap.org 3.6. Summary of the characteristics of existing tools In addition to the tables presented in Bannour et al (2021) and Jay et al (2023), we summarize in table 7what is configurable andwhat are default values for each component, and add details on usage factor. 4. Infrastructure Depending on the infrastructure, users will have access to different resources, which restricts the list of tools that can be used. Themost commonly used infrastructures formachine learning are physical or virtual servers, virtualized environments in the cloud, supercomputers or personal computers. Table 8 summarizes the tools’ requirements and hardware compatibility. 4.1. Access to information and resources Weexplain below how each type of infrastructure handles access to hardware information. 4.1.1. Virtual environments Some tools require knowing the available CPUmodel tomake a better estimation. In virtual environments, the information in the/proc/cpuinfo file (or equivalent tools forWindows ormacOS)may not be correct, and may represent some characteristics of the CPU emulated by the virtualizer. Unfortunately, from the virtual environment, there is noway for users to know exactly the real CPU that is being used for the execution. 4.1.2. RAPL files Some tools require read access to the RAPLfiles. Access to thesefiles is restricted by default to the root user. An administratormust be asked to grant read permission to those files. Also, these files are available only if the machine has Intel CPUs, and has Linux as an operating system. A similar situation is experiencedwith Power Gadget: it is exclusive to Intel CPUs, and the tool need to be installed. Table 6.Emission intensity used in the different tools. Green-Algorithms Most emission intensity data come fromCarbon Footprint but the tool also uses other sources like ElectricityMaps. Information is collected in theCI_aggregated.csv file. The default value is 475 gCO2eq/kWh (world average in 2018). CodeCarbon ForUnited States andCanada, CodeCarbon uses regional data on emissions per unit of power consumed. For other countries, the tool uses the energymix of the country, i.e. intensity data of each energy source (carbon, solar, wind, etc), to calculate the intensity of the country. The average energymix for each country is taken fromGlobal Petrol Prices. The information is collected in thefiles under data folder. The sources of each data are specified in the files. The default value is 475 gCO2eq/kWh (world average in 2018). Eco2AI For all countries the emission intensity calculationwasmade using the intensity data of each energy source (carbon, solar, wind, etc) and the energymix of each country. The values used for the calculations nor their sources are not explained, and only thefinal result of the intensity of emissions for each country is published in carbon_index.csv. The default value is 436.5 gCO2eq/kWh (Ember 2022). CarbonTracker CarbonTracker supports the fetching of carbon intensity in real-time through external APIs. It is currently limited to Denmark andGreat Britain. ForDenmark they use data fromEnergi Data Service and forGreat Britain they use theCarbon Intensity API. For other countries, it usesfixed values available in the carbon-intensities.csv file. The sources are not published. The default value is 475 gCO2eq/kWh (2019). EIT EIT supports the fetching of carbon intensity in real-time through external APIs. It is currently limited toCalifornia using theAPI of California ISO. For other countries, it usesfixed values available in the co2eq_parameters.json file. The sources are published and aremostly fromElectricityMaps. The default value is 301 gCO2eq/kWh (annual mean carbon intensity of all electricityMap zones). MLCO2 MLCO2published the sources and contains the information of theCloud providers in the impact.csv file. For private infrastructure, it is necessary to provide the emission intensity value, whichmust be obtained by user ownmeans. Cumulator The data of emission intensity is fromElectricityMaps. Information is collected in the country_dataset_adjusted.csv file. The default value is 447 gCO2eq/kWh (average carbon intensity value in gCO2eq/kWh in the EU in 2018 Moro and Lonza 2018). 8 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://www.carbonfootprint.com https://www.electricitymap.org https://github.com/GreenAlgorithms/green-algorithms-tool/blob/master/data/latest/CI_aggregated.csv https://www.globalpetrolprices.com https://www.globalpetrolprices.com https://github.com/mlco2/codecarbon/tree/master/codecarbon/data https://github.com/sb-ai-lab/Eco2AI/blob/main/eco2ai/data/carbon_index.csv https://energidataservice.dk/ https://carbonintensity.org.uk/ https://github.com/lfwa/carbontracker/blob/master/carbontracker/data/carbon-intensities.csv http://caiso.com https://github.com/Breakend/experiment-impact-tracker/blob/master/experiment_impact_tracker/emissions/data/co2eq_parameters.json https://www.electricitymap.org https://github.com/mlco2/impact/blob/master/data/impact.csv https://www.electricitymap.org https://github.com/epfl-iglobalhealth/cumulator/blob/master/src/cumulator/countries_data/country_dataset_adjusted.csv Table 7. Summary of the characteristics of the energy andCO2eqmeasurement tools.Wattmeters are not included in the table. Green-Algorithms CodeCarbon Eco2AI CarbonTracker EIT MLCO2 Cumulator General Information 1. Type of tool Online calculator and Server-side tool Embedded package Embedded package Embedded package Embedded package Online calculator Embedded package 2. Embodied emissions no no no no no no no 3. Static (idle) emissionsw/o runs no no no no no no no 4. Process/machine estimation process both both machine process machine machine 5.Measurement frequency (sec) — 15 10 10 10 — — EnergyConsumptionCPU 1.Measured yes yes yes yes yes no yes (if chosen) 2.UseModel of CPU yes yes (if no tracking tool) yes no no — yes 3.Use RAPLfiles or PowerGadget no yes no yes (RAPLfiles) yes — no 4.Default TDP 12 (normalized by core) 85 100 — — — 250 5.Usage Factor considered yes 50% (if default TDPused) yes no yes — no 6. Tool for usage factor — — psutil — psutil — — EnergyConsumptionGPU 1.Measured yes yes yes yes yes yes yes (if chosen) 2.UseModel ofGPU yes no no no no yes yes 3.Default TDP 200 no no no no no 250 4. Tool to get power — pynvml pynvml pynvml nvidia-smi — — 5.Usage Factor considered yes no no no yes no no 6. Tool for usage factor — — — — nvidia-smi — — 7.OnlyNvidia GPUs no yes yes yes yes no no EnergyConsumptionMemory 1.Measured yes yes yes yes yes no no 2. Source of information — system system RAPLfiles RAPLfiles — — 3.Usage Factor considered no yes (if trackingmode) yes no yes — — 4. Tool for usage factor — psutil psutil — psutil — — 5. Formula 0.3725W/GB 0.375W/GB 0.375W/GB — — — — Emission intensity 1.Default E.I value 475 475 436.5 475 301 — 447 2. Real time no no no yes (just UK andDenmark) yes (just California) no no PUE 1. PUE considered yes yes (just cloud) yes yes yes no no 2. PUE configurable yes no yes no yes — — 9 E nviron.R es.C om m un.5 (2023)115014 L B ou za etal Table 7. (Continued.) Green-Algorithms CodeCarbon Eco2AI CarbonTracker EIT MLCO2 Cumulator 3.Default PUE value 1.67 — 1 1.58 1.58 — — Errors 1.Need codemodification - — — yes (with Python 3.10) yes — yes 10 E nviron.R es.C om m un.5 (2023)115014 L B ou za etal 4.1.3. Usage factor Unfortunately, there is no tool that can be usedwith the command line that gives us the total time of the script (whole time), the CPU time and theGPU time, in order to calculate theCPU andGPUusage factor required by Green-Algorithms.However, workloadmanagers such as SLURMcommonly log this information. One option is to take empirical and specificmeasurements of the use of theGPUduring the execution of their algorithm using thenvidia-smi tool, and extrapolate that value ofGPUutilization to the entire execution. It is important to note that this utilization percentage corresponds to the total utilization, and not just the utilization of the process. There could be other processes running on the availableGPUs. Up to our knowledge, there is also no tool thatmeasures GPU time for non-Nvidia GPUs. 4.1.4.Wattmeter Finally, using awattmeter requires having one, and in the case of institutional infrastructure, consultingwith a systems administrator tomake the physical connection. It is important to note that thewattmeter willmeasure the consumption of the entire node, so ideally there should not be other processes running on the node, or if there are, it is key to take it into account when analyzing the value returned by the device. 4.2.Description of the infrastructures used for experimentation In this guidewe have tested on resources in two French laboratories (Labri andMAP5), Grid5000, personal computers andwewill alsomentionGoogle Colab. In table 9we detail the hardware specifications of the infrastructure used for the experiments. 4.2.1. Laboratory servers Wehave tested the differentmeasuring tools in Labri (computer science laboratory of Bordeaux) andMAP5 (laboratory of appliedmathematics in Paris 5University). Labri has physical servers withNVIDIAGPUs, Intel CPUs and Linux operating system.Wehave had the possibility to experiment usingWattmeter. Access to the RAPLfiles is restricted to root, so the execution of the scripts need to be done by an administrator, in order to use Experiment-Impact-Tracker andCarbonTracker. MAP5 has physical servers withNVIDIAGPUs, Intel CPUs and Linux operating system. Access to the RAPL files is currently available.We can test all the tools, but we do not have aWattmeter. 4.2.2. Super computers Weexperimented one super computer: Grid5000which is a large-scale andflexible testbed for experiment- driven research in all areas of computer science, with a focus on parallel and distributed computing including Cloud,HPC andBigData, andAI. Grid5000 cluster allows numerous configurations and is verywell documented. The cluster has servers withNVIDIAGPUs, Intel CPUs, Linux operating system and access to RAPLfiles. Access toWattmetermeasurements on selected nodes is possible, so that all the tools can be used. However, by requesting only a portion of the node, thewattmeter value, thatmeasures the entire node, might not be really useful as other jobs can be running in the same server. Also, note that without booking the whole node, it is not possible to get user privileges so EIT cannot be used, Carbontracker will notmeasureCPU, andCodeCarbonwill use TDP to calculate CPU consumption. Table 8.Requirements to run the tools. Green- Algorithms CodeCarbon Eco2AI CarbonTracker EIT MLCO2 Cumulator Requirements 1.Operating System — — — Linux (if no-NVI- DIAGPU) — — — 2. Access to RAPLfiles no no no yes (if no-NVI- DIAGPU) yes no no 3. PowerGadget — no no — yes — no Compatibility 1.Non Intel CPUs yes yes yes no no does notmea- sure CPU yes 2.NonNvi- dia-GPUs yes no no no no yes yes 11 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al 4.2.3. Personal computers In thesemachines, we could install the necessary tools and enable the permissions that are required. CarbonTracker can be used if at least one of the 2 conditions ismet: having Intel CPUs orNVIDIAGPUs. If neither of the two conditions ismet, the tool cannot be used. The tool willmeasure the power consumption of theCPUs andMemory only if theCPUs are Intel, and it willmeasure the power consumption of theGPUs only if they areNvidia. If we have non-NVIDIAGPUs, we can only useGreen-Algorithms,MLCO2 (if theGPU is on the list), Cumulator andCarbonTracker (if we have Intel CPUs). If we have non-Intel CPUs, wewill not be able to use Experiment-Impact-Tracker and if we have only CPUs, wewill not be able to useMLCO2 either. This explains theN/Avalue reported in results tables. 4.2.4. Colab Google Colab is awidely use resource, with data centers located around theworld, but unfortunately the data center cannot be selectedwhen the environment is created. The execution location can be checkedwith the commandcurlipinfo.io and then using this information to determine the data center being used8. When running a notebook, a virtual environment is generated, for which some commands are not available, users are not administrators, do not have access to RAPL files and do not know the real resources that are being used. This limits the tools that can be used. Experiment-Impact-tracker cannot be used. Green-Algorithms, CodeCarbon, Eco2AI andCumulator can be used, assuming an average consumption. This assumption can lead to reporting values of carbon emissions that are not the correct real ones. CarbonTracker can be used, but only withGPU runtime, andwill notmeasure energy consumption of CPUnorMemory. 5. Experiments and results analysis Wewill now compare the different tools and their use in different infrastructures for image processing and analysis. Section 5.1 details the experimental settings. Then, in section 5.2we present the results. In section 5.2.1 we explain the high variability between the different tools, their differences withwattmetermeasurements (section 5.2.2) and the impact of the infrastructure (section 5.3). Later, focusingmore on the second experiment, we analyze the influence of the data load (section 5.4), of the batch size (section 5.5), of saving the checkpoints Table 9.Hardware specifications of infrastructure used for experiments. Gemini-1 (Grid5000) Rosenblatt (MAP5) Server (Labri) Personal Computer Colab Operating System Linux Linux Linux Linux Linux CPU 1.Quantity 2 2 1 1 1 2.Model Intel Xeon E5- 2698 v4 Intel XeonE5- 2609 v4 Intel Core i9- 7940XCPU@ 3.10 GHz AMDRyzen 5 2600 Six-Core Processor (VE) Intel Xeon CPU@2.20GHz 3. TDP 135W 85W 165W 65W Unknown GPU 1.Quantity 8 2 3 1 1 2.Model NVIDIATesla V100-SXM2-32 GB NVIDIA TITANXp NVIDIA TITANXp NVIDIATITANV NVIDIATesla T4 3. TDP 250W 250W 250W 250W 70W Memory 1.Quantity 512 GB 62 GB 126 GB 32 GB 12 GB Wattmeters 1. Available yes no yes yes no 2. Frequency second — minute minute — 8 https://cloud.google.com/about/locations?hl=es 12 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al http://Google%20datacenters (section 5.6) and of the energy consumption of the tools themselves (section 5.8). Finally, we comment on additional idle consumption (section 5.9). The theoretical analysis of the tools and results provides a better understanding of differences in measurement between the tools, which (Bannour et al 2021) indicatedwas needed. In order to also transparently acknowledge the impact of ourwork, we conducted an analysis using wattmeters when available andCodeCarbonwhen not (machine tracking) to determine the total energy consumed throughout all our experiments. The results revealed a cumulative consumption of approximately 14.5 kWh. This value includes all the runs that led to the paper. It does not include PUE. 5.1. Experiments settings Wecarried out two experiments, with different characteristics, in different infrastructures. First, we trained amanually written digit classifier on theMNIST dataset. TheMNIST dataset is a collection of images of handwritten digits. Its training set has 60,000 examples, with a size of 50MB..The classifier is implementedwith a fully connected, two-layer network (an inner layer of 32 neurons, and an output layer of 10 neurons), over 5 epochs and normally takes less than aminute on different infrastructures. This experiment runs on a single GPU. Appendix C provides the architecture diagram. Second, we trained an image denoiser on the Imagenet validation dataset. The ImageNet dataset is a collection of images depicting diverse objects and scenes. Its validation set has 50,000 examples, with a size of 6 GB. TheDenoiser is implementedwith aDnCNNnetwork (Ryu et al 2019) over 80 epochs and takes approximately two hours to run. Appendix Cprovides the architecture diagram. This experiment runs in parallel on all availableGPUs. In order tomeasure the impact of other configurations, small variations of this experiment were also performed. The experiments were performed using Pytorch. Since each experiment has a different configuration regarding the use of theGPUs, the choice of framework is key to enable the use of all available GPUs. PyTorch enabledmulti-GPU training. This is also the case with Tensorflow, but it would have requirer additional configuration to the default installation in order to use the availableGPUs. The experiments were carried out in the infrastructures detailed in section 4.We also ran the experiments on Gemini-1 requesting only a quarter of the resources (twoGPUs, 128 GBmemory and 10 cores of the 40 available). Depending on the available resources, certain tools could be used only on some infrastructures.We nowdiscuss themain observations fromour results. 5.2. Results This section presents and analyzes the results obtained for the two experiments on the different infrastructures. In table 10we present the energy consumption for the first experiment, which corresponds to the training of a manually written digit classifier. In table 11we present the consumption for the second experiment, which corresponds to the training of an image denoiser. The reported values correspond to individual runs and are not averaged values.However,multiple runs of the experiments were performed on different infrastructures to validate the consistency of these numbers. Experiment 1was executed 3 times onGrid5000, 2 times onMAP5, Labri andColab. Experiment 2was executed twice onGrid5000 and Labri. As said before, Cumulator does not report energy consumed. The values presented in the table were not reported byCumulator, but calculated by us from carbon footprints. 5.2.1. Variability between the different tools From the two tables 10 and 11, we observe a large difference between the energy consumption and carbon emissions reported by the different tools. For instance, a 400% increase of consumption forMLCO2 compare to Eco2AI on theGemini-1 node ofGrid5000. Machine versus Process Some tools are focused on estimating the consumption of the entiremachine, and are comparable withwattmeters, but others estimate the consumption of the process, trying to isolate it fromother processes thatmay be running on themachine. CodeCarbon andCarbonTracker have similar strategies forGPU andCPU consumption estimation, focusing on fullmachine estimation. They differ inmethod for the estimation ofmemory consumption.We can say that CodeCarbon strategy ismore accurate, since it reaches a valuemore similar to that of thewattmeter. Eco2AI and EIT focusmore on isolating the consumption of the process that ismeasured. It can be seen fromboth experiments that these tools show a lower consumption estimate thanCodeCarbon and CarbonTracker. Green-Algorithms approach also attempts to isolate consumption from the process. 13 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al Table 10.Results for the training of a digit classifier (experiment 1). All consumption values are inWh. Carbon emissions are in gCO2e. ForCodeCarbon andEco2AI, (P) refers to the process trackingmode and (M) to themachine trackingmode. Green-Algorithms CodeCarbon (P) CodeCarbon (M) Eco2AI (P) Eco2AI (M) CarbonTracker EIT MLCO2 Cumulator Wattmeter Gemini-1Whole node (57 sec) Tot. Energy reported 5.990 8.800 12.50 7.200 7.100 13.30 2.570 38.00 4.771 Tot. Energyw/oPUE 3.590 8.80 12.50 7.200 7.100 8.580 1.630 38.00 4.771 13.00 Energy for CPU 0.007 1.500 1.500 — — — — — — — Energy for GPU 0.395 7.200 7.200 — — — — — — — Energy forMemory 3.16 0.0184 3.700 — — — — — — — Carbon emissions 0.307 0.480 0.690 0.490 0.480 0.777 0.140 2.53 0.563 Gemini-1 2GPUs (56 sec) Tot. Energy reported 1.689 1.630 4.570 1.640 1.620 2.350 N/A Tot. Energyw/oPUE 1.008 1.630 4.570 1.640 1.620 1.516 N/A 9.333 3.729 N/A Energy for CPU 0.130 0.000 0.000 — — — — — Energy for GPU 0.139 1.620 1.620 — — — — — Energy forMemory 0.739 0.013 2.950 — — — — — Carbon emissions 0.086 0.090 0.250 0.110 0.110 0.140 0.622 0.440 Rosenblatt (1 min 36 sec) Tot. Energy reported 1.030 3.190 3.800 2.000 2.100 4.56 3.860 13.30 6.711 Tot. Energyw/oPUE 0.617 3.190 3.800 2.000 2.100 2.940 2.440 13.30 6.711 N/A Energy for CPU 0.148 1.200 1.200 — — — — — — Energy for GPU 0.086 1.900 1.900 — — — — — — Energy forMemory 0.389 0.0276 0.600 — — — — — — Carbon emissions 0.0527 0.170 0.200 0.138 0.140 0.266 0.210 0.533 0.792 Labri (45 sec) Tot. Energy reported 1.94 1.689 2.287 1.1459 1.126 2.219 1.91 9.375 2.093 Tot. Energyw/oPUE 1.16 1.689 2.287 1.1459 1.126 1.432 1.209 9.375 2.093 2.241 Energy for CPU 0.255 0.565 0.565 — — — — — — — Energy for GPU 0.128 1.111 1.097 — — — — — — — Energy forMemory 0.777 0.013 0.626 — — — — — — — Carbon emissions 0.099 0.093 0.126 0.074 0.076 0.13 0.107 0.375 0.247 14 E nviron.R es.C om m un.5 (2023)115014 L B ou za etal Table 10. (Continued.) Green-Algorithms CodeCarbon (P) CodeCarbon (M) Eco2AI (P) Eco2AI (M) CarbonTracker EIT MLCO2 Cumulator Wattmeter Personal computer (57 sec) Tot. Energy reported 0.356 1.000 1.190 0.733 0.728 1.415 N/A 4.167 3.949 Tot. Energyw/oPUE 0.356 1.000 1.190 0.733 0.728 0.913 N/A 4.167 3.949 1.404 Energy for CPU 0.032 0.330 0.330 — — — — — — Energy for GPU 0.125 0.660 0.660 — — — — — - Energy forMemory 0.199 0.015 0.195 — — — — — - Carbon emissions 0.018 0.056 0.065 0.049 0.049 0.083 0.167 0.466 Colab -Oregon (1 min 6 sec) Tot. Energy reported 0.381 1.500 1.600 3.000 3.000 0.805 N/A 1.280 5.15 Tot. Energyw/oPUE 0.343 1.500 1.600 3.000 3.000 0.519 N/A 1.280 5.15 N/A Energy for CPU 0.219 0.900 0.900 — — — — — Energy for GPU 0.041 0.600 0.600 — — — — — Energy forMemory 0.0913 0.0206 0.100 — — — — — Carbon emissions 0.024 0.200 0.200 0.600 0.600 0.290 0.367 1.03 15 E nviron.R es.C om m un.5 (2023)115014 L B ou za etal Table 11.Results for the training of an image denoiser (experiment 2). All consumption values are in kWh. Carbon emissions are in gCO2e. The consumption indicated forColab is extrapolated. An epochwas executed, the consumptions were obtained, and the valueswere extrapolated. Green-Algorithms CodeCarbon (P) CodeCarbon (M) Eco2AI (P) Eco2AI (M) CarbonTracker EIT MLCO2 Cumulator Wattmeter Gemini-1whole node (2 hs) Total Energy reported 1.92 1.39 1.69 1.07 1.10 2.09 2.09 4.80 0.5 Tot. Energy w/oPUE 1.15 1.39 1.69 1.07 1.10 1.35 1.32 4.80 0.5 2.10 Energy for CPU 0.09 0.22 0.22 — — — — — — - Energy for GPU 0.69 1.14 1.09 — — — — — — - Energy forMemory 0.37 0.03 0.37 — — — — — - - Carbon emissions 100 80 90 70 80 120 120 280 60 Gemini-1 2GPUs (1h 17min) Total Energy reported 0.76 0.36 0.61 0.35 0.37 0.59 N/A 0.77 0.45 Tot. Energyw/oPUE 0.47 0.36 0.61 0.35 0.37 0.38 N/A 0.77 0.45 N/A Energy for CPU 0.05 0 0.00 — — — — — Energy for GPU 0.36 0.359 0.37 — — — — — Energy forMemory 0.06 0.008 0.24 — — — — — Carbon emissions 40 20 34 24 25 34 51 38 Rosenblatt (3hs 16min) Tot. Energy reported 1.77 1.07 1.12 0.89 0.99 1.71 1.75 1.63 0.84 Tot. Energyw/oPUE 1.06 1.07 1.12 0.89 0.99 1.10 1.11 1.63 0.84 N/A Energy for CPU 0.10 0.17 0.17 — — — — — — Energy for GPU 0.88 0.89 0.87 — — — — — — Energy forMemory 0.08 0.02 0.08 — — — — — — Carbon emissions 90 60 60 60 70 100 100 90 100 Labri (1h 13min) Tot. Energy reported 0.80 0.76 0.79 0.69 0.72 1.16 1.17 0.9 0.3 Tot. Energy w/oPUE 0.48 0.76 0.79 0.69 0.72 0.75 0.74 0.9 0.3 0.83 Energy for CPU 0.15 0.097 0.097 — — — — — - - Energy for GPU 0.27 0.66 0.64 — — — — — - - Energy forMemory 0.06 0.03 0.056 — — — — — — — Carbon emissions 41 42 44 47 48 68 65 36 24 16 E nviron.R es.C om m un.5 (2023)115014 L B ou za etal Table 11. (Continued.) Green-Algorithms CodeCarbon (P) CodeCarbon (M) Eco2AI (P) Eco2AI (M) CarbonTracker EIT MLCO2 Cumulator Wattmeter Personal computer (1h 49min) Tot. Energy reported 0.37 0.34 0.35 0.25 0.27 0.52 N/A 0.45 0.46 Tot. Energyw/oPUE 0.37 0.34 0.35 0.25 0.27 0.34 N/A 0.45 0.46 0.40 Energy for CPU 0.001 0.09 0.09 — — — — — — Energy for GPU 0.35 0.24 0.24 — — — — — - Energy forMemory 0.02 0.01 0.02 — — — — — — Carbon emissions 19 19 19 17 18 30 18 54 Colab -Oregon (17 hs est.) Tot. Energy reported 1.22 1.49 1.56 1.03 1.82 0.96 N/A 1.19 0.36 Tot. Energyw/oPUE 1.10 1.49 1.56 1.03 1.82 0.62 N/A 1.19 0.36 N/A Energy for CPU 0.07 0.73 0.73 — — — Energy for GPU 0.95 0.75 0.75 — — — — — Energy forMemory 0.08 0.02 0.08 — — — — Carbon emissions 199 206 216 184 328 369 100 72 17 E nviron.R es.C om m un.5 (2023)115014 L B ou za etal Multiple GPUsCumulator onlymeasures CPUs orGPUs, according towhatwe specify when creating the tracker. In both cases it considers a single unit of the hardware it ismeasuring, without checking howmany CPUs orGPUs exist on themachine. MLCO2 also has a simplified view, onlymeasuring the consumption of 1GPU. The values reported in the tables were obtained bymultiplying the value obtained by the number ofGPUs available. The reported values for 1GPU forCumulator andMLCO2 are very similar because they follow the same strategy. In the case of the personal computer or Colab, having a single GPU,we can come to consider these two tools, but we are also not measuringCPU consumption. In addition, the tools onlymultiply the time consumed by the TDP, so it does not verify actual consumption or compute usage factors. Their results can only be useful whenwe have a single unit of the component to bemeasured (CPUorGPU forCumulator, and onlyGPU forMLCO2), and it has a usage factor close to 100%. Usage factorTheweb calculator of Green-Algorithms and their server toolG4HPC set default usage factor to 100%CPUandGPU loads if these data are not provided. This will overestimate power consumption inmost cases. To be considered byGA, usage factorsmust be calculated by the user. TheCPUusage factor can be calculated using theCPU time and the process time, but there is no easy way to get theGPUusage factor.We can get empirical values frommeasurements usingnvidia-smiwhile the algorithm is running, and assume that it maintains that usage factor throughout the run. In this casewe are assigning all the utilization percentage reported bynvidia-smi to the process, but there could be other processes using theGPU. In our study, since for both experiments the only process running onGPUswas the onemeasured, we took one sample per epoch of thenvidia-smi output during code execution.We averaged the utilization percentage values of all GPUs across all samples. Results are shown in table 12.We observe a lowusage factor, especially on servers. As shown in table 11,MLCO2 seems to largely overestimate consumption onGrid5000, which is because it does not take into account the usage factor of theGPUs. The average usage factor is 14%, butMLCO2 is considering 100% for all 8GPUs. EIT queries and calculates usage factors during execution. Eco2AI only does this for CPU, as it directly queries the consumed energy forGPU.CodeCarbon andCarbonTracker directly query the consumed energy for bothGPU andCPU,without using the usage factor. 5.2.2. Comparison between software tools andwattmeter Wattmeters were present in Labri server, the personal computer andGemini-1. Table 13 shows a summary of the comparison presented. For experiment 1, wattmeter on the personal computer and labri server onlymade onemeasurement during the entire experiment, so the reported valuemay not be exact. In thefirst experiment, the value reported by the consumption of themachinewithCodeCarbon is almost exactly the same as that reported by thewattmeter. For the second experiment the value is not as precise, but it is Table 12.Usage factor of CPU andGPU in the infrastructures used. This computed values are used byGreen-Algorithms. CPU GPU CPU GPU Expe. 1 Expe. 1 Expe. 2 Expe. 2 Gemini-1 (Grid5000) 5% 0.3% 16% 14% Gemini-1 2GPUs (Grid5000) 12% 1.5% 58% 46% Server (Labri) 9% 1% 73% 35% Rosenblatt (MAP5) 16% 1% 39% 54% Personal Computer 22% 3% 4% 77% Table 13.Comparison between software tools andwattmeter inGrid5000 (without considering PUE), personal computer and Labri server. Values represent the percentage of energy reported by tools w.r.t. the value reported by thewattmeter. CodeCarbon (M) Eco2AI (M) CarbonTracker EIT Expe. 1Grid5000 96% 55% 66% 13% Expe. 2Grid5000 80% 60% 64% 63% Expe. 1 Personal comp. 85% 52% 65% N/A Expe. 2 Personal comp. 88% 68% 85% N/A Expe. 1 Labri 102% 50% 64% 54% Expe. 2 Labri 95% 87% 90% 89% 18 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al stillmore than 80% for all infrastructures. Thismeasuring tool is the one that gives the closest value with respect towattmeters, followed byCarbonTracker, withmore variability between infrastructures. Eco2AI and EIT report values larger than thewattmeter. Since these tools try to isolate the consumption of the process, and notmeasure the total consumption of themachine, then the reports of energy consumption are not comparable with thewattmeter value. 5.3. Influence of infrastructures We ran the same experiments on different infrastructures. For both experiments, power consumption is higher on larger infrastructures (e.g. Gemini-1). As an example, theDenoiser training experiment took 2 hours onGemini-1 (Grid5000 server), while on Rosenblatt (MAP5 server) it took 3 hours and 16min. Usage factor of CPUwas lower inGrid5000: 16% in Grid5000 and 39% inMAP5. The estimation of usage factor of GPUwas also lower inGrid5000: 14.3% in Grid5000while inMAP5 it was 54%. The consumption reported inGemini-1 byCodeCarbon (Machine tracker) is 1.69 kWh,while the consumption reported in Rosenblatt by CodeCarbon (Machine tracker)was 1.12 kWh. Rosenblatt’s hardware is considerably smaller thanGemini-1ʼs (see table 9). It can also be seen that in experiment 2 for Labri, the personal computer and onGemini-1 booking only 2 GPUs, the execution timewas less than in the case of execution on the entireGemini-1 node. This longer execution ismore likely due to the parallelization strategy (using nn.DataParallel) that runs the training on all GPUswithout requiring their full computing power. Thismight be a good reason for using, asmuch as possible, a hardwarewhich size is adapted to the experiments where resources can be used asmuch as possible, even if the experiments takemore time.Gemini-1 node has 8GPUswhich is not useful for both our experiments. 5.4.Data load In theDenoiser training experiment, we seperately quantified the energy consumption of data loading (6 GB Imagenet validation split) versus training themodel and found that only 0.5%of the energywas used to load the data. This is partly because the data was already on the server, the impact of downloading the data and of data storage is not beingmeasured. 5.5. Batch size To study the impact of batch size during training, we usedCodeCarbon during experiment 2 (denoiser) on the Gemini-1 node for 10 epochs. Using three batch sizes (32, 64 and 128), we showed that there is a tradeoff between energy used and runtime (table 14).While larger batch sizes led to faster runtimes, the largest energy usagewasmeasured for the smallest batch size (32), closely followed by the largest one (128). In this situation, an intermediate batch size of 64 looks like a better compromise, combining a runtime not far off the shortest one andminimising energy usage. However, whenwe decrease the batchmore, the experiment takes longer, and the idle consumption of the resources starts toweigh on the total consumption of the experiment. If we compare theGPU consumption of experiments with batch size 32 and 128, we see that experiment 32 consumes less, still taking almost 3 times longer. Nevertheless, comparing the experiment of 32with that of 64, we have that the consumption is higher, probably because the experiment takes almost 10minmore, andwe have the static consumption of the resources. In conclusion, a balance is required between the length of the experiment, and the greater consumption of theGPUmemory to obtain aminimumenergy consumption. 5.6. Checkpoints We found that checkpointing had no impact on energy consumption or runtime (table 15).We tested this on experiment 2 onGemini-1 usingCodeCarbon and awattmeter. In the first scenario, the values of the network parameters were saved every epoch (ten epochs in total) and in the second scenario valueswere saved only once. Table 14.Results of experiment 2with different batch sizes. All consumption values are inWh. Experimentwith batch size 32 Experimentwith batch size 64 Experiment with batch size 128 Total Energy (CodeCarbon) 252 184 246 CPU (CodeCarbon) 41 29 20 GPU (CodeCarbon) 205 152 224 Memory (CodeCarbon) 6 3 2.3 Total Energy (Wattmeter) 391 280.3 320 Time spent 25:54 16:29 10:30 19 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al 5.7. Variability of consumption through epochs It is interesting to determine if it is possible to extrapolate the energy consumption of a training phase from the values observed on only few epochs. To determine it, theDenoiser training experiment was executed during different number of epochs onGemini-1; the time consumedwasmeasured, as well as the energy consumption. Results in the figure 3 show that epochs duration and consumption are constant. Itmight therefore be possible to extrapolate energy consumption for large experiments from experiments on just a few epochs. Same conclusionwas reached inAnthony et al (2020)when usingCarbonTracker. 5.8. Ismeasuring really eco-friendly? To compare the extra energy consumption of the tools themselves, we run 2 processes of experiment 2 in parallel, onewith all seven trackers and onewithout any.We report energy consumption provided by the wattmeter.We found that the codewith trackers was almost 10% slower and ended 11min later than the one without trackers. The energy consumption during this extra timewas 0.19 kWh,while it was 2.58 kWh for the timewhen both processes were running in parallel (+7.4%). Another experiment was performed, testing each tracker at a time. As in previous test, we run two processes in parallel, onewith a given tracker and onewithout any tracker. Energy ismeasuredwithwattmeter. Table 16 shows the result with 10 epochs. It can be seen that the additional energy is around 1%of the total consumption for all the tools, except for Eco2AI, where consumption reaches 3.5%, a value that is not negligible.We think that the biggest consumption compared to the other tools is not using the RAPLfiles to obtain thememory andCPU consumption, but rathermaking queries to the operating system to later do the calculations. Although other tools also do it this way, none do it to calculate the energy of both resources. As a result of both experiments, we can conclude thatmeasuring the processes has an impact, but a small one. Thefirst experiment carried outwith all the trackers has a longer execution time, probably due to delays while access to resources. Itmight be a good idea to use online tools such asGreen-Algorithms, in order not to add additional load to the algorithm and still being able tomeasure the impact. Figure 3.Duration and energy consumption after different number of epochs of Experiment 2. All consumption values are inWh. Table 15.Results of experimentwith different frequency of checkpoints. Both experiments are run for 10 epochs.On the left column, only one checkpoint has been saved at the end of these epochs. On the right column, one checkpoint is saved per epoch. All consumption values are inWh. Experiment with one checkpoint Experimentwith ten checkpoints Total Energy reported (CodeCarbon) 161 160 Energy for CPU (CodeCarbon) 24 24 Energy for GPU (CodeCarbon) 134 133 Energy forMemory (CodeCarbon) 3 3 Total Energy reported (Wattmeter) 206 206 Time spent (min) 14:10 13:47 20 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al 5.9. Static and deployment consumption All the tools discussed in this guide are limited to quantifying energy consumptionwhile training a deep learning approach. But infrastructures also us energywhen nodes are not used orwhen thefinal solution is deployed. The authors of (Luccioni et al 2022) studied static infrastructure emission and deployment emissionswhen training BLOOM, a large languagemodel and found these to be substantial. Wemeasured the energy consumption of idle resources onGemini-1 over the same period of time it takes to run experiment 1. In an idle situation, no process is being run beyond those required by the operating system. We performed the same procedure with experiment 2 (executed for 10 epochs). The results are shown in the table 17. Idle energy consumption is around 745Wh.We see that the consumption of idle resources is high comparingwith the consumption reported during training: 84.4% for experiment 1 and 72.9% for experiment 2.Note that for both experiments, the resources are not fully used. In the table 12we can see the percentage of CPUandGPUutilization during the execution. This result is interesting sincewe can see thatmost of the consumption occurs simply by having the hardware available to use it. This tells us that we have to be very careful when leaving hardware on for availability. The availability and immediacy of resources is very expensive in terms of energy consumption.Whenwe use hardwarewherewe do not have the power to turn it off whenwe are not using it, such as the cloud, or shared computers, wemust remember that there is an additional consumption to be able tomake a reservation at any time for a given resource. 6.Discussions This section summarizes our observations and anticipates questions that AI practitionersmay havewhen starting tomeasure the energy consumption of their codes. 6.1.When tomeasure impacts? Contrary to tracking tools, online ones likeGreen-Algorithmsmake it possible to estimate consumption both after training, as concluded in Bannour et al (2021), and before training. Although this will be less precise, it anticipating the environmental impacts of a project. If we use software tools and performmore than one run, we recommend performing themeasurement only for some runs. Given that is possible to extrapolate the energy consumption of a training phase from the values observed on only few epochs, we couldmeasure the consumption of the firsts epochs, and then estimate the consumption of the total training. In this way, the consumption corresponding to themeasurement will be slightly lower. Table 16.Results running experiment 2 twice in parallel onGemini-1: one process using trackers, the otherwithout. CodeCarbon(P) Eco2AI(P) CarbonTracker EIT Cumulator Run timew/ tracker (min) 15:09 15:33 16:35 16:29 15:02 Run timew/o tracker (min) 15:05 14:57 16:24 16:35 14:49 Extra timewith tracker (min) 0:04 0:36 0:11 −0:06:00 0:13 EnergyCons.when 2 processes running (Wh) 335.5 334 358 358.5 331.6 EnergyCons. during extra time (Wh) 3.1 12.2 4.29 0 5.4 Percentage of overload (%) 0.92 3.5 1.2 0 1.6 Table 17. Static (Idle) and dynamic energy consumptionmeasuredwith wattmeters Time (min) Energy consump- tion (Wh) Experiment 1 00:53 12.96 Idle 00:53 10.95 Experiment 2 (10 epochs) 16:29 280.3 Idle 16:29 204.4 21 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al 6.2.Which tools to use Estimating power consumption using software tools adds small load, so itmight be a good idea to use online tools like Green-Algorithms. Green-Algorithms is themost versatile tool, as it can be used under different infrastructures, brands of CPUs andGPUs.However, online tools requiresmanual intervention to obtain the information andmay be less precise. Afirst step to remedy this is the toolGA4HPCwhich is used to obtain the resource reservation data of a job in clusters that use SLURMasworkloadmanager. MLCO2 is an online tool but ismuchmore limited. It just account for GPU consumption and the value returnedmust be correctly weighted according to the number ofGPUs and the correct execution time of the algorithm. If wewant to use software tools, we found that CodeCarbon is the best tool among those studied to estimate the total consumption of themachine. The consumption reportedwith it ismore accurate when accessing RAPL files. However, a strength of this tool is that it can be usedwithout access to them.On the contrary, if what you want is to isolate the consumption of the process using software tools, Eco2AI and EIT are those that try to do it. Eco2AI does not require access to the RAPLfiles and ismaintained and updated. By contrast, EIT requires access to theRAPL files and it is necessary tomodify the code to use the tool. 6.3.Which infrastructure to use Since the idle consumption of resources is a large percentage of the total consumption, we recommend only keeping available the resources needed to achieve a high usage factor and have theminimum idle consumption, even if the execution time is longer. With supercomputers, we recommend requesting only the necessary resources, and if it is adequate, to share the infrastructure with other user processes. If possible, we recommend turning off personal computers or servers as soon as computation is done. If we are using cloud infrastructure, as far as possible, choose data centers that have the lowest PUE and that are located in areas with low gas emissions.We recommend choosing low emission hours for the execution of training. Carbon aware schedulers such as CATS, grid-intensity-go or carbon-aware-scheduler can be used to helpwith this. 6.4.Other impacts In this paper, we have been focusing only on energy consumption, and associated greenhouse gas emissions, for training AImodels. This only a small part of total energy consumption of the complete life cycle of the AI service. For the training phase, an AI practitioner generally trains themodel several times. Complete training emissions should consider all runs. InGreen-Algorithms, we canmodelmultiple runs, associatedwith retraining using the ‘pragmatic scaling factor’ parameter. Asmentioned in previous studies (Bannour et al 2021, Luccioni et al 2022,Wu et al 2021), the energy consumption is underestimated, since all the tools onlymeasure the consumption during training and not during deployment. Studies (Wu et al 2021, Luccioni et al 2022)havemeasured the consumption of deployment phases that can bemuch higher than the one of training.Here again, choosing appropriate resources to have a high usage factor seems to be essential. Many other environmental impacts (resource depletion, ecotoxicity, etc) linked to the life cycle of equipments (manufacturing, transport, distribution, use, end of life), are here not discussed and should be investigated. Even of carbon footprint, computing embodied emissions is a challenge since all data are notmade public bymanufacturers. From several assumptions, the authors of (Luccioni et al 2022) propose an estimation of embodied emissions equal to half the ones of training. Datasets creation, transfer and storage are also very important aspects of AI. An estimate by (Malmodin and Lundén 2016) is 0.023 kWh GB−1 for transferring data on the IP core. For storage, there are various estimates. Following Seagatemeasurement,9 (Lannelongue and Inouye 2023) consider an order ofmagnitude of the carbon footprint of storing 1 terabyte of data to be around 10 kgCO2e per year. Another study (Gröger et al 2021) mention 52Wh for storing one gigabyte for one year. To knowmore about energymanagement techniques for database systems, we refer the reader to the systematic review (Guo et al 2022). 6.5. Predicting impacts Systematically estimating the carbon footprint ofAI project can raise awareness, encourage the development of energy-efficient software and limit thewaste of resource (Lannelongue and Inouye 2023). Importantly, these impacts shouldbe anticipated before the start of a project. Authors of (Lefèvre et al2023)propose a list of criteria 9 https://seagate.com/gb/en/global-citizenship/product-sustainability/ 22 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://www.green-algorithms.org/GA4HPC/ https://github.com/GreenScheduler/cats https://github.com/thegreenwebfoundation/grid-intensity-go https://pypi.org/project/carbon-aware-scheduler/ https://www.seagate.com/gb/en/global-citizenship/product-sustainability/ for assessing the environmental impacts of projects involvingArtificial Intelligence (AI)methods. In addition to measuringwhile training ordeploying anAImodel, AI users should try to anticipate asmuch as possible the impacts of their computations are likely to have, aswell as thebehavioral, economic, or societal changes thatmight be induced by the project. In the same line, (Wilsonand van derVelden2022) review ethics, explainability, responsibility, and accountability concepts inAI andpropose amodel for sustainableAI in thepublic sector. 7. Conclusion In this paperwehavepresented andanalyzed seven existing tools for estimating energy consumptionwhen training a deep learningmodel.Wehave explained the specificities of each tool anddetailed thenotions thatmaybenotwell knownbyAIpractitioners. Fromour study,wehavedrawn someanalysis and recommendations inprevious sections. Remark that our twoexperimentswere related to training regularCNNs for imageprocessing andanalysis.Webelieve that themain resultswouldhold for other typesof architectures, as carbon footprint estimatorshave shown the same behaviors for other applicationsorworkloads in Jay et al (2023), Bannour et al (2021),Dodge et al (2022). In thepaper wehavehighlighted the advantages and limits of online tools, and that the choiceof the software tool dependson the infrastructure andoneither onewants tomeasure thewholenodeor theprocess only.Wehave also shown that measuringwith software tools has a small impact that canbecomenonnegligible for large experiments.Weobserved that consumption is constant throughepochs, and thereforemeasuringonlyon fewepochs andextrapolating canbe sufficient.Wehave confirmed that it is important to trainmodels on infrastructures that is scaled to theneed, not booking awholenodewhennotnecessary. Finally, all these toolsmeasureonlydynamic energy consumptionof computing and further studies are required to include static consumptionandenvironmental impacts. Acknowledgments This study has been carried outwithfinancial support from the FrenchResearch Agency through the PostProdLEAPproject (ANR-19-CE23-0027-01). Loïc Lannelonguewas supported by core funding from the BritishHeart Foundation (RG/18/13/33946); theNIHRCambridge Biomedical ResearchCentre (BRC-1215- 20014;NIHR203312)[*]; theCambridge BritishHeart FoundationCentre of Research Excellence (RE/18/1/ 34212); and the BHFChair Award (CH/12/2/29428). *The views expressed are those of the authors and not necessarily those of theNIHRor theDepartment ofHealth and Social Care. The authors thankMichael Clément andBorisMansencal for running experiments in Labri and personal computer.We also thankMathilde Jay, Denis Trystram, Laurent Lefèvre andAnne-Laure Ligozat for fruitful discussions. Data availability statement Nonewdatawere created or analysed in this study. AppendixA.Methodologies to estimate energy consumption ofCPUs andGPUs This appendix described the twomethods used to estimate energy consumption of CPUs andGPUs Knowing themodel of the CPUorGPU, the firstmethodmultiplies the TDPprovided by themanufacturer by the duration of training to obtain the energy used in kWh. TDP is a specification that indicates themaximum amount of power that a computer processor (CPUorGPU) can dissipate when operating at itsmaximum performance. It refers to the power consumption under themaximum theoretical load. In general, CPUswith a higher number of cores will have a higher TDPbecause they requiremore power to operate atmaximum performance. However, the relationship betweenTDP and the number of cores is not always straightforward. SomeCPUsmay have a higher TDP even though they have fewer cores, because they are designed to operate at a higher clock speed or have a less efficient architecture. The secondmethoduses the IntelRAPL (RunningAveragePowerLimit) systemmanagement interface integrated in INTELCPUsor thePowerGadget tool.RAPLallows software tomonitor and control thepowerusageof the processor and its components, suchas theCPUcores,memory controllers andGPUs.TheLinuxpowercapdriverhas the ability to expose theRAPLhardware energy counters by a set offiles that canbe accessed through theLinuxfile system.Thesefilesmake it possible to read the current powerusageof theprocessor and its components, aswell as to set power limits to control powerusage.Drivers arebeingdeveloped to get the information fromRAPL interface from Windows.A recent implementation is thewindows-rapl-driver10 from theScaphandreproject (Petit 2021). 10 https://github.com/hubblo-org/windows-rapl-driver 23 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://github.com/hubblo-org/windows-rapl-driver PowerGadget is a standalone software application developed by Intel that provides real-timemonitoring of the power usage of Intel processors. It does not rely on the RAPLfiles, but rather uses its own proprietary methods to access and analyze power consumption data. PowerGadget presents power consumption data in a user-friendly graphical interface that displays real-time power usage of the processor, CPU cores,memory controller, and other components. This tool can be used onWindows andmacOS. Appendix B. Bugsfix of some software tools Some toolsmust bemodified to be used, as they have bugs that have not beenfixed by the authors. Here are the changes tomake for each one. B.1. Experiment-impact-tracker • PyPi package is not the latest, and does not not correspond to documentation (issue). • getiterator infile/gpu/nvidia.pymust be changed to iter. • For long runs, the INFO log level is too heavy. Change it to the ERROR level. • If you have other experiment-impact-tracker logs in the same folder or subfolders, correct the data_interface.py file so that the results are shown only from the logs folder thatwas determined. B.2. Cumulator Correct imports inbase.py (structure defined in thisfile does not correspond to the structure of the package. issue) B.3. CarbonTracker Correct decode deprecated function in Python 3.10 infile carbontracker/components/gpu/nvidia.py. AppendixC.Neural network architectures of experiments The neural network architecture of Experiment 1 is a fully connected networkwith a single hidden layer of 32 neurons and an output layer of 10 neurons. The image C1 shows the architecture. The neural network architecture of Experiment 2 is theDnCNNnetwork presented in Ryu et al (2019). The imageC2 shows the architecture proposed in the original paper, which is the onewe used in the experiment. FigureC1. Experiment 1Network Architecture. 24 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://github.com/Breakend/experiment-impact-tracker/issues/76 https://github.com/epfl-iglobalhealth/cumulator/issues/25 ORCID iDs Aurélie Bugeau https://orcid.org/0000-0002-4858-4944 References Anthony L FW,Kanding B and Selvan R 2020Carbontracker: tracking and predicting the carbon footprint of training deep learningmodels arXiv:2007.03051 Arias P et al 2021Climate change 2021: the physical science basis. Contribution ofworking group i to the sixth assessment report of the intergovernmental panel on climate change; technical summary IPCC BannourN,Ghannay S,Névéol A and Ligozat A-L 2021 Evaluating the carbon footprint of nlpmethods: a survey and analysis of existing toolsProceedings of the SecondWorkshop on Simple and EfficientNatural Language Processing 11–21 Budennyy S et al 2022 Eco2ai: carbon emissions tracking ofmachine learningmodels as thefirst step towards sustainable aiDoklady Mathematics.Moscow: Pleiades Publishing 106 S118–S128 Deng L 2012Themnist database of handwritten digit images formachine learning research IEEE Signal ProcessMag. 29 141–2 Deng J, DongW, Socher R, Li L-J, Li K and Fei-Fei L 2009 Imagenet: a large-scale hierarchical image database 2009 IEEEConference on Computer Vision and Pattern Recognition. IEEE 248–55 Dodge J et al 2022Measuring the carbon intensity of ai in cloud instancesACMConference on Fairness, Accountability, andTransparency 1877–94 Ember 2022 https://ember-climate.org/insights/research/global-electricity-review-2022/Global electricity review 2022 Gröger J, Liu R, Stobbe L,Druschke J andRichterN 2021Green cloud computing Life CyclebasedData Collection on Environmental Impacts of CloudComputing https://umweltbundesamt.de/sites/default/files/medien/5750/publikationen/2021-06-17_texte_94-2021_ green-cloud-computing.pdf GuoB, Yu J, YangD, LengH and Liao B 2022 Energy-efficient database systems: a systematic surveyACMComputing Surveys 55 1–53 GuptaU et al 2021Chasing carbon: the elusive environmental footprint of computing IEEE International Symposium onHigh-Performance Computer Architecture 42 854–67 Gupta A, Lanteigne C andKingsley S 2020 Secure: a social and environmental certificate for ai systems arXiv:2006.06217 Henderson P,Hu J, Romoff J, Brunskill E, JurafskyD and Pineau J 2020a Towards the systematic reporting of the energy and carbon footprints ofmachine learning Journal ofMachine Learning Research 21 10039–81 HodakM,GorkovenkoMandDholakia A 2019Towards power efficiency in deep learning on data center hardware 2019 IEEE International Conference on BigData (BigData) 1814–20 JayM,OstapencoV, Lefèvre L, TrystramD,Orgerie A-C and Fichel B 2023An experimental comparison of software-based powermeters: focus onCPU andGPU IEEE/ACM International Symposium onCluster, Cloud and Internet Computing (https://doi.org/10.1109/ CCGrid57682.2023.00020) Kaack LH,Donti P L, Strubell E, KamiyaG, Creutzig F andRolnickD 2022Aligning artificial intelligencewith climate changemitigation Nature Climate Change 12 518–27 Kar AK,Choudhary SK and SinghVK2022How can artificial intelligence impact sustainability: a systematic literature review Journal of Cleaner Production 134120 KaryakinA and SalemK2017A survey ofmain-memory energy efficiency techniquesProceedings of the 13th InternationalWorkshop onData Management onNewHardware (DaMoN). ACM 1–9 Lacoste A, Luccioni S, Schmidt V andDandres T 2019Quantifying the carbon emissions ofmachine learning arXiv:1910.09700 Lannelongue L and InouyeM2023Carbon footprint estimation for computational researchNature ReviewsMethods Primers 3 Lannelongue L,Grealey J and InouyeM2021Green algorithms: quantifying the carbon emissions of computationAdvance Science 8 2100707 Lawrence A 2019 Is PUE actually going up?Uptime Institute Blog Lefèvre L et al 2023Environmental assessment of projects involving aimethods hal-03922093 2023 Ligozat A-L and Luccioni S 2021Apractical guide to quantifying carbon emissions formachine learning researchers and practitioners Research Report,MILA; LISN Ligozat A-L, Lefevre J, BugeauA andCombaz J 2022Unraveling the hidden environmental impacts of ai solutions for environment life cycle assessment of ai solutions Sustainability 14 5172 LottickK, Susai S, Friedler SA andWilson J P 2019 Energy usage reports: environmental awareness as part of algorithmic accountability arXiv:1911.08354 Luccioni S, Viguier S and Ligozat A-L 2022Estimating the carbon footprint of bloom, a 176b parameter languagemodel arXiv:2211.02001 MaevskyDA,Maevskaya E J and Stetsuyk ED2017 Evaluating the ram energy consumption at the stage of software developmentGreen IT Engineering: Concepts,Models, Complex Systems Architectures (Springer) pp 101–21 FigureC2.DnCNNNetworkArchitecture. Image taken fromRyu et al (2019). 25 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://orcid.org/0000-0002-4858-4944 https://orcid.org/0000-0002-4858-4944 https://orcid.org/0000-0002-4858-4944 https://orcid.org/0000-0002-4858-4944 http://arxiv.org/abs/2007.03051 https://doi.org/10.18653/v1/2021.sustainlp-1.2 https://doi.org/10.18653/v1/2021.sustainlp-1.2 https://doi.org/10.18653/v1/2021.sustainlp-1.2 https://doi.org/10.1134/S1064562422060230 https://doi.org/10.1109/MSP.2012.2211477 https://doi.org/10.1109/MSP.2012.2211477 https://doi.org/10.1109/MSP.2012.2211477 https://doi.org/10.1109/CVPR.2009.5206848 https://doi.org/10.1109/CVPR.2009.5206848 https://doi.org/10.1109/CVPR.2009.5206848 https://doi.org/10.1145/3531146.3533234 https://doi.org/10.1145/3531146.3533234 https://doi.org/10.1145/3531146.3533234 https://ember-climate.org/insights/research/global-electricity-review-2022/Global https://umweltbundesamt.de/sites/default/files/medien/5750/publikationen/2021-06-17_texte_94-2021_green-cloud-computing.pdf%0A https://umweltbundesamt.de/sites/default/files/medien/5750/publikationen/2021-06-17_texte_94-2021_green-cloud-computing.pdf%0A https://doi.org/10.1145/3538225 https://doi.org/10.1145/3538225 https://doi.org/10.1145/3538225 https://doi.org/10.1109/MM.2022.3163226 https://doi.org/10.1109/MM.2022.3163226 https://doi.org/10.1109/MM.2022.3163226 http://arxiv.org/abs/arXiv:2006.06217 https://doi.org/10.5555/3455716.3455964 https://doi.org/10.5555/3455716.3455964 https://doi.org/10.5555/3455716.3455964 https://doi.org/10.1109/BigData47090.2019.9005632 https://doi.org/10.1109/BigData47090.2019.9005632 https://doi.org/10.1109/BigData47090.2019.9005632 https://doi.org/10.1109/CCGrid57682.2023.00020 https://doi.org/10.1109/CCGrid57682.2023.00020 https://doi.org/10.1038/s41558-022-01377-7 https://doi.org/10.1038/s41558-022-01377-7 https://doi.org/10.1038/s41558-022-01377-7 https://doi.org/10.1016/j.jclepro.2022.134120 http://arxiv.org/abs/1910.09700 https://doi.org/10.1038/s43586-023-00202-5 https://doi.org/10.1002/advs.202100707 https://doi.org/10.1002/advs.202100707 https://doi.org/10.3390/su14095172 http://arxiv.org/abs/1911.08354 http://arxiv.org/abs/2211.02001 https://doi.org/10.1007/978-3-319-44162-7_6 https://doi.org/10.1007/978-3-319-44162-7_6 https://doi.org/10.1007/978-3-319-44162-7_6 Malmodin J and LundénD2016The energy and carbon footprint of the ict and e&m sector in sweden 1990-2015 and beyond ICT for Sustainability (Atlantis Press) pp 209–18 MoroA and Lonza L 2018 Electricity carbon intensity in europeanmember states: impacts on ghg emissions of electric vehicles Transportation Research PartD: Transport and Environment 64 5–14The contribution of electric vehicles to environmental challenges in transport.WCTRS conference in summer Petit B 2021 https://hubblo-org.github.io/scaphandre-documentation/references/sensor-powercap_rapl.html scaphandre Version v0.3. RolnickD et al 2022Tackling climate changewithmachine learningACMComputing Surveys 55 1–96 Ryu EK, Liu J,Wang S, ChenX,WangZ andYinW2019 Plug-and-playmethods provably converge with properly trained denoisers International Conference onMachine Learning Strubell E, GaneshA andMcCallumA2019 Energy and policy considerations for deep learning inNLP arXiv:1906.02243 [cs] ThompsonNC,GreenewaldK, Lee K andMansoGF 2020The computational limits of deep learning arXiv:2007.05558 The Shift Project 2019 Lean ICT, towards digital sobrietyThe Shift Project TrebaolM J T,HartleyM-A andGhadikolaei H S 2020A tool to quantify and report the carbon footprint ofmachine learning computations and communication in academia and healthcare Infoscience EPFL: record 278189 Uptime Institute 2022 2022 data center industry surveyUptime Institute Vinuesa R et al 2020The role of artificial intelligence in achieving the sustainable development goalsNature Communications 11 233 WuC-J et al 2022 Sustainable AI: environmental implications, challenges and opportunitiesMachine Learning and Systems 795–813 arXiv:2111.00364 4 WilsonC and van derVeldenM2022 Sustainable ai: an integratedmodel to guide public sector decision-makingTechnology in Society 68 101926 26 Environ. Res. Commun. 5 (2023) 115014 L Bouza et al https://doi.org/10.2991/ict4s-16.2016.25 https://doi.org/10.2991/ict4s-16.2016.25 https://doi.org/10.2991/ict4s-16.2016.25 https://doi.org/10.1016/j.trd.2017.07.012 https://doi.org/10.1016/j.trd.2017.07.012 https://doi.org/10.1016/j.trd.2017.07.012 https://hubblo-org.github.io/scaphandre-documentation/references/sensor-powercap_rapl.html scaphandre https://doi.org/10.1145/3485128 https://doi.org/10.1145/3485128 https://doi.org/10.1145/3485128 http://arxiv.org/abs/1906.02243 http://arxiv.org/abs/2007.05558 http://arxiv.org/abs/2111.00364 https://doi.org/10.1016/j.techsoc.2022.101926 https://doi.org/10.1016/j.techsoc.2022.101926 1. Introduction 2. Related works 3. Estimating greenhouse gas emissions 3.1. Energy consumption of each component 3.1.1. Energy consumed by CPU 3.1.2. Energy consumed by GPU 3.1.3. Energy consumed by memory 3.1.4. Energy consumed by communications 3.2. PUE 3.3. Carbon emission and emission intensity 3.4. Measuring whole equipment consumption with wattmeters 3.5. Errors reported and found in the tools 3.6. Summary of the characteristics of existing tools 4. Infrastructure 4.1. Access to information and resources 4.1.1. Virtual environments 4.1.2. RAPL files 4.1.3. Usage factor 4.1.4. Wattmeter 4.2. Description of the infrastructures used for experimentation 4.2.1. Laboratory servers 4.2.2. Super computers 4.2.3. Personal computers 4.2.4. Colab 5. Experiments and results analysis 5.1. Experiments settings 5.2. Results 5.2.1. Variability between the different tools 5.2.2. Comparison between software tools and wattmeter 5.3. Influence of infrastructures 5.4. Data load 5.5. Batch size 5.6. Checkpoints 5.7. Variability of consumption through epochs 5.8. Is measuring really eco-friendly? 5.9. Static and deployment consumption 6. Discussions 6.1. When to measure impacts? 6.2. Which tools to use 6.3. Which infrastructure to use 6.4. Other impacts 6.5. Predicting impacts 7. Conclusion Acknowledgments Data availability statement Appendix A. Appendix B. B.1. Experiment-impact-tracker B.2. Cumulator B.3. CarbonTracker Appendix C. References