This is the Danish Data Science Community's guide to resources on sustainable data science. In this context, sustainable data science refers to the sustainability of data science as a field from a climate and environmental perspective, i.e. how to make the data science field more sustainable for instance by reducing the energy consumption of machine learning model training. Thus, this is not a guide to resources on data science applied to sustainability use cases. By "data science" we mean a broad field encompassing machine learning, statistics, operations research etc, but also data engineering and other related disciplines.
Feel free to create an issue or make a PR if you want to contribute.
Tools for measuring energy consumption and/or CO2e from compute
- carbontracker: A Python package that measures energy consumption and estimates carbon footprint of code as described in Anthony et al. (2020)
- codecarbon: A Python package that estimates the amount of carbon dioxide (CO2) produced by the cloud or personal computing resources used to execute code. It shows developers how they can lessen emissions by optimizing their code or by hosting their cloud infrastructure in geographical regions that use renewable energy sources. It is similar to carbontracker, but is actively maintained and developed as opposed to carbontracker which was last updated in 2021.
- ML CO2 Impact: In this web application, you choose your hardware, runtime and cloud provider to estimate the carbon impact of your model training. This calculator will give you 2 numbers: the raw carbon emissions produced and the approximate offset carbon emissions. The latter number depends on the grid used by the cloud provider.
- AWS Customer Carbon Footprint Tool: Track, measure, review, and forecast the carbon emissions generated from your AWS usage
- GCP Carbon Footprint: Measure, report, and reduce your cloud carbon emissions.
- Azure Emissions Impact Dashboard: Estimate your carbon emissions related to using Microsoft cloud services—including Azure and Microsoft 365
- ThoughtWorks Cloud Carbon Footprint: Free and open source cloud carbon emissions measurement and analysis tool from ThoughtWorks.
- Green Algorithms calculator: Similar to ML CO2 Impact
- Scaphandre: "The goal of the project is to permit to any company or individual to measure the power consumption of its tech services and get this data in a convenient form, sending it through any monitoring or data analysis toolchain"
- Carbon Aware SDK "The Carbon Aware SDK helps you build the carbon aware software solutions with the intelligence to use the greenest energy sources."
- Kepler Kepler Exporter exposes energy consumption statistics from an application running in a Kubernetes cluster.
- JoularJX "JoularJX is a Java-based agent for power monitoring at the source code level with support for modern Java versions and multi-OS to monitor power consumption of hardware and software."
Academic and industry papers about the environmental / climate impact of data science, machine learning and related topics.
- Green AI
- Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
- Carbon Emissions and Large Neural Network Training
- Hyperparameter Power Impact in Transformer Language Model Training
- The carbon impact of artificial intelligence
- Towards Power Efficiency in Deep Learning on Data Center Hardware
- Quantifying the Carbon Emissions of Machine Learning
- Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
- Power Consumption Variation over Activation Functions
- Impacts of software and its engineering on the carbon footprint of ICT
- Energy Efficiency across Programming Languages
- Energy and Policy Considerations for Deep Learning in NLP
- Green Algorithms: Quantifying the carbon footprint of computation
- MEASURING THE ENVIRONMENTAL IMPACTS OF ARTIFICIAL INTELLIGENCE COMPUTE AND APPLICATIONS
A list of scientific journals related to sustainable data science and related topics
Talks, presentations, debates etc. about sustainable data science and the environmental / climate impact of machine learning.
Podcast shows and episodes about sustainable data science, the environmental impact of machine learning and related topics.
- Sustainable data science / GreenAI podcasts - a Spotify playlists with some podcast episodes on the topic
- Environment Variable - A show by The Green Software FOundation: "Each episode we discuss the latest news regarding how to reduce the emissions of software and how the industry is dealing with its own environmental impact"
News, blogs and other written material (excluding published papers) about the environmental / climate impact of data science, machine learning and software engineering.
- Accenture, GitHub, Microsoft and ThoughtWorks Launch the Green Software Foundation with the Linux Foundation to Put Sustainability at the Core of Software Engineering
- Sustainable Software Engineering (SSE) and the role and responsibilities of a Sustainable Software Engineer
- The current state of affairs and a roadmap for effective carbon-accounting tooling in AI
- Carbon proxies: Measuring the greenness of your application
- How to measure the energy consumption of your apps
- A.I.’s carbon footprint is big, but easy to reduce, Google researchers say
- A Practical Guide to Quantifying Carbon Emissions for Machine Learning researchers and practitioners
- How Green Is Your Software?
- Most of computing’s carbon emissions are coming from manufacturing and infrastructure
- Your Software’s Carbon Footprint
- Evaluating the carbon footprint of a software platform hosted in the cloud
- The 10 most energy efficient programming languages
- 8 great podcast episodes on the climate impact of machine learning
- BEYOND SINGLE-DIMENSIONAL METRICS FOR DIGITAL SUSTAINABILITY
- GPS-UP: A BETTER METRIC FOR COMPARING SOFTWARE ENERGY EFFICIENCY
- Towards a Fossil-Free Internet: The Fog of Enactment - Priority areas for research in digital sustainability
- Sustain - Sustainable AI in practice
- How to estimate and reduce the carbon footprint of machine learning models