DE · EN
Book a call
← Back to Insights Methodology · 2026

Where does AI fit in sustainability work — and where doesn't it?

The past year has brought a flood of studies on AI exposure — for tasks, for jobs, for whole occupations. There's still no shared methodology. The approaches diverge sharply: in their logic, their granularity, and how well they survive contact with real work.

We worked through them and built our own approach — aimed squarely at the workflows that actually show up in sustainability work: double materiality assessments, GHG inventories, CSRD reporting, supply-chain due diligence.


Section 1What's out there

People come at this question from different angles.

Eloundou et al. (OpenAI/NBER, 2023) break occupations down into small tasks. "Tax Preparer", for instance, includes things like "compute taxes owed by following tax code." Then they ask, task by task: could AI do this at the same quality, at least twice as fast? Experts answer some, GPT-4 answers others. They aggregate up to the occupation.

Webb (Stanford, 2019/2020) goes the other way around. He reads recent AI patents — "automated visual inspection of welds," that sort of thing — and lays them over job descriptions. If a patented task shows up in a job, that's a signal: someone has already put money into making it automatable.

Felten, Raj, Seamans (NBER, 2018–2024) start with the capabilities — image recognition, translation, language modeling — and match them against task requirements from O*NET, the US database of 800-plus occupations. The more capabilities that map onto a task, the more AI-exposed it is.

McKinsey (MGI, 2017 / GenAI update 2023) combines both moves. They break 800 occupations into roughly 2,000 work activities and rate them against 18 capabilities — sensory, logical, language, social-emotional, physical. Each capability gets a score from 0 (not needed) to 3 (top-quartile human performance). An activity is "technically automatable" when current tech clears the bar on everything it needs. In the 2023 update, McKinsey lands at up to 70 % of work hours.

Looking at the task itself

A different school skips the occupation and the capability and asks about the task: which properties make a task automation-friendly?

Brynjolfsson, Mitchell, and Rock at MIT were early here with their "Suitability for Machine Learning" rubric. Eight criteria — clearly defined inputs and outputs, training data on hand, and so on. One idea stuck: the more structured a task, the more amenable it is to AI.

BCG and Bain took it further. BCG lists five criteria — "no significant physical presence," "rule-based, traceable result." Bain adds six "Agentic Automation Feasibility Factors," including output verifiability (can you cheaply check the result?) and integration and orchestration (how many systems does the AI need to talk to?). Bain's contribution: don't just ask whether the model can do the task in theory — ask whether it can handle it inside the company's actual tool stack.

This is the line we extend for sustainability work. It's the right starting point because it generalises: the same questions work across very different workflows, which lets us compare them on the same terms.

Section 2A wider evaluation base

BCG's five and Bain's six are a strong core. For sustainability work we add more dimensions on equal footing, drawing on a deeper review across 41 institutions — academia, AI labs, standard-setters, consultancies, law firms, international bodies.

That gave us 17 task properties, each traceable back to a clear research source.

A few of the additions:

Atypicality. Has the model seen this kind of material before? Work in an established sector with mature reporting patterns scores low — the training data is full of close relatives. Work under fresh regulation — the Revised ESRS draft from May 2026, the CSDDD rollout — scores high. The material is too new to be in training. Sources: Cambridge ADeLe, MIT Brynjolfsson SML, METR Time-Horizon studies, NBER, OECD AI Capability Indicators.

Cognitive load. How many reasoning steps to a solid answer? Assigning an emission factor is usually one. A full risk-and-opportunity assessment across several value-chain stages is many, nested. Today's frontier models tend to lose reliability after a few chained steps. Sources: Cambridge ADeLe, METR Time-Horizon, MIT SML, OECD AI Capability Indicators, Stanford HAI.

Knowledge depth. How much specialist knowledge does it take? Topic screening only needs ESRS basics — public guidance covers it. Setting thresholds also needs sector experience, an assurance lens, and audit standards — much harder to rebuild from public sources alone. Sources: Cambridge ADeLe, OECD AI Capability Indicators, MIT, Stanford HAI.

17 TASK dimensions in total.

Putting it together

Like Eloundou et al., we break each workflow — a double materiality assessment, for example — into individual tasks, at a granularity that's fine enough to expose real differences but coarse enough to stay readable.

Then we score each task against the dimensions. That gives us a concrete read on automation potential, and an aggregate picture per workflow.

What comes out isn't a single percentage. It's a clearer answer to the questions that actually matter: which steps are worth automating? Which task properties help? Which get in the way? That lets you design workflows where humans and machines actually play to each other — and avoid the trap of pilots that never reach production.


Section 3What it takes to realise the potential

Even a task that's automatable on paper needs three more things to work in practice:

  1. the right capabilities in today's models,
  2. the right setup inside the organisation,
  3. the right governance.

What the models can do

If the model can't do what the task needs, results get unreliable fast. Model capabilities move quickly, so we re-score this layer every quarter — what sits on the edge today often runs robustly a year out.

Nine criteria for model capability. The clearest examples:

What the organisation brings

The setup inside the company matters just as much. Is the data digital? How strong is the domain expertise on the team? What tools are in place? How well do data, systems, and accountability hang together?

Six dimensions here.

Bain and the Cambridge Bennett Institute land on the same point: technically automatable doesn't mean practically usable. Coyle's UK study shows that only a small share of companies actually convert AI into measurable productivity — and the bottleneck is rarely the task, it's organisational friction.

Our default is a typical mid-sized CSRD-bound company. For any real assessment, you adjust that default to the specifics of the company in question.

Already here, the assessment beats a single number

By this point we know three things: whether a task is automatable in principle, whether today's models can carry it, and whether the company is set up to actually use them.

That surfaces concrete signals: which gaps does the company need to close to realise the potential? Where do today's models still cap out? Which tasks are technically attractive but practically still hard?

A single percentage would flatten all of that.

Governance — what often decides in practice

One more layer often decides whether AI can actually be deployed: the controls around it.

Nine more dimensions here — EU AI Act, ESRS audit obligations, GDPR, IAASB requirements, professional skepticism.

This stage doesn't usually return a hard stop. More often it spells out which controls are required where: a four-eyes review, an audit trail, a documented source base, a Fundamental Rights Impact Assessment.


Section 4The methodology in five stages

We work in five stages. A task is AI-suitable only when all four assessment stages clear it — and only when Stage 0, the operationalisation, was done properly.

Stage 0Setup
Operationalisation per sub-task. Setup workshop. Break the workflow into clean sub-tasks at a workable granularity.
Stage 1Task
Task assessment (17 TASK dimensions). Is the task automatable in principle?
Stage 2Capab
Model capability (9 CAPAB dimensions). Can today's models carry it?
Stage 3Deploy
Organisational readiness (6 dimensions). Are the data, tools, and skills in place?
Stage 4Gov
Governance (9 GOV dimensions). Which controls does it need?
Output Recommendation per sub-task + range in context + list of required controls
The rule: a task is AI-suitable only when all four assessment stages clear it — and only when Stage 0 was done properly.

The 41 institutions behind the methodology

Full source list further down.


The result: a workflow that actually runs

What you get out is more than a score. It's a usable answer to one of the harder questions in deploying AI:

How do I build a workflow that holds up in production?

The methodology says where automation pays off, where it doesn't, what has to be in place to make it work, and which controls it needs to clear.

Worked example · DMA
Double Materiality Assessment — AI potential assessed
Eight sub-tasks, four assessment perspectives, one recommendation per step. Aggregate: medium-to-high · 50–65 % time savings.
View the assessment →
Worked example · GHG inventory
GHG Inventory (Scope 1+2+3) — AI potential assessed
Eight sub-tasks along the GHG Protocol. Aggregate: very high · 60–85 % time savings · Collaborator band. Two tasks reach Expert — a first for this methodology.
View the assessment →

Sources — 41 institutions in six groups

The dimensions were synthesised from existing literature. A dimension makes it into the canonical set only if several institutions name it, or if it carries enough explanatory power on its own to be a strong signal for AI potential.

GroupnTop institutions
Academia7MIT Brynjolfsson (SML), NBER Eloundou, Harvard HBS (Jagged Frontier), Cambridge ADeLe, Stanford HAI (HELM/FMTI), Oxford Frey & Osborne, Brookings
AI labs7Anthropic (Economic Index + Agent Evals + RSP), OpenAI (Model Spec, GDPval), Google DeepMind (Levels of AGI, FSF), METR (Time-Horizon, Messiness), Microsoft Research (Tomlinson), ARC Evals, Epoch AI
Standard-setting6NIST (AI RMF MAP, GenAI Profile), ISO (42001, 23894, 25059), COSO, IAASB (ISSA 5000), IFAC, IIA
Consultancies11Bain (Feasibility Factors), BCG (Reshape, Jagged Frontier), McKinsey (MGI 18×4), Deloitte, EY (AAA, 9 RAI), KPMG (10 Pillars), PwC, Accenture, Oliver Wyman, Strategy&, Kearney
Law5Linklaters (LinksAI Benchmark), Clifford Chance, Allen & Overy, Freshfields, Latham & Watkins
International5OECD (AI Capability Indicators), WEF (Jobs of Tomorrow, AI Governance Alliance), ILO (WP140), RAND (Bioweapons Uplift), UC Berkeley (ABC, BASALT, CHAI)

Tier ranking by methodological rigour:


Source register — primary sources per institution

The list below names the primary sources that fed into the canonical dimensions, grouped by the six institutional categories and alphabetical within each. One to three key documents per institution are linked. The full source collection per institution lives in the internal methodology register. Secondary literature, press coverage, and commentary are deliberately left out.

Academia · 7 institutions

Brookings (Center for Technology Innovation, Metro)

  • Muro, Whiton, Maxim (2019) — "What Jobs Are Affected by AI?" — Brookings
  • Kinder, de Souza Briggs, Muro, Liu (2023) — "Generative AI, the American Worker, and the Future of Work" — Brookings

Cambridge (CFI, CSER, Judge Business School, ai@cam)

  • Hernandez-Orallo et al. (2026) — "General Scales Unlock AI Evaluation with Explanatory and Predictive Power" — Nature
  • Burden, Voudouris, Tesic, Hernandez-Orallo — "Measurement Layout Framework" — CSER
  • Coyle et al. (2024) — "Determinants of Firms' Decision to Adopt AI" — SSRN

Harvard (HBS, D^3, Berkman Klein, LISH)

  • Dell'Acqua et al. (2026) — "Navigating the Jagged Technological Frontier" — Organization Science
  • Randazzo, Lifshitz-Assaf et al. (2024) — "Cyborgs, Centaurs and Self-Automators" — SSRN

MIT (MIT FutureTech, Sloan, IDE, MIT-IBM Watson AI Lab)

  • Brynjolfsson, Mitchell & Rock (2018) — "What Can Machines Learn, and What Does It Mean for Occupations?" — AEA Papers & Proceedings
  • Svanberg, Li, Fleming, Goehring & Thompson (2024) — "Beyond AI Exposure: Which Tasks Are Cost-Effective to Automate?" — MIT FutureTech
  • Acemoglu (2024) — "The Simple Macroeconomics of AI" — MIT Economics

NBER (Eloundou/Manning/Mishkin/Rock, Felten/Raj/Seamans, Acemoglu/Restrepo)

  • Eloundou, Manning, Mishkin, Rock (2023) — "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models" — arXiv
  • Felten, Raj, Seamans (2021) — "Occupational, Industry, and Geographic Exposure to AI" — SSRN / Strategic Management Journal
  • Brynjolfsson, Li, Raymond (2023) — "Generative AI at Work" — NBER w31161

Oxford (Oxford Martin Programme on Technology and Employment, GovAI)

  • Frey & Osborne (2013/2017) — "The Future of Employment: How Susceptible Are Jobs to Computerisation?" — Oxford Martin
  • Wood, Graham, Lehdonvirta, Hjorth (2019) — "Good Gig, Bad Gig" — Work, Employment and Society

Stanford HAI (CRFM, Digital Economy Lab, HAI Policy)

  • Liang et al. (2022) — "Holistic Evaluation of Language Models (HELM)" — arXiv
  • Bommasani et al. — "Foundation Model Transparency Index (FMTI)" — CRFM Stanford
  • Stanford HAI (2026) — "AI Index Report 2026" — HAI
AI labs · 7 institutions

Anthropic (Economic Index, Responsible Scaling Policy, Agent Evals)

  • Anthropic (2026) — "Anthropic Economic Index — January 2026 Report" — Anthropic
  • Handa et al. (2025) — "Which Economic Tasks are Performed with AI?" — Anthropic PDF
  • Anthropic (2026) — "Responsible Scaling Policy v3.0" — Anthropic

ARC Evals (Alignment Research Center Evaluations Team, now METR)

  • Kinniment et al. (2023) — "Evaluating Language-Model Agents on Realistic Autonomous Tasks" — arXiv
  • ARC Evals (2023) — "Responsible Scaling Policies" — evals.alignment.org

Epoch AI (Benchmarks, Forecasting, Gradient Updates)

  • Glazer et al. (2024) — "FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI" — arXiv
  • Epoch AI — "GATE: General AI Capability Evaluation" — arXiv
  • Epoch AI — "Most AI Value Will Come From Broad Automation, Not From R&D" — Epoch AI

Google DeepMind (Frontier Safety Framework, Levels of AGI)

  • Morris et al. (2023) — "Levels of AGI: Operationalizing Progress on the Path to AGI" — arXiv
  • Google DeepMind (2026) — "Frontier Safety Framework v3.1" — DeepMind PDF
  • Weidinger et al. (2023) — "Sociotechnical Safety Evaluation of Generative AI Systems" — arXiv

METR (Model Evaluation & Threat Research)

  • METR (2025) — "Measuring AI Ability to Complete Long Tasks" — arXiv
  • METR (2025) — "HCAST: Human-Calibrated Autonomy Software Tasks" — arXiv
  • METR — "Autonomy Evaluation Resources" — METR

Microsoft Research (Working with AI, New Future of Work, RAI Standard)

  • Tomlinson et al. (2025) — "Working with AI: Measuring the Occupational Implications of Generative AI" — arXiv
  • Microsoft Research (2025) — "New Future of Work Report 2025" — Microsoft Research
  • Microsoft (2022) — "Responsible AI Impact Assessment Template" — Microsoft Blog PDF

OpenAI (Model Spec, GDPval, Preparedness Framework)

  • Eloundou et al. (2023) — "GPTs are GPTs" — arXiv
  • OpenAI (2025) — "GDPval: Measuring AI on Real-World Economically Valuable Tasks" — arXiv
  • OpenAI (2025) — "Model Spec (2025-12-18)" — model-spec.openai.com
Standard-setting · 6 institutions

COSO (Committee of Sponsoring Organizations of the Treadway Commission)

  • COSO (2026) — "Achieving Effective Internal Control Over Generative AI" — COSO
  • COSO / Deloitte (2021) — "Realize the Full Potential of AI: Applying the COSO ERM Framework" — Deloitte

IAASB (International Auditing and Assurance Standards Board)

  • IAASB (2024) — "International Standard on Sustainability Assurance 5000 (ISSA 5000)" — IAASB
  • IAASB (2024) — "Technology Position Statement — 8 Guiding Actions" — IAASB
  • IAASB (2025) — "Technology Catalog of Issues v2" — IFAC PDF

IFAC (International Federation of Accountants)

  • IFAC / IAASB (2025) — "ISSA 5000 Implementation Guide" — IFAC PDF
  • IFAC — "Artificial Intelligence & Accounting (Knowledge Gateway)" — IFAC

IIA (Institute of Internal Auditors)

  • IIA (2024) — "AI Auditing Framework (September 2024 Update)"
  • IIA (2024) — "Global Internal Audit Standards 2024" — IIA

ISO (International Organization for Standardization, JTC1/SC42)

  • ISO/IEC 42001:2023 — "Information Technology — AI Management System" — ISO
  • ISO/IEC 23894:2023 — "Information Technology — AI — Guidance on Risk Management" — ISO
  • ISO/IEC 25059:2023 — "Quality Model for AI Systems" — ITeh Sample PDF

NIST (National Institute of Standards and Technology, AISI/CAISI)

  • NIST (2023) — "AI Risk Management Framework 1.0" — NIST PDF
  • NIST (2024) — "AI 600-1: Generative AI Profile" — NIST PDF
  • NIST (2021) — "NISTIR 8312: Four Principles of Explainable AI" — NIST PDF
Consultancies · 11 institutions

Accenture (Technology Vision, Responsible AI, Wharton-Accenture)

  • Accenture (2023) — "Work, Workforce, Workers: Reinvented in the Age of Generative AI" — Accenture
  • Accenture (2025) — "Technology Vision 2025" — Accenture PDF
  • Accenture — "Responsible AI: From Compliance to Confidence" — Accenture PDF

Bain & Company (Technology Report, Feasibility Factors, Agentic AI)

  • Bain (2025) — "The $100 Billion SaaS Opportunity Hiding in Cross-System Labor" (6 Feasibility Factors) — Bain
  • Bain (2025) — "Will Agentic AI Disrupt SaaS? Technology Report 2025" — Bain
  • Bain (2025) — "State of the Art of Agentic AI Transformation" — Bain

BCG (Reshape, Jagged Frontier, AI at Work)

  • BCG (2026) — "AI Will Reshape More Jobs Than It Replaces" — BCG
  • Dell'Acqua et al. (2023) — "Navigating the Jagged Technological Frontier" (BCG × HBS) — SSRN
  • BCG (2025) — "AI at Work 2025: Momentum Builds, but Gaps Remain" — BCG

Deloitte (MGI Generative AI for Work Tasks, Trustworthy AI)

  • Deloitte Insights — "Generative AI for Government Work Tasks" (1–10 Index) — Deloitte
  • Deloitte — "Trustworthy AI Governance in Practice" — Deloitte
  • Deloitte (2026) — "State of AI in the Enterprise 2026" — Deloitte

EY (AAA Framework, Responsible AI Principles, Confidence Index)

  • EY (2024) — "Responsible AI Principles" — EY PDF
  • EY — "Redesigning Work Around Human Skills in the Age of AI (AAA Framework)" — EY
  • EY — "EY.ai Confidence Index" — EY

Kearney (AI Catalyst, GenAI Roles, Procurement)

  • Kearney — "Putting Generative AI to Work" — Kearney
  • Kearney — "Are You AI Ready?"
  • Kearney — "AI Catalyst" — Kearney

KPMG (Trusted AI, 10 Pillars, Risk & Controls)

  • KPMG — "Trusted AI Framework" — KPMG Global
  • KPMG Australia (2025) — "Deploying Trustworthy AI: An Illustrative Risk and Controls Guide" — KPMG PDF
  • KPMG — "AI Governance Principles for Boards" — KPMG

McKinsey (MGI 18×4 Capabilities, Superagency, Agentic AI)

  • McKinsey Global Institute (2017) — "A Future That Works: Automation, Employment, and Productivity" (18 Capabilities × 0–3 rubric) — MGI PDF
  • McKinsey Global Institute (2023) — "The Economic Potential of Generative AI" — McKinsey
  • McKinsey (2025) — "Seizing the Agentic AI Advantage" — McKinsey PDF

Oliver Wyman (Discovery vs Trust Tasks, AI Agents Banking)

  • Oliver Wyman (2025) — "4 Phases to Smarter AI Integration" (Discovery vs Trust Tasks) — Oliver Wyman
  • Oliver Wyman (2023) — "Navigating the AI Revolution" — Oliver Wyman
  • Oliver Wyman (2026) — "AI Agents in Banking: Reshaping Roles, Skills and Leadership" — Oliver Wyman

PwC (AI Jobs Barometer, Responsible AI, Sizing the Prize)

  • PwC (2025) — "Global AI Jobs Barometer 2025" — PwC PDF
  • PwC (2025) — "AI Jobs Barometer — Methodology Appendix" — PwC PDF
  • PwC — "Sizing the Prize" — PwC PDF

Strategy& (Automating for Growth, Capabilities-Driven Strategy)

  • Strategy& — "Automating for Growth" — Strategy&
  • Strategy& — "Small Automation, Big Benefits" — Strategy&
  • Strategy& — "Capabilities-Driven Strategy" — Strategy&
Law · 5 institutions

Allen & Overy (A&O Shearman, Harvey, ContractMatrix)

  • A&O Shearman — "AI Classifier" — A&O Shearman
  • A&O Shearman — "ContractMatrix Analyze: AI that Understands Your Commercial Positions" — A&O Shearman
  • A&O Shearman — "Zooming in on AI 8: Balancing Innovation and Compliance" — A&O Shearman

Clifford Chance (AI Principles, EU AI Act Hub, LUCY)

Freshfields (EU AI Act Coverage, Board-Level Imperative, Anthropic Partnership)

  • Freshfields — "Artificial Intelligence Act" — Freshfields
  • Freshfields (2026) — "AI Now a Board-Level Imperative for Public Companies and Investors" — Freshfields
  • Freshfields (2026) — "Data Law Trends 2026" — Freshfields PDF

Latham & Watkins (WEF AI Toolkit, EU AI Act Deployer Obligations)

  • Latham & Watkins / WEF (2020) — "Empowering AI Leadership — Oversight Toolkit (Board Version)" — WEF PDF
  • Latham & Watkins — "EU AI Act: Obligations for Deployers of High-Risk AI Systems" — Latham
  • Latham & Watkins — "AI and ESG: How Companies Are Thinking About AI Board Governance" — Latham

Linklaters (LinksAI Benchmark, AI Governance & Quality Assurance)

International · 5 institutions

ILO (International Labour Organization, NASK Collaboration)

  • Gmyrek, Berg, Bescond (2023) — "Generative AI and Jobs: A Global Analysis of Potential Effects on Job Quantity and Quality" (WP96) — ILO PDF
  • Gmyrek et al. / ILO × NASK (2025) — "Generative AI and Jobs: Refined Global Index" (WP140) — ILO PDF
  • Gmyrek (2025) — "Task-Score Browser (ISCO-08 Dataset)" — GitHub Pages

OECD (AI Capability Indicators, AI and the Future of Skills)

  • OECD (2025) — "Introducing the OECD AI Capability Indicators" — OECD
  • OECD — "AI Capability Indicators — Interactive Tool" — OECD
  • Lassebie & Quintini (2022) — "What Skills and Abilities Can Automation Technologies Replicate and What Does It Mean for Workers?" (OECD WP No. 282) — OECD PDF

RAND (CAST, AI-Biosecurity, Capabilities-Based Planning)

  • Mouton, Lucas, Guest (2023/2024) — "The Operational Risks of AI in Large-Scale Biological Attacks" — RAND
  • RAND Europe / CLTR (2025) — "Global Risk Index for AI-enabled Biological Tools" — CLTR PDF
  • RAND (2026) — "Tipping the Cyber Balance: How AI Benchmarks Could Make a Difference" — RAND

UC Berkeley (BAIR, CHAI, Haas, Kang Lab)

  • Zhu et al. (2025) — "Establishing Best Practices for Building Rigorous Agentic Benchmarks (ABC)" — arXiv
  • BAIR (2021) — "BASALT: A Benchmark for Learning from Human Feedback" — BAIR Blog

WEF (Future of Jobs, Jobs of Tomorrow, AI Governance Alliance)

  • WEF / Accenture (2023) — "Jobs of Tomorrow: Large Language Models and Jobs" — WEF PDF
  • WEF (2025) — "Future of Jobs Report 2025" — WEF PDF
  • WEF (2024) — "AI Governance Alliance Briefing Paper Series" — WEF

Full assessment matrix and dimension definitions are available in client engagements.

Footer note: the methodology keeps evolving. CAPAB dimensions get recalibrated quarterly against new model generations. TASK, DEPLOY, and GOV dimensions move as needed — most recently around the Revised ESRS, the EU AI Act, and new evaluation work on frontier models.

Get your workflow assessed

A methodology is only as good as its use. For a real read on your workflows — task by task, with clean operationalisation — let's talk.

Get in touch