We worked through them and built our own approach — aimed squarely at the workflows that actually show up in sustainability work: double materiality assessments, GHG inventories, CSRD reporting, supply-chain due diligence.
Section 1What's out there
People come at this question from different angles.
Eloundou et al. (OpenAI/NBER, 2023) break occupations down into small tasks. "Tax Preparer", for instance, includes things like "compute taxes owed by following tax code." Then they ask, task by task: could AI do this at the same quality, at least twice as fast? Experts answer some, GPT-4 answers others. They aggregate up to the occupation.
Webb (Stanford, 2019/2020) goes the other way around. He reads recent AI patents — "automated visual inspection of welds," that sort of thing — and lays them over job descriptions. If a patented task shows up in a job, that's a signal: someone has already put money into making it automatable.
Felten, Raj, Seamans (NBER, 2018–2024) start with the capabilities — image recognition, translation, language modeling — and match them against task requirements from O*NET, the US database of 800-plus occupations. The more capabilities that map onto a task, the more AI-exposed it is.
McKinsey (MGI, 2017 / GenAI update 2023) combines both moves. They break 800 occupations into roughly 2,000 work activities and rate them against 18 capabilities — sensory, logical, language, social-emotional, physical. Each capability gets a score from 0 (not needed) to 3 (top-quartile human performance). An activity is "technically automatable" when current tech clears the bar on everything it needs. In the 2023 update, McKinsey lands at up to 70 % of work hours.
Looking at the task itself
A different school skips the occupation and the capability and asks about the task: which properties make a task automation-friendly?
Brynjolfsson, Mitchell, and Rock at MIT were early here with their "Suitability for Machine Learning" rubric. Eight criteria — clearly defined inputs and outputs, training data on hand, and so on. One idea stuck: the more structured a task, the more amenable it is to AI.
BCG and Bain took it further. BCG lists five criteria — "no significant physical presence," "rule-based, traceable result." Bain adds six "Agentic Automation Feasibility Factors," including output verifiability (can you cheaply check the result?) and integration and orchestration (how many systems does the AI need to talk to?). Bain's contribution: don't just ask whether the model can do the task in theory — ask whether it can handle it inside the company's actual tool stack.
Section 2A wider evaluation base
BCG's five and Bain's six are a strong core. For sustainability work we add more dimensions on equal footing, drawing on a deeper review across 41 institutions — academia, AI labs, standard-setters, consultancies, law firms, international bodies.
That gave us 17 task properties, each traceable back to a clear research source.
A few of the additions:
Atypicality. Has the model seen this kind of material before? Work in an established sector with mature reporting patterns scores low — the training data is full of close relatives. Work under fresh regulation — the Revised ESRS draft from May 2026, the CSDDD rollout — scores high. The material is too new to be in training. Sources: Cambridge ADeLe, MIT Brynjolfsson SML, METR Time-Horizon studies, NBER, OECD AI Capability Indicators.
Cognitive load. How many reasoning steps to a solid answer? Assigning an emission factor is usually one. A full risk-and-opportunity assessment across several value-chain stages is many, nested. Today's frontier models tend to lose reliability after a few chained steps. Sources: Cambridge ADeLe, METR Time-Horizon, MIT SML, OECD AI Capability Indicators, Stanford HAI.
Knowledge depth. How much specialist knowledge does it take? Topic screening only needs ESRS basics — public guidance covers it. Setting thresholds also needs sector experience, an assurance lens, and audit standards — much harder to rebuild from public sources alone. Sources: Cambridge ADeLe, OECD AI Capability Indicators, MIT, Stanford HAI.
17 TASK dimensions in total.
Putting it together
Like Eloundou et al., we break each workflow — a double materiality assessment, for example — into individual tasks, at a granularity that's fine enough to expose real differences but coarse enough to stay readable.
Then we score each task against the dimensions. That gives us a concrete read on automation potential, and an aggregate picture per workflow.
What comes out isn't a single percentage. It's a clearer answer to the questions that actually matter: which steps are worth automating? Which task properties help? Which get in the way? That lets you design workflows where humans and machines actually play to each other — and avoid the trap of pilots that never reach production.
Section 3What it takes to realise the potential
Even a task that's automatable on paper needs three more things to work in practice:
- the right capabilities in today's models,
- the right setup inside the organisation,
- the right governance.
What the models can do
If the model can't do what the task needs, results get unreliable fast. Model capabilities move quickly, so we re-score this layer every quarter — what sits on the edge today often runs robustly a year out.
Nine criteria for model capability. The clearest examples:
- Metacognition — does the model know what it doesn't know? Today's frontier models are weak here. For anything that has to survive an audit, this is usually the binding constraint.
- Context handling — how much can the model read at once, hold accurately, and reference cleanly at the end? Critical for long documents like a CSRD report.
- Quantitative reasoning — can it calculate reliably and read numbers correctly? Make-or-break for GHG work.
What the organisation brings
The setup inside the company matters just as much. Is the data digital? How strong is the domain expertise on the team? What tools are in place? How well do data, systems, and accountability hang together?
Six dimensions here.
Bain and the Cambridge Bennett Institute land on the same point: technically automatable doesn't mean practically usable. Coyle's UK study shows that only a small share of companies actually convert AI into measurable productivity — and the bottleneck is rarely the task, it's organisational friction.
Our default is a typical mid-sized CSRD-bound company. For any real assessment, you adjust that default to the specifics of the company in question.
Already here, the assessment beats a single number
By this point we know three things: whether a task is automatable in principle, whether today's models can carry it, and whether the company is set up to actually use them.
That surfaces concrete signals: which gaps does the company need to close to realise the potential? Where do today's models still cap out? Which tasks are technically attractive but practically still hard?
A single percentage would flatten all of that.
Governance — what often decides in practice
One more layer often decides whether AI can actually be deployed: the controls around it.
- Is it ethically defensible?
- Will the result hold up to an audit?
- Is there reputational risk?
- Are data protection, traceability, and accountability clear?
Nine more dimensions here — EU AI Act, ESRS audit obligations, GDPR, IAASB requirements, professional skepticism.
This stage doesn't usually return a hard stop. More often it spells out which controls are required where: a four-eyes review, an audit trail, a documented source base, a Fundamental Rights Impact Assessment.
Section 4The methodology in five stages
We work in five stages. A task is AI-suitable only when all four assessment stages clear it — and only when Stage 0, the operationalisation, was done properly.
The 41 institutions behind the methodology
- Academia (7): incl. MIT Brynjolfsson, NBER Eloundou
- AI labs (7): incl. Anthropic, METR
- Standard-setting (6): incl. NIST, IAASB
- Consultancies (11): incl. Bain, BCG
- Law (5): incl. Linklaters, Clifford Chance
- International (5): incl. OECD, ILO
Full source list further down.
The result: a workflow that actually runs
What you get out is more than a score. It's a usable answer to one of the harder questions in deploying AI:
How do I build a workflow that holds up in production?
The methodology says where automation pays off, where it doesn't, what has to be in place to make it work, and which controls it needs to clear.
Sources — 41 institutions in six groups
The dimensions were synthesised from existing literature. A dimension makes it into the canonical set only if several institutions name it, or if it carries enough explanatory power on its own to be a strong signal for AI potential.
| Group | n | Top institutions |
|---|---|---|
| Academia | 7 | MIT Brynjolfsson (SML), NBER Eloundou, Harvard HBS (Jagged Frontier), Cambridge ADeLe, Stanford HAI (HELM/FMTI), Oxford Frey & Osborne, Brookings |
| AI labs | 7 | Anthropic (Economic Index + Agent Evals + RSP), OpenAI (Model Spec, GDPval), Google DeepMind (Levels of AGI, FSF), METR (Time-Horizon, Messiness), Microsoft Research (Tomlinson), ARC Evals, Epoch AI |
| Standard-setting | 6 | NIST (AI RMF MAP, GenAI Profile), ISO (42001, 23894, 25059), COSO, IAASB (ISSA 5000), IFAC, IIA |
| Consultancies | 11 | Bain (Feasibility Factors), BCG (Reshape, Jagged Frontier), McKinsey (MGI 18×4), Deloitte, EY (AAA, 9 RAI), KPMG (10 Pillars), PwC, Accenture, Oliver Wyman, Strategy&, Kearney |
| Law | 5 | Linklaters (LinksAI Benchmark), Clifford Chance, Allen & Overy, Freshfields, Latham & Watkins |
| International | 5 | OECD (AI Capability Indicators), WEF (Jobs of Tomorrow, AI Governance Alliance), ILO (WP140), RAND (Bioweapons Uplift), UC Berkeley (ABC, BASALT, CHAI) |
Tier ranking by methodological rigour:
- Tier 1: seven institutions — METR, Cambridge, OECD, MIT, NBER, Harvard, Stanford HAI.
- Tier 2: ten institutions — Anthropic, Bain, BCG, DeepMind, ILO, Linklaters, McKinsey, Microsoft Research, OpenAI, WEF.
- Tier 3: 24 institutions — primarily governance and practice frameworks.
Source register — primary sources per institution
The list below names the primary sources that fed into the canonical dimensions, grouped by the six institutional categories and alphabetical within each. One to three key documents per institution are linked. The full source collection per institution lives in the internal methodology register. Secondary literature, press coverage, and commentary are deliberately left out.
Academia · 7 institutions
Brookings
- Muro, Whiton, Maxim (2019) — "What Jobs Are Affected by AI?" — Brookings
- Kinder, de Souza Briggs, Muro, Liu (2023) — "Generative AI, the American Worker, and the Future of Work" — Brookings
Cambridge
- Hernandez-Orallo et al. (2026) — "General Scales Unlock AI Evaluation with Explanatory and Predictive Power" — Nature
- Burden, Voudouris, Tesic, Hernandez-Orallo — "Measurement Layout Framework" — CSER
- Coyle et al. (2024) — "Determinants of Firms' Decision to Adopt AI" — SSRN
Harvard
- Dell'Acqua et al. (2026) — "Navigating the Jagged Technological Frontier" — Organization Science
- Randazzo, Lifshitz-Assaf et al. (2024) — "Cyborgs, Centaurs and Self-Automators" — SSRN
MIT
- Brynjolfsson, Mitchell & Rock (2018) — "What Can Machines Learn, and What Does It Mean for Occupations?" — AEA Papers & Proceedings
- Svanberg, Li, Fleming, Goehring & Thompson (2024) — "Beyond AI Exposure: Which Tasks Are Cost-Effective to Automate?" — MIT FutureTech
- Acemoglu (2024) — "The Simple Macroeconomics of AI" — MIT Economics
NBER
- Eloundou, Manning, Mishkin, Rock (2023) — "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models" — arXiv
- Felten, Raj, Seamans (2021) — "Occupational, Industry, and Geographic Exposure to AI" — SSRN / Strategic Management Journal
- Brynjolfsson, Li, Raymond (2023) — "Generative AI at Work" — NBER w31161
Oxford
- Frey & Osborne (2013/2017) — "The Future of Employment: How Susceptible Are Jobs to Computerisation?" — Oxford Martin
- Wood, Graham, Lehdonvirta, Hjorth (2019) — "Good Gig, Bad Gig" — Work, Employment and Society
Stanford HAI
- Liang et al. (2022) — "Holistic Evaluation of Language Models (HELM)" — arXiv
- Bommasani et al. — "Foundation Model Transparency Index (FMTI)" — CRFM Stanford
- Stanford HAI (2026) — "AI Index Report 2026" — HAI
AI labs · 7 institutions
Anthropic
- Anthropic (2026) — "Anthropic Economic Index — January 2026 Report" — Anthropic
- Handa et al. (2025) — "Which Economic Tasks are Performed with AI?" — Anthropic PDF
- Anthropic (2026) — "Responsible Scaling Policy v3.0" — Anthropic
ARC Evals
- Kinniment et al. (2023) — "Evaluating Language-Model Agents on Realistic Autonomous Tasks" — arXiv
- ARC Evals (2023) — "Responsible Scaling Policies" — evals.alignment.org
Epoch AI
- Glazer et al. (2024) — "FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI" — arXiv
- Epoch AI — "GATE: General AI Capability Evaluation" — arXiv
- Epoch AI — "Most AI Value Will Come From Broad Automation, Not From R&D" — Epoch AI
Google DeepMind
- Morris et al. (2023) — "Levels of AGI: Operationalizing Progress on the Path to AGI" — arXiv
- Google DeepMind (2026) — "Frontier Safety Framework v3.1" — DeepMind PDF
- Weidinger et al. (2023) — "Sociotechnical Safety Evaluation of Generative AI Systems" — arXiv
METR
- METR (2025) — "Measuring AI Ability to Complete Long Tasks" — arXiv
- METR (2025) — "HCAST: Human-Calibrated Autonomy Software Tasks" — arXiv
- METR — "Autonomy Evaluation Resources" — METR
Microsoft Research
- Tomlinson et al. (2025) — "Working with AI: Measuring the Occupational Implications of Generative AI" — arXiv
- Microsoft Research (2025) — "New Future of Work Report 2025" — Microsoft Research
- Microsoft (2022) — "Responsible AI Impact Assessment Template" — Microsoft Blog PDF
OpenAI
- Eloundou et al. (2023) — "GPTs are GPTs" — arXiv
- OpenAI (2025) — "GDPval: Measuring AI on Real-World Economically Valuable Tasks" — arXiv
- OpenAI (2025) — "Model Spec (2025-12-18)" — model-spec.openai.com
Standard-setting · 6 institutions
COSO
- COSO (2026) — "Achieving Effective Internal Control Over Generative AI" — COSO
- COSO / Deloitte (2021) — "Realize the Full Potential of AI: Applying the COSO ERM Framework" — Deloitte
IAASB
- IAASB (2024) — "International Standard on Sustainability Assurance 5000 (ISSA 5000)" — IAASB
- IAASB (2024) — "Technology Position Statement — 8 Guiding Actions" — IAASB
- IAASB (2025) — "Technology Catalog of Issues v2" — IFAC PDF
IFAC
- IFAC / IAASB (2025) — "ISSA 5000 Implementation Guide" — IFAC PDF
- IFAC — "Artificial Intelligence & Accounting (Knowledge Gateway)" — IFAC
IIA
- IIA (2024) — "AI Auditing Framework (September 2024 Update)"
- IIA (2024) — "Global Internal Audit Standards 2024" — IIA
ISO
- ISO/IEC 42001:2023 — "Information Technology — AI Management System" — ISO
- ISO/IEC 23894:2023 — "Information Technology — AI — Guidance on Risk Management" — ISO
- ISO/IEC 25059:2023 — "Quality Model for AI Systems" — ITeh Sample PDF
NIST
Consultancies · 11 institutions
Accenture
- Accenture (2023) — "Work, Workforce, Workers: Reinvented in the Age of Generative AI" — Accenture
- Accenture (2025) — "Technology Vision 2025" — Accenture PDF
- Accenture — "Responsible AI: From Compliance to Confidence" — Accenture PDF
Bain & Company
- Bain (2025) — "The $100 Billion SaaS Opportunity Hiding in Cross-System Labor" (6 Feasibility Factors) — Bain
- Bain (2025) — "Will Agentic AI Disrupt SaaS? Technology Report 2025" — Bain
- Bain (2025) — "State of the Art of Agentic AI Transformation" — Bain
BCG
- BCG (2026) — "AI Will Reshape More Jobs Than It Replaces" — BCG
- Dell'Acqua et al. (2023) — "Navigating the Jagged Technological Frontier" (BCG × HBS) — SSRN
- BCG (2025) — "AI at Work 2025: Momentum Builds, but Gaps Remain" — BCG
Deloitte
- Deloitte Insights — "Generative AI for Government Work Tasks" (1–10 Index) — Deloitte
- Deloitte — "Trustworthy AI Governance in Practice" — Deloitte
- Deloitte (2026) — "State of AI in the Enterprise 2026" — Deloitte
EY
- EY (2024) — "Responsible AI Principles" — EY PDF
- EY — "Redesigning Work Around Human Skills in the Age of AI (AAA Framework)" — EY
- EY — "EY.ai Confidence Index" — EY
Kearney
- Kearney — "Putting Generative AI to Work" — Kearney
- Kearney — "Are You AI Ready?"
- Kearney — "AI Catalyst" — Kearney
KPMG
- KPMG — "Trusted AI Framework" — KPMG Global
- KPMG Australia (2025) — "Deploying Trustworthy AI: An Illustrative Risk and Controls Guide" — KPMG PDF
- KPMG — "AI Governance Principles for Boards" — KPMG
McKinsey
- McKinsey Global Institute (2017) — "A Future That Works: Automation, Employment, and Productivity" (18 Capabilities × 0–3 rubric) — MGI PDF
- McKinsey Global Institute (2023) — "The Economic Potential of Generative AI" — McKinsey
- McKinsey (2025) — "Seizing the Agentic AI Advantage" — McKinsey PDF
Oliver Wyman
- Oliver Wyman (2025) — "4 Phases to Smarter AI Integration" (Discovery vs Trust Tasks) — Oliver Wyman
- Oliver Wyman (2023) — "Navigating the AI Revolution" — Oliver Wyman
- Oliver Wyman (2026) — "AI Agents in Banking: Reshaping Roles, Skills and Leadership" — Oliver Wyman
PwC
- PwC (2025) — "Global AI Jobs Barometer 2025" — PwC PDF
- PwC (2025) — "AI Jobs Barometer — Methodology Appendix" — PwC PDF
- PwC — "Sizing the Prize" — PwC PDF
Strategy&
Law · 5 institutions
Allen & Overy
- A&O Shearman — "AI Classifier" — A&O Shearman
- A&O Shearman — "ContractMatrix Analyze: AI that Understands Your Commercial Positions" — A&O Shearman
- A&O Shearman — "Zooming in on AI 8: Balancing Innovation and Compliance" — A&O Shearman
Clifford Chance
- Clifford Chance — "AI Principles" — Clifford Chance
- Clifford Chance — "The EU AI Act: Overview of Key Rules and Requirements" — Clifford Chance PDF
- Clifford Chance (2025) — "The EU Introduces New Rules on AI Liability" — Clifford Chance PDF
Freshfields
- Freshfields — "Artificial Intelligence Act" — Freshfields
- Freshfields (2026) — "AI Now a Board-Level Imperative for Public Companies and Investors" — Freshfields
- Freshfields (2026) — "Data Law Trends 2026" — Freshfields PDF
Latham & Watkins
- Latham & Watkins / WEF (2020) — "Empowering AI Leadership — Oversight Toolkit (Board Version)" — WEF PDF
- Latham & Watkins — "EU AI Act: Obligations for Deployers of High-Risk AI Systems" — Latham
- Latham & Watkins — "AI and ESG: How Companies Are Thinking About AI Board Governance" — Latham
Linklaters
- Linklaters (2025) — "LinksAI English Law Benchmark v2" — Linklaters DigiLinks
- Linklaters (2023) — "LinksAI English Law Benchmark v1" — Linklaters DigiLinks
- Linklaters (2025) — "AI Governance and Quality Assurance: Lessons from Linklaters and the Audit Sector" — Linklaters DigiLinks
International · 5 institutions
ILO
- Gmyrek, Berg, Bescond (2023) — "Generative AI and Jobs: A Global Analysis of Potential Effects on Job Quantity and Quality" (WP96) — ILO PDF
- Gmyrek et al. / ILO × NASK (2025) — "Generative AI and Jobs: Refined Global Index" (WP140) — ILO PDF
- Gmyrek (2025) — "Task-Score Browser (ISCO-08 Dataset)" — GitHub Pages
OECD
- OECD (2025) — "Introducing the OECD AI Capability Indicators" — OECD
- OECD — "AI Capability Indicators — Interactive Tool" — OECD
- Lassebie & Quintini (2022) — "What Skills and Abilities Can Automation Technologies Replicate and What Does It Mean for Workers?" (OECD WP No. 282) — OECD PDF
RAND
- Mouton, Lucas, Guest (2023/2024) — "The Operational Risks of AI in Large-Scale Biological Attacks" — RAND
- RAND Europe / CLTR (2025) — "Global Risk Index for AI-enabled Biological Tools" — CLTR PDF
- RAND (2026) — "Tipping the Cyber Balance: How AI Benchmarks Could Make a Difference" — RAND
UC Berkeley
- Zhu et al. (2025) — "Establishing Best Practices for Building Rigorous Agentic Benchmarks (ABC)" — arXiv
- BAIR (2021) — "BASALT: A Benchmark for Learning from Human Feedback" — BAIR Blog
WEF
Full assessment matrix and dimension definitions are available in client engagements.
Footer note: the methodology keeps evolving. CAPAB dimensions get recalibrated quarterly against new model generations. TASK, DEPLOY, and GOV dimensions move as needed — most recently around the Revised ESRS, the EU AI Act, and new evaluation work on frontier models.