Double Materiality Assessment — AI potential assessed · Worked example

The DMA is the hardest piece of the CSRD workflow — and at the same time the biggest opportunity for AI support. This assessment breaks it down into eight sub-tasks and runs each one through the full methodology (Stages 0–4). The point is to land on a differentiated recommendation per task, not a blanket "DMA number".

Per-sub-task assessment

Context analysis · business model, value chain, regulatory framing

AI as Consultant50–70 % time saved

High-level topic screening · pre-relevance check across the ESRS topic list

AI as Collaborator65–80 % time saved

IRO identification · impacts, risks, opportunities per topic field

AI as Consultant50–70 % time saved

Threshold setting · calibrate materiality thresholds per IRO type

AI as Tool30–50 % time saved

Initial scoring · score IROs along scale × scope × remediability

AI as Consultant50–70 % time saved

SME review · subject-matter expert validation of the initial scoring

AI as Tool30–50 % time saved

Stakeholder validation · external stakeholder consultation

Human only0 % time saved

Board presentation · top-level conclusions & decision memo

Human only0 % time saved

Aggregate across all eight tasks: medium-to-high · 50–65 % time saved (weighted by typical effort across the workflow).

Stage 0 — operationalisation per sub-task

Before the assessment runs, every sub-task gets a concrete setup: which inputs feed it, which rubric structures the work, which tools come in, what shape the output takes, what gets reviewed and how.

For context analysis, for example: the inputs are the current annual report, the org chart, a list of sites, and the sector profile. An LLM works them against the EFRAG IG 1 checklist. The output is a structured context note — the foundation for the topic screening that follows.

Each of the eight sub-tasks gets the same treatment before scoring. Skip it — go in with intuitive AI use and no defined setup — and the savings drop sharply while the error risk climbs.

Model capabilities — Stage 2

Scored against today's frontier models (Claude / GPT / Gemini, 2026).

CAPAB coverage

~75 %

Strong:

Text and structure (D20b modality: text-heavy work; D10b context window: 200k+ tokens is plenty).
Pulling research together across messy documents via RAG (D04a clears).
Categorising and pattern-matching within topic fields (D07 rule logic is strong).

Weak:

Calibration / metacognition (D08). Models only partly recognise where their knowledge ends. OECD ACI puts frontier models at L2 of 5 — below median. For the DMA that means a model claim like "this IRO is clearly material" can't be taken at face value.
Revised ESRS 2026 — out of distribution (D26a / D09). The May 6, 2026 draft sits outside training data. Paragraph references have to come from RAG, not from the model's memory.

Deployment readiness — Stage 3

Default — mid-sized company with a compact sustainability function

~65 %

The default reflects a typical setup: documents are digital, an ERP/DMS is in place, the sustainability team is 1–3 people, and IT has a clear EU-cloud policy.

What the client needs to bring:

D04b — inputs available digitally. Annual report, site lists, peer DMAs in text form (PDFs fine, scanned images not).
D11 — user skill. Sustainability team has basic RAG experience, or is willing to learn.
D12 — tool integration. LLM API access (Anthropic / OpenAI / Google) in an EU cloud region; no deep ERP integration needed for the DMA itself.
D26b — monitoring. A documented review cadence for AI output before anything lands in the report.
D02b — blast radius. Limited assurance under ISSA 5000 from CSRD year 1, reasonable assurance later. Needs an auditable trail (see Stage 4).

For more mature setups — an in-house LLM-ops team, established ground-truth pipelines, a dedicated assurance function — readiness rises to 85–90 %, and the recommendations rise with it.

Governance — Stage 4

Governance sets the controls that AI use brings with it — independently of whether the task is technically automatable.

What matters most for the DMA:

Non-delegable accountability. The materiality decision and the board presentation stay human. Reporting accountability can't be handed to AI. See IAASB professional skepticism, ISSA 5000.
Traceable audit trail. ESRS 1 requires a documented, reperformable DMA methodology. Every AI-supported step has to trace back to source, input, and reviewer decision. See ESRS 1 §3+§6, ISSA 5000.
Fundamental rights. When the DMA touches stakeholders with special protection needs — own workforce, affected communities, vulnerable groups in the value chain — the FRIA obligation under the EU AI Act kicks in. Stakeholder validation (T7) stays human-only.

Less critical here:

Misuse and dual-use risk is low — the DMA is not safety-critical.
Fairness and bias only matter indirectly, since materiality logic doesn't decide over individuals.

Recommended controls:

Four-eyes review on every materiality decision and threshold — a second qualified person signs off.
Method documentation covering model version, inputs, prompts, and reviewer decision per step — the basis for reperformance by the assurance provider.
DPA + EU cloud setup for any personal data from stakeholders (GDPR Art. 28).

Where to go from here

Three takeaways from the DMA as a workflow.

The lever sits in the AI-friendly middle. T2 (topic screening) and T3 (IRO identification) usually account for 30–40 % of total effort, and they're exactly where the 65–80 % and 50–70 % savings live. A lean pipeline for those two tasks pays off more than any attempt to "automate" T7 or T8.

Stage 0 is the condition, not the consequence. The ranges above don't hold for casual chat use. They assume inputs, rubric, tools, and review are defined per sub-task — and that setup work is what a first diagnostic engagement covers.

Stage 4 is binding, not optional. T7 and T8 stay human-only by design. That isn't a weakness of the methodology — it's a property of the task. Negotiation, mediation, and non-delegable accountability are activities of a different kind.

Caveat: the time-saved figures all assume professional operationalisation per sub-task. Without a clean Stage 0 (inputs, rubric, tools, output, review), they drop sharply — typically to 0–30 %, with the error risk climbing.

The methodology in detail

View the full methodology →

The assessment framework used here (5 stages, 42 dimensions, 41 institutions) is documented on the methodology page — including Stage 0 operationalisation, the full source matrix, and the recommendation logic on the autonomy scale.

Double Materiality Assessment — AI potential, assessed

Per-sub-task assessment

Stage 0 — operationalisation per sub-task

Model capabilities — Stage 2

Deployment readiness — Stage 3

Governance — Stage 4

Where to go from here

The methodology in detail

Get an assessment for your own context