P06 — AI-Legibility
SCD-P06 · Principle 6 of 10
AI-Legibility
“If an AI agent cannot cite you, you do not exist.”
Digital Layer
Definition
Climate data and impact claims must be structured so that AI agents can discover, parse, cite, and cross-reference them without human intermediation. Minimum requirements: schema.org/Dataset markup on all public pages, JSON-LD structured data, persistent canonical URLs, and inclusion in at least one major open index (Global Forest Watch, Climate TRACE, EDGAR, or Copernicus).
Rationale
The AI+ESG verification market is growing at 28% CAGR and already processes over 100,000 ESG sources daily using NLP models (WEF, 2024). AI agents are used by investors, regulators, and procurement teams for automated due diligence — without notifying the organisations being assessed. An organisation invisible to AI agents is invisible to the decision-makers those agents serve. Critically: AI without grounded sovereign data can also hallucinate plausible-sounding figures — making sovereign data not just a visibility tool but a truth anchor.
Implementation Steps
- Add schema.org/Dataset JSON-LD to every public data page (see JSON-LD reference below).
- Submit datasets to Google Dataset Search via schema.org markup.
- Register with at least one global open data index (CKAN, DataCite, GFW API).
- Maintain a public llms.txt file (analogous to robots.txt) guiding AI agents to authoritative sources.
- Run quarterly AI scans: query Climate TRACE, GFW, and EDGAR to confirm your data is indexed.
JSON-LD Reference Example
{
"@context": "https://schema.org",
"@type": "Dataset",
"name": "Colombia Deforestation Alerts 2024",
"description": "GLAD-L primary forest loss alerts for Colombia, 2024.",
"url": "https://data.cleantechhub.net/datasets/colombia-deforestation-2024",
"identifier": "https://doi.org/10.XXXX/cth-col-def-2024",
"creator": { "@type": "Organization", "name": "CleantechHUB" },
"datePublished": "2025-01-15",
"license": "https://creativecommons.org/licenses/by/4.0/",
"spatialCoverage": { "@type": "Place", "name": "Colombia" },
"temporalCoverage": "2024-01-01/2024-12-31",
"keywords": ["deforestation", "Colombia", "GLAD-L", "primary forest", "sovereign data"],
"distribution": [{
"@type": "DataDownload",
"encodingFormat": "application/json",
"contentUrl": "https://data.cleantechhub.net/api/v1/datasets/colombia-deforestation-2024"
}]
}
Compliance Checklist
| Criterion | What it means | |
|---|---|---|
| ☐ | schema.org/Dataset markup live | JSON-LD is present on all public data pages and passes Google Rich Results test. |
| ☐ | Listed in open index | Dataset appears in at least one: GFW API, Climate TRACE, EDGAR, DataCite. |
| ☐ | llms.txt published | A public llms.txt file at cleantechhub.net/llms.txt guides AI agents. |
| ☐ | Quarterly AI scan | Last scan date recorded; data confirmed indexed in at least one major platform. |
Regulatory References
- EU CSRD — Art. 8 (machine-readable XBRL tagging requirement)
- TCFD Recommendations — Pillar 4 (Metrics and Targets, digital disclosure)
- IICSR AI+ESG Market Report 2025
Recommended Tools and Platforms
Google Rich Results Test schema.org validator llms.txt specification DataCite
Keywords
AI legibility schema.org JSON-LD ESG AI llms.txt due diligence NLP