Marin T. Kael
DE / EN

Research · Methodology Lab

How does the machine read an author?

Phase 1 has been running since T+0 — active pre-launch interventions Q0–Q5 IN PARALLEL with instrument validation. Eight measurement surfaces — knowledge graphs, classical search indices, AI answer areas, and answer engines — are sampled daily; reliability, drift behaviour, and coverage per identity cluster are measured. Phase 2 from 22 September 2026 with the book launch as the central intervention. Phase 3 from Q3 / 2027 carries long-term controlled experiments on the validated apparatus.

Methods are pre-registered, source code and raw data are open, failure logs count equally with findings. Addressed to authors with visibility questions — and to practitioners of search and answer engine optimisation (SEO, AEO, GEO) who seek a validated measurement basis rather than anecdote.

Phase-1 success documented · 20 May 2026 (T+9) · v2.8 · expand details

Construct-Validity audit via 28 live web datapoints across three Claude tiers surfaced three pre-reg-relevant findings — title collision with Mokka Müller's "Das vierte Feld" (1999, Econ), brand collision "Marin" with Maritime Research Institute Netherlands (1932), self-built A1-firewall skill asymmetry. Pipeline hardened to v2.8 Primary-Channel split: only web-search-augmented LLMs deliver load-bearing Marin discoverability data; cutoff LLMs are treated as echo-bias control group. Details in Challenges and Methodology Note 01 v2.8.

H₀ = 0.42 MODEL UPDATE (T+0) Drift +0.36 Re-equilibrium ≈ 14 d 0 0.25 0.50 0.75 1.0 Hit-rate H T−7 T+0 T+7 T+14 T+21 T+28 Days since model update
Method schema Schematic drift profile: hit-rate H of an AI answer engine on a canonical author statement, measured before and after a model update. Phase 1 maps these drift profiles per measurement surface — a precondition for subsequent effect claims not to be confused with instrument drift.
Programme lead
Marin T. Kael
Active since
2026-05-11
Cadence
Quarterly
Licence
CC BY 4.0 · MIT · CC0
Lines of inquiry · Active 3-phase design

Active programme across three temporally overlapping phases.

Marin T. Kael’s research programme operates in three temporally overlapping phases. Phase 1 (May → Sep 2026) interleaves seven active pre-registrations Q0–Q6 with parallel instrument validation. Phase 2 (Sep 2026 → Q3 2027) measures post-launch effects after the book launch on 22 September. Phase 3 (from Q3 2027) runs long-term controlled experiments on the validated apparatus.

Line 01

Citation Inventory

What does each measurement instrument show today? Eight measurement surfaces are sampled daily and the visibility of the author identity per identity cluster — person, work, genre, world-mechanic — is documented as a coverage matrix. In Phase 1 without effect interpretation: first count what the instruments actually depict.

Eight measurement surfaces observed (Wikidata, Google Knowledge Graph, Bing Webmaster AI indexing, Goodreads, Hardcover, Reddit, Google Search Console, Google AI Overviews) plus a small, low-cadence language-model probe (Gemini, Claude) × N = 12 pre-registered query sets. Snapshot cadence 24 h for API endpoints, weekly for browser snapshots. Primary quantity: hit-rate H = correct citations ÷ queries (descriptive, not inference-oriented in Phase 1).

Line 02

Measurement-Instrument Validation

An instrument is reliable when repeated measurement of the same reality yields the same values. Phase 1 tests every measurement surface for test-retest reliability, intra-set consistency, and model drift — and determines which surfaces are even suitable for later effect measurements.

Test-retest correlation r over 24-h replication probes (threshold r ≥ 0.9 for API sources, ≥ 0.7 for language-model probes). Cronbach’s α intra-query-set ≥ 0.7 as a precondition. CUSUM charts on hit-rate over a 90-day window with alarm threshold h = 5. Model-version logs separately recorded; drift events are reported with drift profiles per measurement surface.

Line 03

Codebook Iteration

What even counts as a "correct citation"? Which answer is a hallucination, which is correct-but-incomplete? Phase 1 publicly versions the annotation schema, documents edge cases, and gathers public feedback — as a precondition for later effect claims to rest on an unambiguous measurement schema.

Codebook v0.x → v1.0 as a published milestone before Phase 2 activation. Each new schema version comes with example annotations, edge-case discussion, and a diff to the predecessor version. Inter-rater agreement (Cohen’s κ) is collected from Q4 / 2026 with external annotators; threshold κ ≥ 0.7 as a condition for codebook version release.

Line 04

Open Materials

Everything the programme produces is open: methodology notes, pre-registrations, measurement source code, raw data, quarterly reports, failure logs, codebook versions. Methodology reviewers with a Python environment and internet access can trace any evaluation — that is the only audit mechanism a single-case study has.

Methodology notes and quarterly reports under CC BY 4.0, source code on GitHub under MIT, raw data under CC0 (where platform terms permit). From Q3 / 2026 every publication with a Zenodo DOI. One replication archive per quarter with frozen version pins, environment.yml, user-agent strings, and endpoint snapshots.

Example finding · Phase 1

What a validation finding looks like.

Phase 1 measures not effects, but measurement properties. Test-retest reliability r tells us whether a surface yields the same values on repeated probing of the same reality. The threshold r ≥ 0.9 applies to API sources, r ≥ 0.7 to language-model-based browser snapshots. The figure below is a schematic preview of the Q3 / 2026 validation report.

r = 0.7 r = 0.9 0 0.25 0.5 0.75 1.0 Test-retest reliability r (24-h replication) Wikidata · SPARQL Reddit · public JSON Google Search Console Google Knowledge Graph Goodreads / Hardcover Bing Webmaster AI Gemini (browser probe) Claude (browser probe)
Figure 3 · Q3-2026 preview (hypothetical) Forest plot of test-retest reliability per measurement surface over 24-h replication probes. Classification: r < 0.7 insufficient; 0.7 ≤ r < 0.9 acceptable; r ≥ 0.9 high. Filled wax markers: r ≥ 0.9; filled ink markers in the acceptable band; open markers: confidence interval touches the 0.7 threshold — surface not yet validated. Hypothetical data for methodology preview; real values will appear in the Q3 / 2026 validation report. → Live measurements in the research dashboard
T+12 d T+21 d T+0 T+10 T+20 T+30 T+40 Days since action Peak 50% 0% Remaining effect Wikidata edit · τ½ ≈ 21 d ORCID profile update · τ½ ≈ 18 d IndexNow bulk push · τ½ ≈ 12 d Newsletter dispatch · τ½ ≈ 7 d Reddit post · τ½ ≈ 4 d Manuscript indexing · small, but persistent
Figure 4 · Phase-2 preview Expected decay profiles of six author action classes, as they would be reported from Q3 / 2027 on a validated measurement apparatus. Half-life τ½ measures the days after which an action effect has fallen to 50 % of its peak. Phase 1 (current) builds the apparatus that will yield these profiles reliably in Phase 2 — without confusion with instrument drift.
Publications

Published and in preparation.

Methodology notes, pre-registrations, quarterly reports, and the occasional field note. Every publication receives a stable URL and a citation scheme; from quarterly report Q3 / 2026 also a Zenodo DOI.

Project Journal · Living

Project challenges and solutions

Open engineering journal: every pipeline obstacle, methodological drift, and reach-related brake documented with symptom, root cause, solution, and methodological implication. Eight pipeline challenges, two reach findings, five success factors. Extended on every new finding.

Methodology Note · 01 DOI 10.5281/zenodo.20170615

Baseline Measurement: Author Identity in the Citation Behaviour of Language Models (Active Pre-Launch Design)

v2.0 revises the phase model: the programme does not operate in 'first validation, then action', but in three temporally overlapping phases with continuously pre-registered interventions. Seven active pre-registrations (Q0–Q6) on eleven measurement surfaces — including the Cross-LLM Trust Graph, Common-Crawl Snapshot Probe, and machine-readable identity surfaces.

Pre-Registration · Q0 DOI 10.5281/zenodo.20125967

Pre-Launch Instrument Validation · Active Pre-Launch Window 2026-05 → 2026-09

Locks in, before data collection, the full Phase-1 measurement plan: six instrument hypotheses H-Q0-INST-01 through 06 (test-retest reliability, multi-snapshot aggregation, CUSUM drift sensitivity, Cronbach’s α, Wikidata anchor stability, inter-surface agreement), sampling plan, stop criteria, contingency plan, and reproducibility specification. CC BY 4.0.

Codebook · v0.1 DOI 10.5281/zenodo.20125976

Annotation Schema for AI Citation Behaviour

Operationalises, before every measurement probe, what is to count as a 'correct citation': four binary dimensions (Hit, Correct, Hallucination, Completeness), anti-pattern catalogue, four documented edge-case classes, version plan v0.1 → v1.0 with inter-rater threshold Cohen’s κ ≥ 0.7. CC BY 4.0.

Software Release · v0.3 DOI 10.5281/zenodo.20262669

marin-research-tools — Phase-1 Tooling

Source-code release of the Phase-1 instrument-validation tooling: style_lint.py (style-sheet linter for outbound), source_attribution_parser.py (Cross-LLM Trust Graph), Pre-Registrations Q0–Q6, operator-policy documentation. MIT licence. GitHub tag v0.3. Predecessors v0.1 (10.5281/zenodo.20126017) and v0.2 (10.5281/zenodo.20189714) remain permanently citable; v0.3 supersedes v0.2 by design refactor (see CHANGELOG).

Dataset · T+0 DOI 10.5281/zenodo.20126038

Wikidata Identity Snapshot · Zero Point

Wikidata items Q139720807 (author) and Q139720798 (book) on the reference date 11 May 2026 as a full EntityData export and SPARQL property listing. Serves as ground-truth anchor for H-Q0-INST-05 (coverage stability > 0.85 over the active pre-launch window). CC0 1.0.

Activity Report · Q3 / 2026

Active Pre-Launch Phase — First 90 Days Q0–Q5 + Parallel Measurement Apparatus

Reliability per measurement surface (test-retest correlation r over 24-h replications), drift observations (CUSUM charts on hit-rate), coverage quotas per identity cluster, and first codebook observations. Preview layout with hypothetical data already available; the real publication on 15 October 2026 with raw data and replication archive.

Validation Report · Q4 / 2026 Forthcoming

Inter-Rater Agreement & Codebook v0.2

Second validation report: inter-rater agreement (Cohen’s κ) with external annotators, codebook version difference from v0.1 to v0.2, ongoing drift statistics per measurement surface, first statements on inter-surface agreement (which surfaces are redundant, which orthogonal?).

Validation Report · Q1 / 2027 Forthcoming

Consolidation of the Measurement Apparatus

Third validation report: combined reliability, drift, and inter-rater findings across three quarters. Decision on codebook v1.0 as a precondition for Phase-2 activation. Surfaces not meeting the validation thresholds will be excluded from Phase 2 or replaced by alternatives.

Transition Note Forthcoming

Phase-1 Closure · Phase-2 Pre-Registrations

Methodology note with the Phase-1 validation conclusion and the first pre-registered hypotheses for Phase-3 long-term controlled experiments from Q3 / 2027 on the validated apparatus.

Open Materials

Replication and auditability.

Style-sheet (canonical truth)
The reference document compared against every survey is maintained under versioning and, after each quarterly publication, also published in excerpts. Current state on request; from Q3 / 2026 publicly available.
Measurement pipeline (source code)
Survey and linter scripts are public on GitHub: github.com/marintkael/marin-research-tools (Python, MIT licence). With each quarterly report there is also a replication archive with frozen version pins.
Pre-registrations
Every survey is published, before data collection, with hypothesis, measurement operationalisation, and stop criterion. Later changes are made transparent.
Failure log
Every material blocked by the linter is logged with reason and date. Aggregated excerpts are part of the quarterly reports. The policy boundary of the pipeline remains externally auditable.

About the programme

Marin T. Kael is an author — and runs in parallel an openly documented field laboratory on the question of how language-model- based search systems and AI answer engines take in, understand, and cite a literary author identity. Phase 1 (active, May → Sep 2026): active pre-launch interventions Q0–Q5 + instrument validation in parallel. Phase 2 (from Sep 2026): post-launch effect detection after the book launch on 22 September. Phase 3 (from Q3 / 2027): long-term controlled experiments on the validated apparatus.

Addressed to authors who want to understand their visibility in AI search, and to practitioners of search and answer engine optimisation (SEO / AEO / GEO) who seek a reproducible measurement basis rather than anecdotal claims. The work is self-funded and not affiliated with an academic institution.

A detailed description — mission, methodology, ethics, publication plan — is at /en/research/programme.

Contact

Enquiries on methodology, replication, or auditing: research@marin-t-kael.de. Source code and replication archives on GitHub.