About the Programme

Research Programme.

Mission, methodology, ethics, and publication plan. This document is versioned with every structural change; the current state is binding for all surveys.

Programme lead: Marin T. Kael
Status: Active · Phase 1 (Pre-Launch) since 11 May 2026 · 7 pre-regs Q0–Q6
First survey: 11 May 2026
Version: v2.7 · 15 May 2026 (DOI 10.5281/zenodo.20206124)

Mission

How an author is found today has changed. Anyone searching for a book increasingly no longer lands on the publisher's page, but on an answer assembled by Gemini or Claude — or on a knowledge-graph card that Google's Knowledge Graph or Bing's AI overview displays directly. Which steps of an author actually arrive in this new answer layer, how long that takes, and with what fidelity citations occur there, remains only partially documented empirically.

This programme measures it openly — on a single case, but methodologically honestly. Later claims about effects of individual author actions become more than anecdote only if a reliability-tested measurement apparatus carries them. Language models, knowledge graphs, and AI answer engines are not stable instruments for this purpose: their answers drift with model updates, indexing changes, and platform policy. Separating instrument drift from actual action effects succeeds only when the measurement properties of the instruments themselves are known.

Hence three temporally overlapping phases with continuously pre-registered interventions. Phase 1 (May → Sep 2026) is the active pre-launch phase: deliberate interventions on identity surfaces (Wikidata co-occurrence, Zenodo DOI cadence, Common-Crawl optimisation, machine-readable identity surfaces, Reddit karma, Cross-LLM Trust Graph) run in parallel with instrument validation of the fourteen measurement surfaces (test-retest reliability, intra-set consistency, CUSUM drift, coverage). Phase 2 (Sep 2026 → Q3 / 2027) is post-launch effect detection: the book launch on 22 September 2026 is the central deliberate intervention, aggregated effect measurement across all surfaces, long-tail observation of AI-answer reach. Phase 3 (from Q3 / 2027) carries long-term controlled experiments on the then-validated apparatus — with effect measures appropriate to an n-of-1 design (interrupted time series, Bayesian structural time series, hierarchical Bayesian models), not pre/post Cohen’s d on individual actions.

What the programme aims at: an openly documented measurement basis that transfers to other author identities. The observation is declared as a single-case study; it is not about statistical generalisation, but about the methodology remaining traceable and replicable.

Programme Architecture

The programme observes a single literary project across the entire publication and reception cycle. Phase 1 is operationalised in four lines — Citation Inventory, Measurement-Instrument Validation, Codebook Iteration, and Open Materials — each carrying an isolated question and remaining separately citable. The detailed description of each line, including observed endpoints and measurement procedures, is in the programme index at /en/research.

This document focuses on the methodological commitments (§3), the ethics and policy (§4), and the reproducibility specification (§6), against which the programme can be externally audited.

Figure 1 The measurement cycle of every line of inquiry — from the pre-registered hypothesis, through the action taken and the active pre-launch window, to the report and the feedback into the next pre-registration.

Figure 2 · Expectation schema (reference date 11 May 2026) Expected visibility status of the author identity, per statement cluster and per answer layer. Person and work are already well established in structured sources (Wikidata, Goodreads); genre and world-mechanic are consistently weakly represented. The programme tracks this matrix daily. Hit-rate H = correct citations / queries. Validated real values appear in the quarterly report from Q3 / 2026. → Live heatmap in the research dashboard

Methodological Commitments

Four methodological commitments apply to every survey of the programme. Violations are noted in the quarterly report.

Pre-registration

Before every data collection a pre-registration in OSF style is published: hypothesis (H₀/H₁), measurement operationalisation, data source, sampling plan, a-priori power analysis, stop criterion, and analysis plan. Deviations are openly disclosed in the report.

Open Methodology

All measurement, aggregation, and test scripts are published with the report as a replication archive: Python 3, MIT-licensed, frozen pins via environment.yml, endpoints with retrieval timestamp and user agent, raw data under CC0 where platform terms permit.

Failure Log

Null effects, against-hypothesis findings, and drafts blocked by the linter are equally publication-obligatory. Aggregated excerpts are part of every quarterly report; individual findings with date, reason, and context in the replication archive.

Quarterly Cadence

Reports appear in mid-October, January, April, and July with a ±two-week tolerance. Omitted reports are explained in the following report. From quarterly report Q4 / 2026 an adversarial reviewer role is established.

Figure 3 Measurement pipeline of the programme: eight data sources are queried daily at 04:00 UTC, persisted in Postgres as versioned JSON snapshots with endpoint hash and user agent, and analysed in the quarterly evaluations with classical statistics libraries (scipy) as well as Bayesian inference (pymc). All code and replication archives are public.

Ethics and Policy

4.1 Pseudonymity

Marin T. Kael is an openly declared pseudonym. The separation between pseudonym and real person is grounded in private law and is not dissolved. For the scientific citability of the programme, the person behind the pseudonym is irrelevant — the materials stand on their own.

4.2 Platform Policy

The programme operates explicitly along the policies of the respective platforms. For Reddit the Responsible Builder Policy applies; posts are not automatically submitted but manually filed by the author through the standard web interface. For Wikidata the Wikimedia Foundation editor policy applies; edits carry documented sources. Violations of platform policy are openly noted in the quarterly report.

4.3 Data Protection and Reach

The programme does not collect personal data of third parties. It analyses publicly accessible materials (posts, comments, reviews) in aggregated form. Individual reader comments are cited only when the writer has explicitly consented. No material is prepared for model training or transferred onward.

4.4 Conflict of Interest

The programme lead is simultaneously the subject of inquiry — that is a genuine methodological limit. It is restated in every report. An adversarial reviewer role will be established from quarterly report Q4 / 2026 and is an open position.

Publication Plan

The following plan is binding; deviations require a justified update to this document and are publicly documented.

11 May 2026
Methodology Note 01 Baseline Measurement: Author Identity in the Citation Behaviour of Language Models (Pre-Launch) published
13 May 2026
Methodology Note 01 · v2.7 Major redesign: active three-phase design + 6 pre-registrations Q0–Q5 · DOI 10.5281/zenodo.20170615 published
May – Sept. 2026
Phase 1 · Active Pre-Launch Q0–Q5 interventions + parallel instrument validation · active since T+0 active
October 2026
Activity Report Q3 / 2026 First 90 days Q0–Q5 + parallel measurement apparatus validation scheduled
22 September 2026
Phase Transition · Book Launch "Das vierte Feld" release — central deliberate intervention, transition to Phase 2 (post-launch effect detection) scheduled
January 2027
Activity Report Q4 / 2026 Post-launch effect detection + Codebook v0.2 — external annotators, inter-rater agreement scheduled
July 2027
Transition Note Phase-2 closure · Phase-3 pre-registrations for long-term controlled experiments from Q3 / 2027 scheduled

Figure 2 Publication plan v1.0 — binding as of 11 May 2026. Deviations will be openly disclosed in an updated version of this document.

Reproducibility

Every publication is accompanied by a replication archive. The archive contains the measurement code (Python, MIT), the raw measurement values in machine-readable form, the versioned style-sheet in the state of the survey, and an environment.yml file for restoring the software environment. A pre-version of the measurement code is made available to external auditors on request even before the first regular publication.

Contact, Licence, Versioning

Enquiries: research@marin-t-kael.de
Licence: Texts: CC BY 4.0 · Code: MIT · Raw data: CC0 (where platform terms permit)
Versioning: v1.0 · 11 May 2026 · first version
Citation: Kael, M. T. (2026). Marin T. Kael Research Programme — Mission and Methodology, v1.0. Marin T. Kael, Independent. marin-t-kael.de/en/research/programme