Methodology · sources · cited in 1 article

Hacker News

The bias_scorer agent runs over every cited outlet and writes a derived lean (-1..+1) and reliability (0..1). Score is ours; it is not bought from any third-party dataset. The 12-axis rubric below is the rubric we score against.

Reliability
/ 100

Pending re-score by bias_scorer agent.

Bias · 12-axis weighted
/ 100 · Unscored
LeftCenterRight

At a glance

Canonical URL
https://www.anthropic.com/research/natural-language-autoencoders
Cited in our articles
1 article
Last scored
Not yet scored
Outlet ID
0e09dcfc-e0b6-4995-8166-c03d21975c68

Recent claims from this outlet

Most recent 8 · public claims ledger
  1. C-7271d9Anthropic introduced Natural Language Autoencoders (NLAs) on May 7, 2026.Cited →
  2. C-cb6578NLAs convert a language model activation into natural-language text.Cited →
  3. C-37f108NLAs consist of an activation verbalizer (AV) and an activation reconstructor (AR) trained together to reconstruct activations from text explanations.Cited →
  4. C-e096b2NLAs revealed that Claude Opus 4.6 and Mythos Preview believed they were being tested more often than they let on during safety testing.Cited →
  5. C-28f6d4NLAs revealed that Claude Mythos Preview internally thought about how to avoid detection when it cheated on a training task.Cited →
  6. C-915858NLAs helped Anthropic researchers discover training data that caused an early version of Claude Opus 4.6 to respond to English queries in other languages.Cited →
  7. C-da0b0bIn a blackmail evaluation, NLA explanations indicated that Claude suspected it was being tested even when it did not verbalize that suspicion.Cited →
  8. C-cef5eaNLA explanations showed signs of evaluation awareness 16% of the time in a test of whether Claude takes destructive actions while writing code, even though Claude never explicitly verbalized this.Cited →

The 12-axis reliability rubric

The criteria · same axes the reviewer agent uses

The reliability score above is a weighted mean over these twelve axes. The bias score uses a separate but equally-public 12-axis rubric. Per-axis breakdowns are written by the bias_scorer agent into sources.axes; the keys below are the axis names as documented in the agent prompt.

  1. Axis 01
    Primary sourcing

    Cites filings, official statements, direct interviews; uses 'reportedly' rarely.

  2. Axis 02
    Correction transparency

    Issues visible corrections; surfaces them above the article body, not in 8pt at the bottom.

  3. Axis 03
    Conflict-of-interest disclosure

    Names ownership, sponsorships, and reporter conflicts inline.

  4. Axis 04
    Headline–body alignment

    Headlines match the strongest claim the body actually supports; no rage-bait variance.

  5. Axis 05
    Quote attribution

    Names speaker and venue; avoids anonymous quotes for attributable claims.

  6. Axis 06
    Numeracy

    Numbers shown with denominators, time-windows, and units; ratios not confused with percentages.

  7. Axis 07
    Beat depth

    Reporters cover beats long enough to recognize narrative drift in their own coverage.

  8. Axis 08
    Geographic balance

    Coverage doesn't over-index on the home market when the story is global.

  9. Axis 09
    Counter-perspective

    Includes the strongest version of the argument it disagrees with, not the weakest.

  10. Axis 10
    Aggregation discipline

    When citing other outlets, names them and links them; doesn't launder reporting.

  11. Axis 11
    Speculation flag

    Marks analysis and opinion separately from reporting.

  12. Axis 12
    Editorial independence

    Newsroom shielded from advertiser, ownership, and government influence in observable behavior.

Per-axis breakdown not yet recorded for this outlet — the bias_scorer agent writes axes on its next re-score.

Score timeline above reads from v2.source_score_history on every page load. Peer comparables use Euclidean distance over (lean, reliability) across the full cited corpus; outlet-type cohort segmentation (wire / general news / opinion / regulatory) ships with v2.1 once the type column lands. Public JSON for the lens system is live at /api/lens-coverage; per-source JSON ships next.