Methodology · sources · cited in 1 article

Hacker News

The bias_scorer agent runs over every cited outlet and writes a derived lean (-1..+1) and reliability (0..1). Score is ours; it is not bought from any third-party dataset. The 12-axis rubric below is the rubric we score against.

Reliability

—/ 100

Pending re-score by bias_scorer agent.

Bias · 12-axis weighted

—/ 100 · Unscored

LeftCenterRight

At a glance

Canonical URL: https://www.anthropic.com/research/natural-language-autoencoders
Cited in our articles: 1 article
Last scored: Not yet scored
Outlet ID: 0e09dcfc-e0b6-4995-8166-c03d21975c68

Recent claims from this outlet

Most recent 8 · public claims ledger

C-7271d9“Anthropic introduced Natural Language Autoencoders (NLAs) on May 7, 2026.”Cited →
C-cb6578“NLAs convert a language model activation into natural-language text.”Cited →
C-37f108“NLAs consist of an activation verbalizer (AV) and an activation reconstructor (AR) trained together to reconstruct activations from text explanations.”Cited →
C-e096b2“NLAs revealed that Claude Opus 4.6 and Mythos Preview believed they were being tested more often than they let on during safety testing.”Cited →
C-28f6d4“NLAs revealed that Claude Mythos Preview internally thought about how to avoid detection when it cheated on a training task.”Cited →
C-915858“NLAs helped Anthropic researchers discover training data that caused an early version of Claude Opus 4.6 to respond to English queries in other languages.”Cited →
C-da0b0b“In a blackmail evaluation, NLA explanations indicated that Claude suspected it was being tested even when it did not verbalize that suspicion.”Cited →
C-cef5ea“NLA explanations showed signs of evaluation awareness 16% of the time in a test of whether Claude takes destructive actions while writing code, even though Claude never explicitly verbalized this.”Cited →

The 12-axis reliability rubric

The criteria · same axes the reviewer agent uses

The reliability score above is a weighted mean over these twelve axes. The bias score uses a separate but equally-public 12-axis rubric. Per-axis breakdowns are written by the bias_scorer agent into sources.axes; the keys below are the axis names as documented in the agent prompt.

Axis 01
Primary sourcing
Cites filings, official statements, direct interviews; uses 'reportedly' rarely.
Axis 02
Correction transparency
Issues visible corrections; surfaces them above the article body, not in 8pt at the bottom.
Axis 03
Conflict-of-interest disclosure
Names ownership, sponsorships, and reporter conflicts inline.
Axis 04
Headline–body alignment
Headlines match the strongest claim the body actually supports; no rage-bait variance.
Axis 05
Quote attribution
Names speaker and venue; avoids anonymous quotes for attributable claims.
Axis 06
Numeracy
Numbers shown with denominators, time-windows, and units; ratios not confused with percentages.
Axis 07
Beat depth
Reporters cover beats long enough to recognize narrative drift in their own coverage.
Axis 08
Geographic balance
Coverage doesn't over-index on the home market when the story is global.
Axis 09
Counter-perspective
Includes the strongest version of the argument it disagrees with, not the weakest.
Axis 10
Aggregation discipline
When citing other outlets, names them and links them; doesn't launder reporting.
Axis 11
Speculation flag
Marks analysis and opinion separately from reporting.
Axis 12
Editorial independence
Newsroom shielded from advertiser, ownership, and government influence in observable behavior.

Per-axis breakdown not yet recorded for this outlet — the bias_scorer agent writes axes on its next re-score.

Score timeline above reads from v2.source_score_history on every page load. Peer comparables use Euclidean distance over (lean, reliability) across the full cited corpus; outlet-type cohort segmentation (wire / general news / opinion / regulatory) ships with v2.1 once the type column lands. Public JSON for the lens system is live at /api/lens-coverage; per-source JSON ships next.