From raw feeds to
structured clarity

Four interconnected pipelines. Six specialized AI models. Twenty-seven stages of processing. Zero editorial bias. Here is how ClearSignal turns the noise of modern media into signal.

Ingest
5 stages
Cluster
9 stages
Score
9 stages
Analyze
4 stages
Source-Blind Framing ·Coverage Gap Detection ·Multi-Model Architecture
01

Ingest

Every cycle, ClearSignal pulls from over a hundred sources spanning the full political spectrum, from Fox News to NPR, Wall Street Journal to The Guardian, Reuters to Daily Wire. Articles are normalized, deduplicated against both URL and semantic similarity, transformed into high-dimensional vector embeddings, and stored for downstream analysis. Nothing gets through twice. Nothing gets missed.

1
Fetch
RSS feeds and news APIs
2
Normalize
URL resolution and cleanup
3
Dedup
URL and semantic similarity
4
Embed
384-dimension vectors
5
Store
Supabase pgvector
Source Spectrum Tagging
Every feed is tagged with its editorial lean using AllSides and Ad Fontes Media ratings: left, left-center, center, right-center, right, and wire service. This is not cosmetic. It powers coverage gap detection downstream. When ClearSignal says a story is “undercovered on the right,” that is a measured claim backed by tagged source data.
Semantic Deduplication
URL matching catches the obvious duplicates. But when a wire service publishes a story and dozens of outlets run slightly rewritten versions, URL matching fails. ClearSignal uses cosine similarity on title embeddings to catch syndicated rewrites, ensuring each story is counted once regardless of how many outlets picked it up.
02

Cluster

Raw articles become stories. ClearSignal groups semantically related articles using density-based clustering on their vector embeddings, with no predefined topic count and no forced grouping. Each cluster receives an AI-generated neutral headline, named entities are extracted, and a knowledge graph connects people, organizations, and legislation across stories.

1
Assign
Fast-path matching
2
Discover
HDBSCAN clustering
3
Split
Oversized clusters
4
Label
AI topic naming
5
Rename
Headline polish
6
Entities
NER extraction
7
Graph
PageRank scoring
8
Merge
Converged stories
9
Validate
Currency and relevance
Why HDBSCAN
Most clustering algorithms require a predefined number of clusters. That is impossible with news. HDBSCAN finds natural density boundaries in the embedding space and handles noise gracefully. Articles that do not belong to any story become outliers, not forced matches.
Entity Knowledge Graph
People, organizations, legislation, and court cases are extracted from every topic and connected by co-occurrence. PageRank-style importance scoring means a senator appearing across 12 stories alongside the Supreme Court carries more weight than a name appearing in 10 unrelated local stories.
03

Score

Every story is measured across nine dimensions. Not just “is this important?” but how much attention is it getting, who is covering it, what is the emotional temperature across the spectrum, and crucially: what are people missing? These scores drive the homepage ranking, trending indicators, and coverage gap alerts.

Each Story
9 Scores
AI
Impact
Real-world significance
C
Coverage
Source breadth and diversity
C
Attention
Volume of articles
NLP
Sentiment
Emotional temp by lean
C
Timeline
Story lifecycle stage
C
Gaps
Who is not covering this
F
Heat
Front-page prominence
A
Insights
Category-level patterns
S
Ranking
Final sort order
AI
Impact
Real-world significance
C
Coverage
Source breadth and diversity
C
Attention
Volume of articles
NLP
Sentiment
Emotional temp by lean
C
Timeline
Story lifecycle stage
C
Gaps
Who is not covering this
F
Heat
Front-page prominence
A
Insights
Category-level patterns
S
Ranking
Final sort order
Sentiment by Lean
Sentiment analysis runs per article, then results are grouped by outlet political lean. This surfaces editorial divergence quantitatively: when left-leaning outlets cover a story with alarm while right-leaning outlets dismiss it, that pattern is captured as measured framing data, not opinion.
Deterministic Ranking
The final homepage ranking uses a weighted formula with no AI in the loop. Impact, source breadth, publishing velocity, and editorial divergence are combined with time-decay factors. Rankings are reproducible, auditable, and fast.
04

Analyze

The final and most sophisticated stage. An AI editorial triage selects which stories warrant deep analysis. Full article bodies are scraped. Per-article editorial framing is classified across seven categories. Then a dedicated analysis model generates a complete structured breakdown: neutral, editorial-quality prose with source-specific framing contrasts, coverage comparisons, and verified claims.

1
Select
Editorial triage
2
Scrape
Full article body
3
Frame
7 framing categories
4
Analyze
Structured generation
Analysis Output
headline
lead
body
framing_cards
bottom_line
headline
Neutral, AP-style headline
lead
Opening paragraph with key facts
body
Flowing analysis with woven framing contrasts
framing_cards
Source-by-source comparisons
bottom_line
Single key takeaway
source_coverage
Per-outlet framing, inclusions, omissions
Seven Editorial Frames
Every article is classified before the final analysis begins. This gives the analysis model structured framing data to work with, concrete categorization rather than asking it to detect bias from raw text.
AccountabilityAlarmistContextualDismissiveEmpatheticInvestigativeNeutral / Wire

Six models, each chosen for the job

ClearSignal does not route everything through one model and hope for the best. Fast models handle volume. Reasoning models handle editorial decisions. The most capable model is reserved for the final analysis, where nuance and quality matter most.

Speed Tier
  • text-embedding-3-small
    Embeddings
    384-dim vectors at scale
  • VADER
    Sentiment analysis
    Per-article tone, grouped by lean
Reasoning Tier
  • Claude Haiku
    Labeling, scoring, framing
    Hundreds of calls per cycle
  • GPT-4o-mini
    Validation and selection
    Structured editorial reasoning
Generation Tier
  • Claude Sonnet
    Full analysis generation
    Long-form editorial quality
01
Source-Blind Framing
Analyses never say "the liberal take" or "the conservative view." Framing contrasts are attributed to specific outlets. CNN emphasized X while Fox focused on Y. Readers see what each source actually reported. No shorthand. No labels.
02
Coverage Gap Detection
Every story is checked for who is not covering it. If fifteen right-leaning outlets cover a congressional hearing but only two left-leaning outlets mention it, that silence is surfaced as data. What is missing is often as revealing as what is there.
03
Separation of Concerns
No single model makes all the decisions. No single formula drives all the scores. Fast models handle volume. Powerful models handle nuance. Deterministic formulas handle ranking. This prevents any single point of failure or bias amplification.