From raw feeds to
structured clarity

Four interconnected pipelines. Six specialized AI models. Twenty-seven stages of processing. Zero editorial bias. Here is how ClearSignal turns the noise of modern media into signal.

Ingest

5 stages

▸

Cluster

9 stages

▸

Score

9 stages

▸

Analyze

4 stages

Source-Blind Framing ·Coverage Gap Detection ·Multi-Model Architecture

Pipeline 01

Ingest

Every cycle, ClearSignal pulls from over a hundred sources spanning the full political spectrum, from Fox News to NPR, Wall Street Journal to The Guardian, Reuters to Daily Wire. Articles are normalized, deduplicated against both URL and semantic similarity, transformed into high-dimensional vector embeddings, and stored for downstream analysis. Nothing gets through twice. Nothing gets missed.

Fetch

RSS feeds and news APIs

Normalize

URL resolution and cleanup

Dedup

URL and semantic similarity

Embed

384-dimension vectors

Store

Supabase pgvector

Source Spectrum Tagging

Every feed is tagged with its editorial lean using AllSides and Ad Fontes Media ratings: left, left-center, center, right-center, right, and wire service. This is not cosmetic. It powers coverage gap detection downstream. When ClearSignal says a story is “undercovered on the right,” that is a measured claim backed by tagged source data.

Semantic Deduplication

URL matching catches the obvious duplicates. But when a wire service publishes a story and dozens of outlets run slightly rewritten versions, URL matching fails. ClearSignal uses cosine similarity on title embeddings to catch syndicated rewrites, ensuring each story is counted once regardless of how many outlets picked it up.

Pipeline 02

Cluster

Raw articles become stories. ClearSignal groups semantically related articles using density-based clustering on their vector embeddings, with no predefined topic count and no forced grouping. Each cluster receives an AI-generated neutral headline, named entities are extracted, and a knowledge graph connects people, organizations, and legislation across stories.

Assign

Fast-path matching

Discover

HDBSCAN clustering

Split

Oversized clusters

Label

AI topic naming

Rename

Headline polish

Entities

NER extraction

Graph

PageRank scoring

Merge

Converged stories

Validate

Currency and relevance

Why HDBSCAN

Most clustering algorithms require a predefined number of clusters. That is impossible with news. HDBSCAN finds natural density boundaries in the embedding space and handles noise gracefully. Articles that do not belong to any story become outliers, not forced matches.

Entity Knowledge Graph

People, organizations, legislation, and court cases are extracted from every topic and connected by co-occurrence. PageRank-style importance scoring means a senator appearing across 12 stories alongside the Supreme Court carries more weight than a name appearing in 10 unrelated local stories.

Pipeline 03

Score

Every story is measured across nine dimensions. Not just “is this important?” but how much attention is it getting, who is covering it, what is the emotional temperature across the spectrum, and crucially: what are people missing? These scores drive the homepage ranking, trending indicators, and coverage gap alerts.

Each Story

9 Scores

Impact

Real-world significance

Coverage

Source breadth and diversity

Attention

Volume of articles

NLP

Sentiment

Emotional temp by lean

Timeline

Story lifecycle stage

Gaps

Who is not covering this

Heat

Front-page prominence

Insights

Category-level patterns

Ranking

Final sort order

AI

Impact

Real-world significance

C

Coverage

Source breadth and diversity

C

Attention

Volume of articles

NLP

Sentiment

Emotional temp by lean

C

Timeline

Story lifecycle stage

C

Gaps

Who is not covering this

F

Heat

Front-page prominence

A

Insights

Category-level patterns

S

Ranking

Final sort order

Sentiment by Lean

Sentiment analysis runs per article, then results are grouped by outlet political lean. This surfaces editorial divergence quantitatively: when left-leaning outlets cover a story with alarm while right-leaning outlets dismiss it, that pattern is captured as measured framing data, not opinion.

Deterministic Ranking

The final homepage ranking uses a weighted formula with no AI in the loop. Impact, source breadth, publishing velocity, and editorial divergence are combined with time-decay factors. Rankings are reproducible, auditable, and fast.

Pipeline 04

Analyze

The final and most sophisticated stage. An AI editorial triage selects which stories warrant deep analysis. Full article bodies are scraped. Per-article editorial framing is classified across seven categories. Then a dedicated analysis model generates a complete structured breakdown: neutral, editorial-quality prose with source-specific framing contrasts, coverage comparisons, and verified claims.

Select

Editorial triage

Scrape

Full article body

Frame

7 framing categories

Analyze

Structured generation

Analysis Output

headline

lead

body

framing_cards

bottom_line

Seven Editorial Frames

Every article is classified before the final analysis begins. This gives the analysis model structured framing data to work with, concrete categorization rather than asking it to detect bias from raw text.

AccountabilityAlarmistContextualDismissiveEmpatheticInvestigativeNeutral / Wire

Six models, each chosen for the job

ClearSignal does not route everything through one model and hope for the best. Fast models handle volume. Reasoning models handle editorial decisions. The most capable model is reserved for the final analysis, where nuance and quality matter most.

Speed Tier

text-embedding-3-small
Embeddings
384-dim vectors at scale
VADER
Sentiment analysis
Per-article tone, grouped by lean

Reasoning Tier

Claude Haiku
Labeling, scoring, framing
Hundreds of calls per cycle
GPT-4o-mini
Validation and selection
Structured editorial reasoning

Generation Tier

Claude Sonnet
Full analysis generation
Long-form editorial quality

Core Principles

Source-Blind Framing

Analyses never say "the liberal take" or "the conservative view." Framing contrasts are attributed to specific outlets. CNN emphasized X while Fox focused on Y. Readers see what each source actually reported. No shorthand. No labels.

Coverage Gap Detection

Every story is checked for who is not covering it. If fifteen right-leaning outlets cover a congressional hearing but only two left-leaning outlets mention it, that silence is surfaced as data. What is missing is often as revealing as what is there.

Separation of Concerns

No single model makes all the decisions. No single formula drives all the scores. Fast models handle volume. Powerful models handle nuance. Deterministic formulas handle ranking. This prevents any single point of failure or bias amplification.

From raw feeds tostructured clarity

Ingest

Cluster

Score

Analyze

Six models, each chosen for the job

From raw feeds to
structured clarity