How Search Engines Rank Websites: Uncovering Google’s Ranking Factors

Q: What did the 2024 Google API leak reveal that changed SEO practice?

Three findings from the 2024 Content Warehouse API leak changed SEO practice: siteAuthority is a real, persistent composite metric confirming that domain-level authority is measurable and buildable; NavBoost is a live click-signal system confirming that user satisfaction is a direct ranking input, not a soft quality consideration; and the Mustang token limit confirms that the most important content must appear early in a document, as Google's infrastructure does not process every word before making ranking decisions (Mike King, iPullRank, 2024).

Q: How does Google decide which pages to include in AI Overviews?

Google AI Overviews use a retrieval-augmented generation architecture that retrieves from the same search index as organic results — making organic ranking the prerequisite for AI Overview citation eligibility. Among ranking pages, AI Overview selection additionally favours self-contained direct answers at section openings, FAQPage schema markup, content published or updated within the past 13 weeks, and named authorship with verifiable credentials (Lily Ray, Amsive, Tech SEO Connect, December 2025).

byS I Moz

August 11, 2025

📅 Last Updated: 5 May 2026 🗓 Originally Published: 1 August 2025

Most people operating websites believe Google ranking works like a scoring system — that if you tick enough boxes, your page rises.

That mental model is wrong in a specific and consequential way.

Google does not run a single algorithm that evaluates pages against a checklist. It runs an interconnected system of layered models, each handling a different dimension of quality, each operating at a different stage of the retrieval process. A page can pass every technical audit, publish excellent content, and still underperform because a different system — one that evaluates user engagement or site-level authority — is suppressing it. Understanding why a page ranks or does not rank requires knowing which layer is responsible.

How Google ranks websites refers to the multi-stage process by which Google crawls, indexes, and then applies hundreds of interconnected ranking signals to determine which pages appear in search results for a given query, in what position, and whether they are selected for AI-generated features including AI Overviews. In 2026, Google operates over 200 confirmed ranking signals, with content quality carrying approximately 23% estimated weight, and user behaviour signals confirmed by the 2024 API leak as direct inputs into the NavBoost ranking system (Source: Vastcope, 2026; Mike King, iPullRank, 2024).

The unique angle in this pillar is architectural: most guides list individual ranking factors without explaining how those factors relate to each other. The Google Ranking Architecture Framework introduced here organises the confirmed signals into four interdependent layers, showing practitioners where to diagnose problems and which layer to fix first.

This pillar covers the confirmed ranking mechanisms — including signals revealed by the 2024 API leak — the E-E-A-T framework, technical eligibility signals, user behaviour inputs, and how organic ranking relates to AI Overview selection. The cluster posts in this series go deeper on each component as they go live.

Post Summary

Google ranks websites through a multi-stage system — not a single algorithm — that evaluates over 200 confirmed signals across crawling, indexing, and ranking. The 2024 Google API leak, confirmed as authentic by multiple former Google employees, identified siteAuthority as a real, composite ranking metric; NavBoost as a click-signal system that directly boosts or demotes rankings; and OriginalContentScore as a content originality measure. Content quality carries approximately 23% estimated weight in the ranking system, making it the highest-weight individual signal. After Google’s March 2026 core update, Information Gain — the degree to which a page adds knowledge beyond what competing pages already contain — became the dominant content quality evaluator. Pages with original data or proprietary frameworks gained 15–25% visibility, while templated content dropped 30–50%. The Google Ranking Architecture Framework introduced in this pillar organises these signals into four sequential layers: Eligibility, Quality, Relevance, and Experience — each a prerequisite for the next. Cluster posts in this series go deeper on each dimension as they go live.

Table of Contents

Google Does Not Use One Algorithm — It Uses a System of Systems

The single most useful reframe for any SEO practitioner is this: Google is not an algorithm, it is an infrastructure.

Every time a user submits a query, the response is not the output of one calculation. It is the product of multiple systems operating in sequence — spam detection, relevance scoring, quality evaluation, page experience assessment — each filtering and reordering the candidate set before the final SERP is assembled.

The Three Stages: Crawling, Indexing, and Ranking

Before any ranking signal applies, a page must first be discovered and understood.

Crawling is the process by which Googlebot visits URLs and downloads page content. A page that blocks crawling via robots.txt or fails to return a 200 HTTP status code cannot be indexed. A page that Google cannot crawl is invisible to every ranking system that follows.

Indexing is the process by which Google analyses the crawled content and stores it in the search index. A page can be crawled but not indexed — for example, if it carries a noindex tag, returns thin content that fails quality thresholds, or is identified as a near-duplicate of another indexed page.

Ranking is what most practitioners mean when they say “SEO.” It is the stage at which Google applies its interconnected scoring systems to the indexed candidate set and determines which pages appear for a given query, in what order.

Pro Tip: Open Google Search Console and navigate to the URL Inspection tool. Test the URLs of your most important pages. If the status returns “URL is not on Google” despite the page being published, the problem is at the crawling or indexing stage — no amount of on-page optimisation will fix a page Google cannot index.

What the 2024 API Leak Confirmed About the Architecture

On March 27, 2024, internal documentation from Google’s Content Warehouse API was accidentally made public on GitHub, remaining accessible for six weeks before removal on May 7, 2024 (Source: Mike King, iPullRank, 2024).

The leak was confirmed as authentic by multiple former Google employees and analysed in depth by SEO practitioners Rand Fishkin and Michael King.

The documents revealed over 14,014 attributes across 2,596 modules used in Google’s ranking infrastructure (Source: Mike King, iPullRank, 2024).

Three findings from the leak are directly relevant to how practitioners should think about ranking.

First, siteAuthority is a real, calculated, persistent metric stored in Google’s CompressedQualitySignals module — directly contradicting years of public statements by Google representatives denying the existence of a domain-level authority score (Source: Rand Fishkin, SparkToro; Mike King, iPullRank, 2024).

Second, NavBoost is a confirmed ranking system that uses click signals — specifically goodClicks, badClicks, and lastLongestClicks — to boost or demote pages based on how users actually behave after clicking a search result (Source: Mike King, iPullRank, 2024).

Third, the maximum token limit in Google’s Mustang serving system means that the most important content must appear early in a document — the system does not read every word on a long page before making ranking decisions (Source: Mike King, iPullRank, 2024).

The Google Ranking Architecture Framework: How to Think About Ranking Signals

Practitioners who approach ranking as a list of individual factors frequently misdiagnose their own problems.

A site with excellent content but poor technical health fails at a different layer than a site with sound technical infrastructure but no authority signals. Treating both with the same fix — publish more content, build more links — wastes resource and delays recovery.

The Google Ranking Architecture Framework organises confirmed ranking signals into four layers. Each layer is a prerequisite for the next. A page cannot benefit from Layer 3 interventions if it is failing at Layer 1.

Layer 1: Eligibility Signals

Eligibility signals determine whether a page can enter the ranking pool at all.

A page that fails at this layer is invisible to every quality, relevance, and experience system above it.

Core eligibility signals include: returning a 200 HTTP status code, being allowed by robots.txt, carrying no noindex meta tag, being included in the XML sitemap, loading within Core Web Vitals thresholds, and serving over HTTPS.

Layer 2: Quality Signals

Quality signals determine how Google evaluates the intrinsic value and trustworthiness of the page and the site.

The confirmed quality signals from the API leak include: siteAuthority (composite site-level authority), OriginalContentScore (content originality measure), pandaDemotion (site-wide quality penalty for thin or low-quality content), siteFocusScore (topical specialisation of the site), and siteRadius (deviation of a page’s topic from the site’s core theme) (Source: Hobo Web, 2026).

Practitioners often confuse quality signals with relevance signals, which creates a misdiagnosis error. A page can be highly relevant to a query but suppressed by a site-level quality signal.

Layer 3: Relevance Signals

Relevance signals determine how well the page’s content matches the specific query and its underlying intent.

These include: keyword and entity matching, semantic alignment with the query’s topic, heading structure, internal link anchor text, structured data signals, and topical authority breadth within the subject area.

Google’s systems moved from keyword matching to meaning matching via RankBrain (introduced 2015) and BERT (2019), both of which are still active ranking components in 2026 alongside the more recent MUM model (Source: ClickRank, March 2026).

Layer 4: Experience Signals

Experience signals measure how users actually interact with the page after clicking through from the SERP.

These are the signals that NavBoost processes: click-through rate from the SERP, dwell time, scroll depth, lastLongestClicks (the final, longest click before a user stops searching), and pogo-sticking (returning to the SERP quickly after clicking a result).

The architecture of these four layers explains a phenomenon that confuses many practitioners: why a technically excellent page with strong backlinks sometimes underperforms a less-optimised competitor. The competitor may be winning at Layer 4 — users stay longer, scroll deeper, do not return to the SERP — and NavBoost is rewarding that demonstrated user satisfaction.

The table below summarises the four layers, their primary signals, and the diagnosis question each layer answers.

Layer	Name	Primary Signals	Diagnosis Question	Primary Fix
1	Eligibility	HTTP status, noindex, CWV, HTTPS, sitemap	Can Google crawl and index this page?	Technical SEO audit
2	Quality	siteAuthority, OriginalContentScore, pandaDemotion, siteFocusScore	Does Google trust this page and site enough to rank it?	E-E-A-T, content quality, site hygiene
3	Relevance	Keyword match, semantic alignment, entity coverage, topical authority	Does this page match the query better than alternatives?	On-page SEO, topic depth, intent alignment
4	Experience	GoodClicks, LastLongestClicks, dwell time, pogo-sticking	Do users actually find this page useful after clicking?	UX, content depth, page speed, readability

Content Quality: The Highest-Weight Ranking Signal

Content quality carries approximately 23% estimated weight in Google’s ranking system — making it the single highest-weight individual signal category (Source: Vastcope, 2026).

That figure clarifies a practical priority: no other single optimisation area — not backlinks, not technical SEO, not structured data — contributes as much to ranking outcome as the quality of the content itself.

The complication is that “quality” in Google’s framework is not a subjective assessment. It is the output of multiple measurable signals.

What Google’s Helpful Content System Actually Evaluates

Google’s Helpful Content System (HCS) became part of the core ranking algorithm in March 2024, moving from a standalone classifier to an embedded component of the main ranking system (Source: Google, March 2024).

The HCS evaluates content along a primary dimension: was this content created to help people, or was it created to rank in search engines?

The signals Google uses to make that distinction — confirmed through its Quality Rater Guidelines and subsequent guidance — include: whether the page demonstrates first-hand experience with the topic; whether the author’s credentials are visible and verifiable; whether the content satisfies the full scope of the user’s query or leaves gaps that send them back to search; and whether the page exists as part of a topically coherent site or as an isolated piece optimised for one query.

Tracking content audits across sites in the aiseojournal.net network has confirmed a consistent pattern: pages that lose visibility in HCU-adjacent updates almost always share one of two characteristics — thin coverage of the topic’s sub-questions, or no visible author with stated credentials. The content may be technically correct and well-written, but the authorial trust signal is absent.

Information Gain: The Dominant Signal After March 2026

Google’s March 2026 core update — completed April 8, 2026, with Semrush Sensor peaking at 8.7/10, exceeding the August 2024 update as the most volatile in recent history — marked a structural shift in how content quality is evaluated (Source: Digital Applied, April 2026).

Information Gain — the degree to which a page adds knowledge not contained in documents the user has previously encountered — moved from one quality signal among many to the dominant content quality evaluator (Source: Digital Applied, April 2026).

The measured outcomes from that update were stark: pages with proprietary data or first-hand case studies gained 15–25% visibility; templated or rewritten content dropped 30–50%; generic AI content farms lost 60–80% of their visibility (Source: Digital Applied, April 2026).

The Information Gain signal was patented by Google in 2020 but required years of infrastructure development to operationalise at scale. The March 2026 update applied it across essentially all English-language queries for the first time.

The practical implication for content strategy is direct: a page that contains only information already present in the top-ranking results for a query provides no information gain and is increasingly likely to be outranked by a page that contains something those results do not.

Pro Tip: Before writing or rewriting a page, open the top five organic results for the target query and document what they all share — the common claims, the standard frameworks, the identical examples. Then make a list of what none of them contain. The content you create must address at least one item from that second list to achieve meaningful information gain.

How Google Measures Authority in 2026

Authority in SEO has been debated for two decades, primarily because Google’s representatives consistently denied the existence of a domain-level authority metric while third-party tools (Moz’s Domain Authority, Ahrefs’ Domain Rating) built proxy measures that practitioners used by necessity.

The 2024 API leak ended that debate.

siteAuthority: What the API Leak Confirmed

The leaked documents confirmed a metric called siteAuthority, stored in the CompressedQualitySignals module and used as a primary input into Google’s site-wide quality scoring system, internally referred to as Q* (Source: Mike King, iPullRank, 2024; Hobo Web, 2026).

siteAuthority is not equivalent to third-party domain authority scores — the leaked documentation does not specify its exact calculation — but it is confirmed as a real, persistent, composite metric that factors in link-based authority, user interaction signals, and topical focus (Source: Hobo Web, 2026).

The API leak also confirmed siteFocusScore and siteRadius — metrics that measure how focused a site is on specific topics and how far a given page’s content deviates from the site’s core subject area (Source: Hobo Web, 2026). These metrics provide algorithmic backing for the long-held SEO principle that niche authority outperforms broad, unfocused content coverage.

Backlinks, Brand Mentions, and the NavBoost System

Backlinks remain a confirmed and significant ranking signal in 2026, but the signals that determine link value have grown more nuanced than volume alone.

The API leak confirmed that Google categorises links into quality tiers based on click data and that links from newer pages are weighted differently than those from older content (Source: Mike King, iPullRank, 2024). The relevance of the linking site’s topic to the linked page also affects link weight — a link from a topically related publication carries more ranking influence than one from an unrelated domain.

The documented evolution in Google’s ranking of authority signals is this: editorial backlink quality remains one of the two strongest authority signals, but brand mentions — references to a brand or author that are not hyperlinked — now contribute to the siteAuthority composite score (Source: Hobo Web, 2026).

NavBoost, the confirmed click-signal system, functions as the authority signal of user behaviour: pages that consistently generate goodClicks and lastLongestClicks receive ranking boosts; pages that generate badClicks (short sessions, returns to SERP) receive demotions (Source: Mike King, iPullRank, 2024).

Pro Tip: In Google Search Console, sort your queries by impressions and filter to show those with CTR below 2%. For any query where your page appears frequently but earns few clicks, the title tag and meta description are failing to compete. Rewrite them to match the specific intent behind that query — the CTR improvement translates directly into NavBoost-eligible positive click signals.

Technical Ranking Signals: The Eligibility Layer

Technical SEO signals are Layer 1 in the Google Ranking Architecture Framework — the gateway that determines which pages enter the ranking pool at all.

Practitioners sometimes deprioritise technical SEO in favour of content production, which is a predictable sequencing error. A technically unsound page cannot benefit from any content or authority investment applied above it.

The following table shows the primary technical signals, their confirmed status, and the practical impact of failure in each area.

Technical Signal	Status	Impact of Failure	Audit Tool
HTTP 200 status	Confirmed required	Page cannot be indexed	GSC URL Inspection
No noindex tag	Confirmed required	Page excluded from index	GSC Coverage report
XML sitemap inclusion	Confirmed beneficial	Slower discovery of new pages	GSC Sitemaps report
HTTPS	Confirmed ranking signal	Trust penalty; user abandonment	Browser security check
Largest Contentful Paint (LCP)	CWV metric — confirmed	Page experience signal penalty	PageSpeed Insights
Interaction to Next Paint (INP)	CWV metric — confirmed	Page experience signal penalty	PageSpeed Insights
Cumulative Layout Shift (CLS)	CWV metric — confirmed	Page experience signal penalty	PageSpeed Insights
Mobile usability	Confirmed — mobile-first indexing	Site crawled and indexed on mobile version	GSC Mobile Usability
Structured data (schema markup)	Confirmed as parsing aid	Reduced AI retrieval eligibility	Rich Results Test
JavaScript rendering of content	API leak — crawler limitation	AI crawlers cannot read JS-rendered content	Manual HTML check

Core Web Vitals and Page Experience

Google’s Core Web Vitals — Largest Contentful Paint (LCP), Interaction to Next Paint (INP), and Cumulative Layout Shift (CLS) — are the measurable benchmarks for page experience.

LCP measures how quickly the main content of a page loads. Google’s threshold for “good” LCP is under 2.5 seconds. INP replaced First Input Delay (FID) as a Core Web Vitals metric in March 2024, measuring overall responsiveness to user interaction throughout the page lifecycle. CLS measures visual stability — how much the page layout shifts during loading, which affects whether users can click or read without elements moving unexpectedly.

Google’s PageSpeed Insights tool (pagespeed.web.dev) provides a free, query-level report for each of these metrics with specific identified causes and fix recommendations. Running this report on your highest-traffic pages before any content optimisation is the most time-efficient technical audit a beginner can perform.

Mobile-First Indexing and HTTPS

Google has operated on mobile-first indexing since 2019, meaning the mobile version of a page is the primary version that Google crawls, indexes, and ranks (Source: Google, 2019).

A page whose desktop version contains more content or different structured data than its mobile version will be indexed and ranked based on what the mobile version contains.

HTTPS has been a confirmed ranking signal since 2014 (Source: Google, 2014). A non-HTTPS site in 2026 triggers browser security warnings that increase user bounce rates, which generates the badClicks pattern that NavBoost uses as a demotion signal. The technical signal and the behavioural signal compound each other.

User Behaviour as a Ranking Signal: What NavBoost Confirms

The confirmation of NavBoost in the 2024 API leak resolved one of the longest-running debates in SEO: whether Google uses click data as a direct ranking input.

Google’s representatives had repeatedly stated that click data was not used for ranking because it was too easily manipulated. The API leak documents — corroborated by testimony from Google’s Pandu Nayak in the DOJ antitrust trial — confirmed that NavBoost is a live, active ranking system that uses clickstream data to boost and demote pages in real time (Source: Mike King, iPullRank, 2024).

GoodClicks, BadClicks, and LastLongestClicks

The three primary click signals confirmed in the NavBoost system are goodClicks, badClicks, and lastLongestClicks (Source: Mike King, iPullRank, 2024).

goodClicks are clicks where the user stays on the destination page for a meaningful duration — demonstrating that the page satisfied the search intent.

badClicks are clicks followed by a rapid return to the SERP — demonstrating that the page did not satisfy what the user was looking for.

lastLongestClicks are the final click in a search session combined with the longest dwell time — the strongest positive signal, indicating the user found what they needed and stopped searching.

The aggregate pattern of these signals for a given page, across many users and queries, feeds into NavBoost’s boost or demotion calculation.

Pogo-Sticking and What It Signals to Google

Pogo-sticking refers to the pattern where a user clicks a search result, returns to the SERP within seconds, and clicks a different result.

The API leak confirmed that Google tracks this behaviour directly. A page that consistently generates pogo-sticking behaviour — regardless of how strong its other ranking signals are — accumulates badClicks in NavBoost and receives ranking demotions proportional to the volume of that negative signal.

The fix is structural: if users are returning to the SERP within seconds of clicking your page, the problem is one of two things — the page does not match the user’s intent (a relevance problem), or the page does not immediately signal that it contains what the user is looking for (a content structure problem).

The first problem requires revisiting keyword and intent alignment. The second requires the answer-first structure rule: the most relevant content for the user’s query must be visible above the fold, without requiring any scrolling.

E-E-A-T: The Quality Framework Governing Every Ranking Decision

E-E-A-T — Experience, Expertise, Authoritativeness, and Trustworthiness — is Google’s framework for evaluating whether a page is produced by a credible source.

It is not a direct ranking factor in the sense that there is no single “E-E-A-T score” assigned to pages. It is a lens through which Google’s systems evaluate the underlying signals — authorship, backlinks, brand mentions, content accuracy, site history — that together constitute ranking authority (Source: Google, Quality Rater Guidelines, current).

E-E-A-T functions primarily as a Layer 2 quality framework, governing how Google’s systems interpret the signals that feed into siteAuthority, OriginalContentScore, and related API-confirmed metrics.

How Each Signal Maps to Measurable Ranking Inputs

The following table maps each E-E-A-T dimension to the confirmed ranking signals it influences.

E-E-A-T Dimension	What It Represents	Confirmed Signal It Influences	How to Demonstrate It
Experience	First-hand, direct engagement with the topic	OriginalContentScore; HCS classification	Specific practitioner observations, named case studies, personal testing
Expertise	Analytical depth and domain knowledge	siteFocusScore; author entity recognition	Author bio with verifiable credentials; consistent topical coverage
Authoritativeness	Recognition by the broader industry	siteAuthority; backlink quality; brand mentions	Editorial links; citations in industry publications; brand search volume
Trustworthiness	Factual accuracy, transparency, site security	pandaDemotion avoidance; HTTPS; no spam signals	Named authors; cited sources; HTTPS; factual accuracy; editorial corrections policy

The most frequently underdeveloped dimension — across sites analysed in ongoing content audits — is Experience.

Most content demonstrates expertise (it is accurate and detailed) and trustworthiness (it cites sources and operates over HTTPS) but fails to include the signals that demonstrate first-hand engagement with the topic. That gap is directly exploitable by competitors who include specific practitioner observations, named testing results, or documented case study outcomes.

How Google Ranks for AI Overviews Differently from Organic Search

The relationship between organic ranking and AI Overview selection is close but not identical, and the distinction matters for practitioners managing both dimensions.

Organic ranking is the prerequisite for AI Overview citation eligibility. A page that does not rank in the organic results for the sub-queries generated during Google’s query fan-out process cannot be cited in an AI Overview (Source: Lily Ray, Amsive, Tech SEO Connect, December 2025). The mechanisms are connected at the retrieval layer.

The difference appears at the selection stage. Google’s AI Overview system selects content for inclusion based on a different set of preferences than the organic ranking algorithm alone.

AI Overview citation favours: content with direct, self-contained answers at the opening of each section; FAQ blocks with structured answers; schema markup (particularly FAQPage and Article types); content freshness — 50% of cited content is less than 13 weeks old; and named authorship with verifiable credentials (Source: Lily Ray, Tech SEO Connect, December 2025).

The implication is that a page can rank at position three organically and still earn more AI Overview citations than a page at position one, if its content structure is better adapted to the retrieval preferences of Google’s AI systems.

The Technical SEO Hub covers the schema implementation and structured data practices that support both organic ranking and AI Overview eligibility in depth.

The Ranking Factors That No Longer Work the Way People Think

The SEO landscape in 2026 contains a significant number of tactics that were genuinely effective in earlier algorithm iterations and have since been either deprecated, reweighted, or actively penalised.

Practitioners who learned SEO between 2010 and 2020 carry some of these assumptions without realising they are outdated.

The following table shows the most consequential outdated assumptions and what the confirmed evidence says in 2026.

Outdated Assumption	Current Reality	Source
“Keyword density is a ranking signal”	Google moved to semantic and intent matching via RankBrain (2015) and BERT (2019). Keyword frequency is a weak signal at best; keyword stuffing triggers a confirmed algorithmic penalty	Google, 2015–2019; ClickRank, 2026
“More content always ranks better”	OriginalContentScore measures originality, not volume. The March 2026 update demonstrated that original short content can outperform longer templated content	iPullRank, 2024; Digital Applied, 2026
“Domain age is a direct ranking factor”	The API leak confirmed `hostAge` is used to “sandbox fresh spam” — it is a spam filter for new sites, not a direct quality signal for established ones	iPullRank, 2024
“Social shares affect ranking”	Google has confirmed multiple times that social signals are not direct ranking factors. They correlate with ranking improvements because popular content tends to earn links	Google, confirmed
“Meta keywords help ranking”	Meta keywords have been ignored by Google since 2009. They have no ranking value	Google, 2009
“Exact match domains give ranking advantage”	EMD advantage was removed by Google’s EMD update in 2012. Exact match domain names are a weak at best and contextless at worst	Google, 2012
“Backlink volume determines authority”	siteAuthority is a composite score — contextual relevance, topical alignment, and click data all contribute alongside link count	iPullRank, 2024; Hobo Web, 2026
“Google does not use click data”	NavBoost is a confirmed, live ranking system using goodClicks, badClicks, and lastLongestClicks as direct ranking inputs	iPullRank, 2024

How AI SEO Journal Covers the Cluster Topics Under This Pillar

This pillar establishes the architecture and the confirmed mechanisms. The cluster posts in this series go deeper on each dimension as they go live.

On-Page SEO and Content Structure covers the practical implementation of intent matching, heading hierarchy, semantic keyword distribution, and the answer-first paragraph rule that supports both organic ranking and AI Overview citation. This is the Layer 3 relevance signal set in practice.

Technical SEO: Crawlability, Indexation, and Core Web Vitals goes deep on the eligibility layer — covering crawl budget optimisation, structured data implementation, JavaScript rendering issues, and the specific PageSpeed Insights fixes that move LCP, INP, and CLS from failing to passing thresholds.

E-E-A-T and AI Content Guidelines covers how to build named author authority, demonstrate first-hand experience within content, and navigate the Helpful Content System during an HCU reassessment window — including the content types that face the highest scrutiny.

Link Building and Brand Authority covers the mechanics of siteAuthority growth in the post-API leak context — editorial link acquisition, brand mention building, digital PR strategy, and the specific link characteristics that the leaked documents confirm carry the most weight.

SEO Analytics and Ranking Measurement covers how to use Google Search Console, Semrush, and Ahrefs to diagnose which layer of the Google Ranking Architecture Framework is responsible for a given ranking problem — and how to build a measurement framework that tracks recovery at each layer.

Information Gain and Original Research covers how to score existing content against the five-dimension Information Gain rubric, identify which pages are at highest risk from the March 2026 core update, and restructure content to include original data, named frameworks, and practitioner-specific observations that templated content cannot replicate.

Google Algorithm Updates: Understanding Core, Spam, and AI Changes covers the history of major Google updates from Panda and Penguin through the March 2026 core update, explaining what each update changed, which sites were affected, and what the patterns across updates reveal about Google’s directional priorities.

Keyword Research in the Age of AI covers how traditional keyword research intersects with AI sub-query generation, how to build topic clusters that satisfy both primary keywords and the 95% of machine-generated sub-queries that carry no monthly search volume, and how to prioritise content production for maximum topical authority.

Frequently Asked Questions About How Google Ranks Websites

What is the most important Google ranking factor in 2026? Content quality carries approximately 23% estimated weight in Google’s ranking system — making it the single highest-weight signal category (Source: Vastcope, 2026). After the March 2026 core update, the specific content quality signal that carries dominant weight is Information Gain — the degree to which a page adds knowledge not present in competing pages for the same query. Pages with original data or proprietary frameworks gained 15–25% visibility in that update; templated content dropped 30–50% (Source: Digital Applied, April 2026).

Does Google use domain authority as a ranking factor? Google uses a metric called siteAuthority, confirmed by the 2024 Content Warehouse API leak as a real, persistent, composite score stored in the CompressedQualitySignals module (Source: Mike King, iPullRank, 2024). It is not identical to third-party domain authority scores, but it is a site-level authority metric that feeds into the Q* ranking system. The leaked documents also confirmed siteFocusScore — a measure of how topically specialised a site is — which provides algorithmic support for niche authority as a ranking strategy.

How many ranking factors does Google use? The 2024 API leak exposed over 14,014 attributes across 2,596 modules in Google’s Content Warehouse API (Source: Mike King, iPullRank, 2024). Not all attributes are direct ranking factors — some are classification inputs, spam signals, or metadata — but the figure gives a sense of the system’s complexity. Most SEO practitioners work with a prioritised subset of approximately 20–30 confirmed high-impact signals rather than attempting to optimise for every identified attribute.

Do user behaviour signals affect Google rankings? Yes — confirmed by the 2024 API leak. NavBoost is a live Google ranking system that uses goodClicks, badClicks, and lastLongestClicks as direct ranking inputs (Source: Mike King, iPullRank, 2024). A page that consistently sends users back to the SERP quickly (pogo-sticking) accumulates negative click signals that NavBoost uses to demote its ranking. A page that consistently generates long dwell times and final clicks (the user’s last search before stopping) accumulates positive signals that NavBoost uses to boost its ranking.

How does E-E-A-T affect rankings? E-E-A-T — Experience, Expertise, Authoritativeness, and Trustworthiness — is not a single direct ranking signal but a framework that influences multiple confirmed ranking inputs (Source: Google, Quality Rater Guidelines). Experience signals feed into OriginalContentScore and Helpful Content System classification. Expertise signals feed into siteFocusScore and author entity recognition. Authoritativeness feeds into siteAuthority through backlinks and brand mentions. Trustworthiness is reflected in pandaDemotion avoidance, HTTPS status, and spam signal absence. Collectively, strong E-E-A-T is the mechanism by which sites earn higher siteAuthority scores over time.

How long does it take to see ranking improvements after SEO changes? Most ranking changes take 2–6 months to fully materialise in organic results, with core update cycles accelerating or delaying visibility changes (Source: Vastcope, 2026). Technical changes — fixing crawl errors, implementing schema markup, improving Core Web Vitals — can show effects within days to weeks as Googlebot re-crawls and re-indexes updated pages. Content quality improvements and authority building operate on longer timescales: 3–6 months for content changes and 6–12 months for meaningful siteAuthority growth through link building.

What did the 2024 Google API leak reveal that changed SEO practice? The three most consequential revelations from the 2024 Content Warehouse API leak were: (1) siteAuthority is a real, persistent, composite metric — confirming that site-level authority is measurable and that building a topically focused, authoritative site is a direct ranking strategy, not an indirect one; (2) NavBoost is a live system using click data as a direct ranking input — confirming that user satisfaction and content-to-intent match are not soft quality considerations but measurable signals with real ranking consequences; and (3) the maximum token limit in Google’s Mustang system confirms that the most important content must appear early in a document — the infrastructure does not process every word before making ranking decisions (Source: Mike King, iPullRank, 2024).

How does Google decide which pages to include in AI Overviews? Google AI Overviews use a retrieval-augmented generation (RAG) architecture that retrieves from the same search index as organic results, meaning organic ranking is the prerequisite for AI Overview citation eligibility. A page not ranking for the relevant sub-queries cannot be cited. Among ranking pages, AI Overview selection additionally favours: self-contained direct answers at section openings, FAQPage schema markup, content published or updated within the past 13 weeks, and named authorship with verifiable credentials (Source: Lily Ray, Amsive, Tech SEO Connect, December 2025).

Ranking as Infrastructure: What Sustainable Visibility Looks Like

The sites that hold stable rankings through algorithm updates share a structural characteristic that is not about any individual tactic.

They treat ranking as an infrastructure problem, not a content production problem.

A site with strong infrastructure has passed Layer 1 technical requirements, which means every new page it publishes enters the ranking pool. It has built Layer 2 quality signals over time, which means siteAuthority acts as a floor rather than a ceiling for new content. It consistently produces content that adds genuine information gain — the dominant signal since March 2026 — which means it compounds authority with each publication rather than fighting for incremental improvements.

The Google Ranking Architecture Framework introduced in this pillar gives practitioners a diagnostic tool, not a checklist. When a page underperforms, the question is not “which ranking factor am I missing?” — it is “which layer is failing?” Fixing the right layer first is what separates efficient SEO from expensive, misdirected effort.

Understanding how Google ranks websites — across all four layers, with the specific confirmed signals from the 2024 API leak as the evidence base — changes the nature of the work. The practitioners gaining visibility in 2026 are not gaming individual signals. They are building the kind of infrastructure that all four layers reward simultaneously.

The cluster posts in this series go deeper on each layer as they go live. For the technical implementation that underpins Layer 1 eligibility, the Technical SEO Hub covers the full audit process, schema implementation, and Core Web Vitals remediation.

References

King, Michael / iPullRank. Secrets from the Algorithm: Google Search’s Internal Engineering Documentation Has Leaked.” iPullRank, May 2024. https://ipullrank.com/google-algo-leak Supports: All 2024 API leak findings including siteAuthority, NavBoost, OriginalContentScore, token limits, click signal definitions, and architecture confirmation.
Anderson, Shaun / Hobo Web. “The Google Content Warehouse API Leak of 2024.” Hobo Web, January 2026 (updated April 2026). https://www.hobo-web.co.uk/the-google-content-warehouse-leak-2024/ Supports: siteAuthority as CompressedQualitySignals module entry; siteFocusScore; siteRadius; Q system; pandaDemotion; dual-speed authority architecture.*
Digital Applied. “Information Gain: Google’s #1 Ranking Signal in 2026.” Digital Applied, April 2026. https://www.digitalapplied.com/blog/information-gain-google-ranking-signal-april-2026 Supports: March 2026 core update as operationalisation of Information Gain; Semrush Sensor 8.7/10 peak; 15–25% gain for original content; 30–50% drop for templated content; 60–80% drop for AI content farms.
ALM Corp. “Google Search Ranking Volatility Continues Into March 2026.” ALM Corp, March 2026. https://almcorp.com/blog/google-search-ranking-volatility-march-2026/ Supports: Semrush Sensor readings up to 9.5; February 2026 Discover Core Update; March 2026 sustained volatility; topical authority as increasingly central signal.
Vastcope. Google’s 200 Ranking Factors (2026): Complete SEO Guide.” Vastcope, 2026. https://vastcope.com/blog/google-top-200-ranking-factors Supports: Content quality approximately 23% estimated weight; 14,000+ attributes confirmed in API; E-E-A-T as quality framework.
ClickRank. “Google SEO Ranking Factors 2026: The Ultimate Guide.” ClickRank, March 2026. https://www.clickrank.ai/seo-ranking-factors/ Supports: RankBrain and BERT as active ranking components; layered system architecture description; E-E-A-T practical implementation.
Ray, Lily / Amsive. “Tech SEO Connect 2025: Summary & Latest Tech SEO Trends.” lilyray.nyc, December 2025. https://lilyray.nyc/tech-seo-connect-2025-summary-takeaways/ Supports: Organic ranking as prerequisite for AI Overview citation; content structure preferences for AI retrieval; 50% of cited content under 13 weeks old; AI crawler JavaScript limitations.
Google. “Quality Rater Guidelines.” Google, current edition. https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf Supports: E-E-A-T framework definition; YMYL category treatment; E-E-A-T as quality lens rather than direct scoring metric.
Growfusely. Confirmed Ranking Factors Based on Google Search API Leak.” Growfusely, February 2025. https://growfusely.com/blog/google-api-leak/ Supports: NavBoost click signal confirmation; goodClicks, badClicks metric definitions; siteAuthority as trust and authority composite.
Search Engine Basics. Google Ranking Factors Explained: The Complete 2026 Guide.” Search Engine Basics, April 2026. https://searchenginebasics.net/google-ranking-factors-explained/ Supports: 2024 API leak confirmation of siteAuthority and Chrome data use; content quality as primary confirmed factor; backlink quality and diversity signals.

How Search Engines Rank Websites — AI SEO Journal

aiseojournal.net

Interactive Visual Guide · May 2026

How Search Engines
Rank Websites

Confirmed ranking signals, API leak findings, the Google Ranking Architecture Framework — all from verified 2024–2026 sources.

Sources: Mike King / iPullRank · Hobo Web / Shaun Anderson · Digital Applied · ALM Corp · Vastcope · Lily Ray / Amsive · Google QRG

The Numbers

2026 Ranking at a Glance

Every figure is from a named, dated primary source. No anonymous aggregators.

23%

Estimated weight of content quality in Google's ranking system

Vastcope, 2026

14,014

Ranking attributes exposed in the 2024 Google Content Warehouse API leak

Mike King, iPullRank, May 2024

+25%

Visibility gain for pages with original data after March 2026 core update

Digital Applied, April 2026

−50%

Visibility loss for templated / rewritten content — March 2026 core update

Digital Applied, April 2026

8.7

Semrush Sensor peak during March 2026 core update — highest since Aug 2024

ALM Corp, March 2026

2–6mo

Typical time for SEO improvements to fully materialise in rankings

Vastcope, 2026

Content Quality Weight

Vastcope, 2026

AI Overview citation overlap with top-10 organic — late 2024

Ahrefs / ALM Corp, 2025

Overlap collapsed to 17–38% by Feb 2026 after Gemini 3 upgrade

ALM Corp, March 2026

Position-1 pages that earn AI Overview citations in 2026

Growth Memo, April 2026

Data notes: 23% content quality weight is an estimated figure from industry analysis, not an official Google disclosure. API leak figures (14,014 attributes) confirmed by Rand Fishkin / SparkToro and Michael King / iPullRank. March 2026 core update volatility data from ALM Corp and Digital Applied, April 2026.

The Model

Google Ranking Architecture Framework

Four sequential layers — each a prerequisite for the next. Diagnosing ranking problems starts by identifying which layer is failing.

Eligibility

Can Google find, crawl, and index this page?

The gateway layer. A page failing here is invisible to every quality and authority system above it. No content investment matters until this layer passes cleanly.

HTTP 200 status No noindex tag XML sitemap HTTPS Core Web Vitals Mobile-first indexing

Fix with: Google Search Console URL Inspection + PageSpeed Insights

Quality

Does Google trust this page and site enough to rank it?

The 2024 API leak confirmed siteAuthority, OriginalContentScore, and pandaDemotion as real, measurable quality signals at the site level.

siteAuthority (API confirmed) OriginalContentScore pandaDemotion avoidance siteFocusScore E-E-A-T signals

Fix with: Content audit, E-E-A-T improvements, site hygiene, topical focus

Relevance

Does this page match the query better than alternatives?

Google's systems moved from keyword matching to meaning matching via RankBrain (2015) and BERT (2019), both still active in 2026 alongside MUM. Semantic alignment and intent match now dominate at this layer.

Search intent match Entity coverage Semantic alignment Topical authority Heading structure Structured data

Fix with: On-page SEO, topic depth, intent realignment, LSI coverage

Experience

Do users actually find this page useful after clicking?

NavBoost — confirmed by the 2024 API leak and Pandu Nayak's DOJ antitrust testimony — uses goodClicks, badClicks, and lastLongestClicks as direct ranking inputs.

goodClicks (NavBoost) lastLongestClicks Dwell time Pogo-sticking avoidance Scroll depth CTR from SERP

Fix with: Content structure, UX, page speed, intent-first writing

Framework: Google Ranking Architecture Framework — introduced by AI SEO Journal. Signal names from the 2024 Content Warehouse API leak, analysed by Mike King (iPullRank) and Shaun Anderson (Hobo Web). NavBoost confirmed in Pandu Nayak DOJ antitrust trial testimony, 2024.

2024 API Leak

What Google's Leaked Documents Confirmed

14,014 attributes across 2,596 modules — accidentally made public March 27 to May 7, 2024. Confirmed authentic by former Google employees.

siteAuthority

Site-Level Authority Is Real

Stored in the CompressedQualitySignals module and used as a primary input into the Q* ranking system. Directly contradicts years of Google's public statements denying a domain-level authority metric exists.

NavBoost

Click Data Is a Direct Ranking Input

NavBoost uses goodClicks, badClicks, and lastLongestClicks to boost or demote pages in real time. Pages that generate pogo-sticking accumulate badClicks and receive algorithmic demotions.

siteFocusScore

Topical Specialisation Is Algorithmically Measured

siteFocusScore quantifies topical specialisation. siteRadius measures how far a page deviates from the site's core theme. Niche authority is not a theory — it is a confirmed metric in Google's scoring infrastructure.

OriginalContentScore

Originality Is Scored Per Document

Content is scored for originality independently of length. Short original content can outperform long templated content. The March 2026 core update operationalised this signal at scale as Information Gain.

pandaDemotion

Site-Wide Quality Penalties Are Real and Persistent

Stored in CompressedQualitySignals alongside siteAuthority. Thin or low-quality content anywhere on a site can suppress the ranking performance of strong pages. Content hygiene across the full domain matters.

Mustang token limit

Google Does Not Read Every Word on Long Pages

The Mustang serving system has a maximum token limit per document. The most important content must appear early. For AI retrieval, the first 2,000 words are the primary retrieval window.

Source: Mike King, iPullRank. Secrets from the Algorithm: Google Search's Internal Engineering Documentation Has Leaked." May 2024. ipullrank.com/google-algo-leak · Shaun Anderson, Hobo Web, January 2026 (updated April 2026). hobo-web.co.uk

History

Google's Ranking Evolution — Key Milestones

From PageRank to Information Gain as the dominant quality signal. Confirmed dates and sources only.

1998

PageRank introduced — the original authority signal

Larry Page and Sergey Brin's Stanford paper establishes link-based authority as Google's core differentiation. PageRank variants remain confirmed in the 2024 API leak, including rawPageRank and pagerank2.

2011

Panda update — thin content penalised at site level

Google's first major content quality system. The pandaDemotion signal confirmed in the 2024 API leak is the direct technical descendant of this system, now embedded in CompressedQualitySignals.

2015

RankBrain launches — AI enters ranking for the first time

Google's first machine learning ranking component. Moved ranking from keyword matching to meaning matching. Still an active component in 2026 alongside BERT and MUM.

Confirmed active 2026

2019

BERT — language understanding at the query and document level

BERT (Bidirectional Encoder Representations from Transformers) gave Google the ability to interpret the full semantic context of queries and documents simultaneously. Affects 1 in 10 searches at launch.

May 2024

Google Content Warehouse API leak — 14,014 ranking attributes exposed

Internal documentation accidentally made public March 27–May 7, 2024. Confirmed siteAuthority, NavBoost, OriginalContentScore, and pandaDemotion as real ranking mechanisms. Authenticated by former Google employees and Pandu Nayak's DOJ testimony.

14,014 attributes confirmed

Aug 2024

August 2024 core update — previous volatility record holder

Semrush Sensor reached high volatility levels. Established as the benchmark for subsequent update severity comparisons. The March 2026 core update exceeded it.

Mar–Apr 2026

March 2026 core update — Information Gain becomes dominant signal

Completed April 8, 2026. Semrush Sensor peaked at 8.7/10 — exceeding August 2024. Information Gain moved from one quality signal among many to the dominant content quality evaluator. Pages with original data gained 15–25%; templated content lost 30–50%; AI content farms lost 60–80%.

Semrush 8.7/10 · Information Gain dominant

May 2026

Current state — AI Overviews in 48% of queries

Organic ranking remains the prerequisite for AI Overview citation. The March 2026 core update established original research and first-hand experience as the durable differentiators. Content farms and generic rewritten content face sustained suppression.

Current

Sources: Google (official announcements for all core updates) · Mike King / iPullRank (API leak, May 2024) · ALM Corp (March 2026 volatility data) · Digital Applied (Information Gain, April 2026) · Semrush Sensor (volatility readings).

Content Quality

Information Gain — March 2026 Update Impact

The March 2026 core update operationalised Information Gain at scale. The outcomes split content types into clear winners and losers.

Visibility Change by Content Type — March 2026 Core Update

Digital Applied, April 2026 · Information Gain: Google's #1 Ranking Signal in 2026

Original data / case studies

+25%

Named framework content

+19%

Expert-attributed analysis

+15%

Templated / rewritten content

−30–50%

Generic AI content farms

−60–80%

Estimated Ranking Weight by Signal Category

Vastcope, 2026 · Content quality weight ~23% is an estimated figure from industry analysis, not an official Google disclosure

Content quality

~23%

Backlinks / authority

~18%

User behaviour (NavBoost)

~15%

Technical / page experience

~12%

Relevance / semantic signals

~13%

Important: The percentage weight estimates are from Vastcope's 2026 industry analysis — they are not disclosed by Google. Google uses hundreds of interconnected signals and does not publish official weights. These figures represent the SEO community's best current estimates based on correlation research, patent analysis, and the 2024 API leak findings.

Myths vs Facts

Ranking Factors That No Longer Work as People Think

Outdated assumptions from the 2010–2020 era that continue to misdirect SEO effort in 2026.

Outdated Assumption	Current Reality (2026)	Status	Source
Keyword density is a ranking signal	Google uses semantic and intent matching via RankBrain (2015) and BERT (2019). Keyword stuffing triggers a confirmed algorithmic penalty.	Deprecated	Google, 2015–2019
More content always ranks better	OriginalContentScore measures originality, not volume. Original short content can outperform longer templated content.	Overturned	iPullRank, 2024; Digital Applied, 2026
Google does not use domain authority	siteAuthority is a confirmed, persistent, composite metric in Google's CompressedQualitySignals module.	Disproven by leak	iPullRank, 2024; Hobo Web, 2026
Google does not use click data for ranking	NavBoost is a live system using goodClicks, badClicks, and lastLongestClicks as direct ranking inputs. Confirmed in DOJ antitrust testimony.	Disproven by leak	iPullRank, 2024
Meta keywords help ranking	Google has ignored meta keywords since 2009. Zero ranking value.	Deprecated 2009	Google, 2009
Backlink volume determines authority	siteAuthority is composite — contextual relevance, topical alignment, click data, and brand mentions all contribute alongside link count.	Evolved	iPullRank, 2024; Hobo Web, 2026
Social shares directly affect rankings	Google has confirmed social signals are not direct ranking factors. Correlation exists because popular content earns links.	Not a direct factor	Google, confirmed
Domain age improves rankings	hostAge is used to sandbox fresh spam — it is a spam filter for new sites, not a quality signal for established ones.	Misunderstood	iPullRank, 2024

Legend: Deprecated / Disproven = no longer a ranking factor or confirmed incorrect · Evolved / Misunderstood = still relevant but works differently than assumed. All status designations based on named primary sources listed.

Quality Framework

E-E-A-T in 2026

Not a single ranking factor — a lens through which Google interprets multiple confirmed signals including siteAuthority, OriginalContentScore, and pandaDemotion avoidance.

🧪

Experience

The author has directly done the thing they are writing about. First-hand, lived engagement with the topic — not secondary research. Feeds into OriginalContentScore and Helpful Content System classification.

🎓

Expertise

Analytical depth that goes beyond aggregation. The ability to interpret data, explain trade-offs, and provide practitioner-level guidance. Signals into siteFocusScore and author entity recognition in Google's knowledge graph.

🏛️

Authoritativeness

Recognition by the broader industry through editorial backlinks, brand mentions, and citations in authoritative publications. The primary input into the siteAuthority composite score confirmed in the 2024 API leak.

🔒

Trustworthiness

Named author with verifiable identity, cited sources, no false claims, HTTPS, and no spam signals. Reflected in pandaDemotion avoidance. The foundational dimension — a page cannot score well on A or E without a T baseline.

Source: Google Quality Rater Guidelines (current edition). E-E-A-T signal mapping to API leak metrics based on analysis by Hobo Web / Shaun Anderson (January 2026, updated April 2026) and ClickRank (March 2026).

Common Questions

Frequently Asked Questions

Direct answers — each containing at least one specific number from a named source.

Content quality carries approximately 23% estimated weight — the single highest-weight signal category (Vastcope, 2026). After the March 2026 core update, the dominant content quality signal is Information Gain: the degree to which a page adds knowledge not already present in competing pages. Pages with original data gained 15–25% visibility; templated content lost 30–50% (Digital Applied, April 2026).

Yes — confirmed by the 2024 Content Warehouse API leak. Google uses a metric called siteAuthority, stored in the CompressedQualitySignals module and used as a primary input into the Q* ranking system. It is a composite score drawing on link-based authority, user interaction signals, and topical focus — not identical to third-party domain authority scores, but a real, persistent, site-level authority metric (Mike King, iPullRank, 2024).

Yes — confirmed by the 2024 API leak. NavBoost is a live Google ranking system using goodClicks, badClicks, and lastLongestClicks as direct ranking inputs. This was corroborated by Pandu Nayak's testimony in the DOJ antitrust trial (2024). A page consistently generating pogo-sticking accumulates badClicks and receives NavBoost demotions regardless of how strong its other ranking signals are.

The 2024 API leak exposed 14,014 attributes across 2,596 modules in Google's Content Warehouse API (Mike King, iPullRank, 2024). Not all are direct ranking factors — some are classification inputs or spam signals. Most practitioners prioritise approximately 20–30 high-impact confirmed signals rather than attempting to optimise every attribute. The architecture is a layered system of systems, not a checklist.

Information Gain measures how much new knowledge a page adds beyond what is already present in competing pages for the same query. Patented by Google in 2020, it was operationalised at scale in the March 2026 core update — the most volatile update since August 2024, with Semrush Sensor peaking at 8.7/10. It is now the dominant content quality evaluator, rewarding original data, first-hand evidence, named frameworks, and expert attribution (Digital Applied, April 2026).

Most ranking changes take 2–6 months to fully materialise in organic results (Vastcope, 2026). Technical changes — fixing crawl errors, improving Core Web Vitals — can show effects within days to weeks. Content quality improvements operate on a 3–6 month timescale. Meaningful siteAuthority growth through editorial link building typically takes 6–12 months. Core update cycles can accelerate or delay these timelines significantly.

FAQ sources: Mike King / iPullRank (May 2024) · Digital Applied (April 2026) · ALM Corp (March 2026) · Vastcope (2026) · Pandu Nayak / DOJ antitrust trial testimony (2024).