due-diligenceApril 20, 2026

How to Evaluate Content Quality on a Website You're About to Buy

A practical framework for auditing E-E-A-T signals, spotting AI-generated filler, catching duplicate content, and estimating how much of the archive will need to be rewritten.

A seller tells you the site has "300 in-depth articles". You click into five of them at random and find 1,500-word posts that say nothing — vague intros, generic middle sections, a conclusion that repeats the intro. The traffic numbers look real, but how much of the archive is actually worth keeping?

Content quality is where acquisitions either quietly succeed or quietly fail. Traffic and revenue can be verified in a dashboard. Content quality requires reading, and most buyers skip it because reading is slow. This article gives you a framework to evaluate a full content archive in about 60–90 minutes, without having to read every post.

Why Content Quality Matters More Than the Article Count

After the Helpful Content Update and the rolling core updates that followed, Google's ranking systems have become much more aggressive about demoting sites with low information density, thin AI output, or weak topical authority. The article count on a site is a vanity metric — what matters is how many of those articles are defensible in the current search environment.

When you buy a site with a bloated, low-quality archive, you inherit three problems:

A ceiling on traffic — low-quality pages weigh down the whole domain's perceived quality
Rewrite cost — thin or AI-generated content often needs full rewrites, not edits
A ticking clock — the next core update can wipe out anything that's currently ranking on thin content

The goal of this audit is not to check every article. It's to estimate the percentage of the archive that is worth keeping, worth rewriting, or worth deleting — and price the acquisition accordingly.

Step 1: Build a Representative Sample

You cannot read 300 articles. You can read 20, if you pick them well.

Ask the seller for (or pull from their sitemap):

The top 10 articles by organic traffic
The 5 most recently published articles
5 articles chosen at random from the full archive

The top-traffic sample tells you what the site is actually winning on and whether the quality is high enough to defend. The recent sample tells you what the site's current publishing standard is. The random sample tells you what the bulk of the archive looks like — which is usually very different from the cherry-picked examples a seller will share.

Read every one of those 20 posts. It takes about 45 minutes if you move efficiently.

Step 2: Evaluate E-E-A-T in Practice

E-E-A-T — Experience, Expertise, Authoritativeness, Trustworthiness — is abstract when Google describes it, but concrete when you're reading an article. Run each sampled post through this visual checklist:

Experience signals:

First-person observations that could only come from having done the thing ("I tested this for 3 months and noticed...")
Original photos, screenshots, or videos that aren't stock
Specific numbers, failures, or side-details that don't show up in competitor articles

Expertise signals:

Author byline with a real name, photo, and credentials relevant to the topic
Author bio page that links to their other work, LinkedIn, or portfolio
Terminology used correctly, including edge cases a casual writer would get wrong

Authoritativeness signals:

The site is cited or linked by other reputable sites in the niche (check this in Ahrefs)
The author appears as a guest or source on other sites, podcasts, or publications

Trustworthiness signals:

Clear "About" page with a real business or individual behind the site
Transparent affiliate disclosures, not hidden or legally minimal
Editorial policy, fact-checking policy, or review process visible

If most of your sampled articles fail on Experience and Expertise — generic advice, no author, no original data — the archive will struggle against any serious competitor post-core-update.

Step 3: Spot AI-Generated Thin Content

AI-assisted content isn't a red flag in itself. AI content that was published without meaningful human editing is. The telltale patterns:

Structural uniformity — every article has the same H2 pattern (What is X / Why X Matters / Benefits of X / How to X / Conclusion), regardless of whether that structure fits the topic
Vague hedging — "it's important to note", "there are many factors to consider", "this varies depending on the situation", used to fill space without saying anything
No concrete examples — advice stays at the level of principle without dropping into a real scenario, number, or named tool
Conclusions that restate the introduction — no synthesis, no opinion, no recommendation the writer is willing to stand behind
Perfect grammar, zero voice — no sentence sounds like a specific person wrote it

You can also run selected paragraphs through GPTZero or Originality.ai for a numerical score. Treat these as weak signals — false positives are common, especially on edited AI output — but a consistently high AI score across your sample reinforces what you already saw while reading.

Rule of thumb: if you finish reading a 2,000-word article and couldn't summarize what you learned in one sentence, it's thin, regardless of whether AI wrote it.

Step 4: Check for Duplicate and Plagiarized Content

Run a plagiarism check on at least 5–10 articles from your sample. This catches two distinct problems: content copied from other sites (legal and SEO risk) and content duplicated internally across multiple posts on the same site (cannibalization).

Tools:

Copyscape — the industry standard; paid but cheap per check (around $0.05 per article)
Duplichecker — free, less thorough, good for a first pass
Quetext — free tier with a reasonable daily limit

Paste in several paragraphs from the middle of each article (not the intro, which often gets rewritten). Look for:

Exact or near-exact matches on other sites published earlier than the article you're checking — this is plagiarism
Matches that originate on this domain but on a different URL — this is internal duplication and cannibalization
Matches on content-farm sites or thin listicle sites — not necessarily illegal, but a sign the text was spun rather than written

If plagiarism turns up in your small sample, assume it is present throughout the archive. Full rewrites of hundreds of articles is not a project you want to inherit without a serious price adjustment.

Step 5: Audit Internal Linking and Topical Structure

Open the site and click into a random article. From that article, how many clicks does it take you to reach:

Another article on the same subtopic?
A relevant category or pillar page?
A cornerstone piece the site is clearly trying to rank?

A well-structured content site reads like a hub-and-spoke network: pillar articles link down to detail articles, detail articles link across to siblings and up to the pillar. The internal linking is curated — every link is placed intentionally.

A poorly structured site reads like an archive: articles exist as isolated silos, internal links are missing or randomly placed, there are no clear pillars. This is what a site run by a volume AI-content operation looks like, and it indicates that even the articles that rank are ranking despite the structure, not because of it.

Use Screaming Frog (free up to 500 URLs) to crawl the site and visualize the internal link graph. If you see a small number of pages pulling in most of the internal links and a long tail of orphan pages with zero internal inlinks, the archive has structural debt.

Step 6: Estimate the Rewrite Burden

At the end of the audit, classify each of your 20 sampled articles into one of three buckets:

Keep as-is — strong E-E-A-T, no thin content issues, passes plagiarism
Needs editing — solid foundation but would benefit from adding experience, examples, or updates
Rewrite or delete — thin, AI-generated without editing, plagiarized, or topically off

Project those percentages onto the full archive. If 40% of your sample is "rewrite or delete" and the site has 300 articles, you're looking at 120 articles to rewrite or remove. At a realistic rate of $80–150 per solid rewrite, that is $9,600–$18,000 of post-acquisition content work — and that cost has to come off the price you're willing to pay.

What to Do With What You Find

Finding	Action
80%+ of sample is high quality with real E-E-A-T signals	✅ Proceed; archive is an asset
50–80% keep, remainder needs editing	⚠️ Proceed; factor editing cost into the offer
Significant thin or AI-generated content without editing	⚠️ Discount price by projected rewrite cost; negotiate hard
Plagiarism found in sample	⛔ Treat as archive-wide until proven otherwise; walk away or deeply discount
No real author, no E-E-A-T signals, generic structure throughout	⛔ The archive is a liability, not an asset — the domain and traffic are what you're actually paying for
Internal linking is broken or absent	⚠️ Fixable, but plan for a structural overhaul in month one

Quick Reference Checklist

Before moving forward on the content side:

Pulled top 10 traffic articles, 5 most recent, 5 random — read them all
Evaluated E-E-A-T signals on each (author, experience, expertise, trust)
Scanned for AI-generated thin content patterns (uniform structure, vague hedging, no examples)
Ran Copyscape or equivalent on 5–10 articles — no plagiarism or internal duplication
Mapped internal linking structure (hub-and-spoke vs. archive silos)
Classified sample into keep / edit / rewrite buckets; projected to full archive
Estimated rewrite cost and factored it into the offer

This audit takes 60–90 minutes and is the single best defense against buying a site whose "300 articles" are really 60 good articles and 240 liabilities.

due-diligenceApr 18, 2026

How to Check if a Domain Was Ever Used for Spam, Pharma, or Adult Content

A step-by-step guide to uncovering a domain's dark past before you buy — using Wayback Machine, blacklist databases, Google cache, and email reputation tools.

due-diligenceApr 16, 2026

How to Verify a Domain's Real Age Before Buying a Website

A practical guide to reading WHOIS data, using the Wayback Machine, and checking if the claimed domain age matches reality.