Skip to main content
All articles
Education··7 min read

How Many Rewrites Does It Take to Pass AI Detection?

We analyzed real platform data to find out how many iterations it takes to cross the 80-point human score threshold. The answer: 2-3 versions for most texts.

M

Metric37 Team

AI Humanization Research

Original research from the Metric37 humanization platform, analyzing real user data to uncover AI detection and rewriting patterns.

We analyzed over 500 documents and 1,200 scored versions on Metric37 to answer a question every AI content creator asks: how many rewrites does it actually take to pass AI detection? The short answer — most texts need 2–3 iterations to cross the 80-point human score threshold. But the full picture is more interesting.

The Iteration Curve

Average human score by version number, across all documents with scored versions:

65v178v284v387v489v5+Human Score
VersionAvg. Human ScoreSample Size
165.2520
278.4385
384.1180
487.372
5+89.043

Version 1 is the raw AI-humanized output. Each subsequent version is either an AI rewrite (“Try again”) or a manual edit followed by re-scoring. The biggest jump happens between version 1 and version 2.

Where Diminishing Returns Kick In

The score improvement between consecutive versions tells us when to stop iterating:

TransitionAvg. Score GainSample Size
v1 → v2+13.2385
v2 → v3+5.7180
v3 → v4+3.272
v4 → v5+1.743

The v1-to-v2 transition delivers the most improvement — over 13 points on average. After version 3, gains drop below 4 points per iteration. For most use cases, 2–3 versions hit the sweet spot between quality and effort. This pattern aligns with what we showed in our step-by-step walkthrough from 62 to 91, but now confirmed across the full dataset.

Does Tone Matter?

Users can select a tone for each rewrite (professional, conversational, academic, etc.). Here is how tone affects scores:

ToneAvg. Human ScoreSample Size
Professional79.3210
Conversational75.1145
Academic73.895

Professional tone tends to score highest, likely because it introduces more varied vocabulary and sentence structures that read as distinctly human. Conversational tone follows closely. The differences are modest — tone selection matters less than iteration count.

Manual Edits vs. AI Rewrites

Every version on Metric37 is tagged by how it was created: initial AI humanization, an AI rewrite (“Try again”), or a manual edit saved by the user. Here is how each approach scores:

Edit SourceAvg. Human ScoreSample Size
Manual edit82.595
AI rewrite76.2310
Initial humanize65.2520

Manual edits produce the highest scores on average. This makes sense — when a human adds their own phrasing, experience, or stylistic choices, the text becomes genuinely harder to classify as AI-generated. The combination of AI rewriting followed by manual polish is the most effective workflow.

The Word Count Factor

Does text length affect how well humanization works?

LengthAvg. Human ScoreSample Size
Short (<200 words)70.2130
Medium (200–500)76.8260
Long (500–1,000)78.595
Very long (1,000+)74.135

Medium-to-long texts (200–1,000 words) tend to score highest. Shorter texts give the LLM less room to introduce natural variation. Very long texts may suffer from consistency issues where AI patterns re-emerge over extended passages.

Why First Drafts Fail

Of all first-pass humanizations on Metric37, 70% score below 80. That means the majority of single-shot rewrites would still be flagged by AI detectors. This is not a flaw in the humanization — it is the nature of the problem. A single pass can fix the most obvious AI patterns (filler phrases, uniform sentence length), but subtler signals like predictable word choice and paragraph structure require iteration.

We have written about why one-shot humanization fails in detail. This data confirms it at scale: iteration is not optional.

Methodology

This analysis covers over 500 documents and 1,200 scored versions created on Metric37 between January and March 2026. Human scores are generated by a Gemini Flash evaluation model on a 0–100 scale, where 80+ indicates text that reads as human-written. We excluded versions with null scores (eval failures). No personally identifiable information was used — all data is aggregated.

Limitations: tone data is available only for versions where users explicitly selected a tone. All scores come from a single evaluation model and are not cross-validated against external AI detectors.

Try It Yourself

The data shows that 2–3 iterations is the sweet spot for most texts. Sign up for Metric37 (free tier: 5,000 words/month) and see your own scores improve with each version. Or test a sample with our free AI detector first.

Curious how your text scores?

Check any text for free with our AI detector — no signup required.

Try the free AI detector

Frequently Asked Questions

How many rewrites does it take to pass AI detection?
Based on our analysis of real platform data, most texts need 2-3 iterations to cross the 80-point human score threshold. The biggest improvement happens between version 1 and version 2.
Does tone affect AI detection scores?
Yes, but modestly. Professional tone tends to score highest because it introduces more varied vocabulary. However, iteration count matters more than tone selection.
Are manual edits more effective than AI rewrites?
Yes. Manual edits produce the highest average human scores because human phrasing and stylistic choices are genuinely harder to classify as AI-generated. The best workflow combines AI rewriting with manual polish.
What percentage of first-pass AI humanizations fail detection?
Approximately 70% of first-pass humanizations score below 80, meaning they would still be flagged by AI detectors. This is why iteration is essential.

Keep reading

Ready to refine your AI drafts?

Paste your AI draft and get prose that sounds like you wrote it. 5,000 words free.

Start free