We analyzed over 500 documents and 1,200 scored versions on Metric37 to answer a question every AI content creator asks: how many rewrites does it actually take to pass AI detection? The short answer — most texts need 2–3 iterations to cross the 80-point human score threshold. But the full picture is more interesting.
The Iteration Curve
Average human score by version number, across all documents with scored versions:
| Version | Avg. Human Score | Sample Size |
|---|---|---|
| 1 | 65.2 | 520 |
| 2 | 78.4 | 385 |
| 3 | 84.1 | 180 |
| 4 | 87.3 | 72 |
| 5+ | 89.0 | 43 |
Version 1 is the raw AI-humanized output. Each subsequent version is either an AI rewrite (“Try again”) or a manual edit followed by re-scoring. The biggest jump happens between version 1 and version 2.
Where Diminishing Returns Kick In
The score improvement between consecutive versions tells us when to stop iterating:
| Transition | Avg. Score Gain | Sample Size |
|---|---|---|
| v1 → v2 | +13.2 | 385 |
| v2 → v3 | +5.7 | 180 |
| v3 → v4 | +3.2 | 72 |
| v4 → v5 | +1.7 | 43 |
The v1-to-v2 transition delivers the most improvement — over 13 points on average. After version 3, gains drop below 4 points per iteration. For most use cases, 2–3 versions hit the sweet spot between quality and effort. This pattern aligns with what we showed in our step-by-step walkthrough from 62 to 91, but now confirmed across the full dataset.
Does Tone Matter?
Users can select a tone for each rewrite (professional, conversational, academic, etc.). Here is how tone affects scores:
| Tone | Avg. Human Score | Sample Size |
|---|---|---|
| Professional | 79.3 | 210 |
| Conversational | 75.1 | 145 |
| Academic | 73.8 | 95 |
Professional tone tends to score highest, likely because it introduces more varied vocabulary and sentence structures that read as distinctly human. Conversational tone follows closely. The differences are modest — tone selection matters less than iteration count.
Manual Edits vs. AI Rewrites
Every version on Metric37 is tagged by how it was created: initial AI humanization, an AI rewrite (“Try again”), or a manual edit saved by the user. Here is how each approach scores:
| Edit Source | Avg. Human Score | Sample Size |
|---|---|---|
| Manual edit | 82.5 | 95 |
| AI rewrite | 76.2 | 310 |
| Initial humanize | 65.2 | 520 |
Manual edits produce the highest scores on average. This makes sense — when a human adds their own phrasing, experience, or stylistic choices, the text becomes genuinely harder to classify as AI-generated. The combination of AI rewriting followed by manual polish is the most effective workflow.
The Word Count Factor
Does text length affect how well humanization works?
| Length | Avg. Human Score | Sample Size |
|---|---|---|
| Short (<200 words) | 70.2 | 130 |
| Medium (200–500) | 76.8 | 260 |
| Long (500–1,000) | 78.5 | 95 |
| Very long (1,000+) | 74.1 | 35 |
Medium-to-long texts (200–1,000 words) tend to score highest. Shorter texts give the LLM less room to introduce natural variation. Very long texts may suffer from consistency issues where AI patterns re-emerge over extended passages.
Why First Drafts Fail
Of all first-pass humanizations on Metric37, 70% score below 80. That means the majority of single-shot rewrites would still be flagged by AI detectors. This is not a flaw in the humanization — it is the nature of the problem. A single pass can fix the most obvious AI patterns (filler phrases, uniform sentence length), but subtler signals like predictable word choice and paragraph structure require iteration.
We have written about why one-shot humanization fails in detail. This data confirms it at scale: iteration is not optional.
Methodology
This analysis covers over 500 documents and 1,200 scored versions created on Metric37 between January and March 2026. Human scores are generated by a Gemini Flash evaluation model on a 0–100 scale, where 80+ indicates text that reads as human-written. We excluded versions with null scores (eval failures). No personally identifiable information was used — all data is aggregated.
Limitations: tone data is available only for versions where users explicitly selected a tone. All scores come from a single evaluation model and are not cross-validated against external AI detectors.
Try It Yourself
The data shows that 2–3 iterations is the sweet spot for most texts. Sign up for Metric37 (free tier: 5,000 words/month) and see your own scores improve with each version. Or test a sample with our free AI detector first.
Curious how your text scores?
Check any text for free with our AI detector — no signup required.
Try the free AI detectorFrequently Asked Questions
- How many rewrites does it take to pass AI detection?
- Based on our analysis of real platform data, most texts need 2-3 iterations to cross the 80-point human score threshold. The biggest improvement happens between version 1 and version 2.
- Does tone affect AI detection scores?
- Yes, but modestly. Professional tone tends to score highest because it introduces more varied vocabulary. However, iteration count matters more than tone selection.
- Are manual edits more effective than AI rewrites?
- Yes. Manual edits produce the highest average human scores because human phrasing and stylistic choices are genuinely harder to classify as AI-generated. The best workflow combines AI rewriting with manual polish.
- What percentage of first-pass AI humanizations fail detection?
- Approximately 70% of first-pass humanizations score below 80, meaning they would still be flagged by AI detectors. This is why iteration is essential.
Keep reading
Why One-Shot AI Humanization Fails (And What Works Instead)
Single-pass humanizers miss the mark. Learn why iterative refinement with scoring feedback produces dramatically better results.
8 min readGuideFrom 62 to 91: Watch a Real Text Get Refined in 5 Steps
A step-by-step walkthrough showing how iterative humanization transforms a flagged AI paragraph into undetectable prose.
7 min readEducationHow AI Detection Actually Works (Technical Explainer)
Perplexity scoring, burstiness analysis, and classifier models — a plain-English breakdown of how detectors spot AI text.
9 min readReady to refine your AI drafts?
Paste your AI draft and get prose that sounds like you wrote it. 5,000 words free.
Start free