Does tone affect AI detection scores?

Yes, but modestly. Professional tone tends to score highest because it introduces more varied vocabulary. However, iteration count matters more than tone selection.

Are manual edits more effective than AI rewrites?

Yes. Manual edits produce the highest average human scores because human phrasing and stylistic choices are genuinely harder to classify as AI-generated. The best workflow combines AI rewriting with manual polish.

What percentage of first-pass AI humanizations fail detection?

Approximately 70% of first-pass humanizations score below 80, meaning they would still be flagged by AI detectors. This is why iteration is essential.

How Many Rewrites Does It Take to Pass AI Detection?

Q: How many rewrites does it take to pass AI detection?

Based on our analysis of real platform data, most texts need 2-3 iterations to cross the 80-point human score threshold. The biggest improvement happens between version 1 and version 2.

We analyzed over 500 documents and 1,200 scored versions on Metric37 to answer a question every AI content creator asks: how many rewrites does it actually take to pass AI detection? The short answer — most texts need 2–3 iterations to cross the 80-point human score threshold. But the full picture is more interesting.

The Iteration Curve

Average human score by version number, across all documents with scored versions:

Version	Avg. Human Score	Sample Size
1	65.2	520
2	78.4	385
3	84.1	180
4	87.3	72
5+	89.0	43

Version 1 is the raw AI-humanized output. Each subsequent version is either an AI rewrite (“Try again”) or a manual edit followed by re-scoring. The biggest jump happens between version 1 and version 2.

Where Diminishing Returns Kick In

The score improvement between consecutive versions tells us when to stop iterating:

Transition	Avg. Score Gain	Sample Size
v1 → v2	+13.2	385
v2 → v3	+5.7	180
v3 → v4	+3.2	72
v4 → v5	+1.7	43

The v1-to-v2 transition delivers the most improvement — over 13 points on average. After version 3, gains drop below 4 points per iteration. For most use cases, 2–3 versions hit the sweet spot between quality and effort. This pattern aligns with what we showed in our step-by-step walkthrough from 62 to 91, but now confirmed across the full dataset.

Does Tone Matter?

Users can select a tone for each rewrite (professional, conversational, academic, etc.). Here is how tone affects scores:

Tone	Avg. Human Score	Sample Size
Professional	79.3	210
Conversational	75.1	145
Academic	73.8	95

Professional tone tends to score highest, likely because it introduces more varied vocabulary and sentence structures that read as distinctly human. Conversational tone follows closely. The differences are modest — tone selection matters less than iteration count.

Manual Edits vs. AI Rewrites

Every version on Metric37 is tagged by how it was created: initial AI humanization, an AI rewrite (“Try again”), or a manual edit saved by the user. Here is how each approach scores:

Edit Source	Avg. Human Score	Sample Size
Manual edit	82.5	95
AI rewrite	76.2	310
Initial humanize	65.2	520

Manual edits produce the highest scores on average. This makes sense — when a human adds their own phrasing, experience, or stylistic choices, the text becomes genuinely harder to classify as AI-generated. The combination of AI rewriting followed by manual polish is the most effective workflow.

The Word Count Factor

Does text length affect how well humanization works?

Length	Avg. Human Score	Sample Size
Short (<200 words)	70.2	130
Medium (200–500)	76.8	260
Long (500–1,000)	78.5	95
Very long (1,000+)	74.1	35

Medium-to-long texts (200–1,000 words) tend to score highest. Shorter texts give the LLM less room to introduce natural variation. Very long texts may suffer from consistency issues where AI patterns re-emerge over extended passages.

Why First Drafts Fail

Of all first-pass humanizations on Metric37, 70% score below 80. That means the majority of single-shot rewrites would still be flagged by AI detectors. This is not a flaw in the humanization — it is the nature of the problem. A single pass can fix the most obvious AI patterns (filler phrases, uniform sentence length), but subtler signals like predictable word choice and paragraph structure require iteration.

We have written about why one-shot humanization fails in detail. This data confirms it at scale: iteration is not optional.

Methodology

This analysis covers over 500 documents and 1,200 scored versions created on Metric37 between January and March 2026. Human scores are generated by a Gemini Flash evaluation model on a 0–100 scale, where 80+ indicates text that reads as human-written. We excluded versions with null scores (eval failures). No personally identifiable information was used — all data is aggregated.

Limitations: tone data is available only for versions where users explicitly selected a tone. All scores come from a single evaluation model and are not cross-validated against external AI detectors.

Try It Yourself

The data shows that 2–3 iterations is the sweet spot for most texts. Sign up for Metric37 (free tier: 1,500 words on signup) and see your own scores improve with each version. Or test a sample with our free AI detector first.

Curious how your text scores?

Check any text for free with our AI detector — no signup required.

Try the free AI detector

Frequently Asked Questions

How many rewrites does it take to pass AI detection?: Based on our analysis of real platform data, most texts need 2-3 iterations to cross the 80-point human score threshold. The biggest improvement happens between version 1 and version 2.
Does tone affect AI detection scores?: Yes, but modestly. Professional tone tends to score highest because it introduces more varied vocabulary. However, iteration count matters more than tone selection.
Are manual edits more effective than AI rewrites?: Yes. Manual edits produce the highest average human scores because human phrasing and stylistic choices are genuinely harder to classify as AI-generated. The best workflow combines AI rewriting with manual polish.
What percentage of first-pass AI humanizations fail detection?: Approximately 70% of first-pass humanizations score below 80, meaning they would still be flagged by AI detectors. This is why iteration is essential.

How Many Rewrites Does It Take to Pass AI Detection?

The Iteration Curve

Where Diminishing Returns Kick In

Does Tone Matter?

Manual Edits vs. AI Rewrites

The Word Count Factor

Why First Drafts Fail

Methodology

Try It Yourself

Frequently Asked Questions

Keep reading

Why One-Shot AI Humanization Fails (And What Works Instead)

From 62 to 91: Watch a Real Text Get Refined in 5 Steps

How AI Detection Actually Works (Technical Explainer)

Ready to humanize your AI drafts?