Skip to main content
All articles
Education··9 min read

How AI Detection Actually Works (Technical Explainer)

Perplexity scoring, burstiness analysis, and classifier models — a plain-English breakdown of how detectors spot AI text.

M

Metric37 Team

AI Writing Research

Writing about how AI text works, why it sounds the way it does, and what you can do about it.

AI detectors work by measuring two statistical patterns in text: perplexity (how predictable each word is) and burstiness (how much that predictability varies across sentences). AI-generated text has consistently low perplexity and low burstiness because language models choose the most probable next token at each step. Human writing is messier — it spikes and dips unpredictably. Commercial tools like GPTZero, Originality.ai, and Turnitin combine these metrics with trained classifier models to flag AI text at 85-95% accuracy on unedited output, though false positive rates range from 1% to over 15%.

Below is a plain-English breakdown of each detection method, its strengths, and its significant limitations.

Perplexity: How Surprised Is the Model?

Perplexity is the foundational metric in AI detection. It measures how "surprised" a language model is by a piece of text. Technically, it is the exponential of the average negative log-likelihood of each token given the preceding context. In simpler terms: if a model can easily predict every next word in a passage, that passage has low perplexity.

AI-generated text tends to have low perplexity because it was produced by a model that, by definition, chose the most probable words at each step. Human writing has higher perplexity because humans make surprising word choices — unusual metaphors, sentence fragments, domain-specific jargon, colloquialisms, and deliberate rule-breaking.

Early detectors like GPTZero — which has processed over 100 million documents since launch — were built primarily on perplexity scoring. The logic is straightforward: run the text through a reference model, measure the average perplexity, and flag anything below a threshold as likely AI-generated. This works reasonably well on long passages of unedited AI output (200+ words). It breaks down quickly on shorter texts, edited texts, and text from certain domains.

Burstiness: The Rhythm Test

Burstiness measures the variation in perplexity across a text. Human writing is "bursty" — some sentences are highly predictable (simple factual statements, common phrases) while others are surprising (creative descriptions, unexpected transitions, personal anecdotes). The perplexity jumps around.

AI text has low burstiness. The perplexity stays relatively constant from sentence to sentence because the model maintains a consistent level of "safe" word selection throughout. It does not have boring sentences and interesting sentences. It has uniformly adequate sentences.

Combining perplexity and burstiness gives detectors a two-dimensional signal. Low perplexity plus low burstiness is a strong indicator of AI origin. High perplexity plus high burstiness suggests human authorship. Mixed signals — low perplexity but high burstiness, or vice versa — fall into a gray zone where detectors are unreliable.

Statistical Watermarking

Some AI providers embed invisible statistical watermarks in their output. The idea is simple: when generating text, the model slightly biases its token selection toward a specific pattern that is imperceptible to readers but detectable by the provider's verification tool.

For example, a watermarking scheme might partition the vocabulary into "green" and "red" lists at each position and nudge the model to prefer green-list tokens. A detector checks whether the text contains a statistically improbable number of green-list tokens. If so, it was likely generated by that specific model.

Watermarking is the most reliable detection method when it works, but it has significant limitations. It only detects text from models that implement it. It breaks when text is paraphrased, translated, or even moderately edited. And it requires cooperation from AI providers, which not all are willing to provide.

Classifier Models

The most widely used commercial detectors — GPTZero, Originality.ai, Turnitin’s AI detection, Copyleaks — use trained classifier models. These are neural networks (typically fine-tuned transformers, often based on RoBERTa or DeBERTa architectures) trained on millions of samples of human-written and AI-generated text to learn the difference.

The classifier approach has an advantage over pure statistical methods: it can learn subtle patterns that are hard to capture in a single metric. It can pick up on things like the distribution of rare words, the ratio of content words to function words, paragraph-level structure, and stylistic consistency.

But classifiers also inherit all the biases of their training data. They tend to flag certain writing styles as AI even when a human wrote them — particularly formal academic writing, non-native English speakers, and technical documentation. This leads to the false positive problem.

The False Positive Problem

False positives are the Achilles' heel of AI detection. Every major detector has been documented flagging human-written text as AI-generated. The cases are not edge cases:

  • The US Constitution has been flagged as AI-generated by multiple detectors.
  • Non-native English speakers are disproportionately flagged because their writing patterns (simpler vocabulary, more regular grammar) resemble AI output.
  • Formal academic writing, with its structured argumentation and careful hedging, triggers the same signals as RLHF-trained model output.
  • Technical writing about well-documented topics tends to converge on standard phrasing regardless of whether a human or AI wrote it.

The false positive rate varies by detector and by text type, but independent studies have found rates ranging from 1% to over 15% depending on the context. For any individual piece of text, a detector's confidence score should be treated as a probability estimate, not a verdict.

Why Detectors Keep Losing the Arms Race

AI detection is fundamentally an adversarial problem. As detectors improve, the tools used to evade them improve too. But there is a deeper issue: as AI models themselves improve, their output becomes harder to distinguish from human text. Better models produce more varied, more natural-sounding prose with higher perplexity and burstiness.

This means detection accuracy is trending downward over time. GPT-3.5 was detectable at 95%+ accuracy; GPT-4 dropped to 85-90%; and the latest models (GPT-5-mini, Claude 4, Gemini 2.5) produce text that some detectors flag at only 70-80% accuracy. Each generation of models narrows the statistical gap between AI and human writing.

Why Better Writing Beats Gaming Detectors

There are two approaches to reducing AI detection scores. The first is to game specific detectors — introducing targeted noise, misspellings, or Unicode tricks that exploit weaknesses in particular detection algorithms. This works temporarily and breaks when the detector updates.

The second approach is to actually improve the writing. Text with genuine variation in sentence length, vocabulary, and structure; text with a clear voice and perspective; text with specific examples and natural imperfections — this text scores lower on detectors not because it exploits a bug, but because it genuinely resembles human writing. It has high perplexity and high burstiness because it was written (or rewritten) with the same unpredictability that characterizes human prose.

This is the approach Metric37 takes. Rather than adding noise to fool a specific detector, Metric37's rewriting engine produces text that is statistically closer to human writing across all the metrics detectors measure. A quality check verifies this before returning the result. The output does not just evade detectors — it reads better, which is the whole point.

Detection by AI Tool

Different AI models have different detection profiles. Want to know how detection works on specific tools? See our detailed guides for ChatGPT, Gemini, Claude, Copilot, and Perplexity.

What This Means for You

If you are worried about AI detection, the most durable strategy is not to find a tool that games today's detectors. It is to produce text that is genuinely good enough that the question of provenance becomes irrelevant. Detectors will keep changing. Good writing is good writing regardless of how it was produced.

Curious how your text scores?

Check any text for free with our AI detector — no signup required.

Try the free AI detector

Frequently Asked Questions

How do AI detectors work?
AI detectors analyze statistical patterns in text — primarily perplexity (how predictable each word is) and burstiness (how much that predictability varies). AI text has low perplexity and low burstiness compared to human writing.
Are AI detectors accurate?
AI detectors have significant limitations. Independent studies show false positive rates from 1% to over 15%. They disproportionately flag non-native English speakers and formal academic writing.
Can AI detectors be fooled?
Yes, but gaming specific detectors is a temporary fix. Detectors update regularly. The more durable approach is producing genuinely better writing that has natural variation, voice, and unpredictability.
What is perplexity in AI detection?
Perplexity measures how 'surprised' a language model is by a piece of text. AI-generated text has low perplexity because the model chose the most probable words. Human writing has higher perplexity due to surprising word choices.

Keep reading

Ready to humanize your AI drafts?

Paste your AI draft and get prose that sounds like you wrote it. 1,500 words free.

Start Free