We tested the top detection tools for AI-generated content. Here’s what they are good and bad at, plus what to expect when using them.
In this article, I’ll share 16 of these detectors and show you how they score some paragraphs of my original, unpublished writing versus a few paragraphs on the same topic from ChatGPT.
I’ll also walk through the types of functions AI detectors are (and aren’t) a good fit for and how marketers, editors, and SEOs should think about them.
What detection tools are good (and bad) at
As I’ve detailed in other articles, generative AI and ChatGPT content poses several issues:
- AI-generated information can be factually incorrect, dangerous, outdated, or misleading.
- AI writing outputs can be subpar.
- While there’s no explicit penalty for AI content, Google may not always trust and view it like human-created content.
- AI content may be able to “fool” editors or businesses who think they’re paying for human-created content.
- AI content may leverage creative work from humans and repurpose it without attribution.
It’s important to note that the current AI detectors do not solve all of these problems.
Mostly, these tools do not fact-check AI content, improve or audit content quality, or provide citations for information pulled from other sources.
That said, the areas that AI detectors can help include:
- Plagiarism: Many of these tools have plagiarism detection built in, so there’s some check for whether the AI content was largely pulled from another source.
- Penalty prevention: If you’re concerned about AI content being devalued somehow in search results, these tools can help give you a sense of how easily detectable the AI content is. (Of course, Google will undoubtedly have different tools and checks.)
- Auditing AI usage: If you have a specific policy or way to compensate writers for original versus AI-generated content, these tools can give you a rough sense of whether a writer uses AI to generate content. (Note that they can also return false negatives and positives.)
- Understanding search results: Some of these tools offer Chrome extensions, which can help you understand whether competitors and other websites use AI content or not.
How AI detection software works
Each tool is different and has its approach to the problem. But for the most part, ChatGPT detection tools grade content based on how predictable the phrase choices are within a piece of content.
In other words, the likelihood that content is scored as AI versus human has a lot to do with whether the detection software deems a piece of writing as following the likely pattern AI would follow in generating content.
The two core concepts around this process are called:
- Burstiness: A predictable length and tempo to sentence structure.
- Perplexity: A randomness to the words chosen in a sentence or collection of sentences.
For example, in an essay about the founding of America, it’s highly unlikely that generative AI would include a random, unevenly written anecdote about the first time they ever saw a penguin, so that would likely look like human writing to a detection tool.
Similar to how ChatGPT detectors popped up to detect generative AI writing, tools are already being developed to get around the detectors. (And, of course: the detectors are likely already thinking about how to detect the bypassers, and so on).
Tools like Undetectable or Quillbot will rewrite your content, sometimes making it more difficult to detect for certain AI detection tools.
Additionally, several people have found different prompts to get ChatGPT and other AI writing tools to output content that scores “more human” on the human-to-AI scale by using prompts defining burstiness and perplexity and telling ChatGPT to write with more of each.
Does detection accuracy matter to you?
An important question to answer before you dive too far into these tools is:
How much do you care about detecting whether content is written by AI? And why?
If you’re using ChatGPT for rewriting title tags or generating email copy, maybe it doesn’t matter at all if that content “passes” AI writing checks.
Additionally, if a writer uses AI to generate a copy and the copy is great, maybe the score isn’t important at all.
These detection tools will likely be engaged in the “detection arms race” with un-detection tools and prompts I mentioned above.
The best AI writing detectors compared
If you’re still looking for an AI/ChatGPT content detector, we’ll go through each of them and how they “scored” in evaluating human-generated copy versus AI copy versus AI copy that used this prompt to try to “beat detection.”
Note: Detection versus a few paragraphs of content isn’t necessarily a thorough test of the detection capabilities of these tools. Hopefully, it will give you a rough sense of how they score different content and a glimpse of the range of outcomes you can expect from these kinds of tools.
(You can view the actual samples input to the tool – the “human” sample written by me, the “AI” sample written by ChatGPT via GPT-4, and the updated copy based on the same topic here.)
In the table below, you can see how each tool scored the copy I wrote from scratch, the copy I took from ChatGPT directly with no prompt modification, and that same copy tweaked with the “perplexity and burstiness” prompt :