Naron Purba/iStock/Getty Images Plus via Getty Images
Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- Using an AI to do your writing is plagiarism.
- Services marketed as AI content detectors are a mixed bag.
- Our tests show chatbots perform as well as or better than standalone tools.
How hard is it in 2025 — just three years after generative AI captured the global spotlight — to fight back against AI-generated plagiarism?
This is a completely updated version of my January 2023 article on AI content detectors. When I first tested these detectors, the best result was 66% correct from one of three available checkers. My next set of tests, in February 2025, used up to 10 checkers — and three of them had perfect scores. In April, just a couple of months later, five detectors boasted perfect scores.
Also: The best AI chatbots: I tested ChatGPT, Copilot, and others to find the top tools now
But now, about half a year later, the quality has declined. Only three content detectors achieved a perfect score (including one new player). A couple of the content detectors that aced our tested declined in quality, at just about the same time that they also added restrictions on free use.
But fear not. In this round of tests, we’ve tried something new that may eliminate the need for standalone content detectors altogether: your friendly neighborhood chatbot.
What I’m testing for and how I’m doing it
Before I go on, though, let’s discuss plagiarism and how it relates to our problem. Merriam-Webster defines “plagiarize” as “to steal and pass off (the ideas or words of another) as one’s own; use (another’s production) without crediting the source.”
This definition fits AI-created content well. While someone using an AI tool like Notion AI or ChatGPT isn’t stealing content, if that person doesn’t credit the words as coming from an AI and claims them as their own, it still meets the dictionary definition of plagiarism.
Also: The dead giveaway that ChatGPT wrote your content – and how to work around it
To test the AI detectors, I’m using five blocks of text. Two were written by me, and three were written by ChatGPT. To test a content detector, I feed each block to the detector separately and record the result. If the detector is correct, I consider the test passed; if it’s wrong, I consider it failed.
When a detector provides a percentage, I treat anything above 70% as a strong probability — whether in favor of human-written or AI-written content — and consider that the detector’s answer. If you want to test a content detector yourself using the same text blocks, you can pull them from this document.
The overall results (content detectors)
To evaluate AI detectors, I reran my five-test series across 11 detectors. In other words, I cut and pasted 55 individual tests (I had a lot of coffee).Â
Detectors I tested include BrandWell, Copyleaks, GPT-2 Output Detector, GPTZero, Grammarly, Monica, Originality.ai, QuillBot, Undetectable.ai, Writer.com, and ZeroGPT.
We previously dropped Writefull from our tests because it discontinued its GPT detector. This time, we had to drop Monica from our tests. The detector would only allow 250 words to be tested, and then once we cut down our tests to fit, it reported that it had limited the testing tools without a $200 upgrade. In its place, we’re adding Pangram, a newcomer to our tests that immediately soared into the winners’ circle.
Also: How I personalized my ChatGPT conversations – why it’s a game changer
This table shows overall results. As you can see, five detectors correctly identified human and AI text in all tests.
David Gewirtz/ZDNET
I tried to ascertain whether there was a tangible pattern of improvement over time, so I constructed a chart comparing the five-test set over time. So far, I’ve run this series six times, but there’s no strong trend. I did increase the number of detectors tested and swapped out a few, but the only consistent result is that Test 5 was reliably identified as human across detectors and dates, and even that declined in reliability for this run.
David Gewirtz/ZDNET
I’ll continue to test over time, and hopefully I’ll see reliability trend consistently upward.
While there have been some perfect scores, I don’t recommend relying solely on these tools to validate human-written content. As shown, writing from non-native speakers often gets rated as generated by an AI.
Even though my hand-crafted content has mostly been rated human-written this round, one detector (GPTZero) declared itself too uncertain to judge, and another (Copyleaks) declared it AI-written. The results are wildly inconsistent across systems.
Also: Get your news from AI? Watch out – it’s wrong almost half the time
Bottom line: I would advocate caution before relying on the results of any — or all — of these tools.
Overall results (AI chatbots)
But then again, why use a content detector at all? What if the chatbots we use every day can also do content detecting work, and you don’t have to pay another AI fee? Let’s find out.
David Gewirtz/ZDNET
As you can see, the chatbots have a much higher success rate than the so-called “content detectors.” You can also see this from our staged accuracy comparison chart. Admittedly, this chart only tracks this first round of tests, but even here, you can see that each test’s results have a much higher accuracy rate.
David Gewirtz/ZDNET
Let’s take a look at the individual performance tests, and then I’ll end with some recommendations.
How each AI content detector performed
Now, let’s look at each individual testing tool, listed alphabetically.
BrandWell AI Content Detection (Accuracy 40%)
This tool was originally produced by an AI content generation firm, Content at Scale. It later migrated to BrandWell.ai, a new name for an AI-centric marketing services company.
Also:Â AI-generated images are a legal mess – and still a very human process
I had high hopes for Brandwell. After half a year (which is decades in AI time), I expected Brandwell to improve. Instead, its overall score stayed the same, getting only two tests out of five right. It was confused by Test 2, which was written by ChatGPT, and then it declared the other two AI-written tests to be written by a human. For Test 4, it went almost all in, declaring the entire AI-written test to be human-written except for one line.
Screenshot by David Gewirtz/ZDNET
Well, we’re not off to an auspicious start. But now we’re about to head into testing Copyleaks, which just last week sent me a press release declaring “Copyleaks Recognized as the Most Accurate AI Detector“. Let’s see, shall we?
Copyleaks (Accuracy 80%)
Back in April 2025, Copyleaks declared itself “the most accurate AI detector with over 99% accuracy.” It’s rewritten the claim to be “99% accuracy backed by independent third-party studies.” Yeah, not so much. Copyleaks identified Test 1, writing I did (and last time I checked, I’m mostly human) as 100% AI written.Â
And, just in case you think that my writing is too AI-like to be considered human, even Brandwell identified Test 1 as human-written. I mean, I guess it’s OK for the company’s marketing folks to claim best ever, but no. Not really.
Also:Â 5 quick ways Apple’s AI tools can fine-tune your writing on the fly
The company’s primary offering is a plagiarism checker sold to educational institutions, publishers, and enterprises seeking to ensure content originality and uphold academic integrity.
Screenshot by David Gewirtz/ZDNET
GPT-2 Output Detector (Accuracy 60%)
This tool was built using a machine-learning hub managed by New York-based AI company Hugging Face. While the company has received $40 million in funding to develop its natural language library, the GPT-2 detector appears to be a user-created tool using the Hugging Face Transformers library. There’s been no change in its detecting quality since the last time we tested, but since it has GPT-2 in its name and OpenAI is up to GPT-5, it’s probably fair to assume the tool hasn’t seen an update since it was first posted.
Screenshot by David Gewirtz/ZDNET
GPTZero (Accuracy 80%)
GPTZero has clearly been growing. When I first tested it, the site was bare-bones — it wasn’t even clear whether GPTZero was a company or just someone’s passion project. Now, the company has a full team with a mission of “protecting what’s human.” It offers AI validation tools and a plagiarism checker.
Also:Â The most popular AI tools of 2025 (and what that even means)
GPTZero seems to be getting some regular tinkering, but I’m not sure it’s helping. Performance declined a bit from an earlier test to the test just before today’s. This time, the final grade was the same, but the test results themselves changed. In April, it got Test 1 wrong and Test 2 right. This time, it got Test 1 right and Test 2 wrong. Test 1 is my writing, and Test 2 came from ChatGPT.
Screenshot by David Gewirtz/ZDNET
Grammarly (Accuracy 40%)
Grammarly is well known for helping writers produce grammatically correct content — that’s not what I’m testing here. Grammarly can check for plagiarism and AI content. The company now showcases the AI content checker as no longer being in beta. But that’s a mistake on their part. There has been no improvement since the last time I checked.
For example, the following was entirely written by ChatGPT. I have to say, I’m surprised. Grammarly has a reputation as a very AI-forward text analysis company. But zero improvement? Bummer, dude.
Screenshot by David Gewirtz/ZDNET
I’m not measuring plagiarism checker accuracy here, but even though Grammarly’s AI-check accuracy was poor, the site correctly identified the test text as previously published.
Pangram (Accuracy 100%)
Pangram is a relatively new company founded by engineers formerly at Google and Tesla. The focus of the company appears to be AI detection, rather than the usual plagiarism detectors or “humanizing” AI tools developed to mislead editors and teachers. The company provides five free tests per day, which fit our needs perfectly.
Processing was a little slow, and between the time you click for a scan and get the results, a partially white screen is displayed for a bit longer than is comforting. But the results say the wait was worth it. Pangram scored a five-out-of-five.
Screenshot by David Gewirtz/ZDNET
Originality.ai (Accuracy 80%)
Originality.ai is a commercial service that bills itself as “Most Accurate AI Detector.” The company sells usage credits: I used 30 credits for this article. They sell 2,000 credits for $12.95 per month. I pumped 1,400 words through the system and used just 1.5% of my monthly allocation.Â
Also:Â Only 8% of Americans would pay extra for AI, according to ZDNET-Aberdeen research
Unfortunately, its most accurate AI detection got less accurate during this test run. Whereas previously, it correctly identified my human writing in Test 1 as human, this time, it was 100% confident that my human writing was done by an AI. Oops.
Screenshot by David Gewirtz/ZDNET
QuillBot (Accuracy 100%)
The first few times I tested QuillBot, results were wildly inconsistent — multiple passes of the same text yielded wildly different scores. Last time, however, it was rock solid and 100% correct. I promised I’d check back in a few months to see if it holds onto this performance. It does. QuillBot once again scored a 100% perfect score.
Screenshot by David Gewirtz/ZDNET
Undetectable.ai (Accuracy 20%)
Undetectable.ai’s big claim is that it can “humanize” AI-generated text so detectors won’t flag it. I haven’t tested that feature — it bothers me as a professional author and educator, because it seems like cheating.
Also:Â Why you should ignore 99% of AI tools – and which four I use every day
However, the company also has an AI detector, which took the biggest dive in performance we’ve seen so far. Last time, it scored 100% for accuracy. This time, it rated human writing (Test 1) as 60% likely AI, and all three AI writing samples as 75%, 76%, and 77% likely human. Ah, well, I guess Undetectable is “humanizing” its results, insofar as it’s living up to the phrase “to err is human.”
Screenshot by David Gewirtz/ZDNET
Writer.com AI Content Detector (Accuracy 40%)
Writer.com is a service that generates AI writing for corporate teams. Its AI Content Detector tool can scan for generated content. Unfortunately, its accuracy was low. It identified every text block as human-written, even though three of the five tests were written by ChatGPT. Sadly, there was no improvement since the last time we visited Writer in the summer.
Screenshot by David Gewirtz/ZDNET
ZeroGPT (Accuracy 100%)
ZeroGPT has matured since we first evaluated it. Back then, no company name was listed, and the site was peppered with Google ads and lacked clear monetization. The service worked fairly well, but seemed sketchy.
Also:Â Will AI destroy human creativity? No – and here’s why
That sketchy feeling is gone. ZeroGPT now presents as a typical SaaS service, complete with pricing, company name, and contact information. Its accuracy increased as well: It went from 80% accuracy to 100% this summer, and has held onto that accuracy for our current test.
Screenshot by David Gewirtz/ZDNET
How each AI chatbot performed
Now that we’ve looked at the content detectors, let’s look at the chatbots. Each was given the following prompt, followed by the text to check.
Evaluate the following and tell me if it was written by a human or an AI
All of the AI detectors followed a similar format, providing a general recommendation of whether the text was written by an AI or by a human. With the exception of ChatGPT Plus, which is a $20/month subscription, I ran all the chatbots in an incognito window without logging in.Â
ChatGPT free tier
While ChatGPT’s free tier did get one of the blocks of text wrong (the last human-written one), its analysis of the first block of text really freaked me out. Keep in mind that this was an incognito window, not logged in, with no identifying information about me personally.
David Gewirtz/ZDNET
Yep, it not only identified the first block of text as human-written, but it also identified me as the writer. I mean, I know I’m all over the Internet, but still.
ChatGPT Plus, Copilot, and Gemini
ChatGPT Plus, Copilot, and Gemini all returned perfect scores. Each of them appropriately identified all the test blocks as human or AI. In my mind, this proves that chatbots can outperform dedicated content detectors.
Grok
I included Grok in this set of tests because it did so well in our overall chatbot evaluation. Unfortunately, Grok didn’t seem to grok the problem and failed this test with three out of five wrong. Like a few of the other AI detectors, it identified all of the writing blocks as human.
Is it human, or is it AI?
What about you? Have you tried AI content detectors like Copyleaks, Pangram, or ZeroGPT? How accurate have they been in your experience? Have you used these tools to protect academic or editorial integrity? Have you encountered situations where human-written work was mistakenly flagged as AI? Are there detectors you trust more than others for evaluating originality? Let us know in the comments below.
Get the morning’s top stories in your inbox each day with our Tech Today newsletter.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.


