KJZZ is a service of Rio Salado College,
and Maricopa Community Colleges

Copyright © 2026 KJZZ/Rio Salado College/MCCCD
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

Voice cloning scams are on the rise. This ASU researcher is making a tool to detect real humans

Isabella Lenz (left) and Visar Berisha.
Sam Dingman/KJZZ
Isabella Lenz (left) and Visar Berisha.

SAM DINGMAN: A while back, Bella Lenz’s grandparents got a phone call. The voice on the line claimed to be Bella’s cousin. The cousin was in serious trouble — they were traveling abroad, and had somehow ended up in prison. They needed money right away, or else something even worse was going to happen.

Bella’s grandparents didn’t realize it, but that voice on the line was not, in fact, the cousin. It was a digital clone, built by a scammer.

BELLA LENZ: And they convinced my grandparents, over a series of several months, to ultimately wire them over $50,000. And during that time period my grandparents were absolutely distraught.

DINGMAN: Thanks to the rapid expansion of generative AI, scams like this are increasingly common. In 2025, it’s easier than ever to create a convincing facsimile of anyone’s voice, and make it say whatever you want. I know this, because Bella, a postdoctoral researcher at ASU, and her boss, professor Visar Berisha, recently did it with my voice.

CLONE SAM DINGMAN: ASU is a comprehensive public research university, measured not by whom it excludes, but by whom it includes, and how they succeed, advancing research and discovery of public value.

DINGMAN: Not bad, right? The tone is a little off, and the pace is a little stiff — but it doesn’t sound as fake as I thought it would. To make it — which, to be clear, they did with my permission — professor Berisha pulled a few seconds of audio from one of my KJZZ stories. Then, he fed it into one of the many widely-available software platforms that come up when you search “AI voice cloning software.”

VISAR BERISHA: We used an off-the-shelf algorithm — so this is not something that you know, we have access to because we’re experts in the field. This is something that everyone has access to and could easily do. And the whole process took less than ten minutes.

DINGMAN: Phone scammers, of course, aren’t the only ones using AI to make convincing voice recordings. This past summer, a band called Velvet Sundown raised eyebrows when they released several full-length albums in the space of two months. They racked up millions of streams before it was revealed that their songs were created using AI.

Earlier this year, the Hollywood Reporter profiled a company called Inception Point which claims to be creating 3,000 podcast episodes a week using generative AI — like Declutter Delights: Your Step-By-Step Guide to Simplify Life.

[CLIP FROM DECLUTTER DELIGHTS: Hey there, clutter conquerors and declutter divas! Welcome to the show that’s here to help you declutter your life, and have a blast while doing it. Today we’ve got a clutter-busting adventure ahead, and we’re going to guide you through the process step by step. Because let’s face it: decluttering can be wilder than a monkey in a shoe store …]

BERISHA: Unfortunately this is one of the realities when you have the ability to create synthetic content at scale, and anyone has access to it. Then, I think, what happens over time, is the ratio of AI-generated content to human-generated content online flips.

DINGMAN: That’s Visar Berisha again, the professor who cloned my voice. And not to get too meta here, but as I was editing this story, I ducked the narration of the Declutter Delights podcast under Visar’s voice, to illustrate what is, for now, the obvious difference between robot speech and human speech. But Visar says the reality gap is closing fast, and it won’t be long before more people are duped by AI voice clones. It will be almost impossible to trust your ears, leading to even more fake music, fake podcasts, more expensive scams — or worse. But even before we get there, he says, AI is already contributing to what he sees as an ever-increasing sense of exhaustion.

BERISHA: If you’re constantly focusing on what the voice sounds like, then you miss out on the content. Or perhaps you no longer want to communicate, because it’s cognitively really taxing to constantly think about whether you’re speaking with a human or not. 

DINGMAN: Visar’s been working with Bella on a piece of technology that might provide a solution. They call it OriginStory, and when I visited their lab recently, they showed me how it works. Bella sat down in front of a laptop. In front of it, mounted on a stand, was a small red circuit board, about two inches long and three inches wide.

LENZ: What we have in front of me right now is a high-frequency radar sensor. What it’s doing is it’s sending very low energy pulses of electromagnetic energy out towards your body, and then it’s measuring how long it takes those pulses to hit your body and then come back. 

Sam Dingman/KJZZ
The radar sensor connects to a piece of software running on the laptop.

DINGMAN: The radar sensor was connected to a piece of software running on the laptop. The idea, Visar explained, was to correlate the physiological signals the human body sends out when it speaks with the sound of the human voice. Bella invited me to sit down in front of the sensor, and start speaking. When I did, lines of code appeared on the laptop screen, followed by two words: human detected. Then, I stood up and got out of the way, and we played the clone of my voice into the radar sensor.

CLONE SAM DINGMAN: ASU is a comprehensive public research university, measured not by whom it excludes, but by whom it includes, and how they succeed, advancing research and discovery of public value.

DINGMAN: I’m looking at the live readout, it says, “No human detected.” 

BERISHA: Yep, so that works as intended. ... So the idea is that the only way that it will authenticate the audio as human generated is if it detects those parallel physiological signals, the biosignals we mentioned earlier. You actually have to have a live human presence in front of the device in order for the biosignals to be matched relative to the actual audio, to the speech.

DINGMAN: Visar and Bella are still in the early stages of testing this system. But eventually, they envision something like this radar sensor being built-in to any device that has a microphone. If a scammer tried to rip off someone’s grandparents by playing them a fake recording of a family member’s voice, the software would trigger an alert on the grandparents’ phone screen: no human detected. And if you were driving around listening to a live stream of KJZZ, and we had something like OriginStory hooked up to the mic I’m talking into right now, a notification would pop up on your car’s media console: human detected.

BERISHA: You know it opens up sort of new avenues. You could imagine some future version of the internet where it’s human-only content. Or virtual meetings that do not allow AI bots, but only human to be present. So there’s lots of different ways that the technology could be used downstream.

DINGMAN: A future version of the internet where only humans are allowed. It already sounds a little old-fashioned.

KJZZ's The Show transcripts are created on deadline. This text is edited for length and clarity, and may not be in its final form. The authoritative record of KJZZ's programming is the audio record.

More Technology news

Sam Dingman is a reporter and host for KJZZ’s The Show. Prior to KJZZ, Dingman was the creator and host of the acclaimed podcast Family Ghosts.