VoCo, The Photoshop for Voice

Adobe VoCo, a voice over technological breakthrough or a scandal waiting to happen?

In the Adobe Max 2016 conference held last November, Adobe introduced several technologies that may later be incorporated in their line of products under Creative Cloud. One of which is stirring a lot of controversy – project VoCo.

Dubbed as the Photoshop of speech, Adobe’s Zeyu Jin in introducing VoCo said, “We have already revolutionised photo editing. Now it’s time for us to do the audio stuff.”

He explained that VoCo allows you to manipulate prerecorded voice by editing and adding text through the platform. In his demonstration he showed how easily and quickly he changed a man’s recording from “and I kissed my dogs and my wife” to “and I kissed Jordan three times“.


This shows how incredibly powerful this application is as it can generate words that are not in the recording and the same time replicate the voice, in general tone and timbre.

Co-host Jordan Peele remarked, albeit jokingly, “If this gets to the wrong hands…” But Zeyu was quick to counter that they are doing a lot of research to prevent forgery, and also using watermarking system, so that users and listeners can detect which is fake and real, and if their system is used in the recording. Peele though added, “You can get into big trouble for something like this.”

The system will require about 20 minutes of voice sample, creating some sort of “voice print”, to make the interpretation and replication possible – making it ideal for voice over projects like audiobooks and podcasts.

Powerful and amazing as it is, this voice manipulation technology can pose some serious security and ethical concerns.

Dr. Eddy Borges Ray – a lecturer in media and technology at the University of Stirling said in an interview with BBC, “Inadvertently, in its quest to create software to manipulate digital media, Adobe has [already] drastically changed the way we engage with evidential material such as photographs.”

“In the same way that Adobe’s Photoshop has faced legal backlash after the continued misuse of the application by advertisers, VoCo, if released commercially, will follow its predecessor with similar consequences,” he added.

With the world’s current situation, wherein the proliferation of fake news has marred the credibility of legitimate media, would having a technology like VoCo escalate the situation even further? Would this affect our perception of what we hear as evidence, and rely on its truthiness?

The creators are peddling that their main target users are those in the voice over industry. On their blog site VoCo is describe as giving the user the “option to edit or insert a few words without the hassle of recreating the recording environment or bringing the voiceover artist in for another session.” But what will stop people with enterprising minds from misusing the application and break ethical boundaries?

If released, VoCo would definitely do wonders for the voice over industry, in particular benefiting producers, and the voice over actors would undoubtedly come up on the short end of the bargain.

There’s no news of ship date yet, and if VoCo will eventually be released. But what do you think, would this limited benefit (and beneficiaries) worth the possible repercussions?

How would it impact the voice over industry, particularly the voice over actors? Weigh in on VoCo by commenting below.

Rana King

Rana King has presented marketing, sales, and writing seminars around the globe. She is also experienced in business-to-business copywriting and technical writing. She is also an accomplished voice actor with regular clients from around the globe.

  • Paul Boucher

    I already have clients who like to edit their own audio (some, much more proficiently than others), & then send me back the edited files so my archives “stay up to date”. The last thing I need is a client who can actually circumvent the revision process/service with technology like this without notifying me.

    At that point, they would be still using MY voice (synthesized or otherwise) to accomplish a goal that they normally would have had to pay for. The technology will evolve to allow longer segments of speech to be synthesized from an original voice. Think tags for car dealers or lotteries. Forget about getting paid to do those.

    It also strikes me that VO contracts (particularly for things like video games where many revisions are short snippets) will now have to have some sort of up-front “price load” that compensates VO actors for eventual “voice synthesis revisions”.

    Maybe we should’ve seen this one coming, but I certainly didn’t.

    The point made in the blog post about the toll on evidential visual “truths” altered by PhotoShop to the great detriment of many – in every way from negative body image, to actually falsifying evidence in court – definitely applies to this technology.

    I sincerely don’t believe that’s alarmist. The presenter actually jokes about saying you said at “a wedding”. Imagine how much simpler some things would be if you have a video recording and you could simply change the audio to “I don’t”, rather than I do. 😉