Hiring Through Computer Voice Profiling, a Moral Achievement?

English (North American), Male, Adult, Corporate, Production Skills – these are some of the requirements set by the client when they search for the right voiceover talent for their campaign or project. Major voiceover marketplace sites use an algorithm that filters their pool of talents based on how the talents have accomplished their profiles and send out the invitation to audition to those who fit the requirements set by the client.

But what if we take it one step further?

Whose voice is engaging? Who is trustworthy? Who can convince the audience to purchase? Who is the voice of reason, the voice of authority? Who can the audience relate to? Depending on their project needs, these are some of the questions that go through the minds of casting directors, producers, and clients while they listen to voiceover submissions.

While they judge the voiceover actor by his/her skills, talent and delivery, it is like going through a voice beauty contest with a set of criterion that the talent needs to tick off. And these criteria are based on years of studies and consumer feedback on what is appealing to their target audience.

But what if we can eliminate this process and save hours of listening to recordings?

Humans are hardwired to judge, form perceptions the moment they interact – whether it be upon seeing, touching or listening, our brains start to formulate different impressions.  In the first few seconds we create perceptions that leads to feelings and then snap decisions. What if these perceptions and feelings are bottled up, or more accurately computerized?

A company says they have done just that.

Jobaline has taken years of scientific studies and focus groups results on the human voice and fed it into algorithms. The program categorize and interpret the emotions evoked when listening to a person speaking, and validate it with real human listeners. CEO of Jobaline, Luis Salazar says in an interview with NPR, “We’re not analyzing how the speaker feels. That’s irrelevant.” What they are homing on is the “emotion that that voice is going to generate on the listener.”

Is this not what casting directors, producers, and clients are looking for – who can evoke the right emotions that they wish their audience would feel?

Regardless of the intonation, emotions expressed, speech rate, and other qualities that can form the listener’s perception, there would always be an underlying quality in a person’s voice. This is man’s unique vocal fingerprint. Perception though varies from person to person. What can be perceived as high pitched and excitable for some, maybe heard as an expression of happiness for others. Jobaline’s approach was to identify interactions between an array of different features from pitch to energy accumulated over time. With this they aim to accurately predict which voice is suitable for a particular job using the right combination of vocal features. So far, the company’s formula can determine if a voice is engaging, calming, and/or trustworthy.

The company lauds that this is a moral achievement. “That’s the beauty of math,” Salazar says. “It’s blind.” This computer automation is said to not only cut costs, but also eliminates any biases. It is unaware of differences in race, gender, sexual preferences or age, and for the sake of argument, years of experience in honing your skill or craft.

The problem is, this “blind audition” is still riddled with unfairness and prejudice. Like any form of profiling, voice profiling is scary. We leave the first say to an impersonal machine on who gets to the next stage. Voice profiling at this level is dangerous, as we pass on man’s prejudices to an unfeeling binary program that only few can comprehend.

Can this technology be used by voiceover platforms to help clients screen talents? Isn’t this the essence of screening – the years of experience of casting directors, producers, and clients, backed up with customer data, made efficient by binary codes?

Imagine reading the script and employing years of training and experience to deliver the copy perfectly – your recording fed through the system, and then only to be passed up for the project because a combination of ones and zeroes says your voice is lacking.

Are we going to let robots take over humanity?

Rana King

Rana King has presented marketing, sales, and writing seminars around the globe. She is also experienced in business-to-business copywriting and technical writing. She is also an accomplished voice actor with regular clients from around the globe.

  • Steven Lowell

    Technology is in place to provide nothing more than consistency of product.

    The discussion of morals only comes up because anytime a process lacks consistency, people claim the process is somehow unfair, and tech folks build a solution to the problem…..which sometimes seems unfair for it then removes the human decision making power. Besides, they also sell their tech by calling it something very human. Its not. It’s more of a display of how imperfect people are…supercharged.

    It’s funny because for years I have studied how people who fear “robots” were often the very same people who demanded them. I saw this often working for corporate companies.

    I know how algorithms work and their nasty secret is that they start off as blank slates, almost like children, and are then taught what to do next by human behavior and the algorithm attempts to solve problems based on what people claimed is unfair.

    So what’s really broken? THAT is a moral discussion.

    What you end up with? The scenario: Be careful what you wish for because you may just get it.

    Sadly, the love affair with tech comes from human inability to accept human decision making. And so no one can remember the algorithms were built based on feedback by people and taught how to behave by people.

  • New technology offers lots of ways to help us narrow down our decision making processes. The fact is, we have so many choices today, in so many different areas, that if we’re able to narrow it down, it saves us some time, in weeding out choices that only serve to muddy the waters.
    I could see how this technology could take a voice print and call one voice “aggressive” and another “calm” based on lots of factors. But there are certainly a myriad of calm sounds that are all unique. A deep male voice can sound calm, just as easily as a light, female voice. So who ultimately makes the call? If a selection down to a certain number of choices can be done via some technology, then a human will have an easier time making the final choice….or maybe harder, since presumably all the choices will fit the criteria even better. But what or who is being selectively removed?? That’s what would worry me. What “out of the box”, cool choice did I lose in using this technology?

  • Malk Williams

    @Emma & @Brent, I can’t help but think that you sound a bit complacent. I agree that computers cannot, and are unlikely in any foreseeable scenario to be able, to assess, judge and respond to voice the way humans do. That’s not to say they can’t process and analyse voices according to a programmed algorithm, matched to ‘ideal’ voice patterns.

    That wouldn’t satisfy you… it wouldn’t satisfy me either, but it wouldn’t have to. It only has to be good enough to convince business executives that it’s good enough to make financial sense.

    I don’t know if it’ll happen; I hope it won’t… but I wouldn’t rule it out based on some misplaced belief in the innate and unquestionable superiority of humans to machines. After all, we used to say exactly the same things about Chess.

    This is a thought-provoking article. The key point for me is “Voice profiling at this level is dangerous, as we pass on man’s prejudices to an unfeeling binary program”. Yes, to me it sounds as plausible as it does repulsive.

  • Emma Jamieson

    Voice is so unique that only the ears of man can truly appreciate its beauty.

  • brentwalker

    When Jobaline finally gives up, maybe they’ll find another wall to pound their head against.

    Human speech is so packed with subtlety and nuance that no computer could possibly process the emotion a voice is going to evoke. It’s a fool’s errand.

    I’ve had voiceover people ask me for years “When do you think computers will end up reading this copy?” My answer is, and always will be Never. Binary code has no emotion. Nuance can’t be programmed. Computers can’t smirk.