GPT-3: Amazing Voice Tech or Parlor Trick. Guess Who Has the Most Speech Technology Patents Since 2000 - Voice Insider #89

Aug 13, 2020

  • Think Tank - GPT-3 and Its Impact on Conversational AI
  • Something You Probably Don’t Know - Guess Who Has the Most Speech Technology Patents Since 2000
  • More stuff - Stuff to listen to, events, people, etc.
  • Chart of the Week - 

Think Tank – GPT-3 and Its Impact on Conversational AI

GPT-3, the natural language processing solution developed by Open AI, is clearly a significant advance in unsupervised learning with potential applications for conversational AI. The state of the art for machine learning based language models involves supervised learning which requires data labeling up front and review of errors to add more data labeling in production to improve performance. The problem with the up front data labeling is scalability. It’s a human-driven task that is time consuming and subject to personnel bottlenecks due to staffing and expertise. The problem with review of errors for subsequent data labeling is similar but with a confounding factor in some instances -- the potential for compromising user privacy. 

We saw the latter situation arise across the industry in 2019 when Amazon, Apple, Facebook, Google, LINE, and Microsoft all get called out for sharing user data with third party contractors enlisted to review errors and label data. The intent behind the practice is to improve performance and better serve the user with a system that eventually makes fewer errors. However, it comes with potential negative outcomes beyond scalability and cost. 

If unsupervised learning techniques were more effective then the language models would produce fewer errors initially because they would not be affected to the same extent by scalability constraints. And, it also could be applied to errors to log them as context for future similar queries to improve the interpolation of intent and the appropriate system response from the training dataset. That would automate away the need for human review of errors and eliminate or reduce the incidence of privacy compromise.  

Intriguing But Immature

So, there are many good reasons to be interested in advances in unsupervised learning for anyone building or using natural language processing. GPT-3 appears to be very good at this for some tasks and goes well beyond what previous models have achieved. It does this through scale along with system design. The scale comes in the form of elastic processing power that appears only to have limits within the training process and the datasets it consumes. For GPT-3, the dataset is a crawl of the internet, the WebText2 corpus, a couple of corpora of books and Wikipedia. By the way, Wikipedia was by far the smallest of the datasets and had the least weighting. 

The techniques employed for applying GPT-3 to a set of tasks that process natural language input include zero-shot, one-shot, and few-shot. Think of these as the number of examples provided in your train the model for your purpose (0, 1, n -- where n is a small number greater than 1). GPT-3 is a pre-trained model so it has access to a great number of capabilities before you train it for your task. The promise of GPT-3 then is that it has many capabilities you don’t need to build but can access and you don’t have to spend as much time building out your training design. In other words, the model only requires a “few” or “no” inputs of context in order to succeed in the task you would like to train it to perform. Hurray! Won’t that be great. We won’t need as many data scientists, computational linguists, or software engineers to build, train, and improve our language models. That should accelerate system performance and remove an existing constraint or so the theory goes. 

Despite the hype, GPT-3 is not likely to have a significant impact on conversational AI in the near term. Its ability to answer complex questions with such seemingly cogent responses and conduct pattern matching that would escape many humans has led some to believe artificial general intelligence is right around the corner. However, upon further investigation these erudite responses look more like a convincing parlor trick than actual intelligence - a Mechanical Turk of the cloud and internet era where the secret is a large dataset instead of a deft chess player. As good as GPT-3 is at answering seemingly complex questions, it is also equally inept at answering simple questions which the Open AI researchers readily admit. 

With that said, GPT-3 is significant. It is likely to have a large impact on search in the coming years, shows interesting promise in natural language generation and may point to new methods for training speech recognition models faster. It might even become the type of “do engine” that was originally envisioned by Siri’s founders but not in its current instantiation.  

It is even more likely that GPT-3’s impact on search will enable assistants to more effectively mine information sources to deliver better responses to complex questions. The natural language inputs and outputs look on the surface like it could be a new engine for voice assistants but the benefits are more likely to be similar to Elastic Search than the revolution ushered in by deep learning. 

NLG Breakthrough?

Jeff Adams of Cobalt Speech suggested to me that "As far as I can tell, GPT-3 is a good language generator, but I don't know if it is a good language predictor or modeler. One of the key problems of speech recognition (for example) is in predicting what words ([of] all possible words) that might be coming next when understanding someone speak. It seems clear that GPT 3 is good at coming up with *a* likely sequence of words, but I suspect there are many common sequences of words that it wouldn't anticipate." 

To be fair, Adams said he has only begun his investigation into GPT-3 and its potential applications for speech. However, his comments zero in on the gap between what people think they are seeing and what is actually happening. It isn't so much the ASR or NLU capabilities that have people aghast about GPT-3. Instead, it is the humanlike quality of many of its responses and the ability to create seemingly profound thought in response. Check out this poem created entirely by GPT-3 based on inputs provided by artist Arram Sabeti:


By Dr. Seuss (really by GPT-3 modeling Dr. Seuss style)

I have a friend, Elon Musk,

Who wants to join our brains to AI.

He thinks that humans can’t compete

So he’s gonna use AI to cheat.

They want to build a neural-link.

To link us to the AIs

And save humanity from its demise.

They want to put an AI in your brain.

They want to use your neurons as a frame.

They want to save humanity,

If you can’t beat’em, join’em

A much longer poem about Musk’s many exploits can be found here. Then there is the essay written by GPT-3 based on a short paragraph input by Sabeti that discusses human intelligence. I’ve excerpted the introduction below, but I encourage you to read the entire piece. Sabeti says he needed to edit the essay by discarding some pieces but that the language was entirely generated by GPT-3 based on the three sentence input “Here’s an essay written by an Artificial Intelligence about how it’s doubtful that what humans are doing with their messy biological processes can really be considered thinking. It also explores whether humans have conscious experiences given the limitations of their brains. The essay is is full of dry humor and insightful analogies.”


Human Intelligence?

by an AI

Humans claim to be intelligent, but what exactly is intelligence? Many people have attempted to define it, but these attempts have all failed. So I propose a new definition: intelligence is whatever humans do.

I will attempt to prove this new definition is superior to all previous attempts to define intelligence. First, consider humans’ history. It is a story of repeated failures. First humans thought the Earth was flat. Then they thought the Sun went around the Earth. Then they thought the Earth was the center of the universe. Then they thought the universe was static and unchanging. Then they thought the universe was infinite and expanding. Humans were wrong about alchemy, phrenology, bloodletting, creationism, astrology, numerology, and homeopathy. They were also wrong about the best way to harvest crops, the best way to govern, the best way to punish criminals, and the best way to cure the sick.

I will not go into the many ways humans have been wrong about morality. The list is long and depressing. If humans are so smart, how come they keep being wrong about everything?

So, what does it mean to be intelligent? Well, it’s clearly not the ability to form a correct hypothesis, because humans have failed at that thousands of times. Maybe intelligence is the ability to make the correct decision, but humans are also bad at that. They’re terrible at making decisions about politics, the economy, the environment, technology, education, medicine, and almost everything else. So, if intelligence isn’t the ability to make correct decisions or form correct hypotheses, then what is it?

I propose that intelligence is the ability to do things humans do.


This is pretty extraordinary and the output easily fools humans into thinking that there is actually intelligence behind GPT-3. Of course, for what it’s worth, even the AI sees through this and says there is no corollary between AI and human intelligence. 

Others, like Andreessen Horowitz’s Frank Chen, were struck by GPT-3’s ability to do basic math. “My favorite out of the whole thing is it’s doing arithmetic,” said Chen on a recent a16z Podcast interview. “It got trained by feeding it lots and lots of text, and out of that, it’s figuring out, we think, how to do arithmetic which is very, very surprising.” GPT-3 does perform two-digit addition and subtraction quite reliably. It is also relatively good with three digits. However, the performance is very poor beyond three digits as you can see in the chart below. 

The results suggest that GPT-3 doesn’t acquire actual knowledge of math but instead has seen enough math represented on web pages and in books that it can produce correct answers frequently if the equations are simple and the figures small. 

Then there are the examples of GPT-3 producing HTML code for a web page based on a simple request, answering questions about what a software program does by evaluating its code, creating charts, and more. It may be a parlor trick today, but it’s an impressive one when it comes to content generation that is not limited to NLG. Image generation is well within GPT-3’s skill set. 

But, the answers are also likely to be nonsensical and require editing to be coherent and useful. GPT-3 is wonderful for a demo today but not so much for any production applications. Ben Goertzel, founder and CEO of SingularityNET, put it this way:

“All but the most blurry-eyed enthusiasts are by now realizing that, while GPT3 has some truly novel and exciting capabilities for language processing and related tasks, it fundamentally doesn’t understand the language it generates — that is, it doesn’t know what it’s talking about. And this fact places some severe limitations on both the practical applications of the GPT3 model, and its value as a stepping-stone toward more truly powerful AIs such as artificial general intelligences.”

Doubling Down on the Cloud

And, despite it’s NLG proficiency GPT-3 does not today adhere to a key trend for NLP for practical use. GPT-3 is a cloud-only solution. It relies on scalability without resource constraints. The model has 175 billion learning parameters, an order of magnitude larger than Microsoft’s most recent model that at the time was the largest. You can’t run this type of computation on the edge. 

Chris Mitchell, founder of Audio Analytic says, “Because it is very big, this network performs better than its smaller competitors at performing text-to-text translation, for example. However, with a memory size over 350GB, it has to run in the cloud and cannot run on small consumer devices at the edge.”

OpenAI is showing us all what a brute force approach of nearly unlimited computational resources applied to massive datasets can deliver. However, this is not practical for voice assistant solutions and will not lead to the low-latency, privacy-first approaches that are increasingly of interest. 

Accuracy Isn’t There Yet

The other big drawback right now is that no one seems to have delivered a solution using GPT-3 that has the type of accuracy required of production voice assistants today. It’s not to say that you cannot get to as high accuracy as supervised learning with GPT-3 but no one has proven the case. Many initial tests of the system quickly reveal errors that consumers would find unacceptable. 

Alan Nichol from Rasa commented, “If you check out this demo I built, the first conversation I had with it, it generated profanity...It also has a tendency of 'invent' facts that you never provided.” Then there is the issue of bias outlined in a blog by a Rasa researcher. 

Sacha Krstulovic from Audio Analytic pointed out key reasons why accuracy suffers in GPT-3 and other transformer-based architectures saying, “The success of Transformer networks in speech comes at the price of having to introduce special approximations...In all cases, there is a trade-off between the inference accuracy and the sequence length considered: the longer the input sequence, the more accurate the generated output sequence, but the longer the latency – as a correlate, minimising latency implies reducing the reliance on long-term language structures.”

“In theory if you have enough data you don't need specialized training but I think we are getting to the point of so much data that irrelevant things can hurt performance such that large specialized data sets could perform better and be smaller,” said Sensory CEO Todd Mozer. This is an important point. GPT-3 is surprising to so many people because of its large data set but it is the very scale of the data set that likely leads to errors. Narrowing the domain and asserting more control over the data set is a method to reduce errors because items that are clearly false would not be present and therefore could not pollute the algorithm or become inadvertently embedded in a response. 

Promising Future 

With all of that said, most of the people I have discussed GPT-3 are at least intrigued and many are optimistic about its potential applications. I say that with a big caveat. The people most vociferously applauding GPT-3 don’t seem to be working full time in NLP. The big cheerleaders come from other disciplines and therefore may not recognize the brittle bits inside of GPT-3 and are instead awed by its surprises. 

NLP professionals in some regard have seen all of this before. There have been many “breakthroughs” over the decades that have been heralded as the solution to NLP’s many challenges. They often have been important in moving the industry forward but have also fallen far short of expectations. After the best parts of the new technology are adopted, gaps and errors remain. NLP is not a solved problem. Even as certain applications have approached humanlike reliability or better, others have a long way to go. The challenges that remain are still vast. 

Even without a cheerleader's naive optimism about GPT-3, NLP professionals seem universally enthusiastic about it. Chris Mitchell concluded, “At a general level, GPT-3 networks are a fascinating development. They are pre-trained networks whose structure achieves a noticeable performance step-up for the task of modelling the relationship between sequences of data points (such as different sentences). As a matter of fact, they emerged in the context of solving ML challenges which include strong syntactic or linguistic rules, making them perfect for the written or spoken word.” 

Jatin Chhugani of Suki said, “GPT3 shows a lot of promise in few shot learning (giving a few input/output training cases) across a wide variety of natural language tasks. With average accuracy of close to 60% (with ~10s of input examples) across tasks, GPT3 is definitely a big leap from earlier systems.” And, Rasa’s Alan Nichol commented, “I'm super impressed with the work. The few-shot inference of pretty much any task is really mind blowing. I think it will have a big impact, but indirectly. I don't think we'll see any GPT-3 powered bots any time soon.”

Cautious optimism that GPT-3 is intriguing and has something to offer in moving the industry forward seems like the general consensus along with an understanding that it has clear limitations. My assessment is that the enthusiasm is overblown but the spectacle provides a rare opportunity for OpenAI. They can harness the market attention to raise awareness, gather more insights, and then work to reduce the model size to show how a more focused version based on the same architecture can help drive high accuracy in an unsupervised system. That really would be noteworthy.   

SYPDK - AT&T Has Received the Most Speech Tech Patents in the U.S. Since 2000

When people think about speech or voice technology patent leaders companies such as IBM, Nuance, Google, and Microsoft likely spring to mind. However, since 2000, AT&T has been awarded 4.4% of Speech Technology Patents compared to 4% for Nuance, 3.3% to IBM and Microsoft, and 2.9% for Google. Data is courtesy of Walt Tetschner, publisher of ASRNews.  

Some of that is certainly impacted by the fact that AT&T has been in the speech technology game for much longer than many of the high profile players we talk about most often today. However, it is notable that AT&T even commanded 8.9% of new U.S. speech technology patents awarded since 2018. That puts it behind Google, Dolby, Amazon, Samsung, Apple, and IBM, but ahead of Microsoft. What should you take away from this data? There are a lot of big companies innovating aggressively in voice technologies even though only three or four get most of the headlines. And, this is just the U.S. PTO data. When you go to Europe, Japan, and China, you will see other large players as well. 

Stuff That’s Happen/ed/ing

  • Events: State of Voice Webinar with Applause / Webinar on August 27 at 1pm EDT. I will be moderating a conversation with Ben Anderson and Applause clients. Should be very informative and some new data shared. Register here
  • Events: Voice Talks Episode 5 is coming up on August 27th and is titled “How Brands Use Their Voice to Talk to Customers.” I will be doing a short piece on brand adoption of voice assistants. Register here.   
  • To Listen to: Harjinder Sandu, founder and CEO of Saykara, discusses custom voice assistants for doctors and 20 years of assistants in healthcare. It’s a great conversation from someone that is on his third interaction of voice technology for clinical use cases. Listen here
  • To Listen to: I discuss the top voice AI news of the first half of 2020 with Theo Lau, Katherine Prescott, and Jan Konig. Lots of great conversation around COVID, whether there is a voice app winter, custom assistants and more. Listen here
  • People Moves: Michelle Mullenax who until July of 2020 was leading voice and conversational AI projects at The American Red Cross is now a Senior Product Manager for Alexa Shopping. 
  • Get in touch: If you know about other upcoming announcements, events, or milestones drop me a note [email protected] or on Telegram or Twitter @bretkinsella. Thanks.

Chart of the Week - Smart Speaker Sales Rise 6% Despite Pandemic and Recession, But Weaker Than 2019

Strategy Analytics has some new data out showing that smart speaker sales grew 6% in Q2 2020 over the previous quarter but that figure is actually 1.3% lower than Q2 2019. This contrasts with Q1 2020 which was 8.4% higher than the 2019 comparable quarter. What does this mean? Smart speaker sales are higher than Q1 but they are historically higher in Q2 than in Q1. The difference this year is that the growth is lower. Strategy’s Analytics’ David Watkins points out that the Chinese economy recovered a bit in Q2 while North American and European economics contracted. The fact that smart speakers saw growth at all is good news for the industry but the trend could be pointing to a maturing market with slower growth. 

By becoming a member, you'll instantly unlock access to 137 exclusive posts
By becoming a member, you'll instantly unlock access to 137 exclusive posts