Putting Digital Voice Assistants to the Test

While voice assistants have become a great convenience, they are not without privacy concerns.

It seems that once every week or two one of ours “hears” its wake phrase incorrectly. Sometimes it happens with TV or a movie playing, other times it happens when there’s no one talking at all.

False Trigger Study

Good news, though. It looks like there’s a study underway to find out what phrases trigger these devices on accident. This study is a joint effort by researchers from Ruhr-Universität Bochum (RUB) and the Bochum Max Planck Institute (MPI) for Cyber Security and Privacy.

The full results of the study aren’t yet out, but they do have some preliminary findings. On their GitHub page you can see the three major devices fail on a variety of trigger words. You can also read about their methodology and some of the most common phrases that misfired on devices.

Their preliminary findings are not great. Especially if you’re paranoid about these voice assistants over-listening:

Our setup was able to identify more than 1,000 sequences that incorrectly trigger smart speakers. For example, we found that depending on the pronunciation, «Alexa» reacts to the words “unacceptable” and “election,” while «Google» often triggers to “OK, cool.” «Siri» can be fooled by “a city,” «Cortana» by “Montana,” «Computer» by “Peter,” «Amazon» by “and the zone,” and «Echo» by “tobacco.”

Always (Over)Listening

Another interesting fact that I picked up from this study. All of these devices have a 2-step process before activation.

In the first step, each device has onboard algorithms designed to detect possible wake words. This is the part of the process that is “always-listening” (but not recording). It’s important to know that these algorithms are more forgiving – they will trigger on lots of sound-alike words. From a usability perspective, this makes sense – a device that triggers too often is better than a device that doesn’t trigger when asked.

When the device thinks it’s heard a wake word, it forwards a short audio clip to cloud servers for more detailed analysis. If the cloud servers confirm that a wake word was said, then the voice assistant sends additional audio to the cloud server. This second audio clip (hopefully) contains the voice commands from the user. If the device triggered accidentally, though, these two clips could contain almost anything.

The analysis done by these researchers shows that while this 2-step process does reduce the number of false positives, many times non-wake words make it past both checks. This means that there can be a lot of unintentional audio sent out without a person being aware of it.

No Solution, But Awareness

One thing that I’ve done with all of my voice assistants is set them to have an audible “beep” when they hear their wake word. This doesn’t make them any less likely to activate unnecessarily, but it does make you aware when they are recording.

Good news if you’re the paranoid type – you’re not wrong, someone really is listening!