The challenges we came across building a voice assistant automation tool on Google's Dialogflow.

Warehouse Inventory Management Using Voice: Skuvault Agent

May 2020

Some universities offer a course called "Human Computer Interaction." I think with the new technology made available by AI/ML advancements, they will have to begin adding voice & chat as a serious means of interacting with computers.

Creating a workflow program using this new tech is the challenge we took on for Sunbelt Power Equipment. They're an internet retailer with their own warehousing, and they use the warehouse management software Skuvault.

In short, it's a central repository of all your SKUs and their quantities and locations in your warehouses. It's the primary software pickers interact with as they prepare orders for customers.

In order to improve worker efficiency, Mike the owner had the idea of creating voice commands for the Google Assistant devices he had in the warehouses. It'd be a better use for them than just listening to music (their raison d'etre before SkuvaultAgent.)

Mike had already done the work of designing the commands to be used, so we can't take the credit there. But we did come across some challenges in implementing them.

The software we went with was Dialogflow, which was previously it's own company before being acquired by Google. Dialogflow aims to be a meta platform. Although we didn't test this, they say once you build an agent on it, you can plug it into many other systems (e.g. chatbots on Facebook, Whatsapp, your own site and voicebots through Alexa, Google Assistant, etc...)

The two main features the agent supports are moving products, and auditing products.

At first I thought there wouldn't be much to creating a chatbot for these two tasks. If you're familiar with the command line interface for your computer, you're aware that you can run a program and pass it data through parameters (called arguments by programmers.)

notepad.exe /A myfile.txt

For example, running this on Windows passes a file parameter to nodepad (which will then open the specified file.)

Based on my past research at creating a voicebot when I had first purchased an Alexa device, I assumed that creating the voice commands for SkuvaultAgent in essentially a "command and parameters" sort of way like the above would be possible.

This straightforward and simple way would have been possible had we started this project a year earlier. But Dialogflow had removed the ability to create "template commands" like this from their program.

Instead, they now only supported a machine learning based model. There are advantages to this, as it allows for more natural conversation. Failing to adhere to a defined template won't necessarily cause the bot to reply in confusion just because you used a different word in a different order.

But this also meant learning some new ways of doing things. If you're building a bot, keep these tips in mind.

Machine learning means lots of repetition is needed

At first, still in my "template" frame of mind, I tried creating the commands with 1 or 2 "intent phrases." Intent phrases are what you enter into Dialogflow as examples the user will say.

Move 100 units of ABC123 from A1 to A2

You can highlight "entities" in your phrases and these will be passed as the parameters to your program.

Dialogflow Highlighting Entities

Think of the overall intent phrases as the program name, and then the entities as the parameters.

It turned out that using only a couple of training phrases absolutely would not work.

Dialogflow aims to do two basic things:

  1. Recognize the correct "program" (intent) to run
  2. Recognize the parameters to pass to the program

In order to do either of these things close to correctly, you will have to pass in many repetitive examples. Don't be opposed to copy pasting the same example and just changing the parameters (e.g. 95 units instead of 100, from A2 to A4 instead.) Google has 4 pages of phrases for the "yes" intent alone.

To understand why this is helpful, I recommend watching this series on machine learning.

An issue we came across was the AI having trouble correctly assigning the source ("from") and destination ("to") entities. The location type of these parameters is the same. The only thing that can differentiate them is the context.

First of all, after doing some testing, we decided to ditch using from and to as the context markers. To's too similar to 2 and location IDs can definitely end in numbers. To put the kibosh on errors of this type, we resolved to use "source" and "destination" as the location markers. Though, in practice, we still sometimes forget this and revert to from / to.

To solve this, we had to give it a lot of examples for it to associate that these words identify whether a location is the source or destination. The variations we did included

  • Phrases specifying only the source
  • Phrases specifying only the destination
  • Mixing up the order (destination first, source second)

All throughout creating these phrases, we had to relabel entities to their correct parameters. Dialogflow has a troubling habit of thinking there should be two destinations instead of one source and one destination. And it shamelessly enables the "Is List" on the entity config every time you add a new example. If you're doing a similar thing, make sure that's unchecked.

Dialogflow Is List

Another trick we're using to improve accuracy is custom entity types. We sync all the product IDs and location IDs into Dialogflow, so when a user says these the algorithm has a much easier time understanding. It also means our program doesn't have to deal with converting input like "A1 zero 2."

You can GOTO a different intent

We had to design a flow (or REPL in programmer speak) for the product auditing functionality. The user is presented with location, can "enter" the location and then in is presented with products which they can choose to edit or not. Basically, it can be an endless conversation if the person is up for it.

Dialogflow Intent Tree

Luckily, Dialogflow supports "going to" a different intent. One thing to note is that this is done through the "Event" field on intents. You can't actually go to an intent, but rather you define an event on an intent and then tell Dialogflow to trigger the event.

Make sure you're using contexts correctly

If you look at the audit intent tree above, you'll see all the nested sub intents.

Some things you should know about accessing contexts from sub intents:

  • All contexts are lowercase in the API. The Dialogflow UI will show you Uppercase characters, but only lowercase is used through the API.
  • The context duration can be modified by clicking the number next to it. The unit of measurement is 1 response. In a REPL setting like ours, there isn't harm in setting this to a high arbitrary value, as the context is overwritten on triggering an event (GOTO statement.)
  • You can set your own data that will last the life of the conversation (outside of any contexts.) It is your responsibility to manage this as needed.

Resources

Part of my research into Dialogflow lead me to Aravind from miningbusinessdata.com (he also has a YouTube channel here.) He is a Dialogflow consultant focusing on how to design a bot correctly, and some of his articles gave us pointers into the nuances of Dialogflow.

Of course, Google has their own content on Dialogflow. A good intro is this playlist on YouTube.

We used the Actions on Google SDK to build SkuvaultAgent. It really makes things easy as you can use intent names as your routes. And it abstracts away all the HTTP IO that's needed to communicate with Dialogflow, letting you use simple commands. You just plug it into Express and you're good to go.