NLP: Talking with Machines
To best solve our clients business challenges, we experiment with the latest technologies and applications. In doing so, we reviewed a variety of Natural Language Processing services and will take you through the step-by-step process of creating an implementation yourself.
The field of Natural Language Processing (NLP) dates back to the 1950s, when prominent computer scientists, such as Alan Turing, claimed one day, an interaction between human and computer would be so realistic it would be indistinguishable from a ‘normal’ conversation. In today’s world, services like Google Assistant and Siri exist to bridge that gap. However, more importantly, powerful NLP tools allow even the smallest development teams to create ground-breaking applications and services.
To best solve Vectorform’s clients business challenges, we experiment with the latest technologies and applications. In doing so, we reviewed a variety of NLP services and will take you through the step-by-step process of creating an implementation yourself.
First up is IBM’s Watson, part of their Bluemix online suite of cloud services. This suite includes speech to text, text to speech, and a powerful NL classifier, which takes training data and allows anyone to discern topics from text. In addition, IBM offers Alchemy Text, which takes text input such as a sentence and returns the primary “subject” of the sentence. For example, given the input “Where are you going on vacation?“, it will conclude you are talking about “vacations”. The real power of these services is the ability to understand any sentence structure, like “I had a great time in San Diego last March. What’s your choice of vacation this year?” Understanding language is the key to NLP, and saves countless hours of building a rigid system. IBM’s cloud services are certainly fast and easy to set up, but they generally only perform one function and this chaining of services can drive the expense up; as developers are billed for each network call.
Microsoft Cognitive Services / LUIS
At the 2016 Build Conference, Microsoft unveiled the rebranding of their machine learning and NLP services, previously called Project Oxford. Despite their beta status, Microsoft Cognitive Services offers the most accessible, affordable, and compelling implementation of NLP we have come across, in the form of “Language Understanding Intelligence Service,” or LUIS for short.
How to Create a Sample in LUIS
After signing in with a Microsoft account at luis.ai and providing some locale information, add a new app using the “+” button. Fill in a short survey and you’re presented with the following screen:
Pause here and think about use-cases for your app/service. Let’s say you want to create a virtual travel agent – someone who finds flights for you based on any question you ask. Follow these steps to do so:
- In the left-hand column, see “intents” and “entities.”
- Intents are essentially the main “context” of a sentence – i.e. “What is this sentence about?” Entities are the “objects/subjects” of the sentence; if I told the travel agent to book a flight to London, “London” would be the entity. This is a simplified overview and there is a lot more to the service for advanced implementations, but let’s take a look at it in action.
- Add a new intent with the corresponding “+” button. Call it “BookFlight” and think of what you would say to a real travel agent, e.g. “I want to book a return flight to San Francisco leaving on December 1st.”
- Add the above to the “example” field and click “add.” You will see this:
- From here, hitting ‘Submit’ will log this kind of phrase (referred to as “utterance”) as a phrase which is related to booking a flight. Go ahead and do that.
Now, let’s create a few entities to get the ball rolling. For a travel agent, we might want to know the destination and date. For more sophistication, we could expand on this by checking for return flights, number of stops, a return date and so on. We’ll keep this simple.
- Create a new entity called “Destination,” and a “Pre-built Entity” in the section below; choose “datetime” at the bottom of the list. Pre-built entities have been created and trained already for specific cases, like dates (and all the permutations of specifying a date).
Adding a new phrase, we can begin to train LUIS to understand our input. Think of another phrase we might ask – keep it simple. “I’d like a trip to London leaving next Monday.” Add it as a new utterance and you’ll see this:
We are nearly there, but we can already see LUIS is 97% sure we are talking about booking a flight, based on the words in the sentence. LUIS already understands we stated a date, but doesn’t yet understand a destination.
- Click on the word “London” and choose “Destination.” Now we have identified the destination and London is highlighted, click submit.
- Repeat this process with a few different variations and destinations. Follow-up by clicking “train” in the bottom left.This is where things get interesting.
- Click the “publish” button -> “publish web service.” Type another example query, and click the URL below in the query box. You will see something like the following:
In just a few minutes we created the building blocks of a sophisticated NLP system. It is important to note the pre-built “geography” entity which makes the “Destination” entity redundant, but bear in mind that we still need a return flight date, type of flight, class, and more; these functions can be fulfilled by the creation of more entities
The combination of intents and entities creates what we might call an “understanding.” Creating a simple web service which sends text to LUIS and comes back with the machine’s “understanding” (the sum of its response) is trivial; the URL we clicked on is an endpoint we can hit. Armed with this knowledge and with further training/tweaking/building, we can build conversational applications and services which start to feel more like people. Behind the scenes, your application can carry out searches on the user’s behalf and even ask for more information.
If the understanding returned by LUIS is incomplete – for example, if we have an intent but no detected entities – we can ask follow-up questions to get more information. Let’s say the user simply says “I want to book a flight.” Detecting that the user wants to fly somewhere, but finding the “entities” array is empty, the app could respond “where to?”, remembering the user’s intent. Thus, when the user responds “Paris,” the application has all the information it needs without the user needing to clumsily repeat themselves. There are sure to be other “conversational” tricks. Perhaps creating a master “switchboard app” in LUIS, which calls different, specifically trained sub-apps, would maximize sophistication.
“Breaking out of text and using only our voices in combination with LUIS enters us into a brave new world of possibilities.”
So far in our example we’ve been living in a text-only world. To take it a step further, we can make use of Speech-to-Text (STT), which has become far more accurate in recent years. Combine it with the ability of LUIS to generally understand the user even if the input wasn’t quite right, and we have a great, innovative input method to play with.
This article is not focused on STT, but native options exist for both Android (Google’s voice recognition) and iOS (Siri API in iOS 10) to make this fairly trivial. LUIS also has its own STT service, but native solutions work just as well and are less complex to implement.
Brave New World
Breaking out of text and using only our voices in combination with LUIS enters us into a brave new world of possibilities. Take Virtual Reality (VR), for example; an emerging technology which has exploded in popularity in recent years, and one we love to experiment with at Vectorform. With a dedicated VR lab and access to the HTC Vive and Oculus Rift, our developers have already begun to dream up engaging, immersive VR experiences, such as our work with DTE – an interactive virtual training environment for DTE’s field technicians.
Taking this a step further, we can combine NLP with VR to add another layer of immersion and interactivity to virtual experiences. Returning to our original example, we can now place our virtual travel agent in VR and have the user enjoy a face-to-face conversation with a machine. Text-to-Speech (TTS) is equally sophisticated and allows our travel agent to be the machine’s mouthpiece. Although we cannot provide details at this time, the development of immersive, voice-driven VR is well under way, right here in our office.
It only takes moments to imagine countless new applications for this combination of technologies. Virtual doctors, assistants, service personnel, technicians and help-desks are just a few. Save a trip into town for a face-to-face consultation with your financial adviser and have the consultation in the comfort of your own home instead. Enjoy personalized therapy and triage before ever having to make an appointment. Wear a headset and book an entire vacation with your virtual travel agent.
As the line between humanity and artificial intelligence continues to blur, Alan Turing’s prediction may be closer to reality than you think.
Interested in learning more? Let’s start a conversation.