Building chatbots — exploring requirements and options for a Real Estate use case
Chatbots are conversational agents or bots that can be used for automating customer engagement over voice or messaging. Chat platforms are automatic responding system, built with Natural Language Processors and powered by Artificial Intelligence(AI) modules for product search websites like Amazon, Kayak etc. Today we see various AI assistants like Google Assistant, Siri, Alexa that are primarily bots that are aimed at assisting people in their day to day life like shopping, booking an appointment or reminding events, etc.
In this post we outline various techniques explored to design a chatbot for Real Estate domain. The building blocks for chatbots include:
- Intent Classification System
- Chat State Tracking System (to add memory to bot) — Frame Based and End-to-End dialog systems
- Natural Language Generation System (to craft a response)
A dialogue of user with real estate bot would look something like below
A dialogue of user with real estate bot would look something like below
User: Hello\n**Bot: Hello, How may I help you today**\nUser: I am looking to buy a house\n**Bot: Sure, I can help you with that. Tell me what are your requirements.**\nUser: I am looking to buy home in Charlotte.\n**Bot: Do you have a price range in mind?**\nUser: My budget is $200K\n**Bot: And in what area are you looking to buy?**\nUser: I am looking for house in South Charlotte area?\n**Bot: I couldn't find any house range which matches your **requirements. Are you flexible on price range?\nUser: I can go a little higher maybe $300k\n**Bot: I found this 3 listings.**\nUser: Can you narrow down result to filter by parking available.\n**Bot: This 2 listings have parking in them.**\nUser: Can you check same house requirements but in North Davidson area\n**Bot: Sure I found these 6 listing**\nUser: I didn't like any. Lets go back to South Charlotte and I can increase my budget.\n**Bot: Sure I found these 8 listings.**\nUser: Can you only show me the one that have fireplace in them.\n**Bot: This 3 listings have fireplace in them.**\nUser: I like the first one.\n**Bot: This is the MLS# for the house. You can use this number to locate the house on our website. Would you like to book appointment with agent to see the house?**\nUser: I have saved the MlS#. I am done. Thanks. \n
1. Intent Classification
Intent classification deals with finding the objective and purpose of input text. Intent is the core module around which chatbots are built.
The quality of chatbot highly depends on how good the intent classification model is trained. Listed below are some of the intent classification libraries
- wit.ai — Now integrated with Facebook Messenger
- LUIS (Microsoft)
- api.ai (Google)
- RASA NLU
- Open Source Spacy.io — Commonly used by early startups
- Advanced interactions and use cases using deep learning libraries such as Tensorflow or PyTorch.
While, any one of the above produce a good result but the accuracy usually tends to range from 70% to 90% for natural conversations. To improve and strengthen the accuracy, a wide range of techniques should be employed including multiple packages, spell checking, better quality of training dataset, data cleansing and manual label overrides, etc.
2. Chat State Tracking
Chat state tracking refers to saving various aspects of chat such as - queries put by the user, all the bot responses and search results processed by bot platform to help with path for future dialogues and work flow.
For the real estate bot when the user presents query as “buying a house with various house facilities” the bot will use query attributes and search the database. The user may change 1 or more attributes while keeping other attributes same, e.g. “similar home in a different city/state”. For such scenarios, the data structure should ensure that it properly store all previous conversations enabling the bot to access it whenever user refers to it in future.
2.1. Frame selection and linking
The project referenced frames structure which was proposed by Maluuba in their Goal Oriented Dialogue dataset “Frames Dataset”. Every time user posts a query; the query is converted into a search call and a new frame is created. The frame will contain: original user query, all entities detected, intent, search call, and results along with frame id, and active frame status. When the user alters search criteria a new frame is created
All the user queries can be categorized based on their intents. For example in the real estate bot there are three main intents: buying, selling, and renting/leasing. Based on this, the frames are arranged in the tree structure with root node being the initiation of the conversation and each branch being one of the intent. Each frame is also numbered based on the order of queries user enters.
When user is in conversation, the user often changes the topic or refers to older frame(s) or partial user query. Every time the user changes the topic or intent, a new frame is created and the branch is changed based on the new intent. Problem arises when the user refers to the old frame and the bot needs to locate the appropriate frame. One example is “exit and switch”
In this case, the user doesn’t like any options for renting and wants to go back to buying houses. In this case the bot can locate the conversation user is referring to by finding the latest frame in history that has the buying intent. But what about the next conversation “mix intents and attributes” e.g.
Here the user looking for more buying options(intent) while changing location and budget(attributes). In this, the bot needs to be equipped to identify request with change in attributes and find the correct frame to suit user query. Solutions for such scenario is open problem in active research. Thus this poses limitations to usage of Frame concept.
2.2. End to End Dialog Systems
The end to end system works by eliminating all above modules and replacing them with a single model. One of the proposed model is “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation”. In this architecture, we have two recurrent neural networks: encoder and decoder. The encoder network takes input and learns vector representation, while the decoder network takes the vector representation and generates a response. Quote from paper
“One RNN encodes a sequence of symbols into a fixed length vector representation, and the other decodes the representation into another sequence of symbols.”
Another model architecture proposed by Google is “A Neural Conversational Model”. This is a seq2seq model which has a single recurrent neural network which takes input as sequence and outputs a sequence. To train an end to end dialogue system we need dialogue data, especially if we are training domain specific models. We need multiple dialogues documenting almost every dialogue case for the specific domain as this model will be learning everything from the data.
Natural language generation
Natural language generation can be done by using the library or by using NLP packages like the SimpleNLG. This library takes input subject, object, and verb, and constructs a sentence. We can also give multiple clauses and the library can combine them to give a coherent statement. But we need to either write template where you mention all the subject, object and verbs which can then be used to generate the response and combine multiple sentences to create a complex sentence.
Building a generalized chatbot is still an open research problem. It is possible to build a good domain specific chatbot but is hard to port on other domains without good domain data.