botsplash
Platform
Why Botsplash?
Reviews
Channels
Web Chat
SMS
Voice and Video Call
Facebook Messenger
Google Business Messages
Features
Integrations
Mobile App
Reporting
Single Sign-On
Security
Support
All Features
Solutions
By Industry
Mortgage
Pharmaceutical
Real Estate
Retail
Automobile Sales
By Company Size
SMB
Enterprise
Pricing
Resources
Resource Center
Blogs
News
Help Center
Knowledge Base
FAQs
About
About Us
Our Team
Careers
Newsletter
Login
Request DemoLogin
Posted on 
November 21, 2017

Named Entity Recognition Analysis

N amed E ntity R ecognition( NER ) is a technique in Natural language processing used for identifying the entities in an input text. This article discusses various NER techniques examined at botsplash, for chatbot creation. The NER tags the input sequence of words with entities such as person, place, organization, date reference etc. Depending on requirements and application, entities might vary. In our application, we use NER for the chatbot to help in processing user request.

e.g. Below is a user query

"Hi I am looking for a house in Mecklenburg County with garage & pool"

‍

Entities to extract

Address: Mecklenburg County  
House Feature: Garage, Pool

‍

The extracted entities are parameters to the data store/database. The return result from the database can be used to create chatbot responses.

Techniques

Simple NER can be constructed by using a collection of regular expressions for simple entities like currency, amount etc. But for a complex entity like the name of a place (which can also be the name of the person so the only way to distinguish is by context) or reference to date or time (there are multiple ways to writing) we need machine learning model which learn rather than having a specific rule. There are various NLP packages which have NER implementation.
Some of the libraries which have NER built are listed below
1. Stanford CoreNLP,
2. Spacy
3. Spacy 2.0 (Alpha version)
4. OpenNLP
5. JULIE Lab Tools (Biomedical clinical, ecological and economic entities)
6. Balie (called as YooName)
7. Postech Biomedical NER System (Biomedical entities only)
8. ABNER (Biomedical entities only)
NLTK also provides an interface for using Stanford coreNLP NER. Deep neural network model (LSTM) is also used for NER and have been proved to be effective. Spacy Alpha used neural network model.
For our application, we only explored Stanford coreNLP, Spacy and LSTM model.

Stanford coreNLP

The Stanford coreNLP package provides NER and is written in JAVA. We can also use the package from the command line by passing input file and in Python using os.cmd. The package comes with pre-trained NER model that can be used immediately. The pre-trained model can recognize 3 classes (Person, Organization, Location and Date). They also have web demo where we can try out NER and other package functionality. The package also supports retraining on custom entities and multi word entities such as multi word city names (e.g. New York) or street address (e.g. 100 N Tryon Street). The data needs to be in IO(inside outside) encoding or IOB (inside outside beginning) encoding. IO cannot distinguish adjacent entities of the same category but IOB can, as it marks multi word entity with ‘B-‘(for the first word) and ‘I-‘(for the following words) prefix.

e.g. IO encoding

I am looking for property in New York city  
O O  O       O   O        O  LOC LOC  O

‍

IOB encoding

I am looking for property in New     York    city  
O O  O       O   O        O  B-LOC   I-LOC   O

‍

Spacy

Spacy is Python NLP package that provides NER, tokenization, sentence segmentation, sentiment analysis, coherence resolution, dependency parsing and POS tagging. This package also comes with pre-trained model which can be used to do entity recognition like a product, language, event etc. It also supports re-training of the model. For training, the data needs to be in a list of tuples. Each tuple contains sentence and list of all the entities and their location in the sentence. The newer version of Spacy 2.0 is in alpha. It still has most of the functionality same as the original package but the model implementation has changed. Most of the models are implemented using neural network using Thinc and support GPU.

Deep LSTM network

We build many to many recurrent neural network model for NER using a recurrent neural network and LSTM cells.

RNN architecture (source: <http: 21="" 2015="" karpathy.github.io="" 05="" rnn-="" effectiveness="">)</http:>

The network was written in Tensorflow and was trained on real estate data to recognize various parts of address (street, city, state, and zip code). The model has word & character embedding as input layer and we use GloVe word vectors. Every input word would be converted to its corresponding word vector and concatenated with character vector. This input was then fed into LSTM network. The logit from LSTM layer is fed to softmax layer to output the probability. The model was trained for 10 epochs. The final trained model has had test accuracy of 98.90 and f1 of 98.71. The trained model was able to recognize street address (even made up but syntactically correct) with high accuracy.

Conclusion

Deep neural network model have given us good accuracy but the biggest hurdle to good NER is availability of data. Other library also performed well when used for entities they were trained for. During re training of these model we again face the issue of dataset. Collection of data can be difficult and highly depends on domain. There are dataset available like the coNLL 2002 task data , Language-Independent Named Entity Recognition at CoNLL-2003 or CLEF ehealth 2016 but when building domain specific models, in our case real estate, we either had to get data from people’s conversation with real estate agents or find data that closely resembles it.


view All Posts
Featured Posts
Blog
Convert your JSON to Forms in React
Blog
Top 3 reasons every car dealership needs live chat
Blog
How to Re-engage Old Leads
Blog
Digital Communication for the Post-Pandemic World
Blog
Tips for Generating High-Quality Leads on Facebook
News
Intercontinental Capital Group Expands Multi-Channel Messaging with Botsplash Partnership
Newsletter
January 2021 at Botsplash
Blog
SMS Marketing Best Practices
Tagged:
Engineering
Natural Language Processing
Machine Learning
FinTech Modeling
Business Domain
Subscribe
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Stay Connected

Learn more about Botsplash

take me there!

Botsplash Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
More Posts

You Might Also Like

Blog
Convert your JSON to Forms in React
Feb 23, 2021
 by 
Sumit Bajracharya
Blog
Top 3 reasons every car dealership needs live chat
Feb 22, 2021
 by 
Lauren Tooley
Newsletter
February 2021 at Botsplash
Feb 9, 2021
 by 
Lauren Tooley
Blog
How to Re-engage Old Leads
Feb 3, 2021
 by 
Bruce Walker
Blog
Digital Communication for the Post-Pandemic World
Feb 2, 2021
 by 
Lauren Tooley
Blog
Tips for Generating High-Quality Leads on Facebook
Jan 28, 2021
 by 
Lauren Tooley
botsplash

9820 Northcross Center Ct
Suite 187
Huntersville, NC 28078

Platform
Web ChatSMSVideo and Voice CallFacebook MessengerGoogle Business MessagesAll Features
Solutions
MortgageReal EstateAutomobile SalesPharmaceuticalRetail
Resource Center
BlogsKnowledge BaseFAQsDevelopers and API
About
About UsCareersContact UsNewsletter
Copyright © , Rohi LLC. All Rights Reserved.
Privacy PolicyTerms of UseStatus Page