Entity Recognition in Stanford NLP using Python

Question

I am using Stanford Core NLP using Python.I have taken the code from here.
Following is the code :

from stanfordcorenlp import StanfordCoreNLP
import logging
import json

class StanfordNLP:
def __init__(self, host='http://localhost', port=9000):
    self.nlp = StanfordCoreNLP(host, port=port,
                               timeout=30000 , quiet=True, logging_level=logging.DEBUG)
    self.props = {
        'annotators': 'tokenize,ssplit,pos,lemma,ner,parse,depparse,dcoref,relation,sentiment',
        'pipelineLanguage': 'en',
        'outputFormat': 'json'
    }

def word_tokenize(self, sentence):
    return self.nlp.word_tokenize(sentence)

def pos(self, sentence):
    return self.nlp.pos_tag(sentence)

def ner(self, sentence):
    return self.nlp.ner(sentence)

def parse(self, sentence):
    return self.nlp.parse(sentence)

def dependency_parse(self, sentence):
    return self.nlp.dependency_parse(sentence)

def annotate(self, sentence):
    return json.loads(self.nlp.annotate(sentence, properties=self.props))

@staticmethod
def tokens_to_dict(_tokens):
    tokens = defaultdict(dict)
    for token in _tokens:
        tokens[int(token['index'])] = {
            'word': token['word'],
            'lemma': token['lemma'],
            'pos': token['pos'],
            'ner': token['ner']
        }
    return tokens

if __name__ == '__main__':
sNLP = StanfordNLP()
text = r'China on Wednesday issued a $50-billion list of U.S. goods  including soybeans and small aircraft for possible tariff hikes in an escalating technology dispute with Washington that companies worry could set back the global economic recovery.The country's tax agency gave no date for the 25 percent increase...'
ANNOTATE =  sNLP.annotate(text)
POS = sNLP.pos(text)
TOKENS = sNLP.word_tokenize(text)
NER = sNLP.ner(text)
PARSE = sNLP.parse(text)
DEP_PARSE = sNLP.dependency_parse(text)

I am only interested in Entity Recognition which is being saved in the variable NER. The command NER is giving the following result

The same thing if I run on Stanford Website, the output for NER is

There are 2 problems with my Python Code:

1. '$' and '50-billion' should be combined and named a single entity. 
Similarly, I want '25' and 'percent' as a single entity as it is showing in the online stanford output.
2.  In my output, 'Washington' is shown as State and 'China' is shown as Country. I want both of them to be shown as 'Loc' as in the stanford website output. The possible solution to this problem lies in the documentation .

But I don't know which model am I using and how to change the model.

anindya · Answer

The Stanford CoreNLP released by the NLP research group at Stanford University. It offers Java-based modules for the solution of a range of basic NLP tasks like

POS tagging (parts of speech tagging)
NER (Name Entity Recognition)
Dependency Parsing, Sentiment Analysis etc.

Before doing all above task you should first setup stanford CoreNlp.

If you want to have clear picture about stanford coreNlp starting from setup core nlp for python, NER , POS to sentiment, you can have a look at below link.

Advanced Natural Language Processing with Stanford CoreNLP

nag · Answer

From the following links, I understood that we can use a specific classifier by doing.

Load the specific classifier:

java -mx600m -cp "*;lib*" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz

Set ner model to the classifier in the code:

{ "ner.model", "english.all.3class.distsim.crf.ser.gz" }

Stanford, Stack Overflow

Entity Recognition in Stanford NLP using Python

2 Answers

Add your own answers!

Ask a Question