TransWikia.com

Entity Recognition in Stanford NLP using Python

Data Science Asked by idpd15 on May 20, 2021

I am using Stanford Core NLP using Python.I have taken the code from here.
Following is the code :

from stanfordcorenlp import StanfordCoreNLP
import logging
import json


class StanfordNLP:
def __init__(self, host='http://localhost', port=9000):
    self.nlp = StanfordCoreNLP(host, port=port,
                               timeout=30000 , quiet=True, logging_level=logging.DEBUG)
    self.props = {
        'annotators': 'tokenize,ssplit,pos,lemma,ner,parse,depparse,dcoref,relation,sentiment',
        'pipelineLanguage': 'en',
        'outputFormat': 'json'
    }

def word_tokenize(self, sentence):
    return self.nlp.word_tokenize(sentence)

def pos(self, sentence):
    return self.nlp.pos_tag(sentence)

def ner(self, sentence):
    return self.nlp.ner(sentence)

def parse(self, sentence):
    return self.nlp.parse(sentence)

def dependency_parse(self, sentence):
    return self.nlp.dependency_parse(sentence)

def annotate(self, sentence):
    return json.loads(self.nlp.annotate(sentence, properties=self.props))

@staticmethod
def tokens_to_dict(_tokens):
    tokens = defaultdict(dict)
    for token in _tokens:
        tokens[int(token['index'])] = {
            'word': token['word'],
            'lemma': token['lemma'],
            'pos': token['pos'],
            'ner': token['ner']
        }
    return tokens

if __name__ == '__main__':
sNLP = StanfordNLP()
text = r'China on Wednesday issued a $50-billion list of U.S. goods  including soybeans and small aircraft for possible tariff hikes in an escalating technology dispute with Washington that companies worry could set back the global economic recovery.The country's tax agency gave no date for the 25 percent increase...'
ANNOTATE =  sNLP.annotate(text)
POS = sNLP.pos(text)
TOKENS = sNLP.word_tokenize(text)
NER = sNLP.ner(text)
PARSE = sNLP.parse(text)
DEP_PARSE = sNLP.dependency_parse(text)    

I am only interested in Entity Recognition which is being saved in the variable NER. The command NER is giving the following result

NER

The same thing if I run on Stanford Website, the output for NER is
NER Stanford

There are 2 problems with my Python Code:

1. ‘$’ and ’50-billion’ should be combined and named a single entity.
Similarly, I want ’25’ and ‘percent’ as a single entity as it is showing in the online stanford output.
2. In my output, ‘Washington’ is shown as State and ‘China’ is shown as Country. I want both of them to be shown as ‘Loc’ as in the stanford website output. The possible solution to this problem lies in the documentation .
documentaion

But I don’t know which model am I using and how to change the model.

2 Answers

The Stanford CoreNLP released by the NLP research group at Stanford University. It offers Java-based modules for the solution of a range of basic NLP tasks like

  • POS tagging (parts of speech tagging)
  • NER (Name Entity Recognition)
  • Dependency Parsing, Sentiment Analysis etc.

Before doing all above task you should first setup stanford CoreNlp.

If you want to have clear picture about stanford coreNlp starting from setup core nlp for python, NER , POS to sentiment, you can have a look at below link.

Advanced Natural Language Processing with Stanford CoreNLP

Answered by anindya on May 20, 2021

From the following links, I understood that we can use a specific classifier by doing.

  1. Load the specific classifier:

    java -mx600m -cp "*;lib*" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz
    
  2. Set ner model to the classifier in the code:

    { "ner.model", "english.all.3class.distsim.crf.ser.gz" }
    

Stanford, Stack Overflow

Answered by nag on May 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP