Data Science Asked by idpd15 on May 20, 2021
I am using Stanford Core NLP using Python.I have taken the code from here.
Following is the code :
from stanfordcorenlp import StanfordCoreNLP
import logging
import json
class StanfordNLP:
def __init__(self, host='http://localhost', port=9000):
self.nlp = StanfordCoreNLP(host, port=port,
timeout=30000 , quiet=True, logging_level=logging.DEBUG)
self.props = {
'annotators': 'tokenize,ssplit,pos,lemma,ner,parse,depparse,dcoref,relation,sentiment',
'pipelineLanguage': 'en',
'outputFormat': 'json'
}
def word_tokenize(self, sentence):
return self.nlp.word_tokenize(sentence)
def pos(self, sentence):
return self.nlp.pos_tag(sentence)
def ner(self, sentence):
return self.nlp.ner(sentence)
def parse(self, sentence):
return self.nlp.parse(sentence)
def dependency_parse(self, sentence):
return self.nlp.dependency_parse(sentence)
def annotate(self, sentence):
return json.loads(self.nlp.annotate(sentence, properties=self.props))
@staticmethod
def tokens_to_dict(_tokens):
tokens = defaultdict(dict)
for token in _tokens:
tokens[int(token['index'])] = {
'word': token['word'],
'lemma': token['lemma'],
'pos': token['pos'],
'ner': token['ner']
}
return tokens
if __name__ == '__main__':
sNLP = StanfordNLP()
text = r'China on Wednesday issued a $50-billion list of U.S. goods including soybeans and small aircraft for possible tariff hikes in an escalating technology dispute with Washington that companies worry could set back the global economic recovery.The country's tax agency gave no date for the 25 percent increase...'
ANNOTATE = sNLP.annotate(text)
POS = sNLP.pos(text)
TOKENS = sNLP.word_tokenize(text)
NER = sNLP.ner(text)
PARSE = sNLP.parse(text)
DEP_PARSE = sNLP.dependency_parse(text)
I am only interested in Entity Recognition which is being saved in the variable NER. The command NER is giving the following result
The same thing if I run on Stanford Website, the output for NER is
There are 2 problems with my Python Code:
1. ‘$’ and ’50-billion’ should be combined and named a single entity.
Similarly, I want ’25’ and ‘percent’ as a single entity as it is showing in the online stanford output.
2. In my output, ‘Washington’ is shown as State and ‘China’ is shown as Country. I want both of them to be shown as ‘Loc’ as in the stanford website output. The possible solution to this problem lies in the documentation .
But I don’t know which model am I using and how to change the model.
The Stanford CoreNLP released by the NLP research group at Stanford University. It offers Java-based modules for the solution of a range of basic NLP tasks like
Before doing all above task you should first setup stanford CoreNlp.
If you want to have clear picture about stanford coreNlp starting from setup core nlp for python, NER , POS to sentiment, you can have a look at below link.
Answered by anindya on May 20, 2021
From the following links, I understood that we can use a specific classifier by doing.
Load the specific classifier:
java -mx600m -cp "*;lib*" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz
Set ner
model to the classifier in the code:
{ "ner.model", "english.all.3class.distsim.crf.ser.gz" }
Answered by nag on May 20, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP