TransWikia.com

Spacy v2.0.1 custom NER: How to improve training of existing model

Data Science Asked on December 31, 2020

I implemented custom NER with bellow trained data first time and it gives me good prediction with Name and PrdName. I mentioned code bellow.

if __name__ == '__main__':
TRAIN_DATA = [
            ('My Name is Rajesh', {'entities': [(11, 17, 'Name')]}),
            ('My Name is Bakul', {'entities': [(11, 16, 'Name')]}),
            ('My Name is Pritam', {'entities': [(11, 17, 'Name')]}),
            ('My Name is Rakesh', {'entities': [(11, 17, 'Name')]}),
            ('My Name is Jayeeta', {'entities': [(11, 18, 'Name')]}),
            ('this is the price of bag', {'entities': [(21, 24, 'PrdName')]}),
            ('what is the price of ball?', {'entities': [(21, 25, 'PrdName')]}),
            ('what is the price of jegging?', {'entities': [(21, 28, 'PrdName')]}),
            ('what is the price of t-shirt?', {'entities': [(21, 28, 'PrdName')]}),
              ]

iterations = 20
try:
    model = 'live_ner_model'
    nlp = spacy.load(model)  # load existing spacy model
except:
    model = None
    print("Exception")
    nlp = spacy.blank('en')  # create blank Language class
    print("Created blank 'en' model")

if 'ner' not in nlp.pipe_names:
    ner = nlp.create_pipe('ner')
    nlp.add_pipe(ner)
    print("Create NER")
else:
    ner = nlp.get_pipe('ner')
    print("Exhisting NER")

# Add new entity labels to entity recognizer
for _, annotations in TRAIN_DATA:
    for ent in annotations.get('entities'):
        ner.add_label(ent[2])

# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes):  # only train NER
    optimizer = nlp.begin_training()
    for itn in range(iterations):
        print("Statring iteration " + str(itn))
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotations in TRAIN_DATA:
            nlp.update(
                [text],  # batch of texts
                [annotations],  # batch of annotations
                drop=0.2,  # dropout - make it harder to memorise data
                sgd=optimizer,  # callable to update weights
                losses=losses)
        print(losses)

# Save model
output_dir = 'live_ner_model'
if output_dir is not None:
    output_dir = Path(output_dir)
    if not output_dir.exists():
        output_dir.mkdir()
    nlp.meta['name'] = model  # rename model
    nlp.to_disk(output_dir)
    print("Saved model to", output_dir)

# Test the saved model
output_dir = 'live_ner_model'
print("Loading from", output_dir)

nlp2 = spacy.load('live_ner_model')
test_text = """
   what is the price of cup. My Name is Rahim
"""
doc2 = nlp2(test_text)
for ent in doc2.ents:
    print(ent.label_, ent.text)

But when I am trying to trained with some new data which has entity with only PrdName or any other new entity excluding Name in existing model.
Then Name entity prediction goes wrong. I think this issue arises as I updated trained data excluding Name entity.

So is there any way we can improve training by not affecting existing training. Can someone share the idea? If possible please share a sample code.

Environment: Anaconda, spacy=v2.0.1, python=3.7

One Answer

The model depends entirely on the training data: if you train with some data which has only PrdName as label, the model knows only this label and can predict only this label. You need to provide as much training data as possible, containing all the possible labels.

For the record, NER are usually trained with thousands of sentences in order to account for the diversity of the cases where a NE can appear.

Answered by Erwan on December 31, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP