Data Science Asked on January 12, 2021
I have search at lot, was not able to find a solution for my problem…
I am training a NER model, that should detect two types of words: Instructions and Conditions. This is not the standard use-case of NER, as it does not search for specific types of words (e.g. Google == Corporation), but is rather much more depended on the sentence structure.
For example:
If the car crashes, the airbag should go off.
When training the model, I want to provide for each sentence not only my annotations, but also the dependency tree of the sentence calculated by the ‘en_core_web_sm’ model. I want my model to not only train based on the given words but also train based on the sentence structure.
My training data currently looks like this, but I want to expand it by also adding the dependency tree of each sentence generated using the ‘en_core_web_sm’ model:
train_data =
("If the car crashes, the airbag should activate", [(11, 17, 'CON'), (38, 46, 'INS')]),
...
]
This is my current training loop, using the update function from spaCy, but I am open on trying a different tool:
import random
import datetime as dt
from spacy.util import minibatch, compounding
from spacy.util import decaying
dropout = decaying(0.6, 0.2, 1e-4)
nlp = create_blank_nlp(TRAIN_DATA)
optimizer = nlp.begin_training()
for i in range(80):
losses = {}
batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(
texts, # batch of texts
annotations, # batch of annotations
drop=next(dropout), # dropout
losses=losses,
)
print(f"Losses at iteration {i} - {dt.datetime.now()} {losses}")
I am curious if and how this is might be possible. It feels like a waste to not use the pretrained model (mind you, the pretrained NER model from spaCy probably will not help me, only the dependency part).
Open to any advice, thank you.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP