Data Science Asked on June 14, 2021
I have several documents that either contain or don’t an X number of references. I would like to build a model that can detect the number of references if any in a text.
I’ve been thinking for training to generate a bunch of random text and generate a variety of styles of references for different articles. Generating this dataset is fairly straightforward.
I am not sure how to craft the data for CNN. Word2vec does not seem like a good idea since punctuation is part of what makes references different than regular text. I could just do tf-idf vectors but then not sure what to represent as my Y. Should I put the boundary (index position, start and end) of where the reference is? What loss function do I use for a vector Y variable? Most guide show how to do numeric, binary and multiclass. Any advise or resources are much appreciated.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP