TransWikia.com

Deep learning detect reference boundary in text (or number of references in text)

Data Science Asked on June 14, 2021

I have several documents that either contain or don’t an X number of references. I would like to build a model that can detect the number of references if any in a text.

I’ve been thinking for training to generate a bunch of random text and generate a variety of styles of references for different articles. Generating this dataset is fairly straightforward.

I am not sure how to craft the data for CNN. Word2vec does not seem like a good idea since punctuation is part of what makes references different than regular text. I could just do tf-idf vectors but then not sure what to represent as my Y. Should I put the boundary (index position, start and end) of where the reference is? What loss function do I use for a vector Y variable? Most guide show how to do numeric, binary and multiclass. Any advise or resources are much appreciated.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP