TransWikia.com

How to extract contents by topic from a document?

Data Science Asked by SRJ577 on March 11, 2021

I am trying to extract information from resumes. I tried the pdfminer for the text extraction. But I need to extract the contents from a resume with respect to its title.

For example:
I will be giving my educational details under a title EDUCATIONAL BACKGROUND, so I have to extract the content topic wise.

Is it possible to extract like that?

What will be the process behind that?

Is it possible to approach the problem in a segmentation manner.

2 Answers

pyresparser is useful for extracting information from resumes. I believe this should work in your case.

Check out the more details on the same here https://pypi.org/project/pyresparser/

Let me know if it works!

Answered by Kalyan Prasad on March 11, 2021

Here are a list of tools you can look into:

  1. https://tika.apache.org/
  2. https://jsoup.org/
  3. https://poi.apache.org/

This was a neat read detailing the steps. The author was doing something similar to what you are trying.

https://towardsdatascience.com/how-to-build-a-resume-parsing-tool-ae19c062e377

Answered by Keneni on March 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP