Data Science Asked on August 26, 2020
I am having an HTML string and want to find out if a word I supply is relevant in that string.
Relevancy could be measured based on frequency in the text.
An example to illustrate my problem:
this is an awesome bike store
bikes can be purchased online.
the bikes we own rock.
check out our bike store now
Now I want to test a few other words:
bike repairs
dog poo
bike repairs
should be marked as relevant whereas dog poo
should not be marked as relevant.
Questions:
in
or or
Thanks for your ideas!
I guess it’s something Google does to figure out what keywords are relevant to a website. I am basically trying to reproduce their on-page rankings.
That's an outline of the Information Retrieval process
Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze is a very good book to get started in IR.
Or just use Apache Solr to get everything you need out of the box (or Apache Lucene, that is used by Solr, to build your own application)
Answered by Alexey Grigorev on August 26, 2020
I remember a long time ago playing with Elastic Search (the website is very different now from what I remember). There is some stuff about dealing with human language here.
Be warned that Elastic search is like a big bazooka to your problem. If your problem is very simple, maybe you want to go from scratch. There is some docs in the web about it.
Answered by eri0o on August 26, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP