Data Science Asked by rb173 on February 2, 2021
For example: for a parameter like input voltage,
Alias names : V_INPUT, VIN etc.
Now, I want the software to recognize each of the alias names as same. Is there any package/method by which I can achieve this?
Nltk is only allowing for dictionary words.
If you know there are only specific variants, you can obviously make a look-up table yourself (i.e. a Python dictionary).
Otherwise you could try using a fuzzy matching library, like fuzzywuzzy.
This will give you a "closeness" score for your search term, based on your list of parameters (measurements). Here is an example of how you could use it:
In [1]: from fuzzywuzzy import process
In [2]: measurements = ["Voltage", "Current", "Resistance", "Power"]
In [3]: variants = ["VOLT", "voltage_in", "resistnce", "pwr", "amps"] # notice typos etc.
In [4]: for variant in variants:
...: results = process.extract(variant, measurements, limit=2)
...: print(f"{variant:<11} -> {results}") # See which two were found to be closest
...: best = results[0] # Take the best match by score (first in the list)
...: if best[1] < 70: # Set a threshold at 70%
...: print(f"Rejected best match for '{variant}': {best}")
VOLT -> [('Voltage', 90), ('Current', 22)]
voltage_in -> [('Voltage', 82), ('Resistance', 30)]
resistnce -> [('Resistance', 95), ('Current', 38)]
pwr -> [('Power', 75), ('Current', 30)]
amps -> [('Voltage', 26), ('Resistance', 22)]
Rejected best match for 'amps': ('Voltage', 26)
So most worked out pretty well, including the typo example.
Obviously this does not kind of semantic
search, as so amps
do not get related to Current
in any way.
To go the way of semantic encodings, you might want to look into "word embeddings", which do indeed try to match the real meaning of words, based on their semantic meaning. To start here, you could look into Word2Vec
or GloVe` embeddings. Perhaps there is even a tool or library that already offers this capability.
These approaches will not inherently deal with things like typos, so for best results, you could even combine the two approaches.
Answered by n1k31t4 on February 2, 2021
Yes, there are a couple. My favorite is PyDictionary PyDictionary
Or if you’re using pip make sure you’re up to date and in terminal execute this command:
pip install PyDictionary
Hope this helped
Answered by Dummy Scripts on February 2, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP