Super User Asked by brandonb on February 7, 2021
I have a table with the names of classes; however, these names have not been typed consistently. This has resulted in multiple variations of the same class name (See Variations table below). For example, "Learning to Sew", "Learning Sewing", and "Learn to Sew." I also have a table of the standard names for these classes (See Standard Names below). I want to match the variations with the standard name, so the standard name appears in a column next to the variation name (See Result table). To further complicate things, one class in particular had its name changed, so any search will need to look for both versions and return the current version (See Experiencing Biology below). I have been able to use COUNTIF to count the occurrences of each standard name across the variations using search terms that have the key words of the names combined with wildcards and multiple criteria when needed. I have not been able to figure out how to perform the match however. Is there a formula in Excel that can accomplish this?
Note:
Variations
Class
Learning to Sew
Learning Sewing
Learn to Sew
Experiencing Biology
Cycles of Biology
Standard Names
Class Search Term 1 Search Term 2 Count
Learning to Sew *Learn*Sew* 3
Experiencing Biology *Experiencing*Biology* *Cycles*Biology* 2
Result
Class Standard Title
Learning to Sew Learning to Sew
Learning Sewing Learning to Sew
Learn to Sew Learning to Sew
Experiencing Biology Experiencing Biology
Cycles of Biology Experiencing Biology
I created two named tables (Ctrl+T then name them on the Table Design tab):
On each table, I used Data>Get & Transform Data>From Table/Range to create a query. In the power query editor, I used Home>Close & Load>Close & Load To>Only Create Connection such that I had two query connections, one to each table:
I then used Data>Get & Tranform Data>Get Data>Combine Queries>Merge and configured the merge dialog like this:
Depending on the rest of your data, you may need to experiment with an optimal value for the similarity threshold. You can also explore using a transformation table to make the job easier if you find that you're getting false positives.
After clicking OK on the merge, then using the double-arrow at the top of the column with the word Table in each row and expanding the joined data, I got this:
To put this resulting dataset back into the workbook, I would just use Home>Close & Load.
This is not foolproof, but with the right parameters in the fuzzy matching, you may find something that works for your entire dataset.
Answered by FlexYourData on February 7, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP