TransWikia.com

Making a tagged Part-of-Speech corpus with the help of a lexicon

Data Science Asked by Aziz Qadeer on July 2, 2021

I have a part-of-speech lexicon that has two columns, words and part-of-speech tags, inside a Pandas’s Dataframe.

lexocon

Also, I have a list of tokens (words) in another Dataframe.

untagged corpus

I want to take each token in the untagged corpus and search it inside the entire lexicon. If a token is matched in the lexicon, then take that token’s tag and add it to another column in the untagged Dataframe. If the token is not found then return ‘X’.

Here how I did it:

lexicon_rows = lexicon.iloc[:,:].values

def add_tag(untagged_row):
    tag = lexicon_rows[:, [1]][lexicon_rows[:, 0] == untagged_row['word']]
    if tag.size == 1:
        return str(tag[0][0])
    else:
        return 'X'

untagged['tag'] = untagged.apply(add_tag, axis=1)

tagged corpus

I am not sure whether each word in the untagged corpus is searched against the entire lexicon or not.

My question is: Am I doing it right? If so, what is a better approach to accomplish this task? If not, could you please provide me with an answer?

Thank you.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP