pythonic way to identify names in url and match it with an existing set of names

Question

Hello this is a problem I want to resolve but I am stuck.
Given a list of urls I want to do the following :

extract the name within the url
match the name found from the url to a dictionary of existing names
have 1 dictionary of all the names found, an split the found names into 2 separate dictionaries, 1 associated to the names found in the dictionary and another associated to no names found

example:
INPUT : 
urls = ['www.twitter.com/users/aoba-joshi/$#fsd=43r', 
        'www.twitter.com/users/chrisbrown-e2/#4f=34ds', 
        'www.facebook.com/celebrity/neil-degrasse-tyson',
        'www.instagram.com/actor-nelson-bigetti']

# the key is the ID associated to the names, and the values are all the potential names

existing_names = {1 : ['chris brown', 'chrisbrown', 'Brown Chris', 'brownchris'] ,
                  2 : ['nelson bigetti', 'bigetti nelson', 'nelsonbigetti', 'bigettinelson'],
                  3 : ['neil degrasse tyson', 'tyson neil degreasse', 'tysonneildegrasse', 'neildegrassetyson']}

OUTPUT : 
# names_found will be a dictionary with the key as the URL and the values as the found name
names_found = {'www.twitter.com/users/aoba-joshi/$#fsd=43r' : 'aoba joshi',
               'www.twitter.com/users/chrisbrown-e2/#4f=34ds' : 'chris brown',
               'www.facebook.com/celebrity/neil-degrasse-tyson' : 'neil degrasse tyson',
               'www.instagram.com/actor-nelson-bigetti' : 'nelson bigetti'}

# existing_names_found is a dictionary where the keys are the found name, and the values are the corresponding list of names in the existing names dictionary

existing_names_found = {'chris brown' : ['chris brown', 'chrisbrown', 'Brown Chris', 'brownchris'],
                        'neil degrasse tyson' : ['neil degrasse tyson', 'tyson neil degreasse', 'tysonneildegrasse', 'neildegrassetyson'],
                        'nelson bigetti' : ['nelson bigetti', 'bigetti nelson', 'nelsonbigetti', 'bigettinelson']}

# new_names_found is a dictionary with the keys as the new name found, and the values as the url associated to the new found name
new_names_found = {'aoba joshi' : 'www.twitter.com/users/aoba-joshi/$#fsd=43r'}

user14037529 · Answer

for the first part in identifying name in the url you can do something like :
urls = [i for i in urls if 'name' in i]

found_celeb = {}
for url in urls:
    link_split = url.split('=')[-1].split(',')[-1]
    celeb_name = ' '.join(link_split)
    found_celeb[url] = celeb_name

StyleZ · Answer

well ... if i got correctly what u want to do ... here is something what should work

for link in links_list:
    link_split = link.split('/')
    name_list = link_split[2].split('-')     # makes from chris-brown-xx => chrisbrownxx
    name = ""
    for part in name:
        name + part
    for (key, value) in existing_names:    # check if the name is in the list
        for name_x in value:
            name_x = # same as I did with name_list, but this time with " "
            if name_x in name.lower():
                # append it to new_names_found

(Sorry in advance, I am typing this on my phone, but hopefully it will be helpful :))
(alternatively, you can try to look if it contains both parts of a text ... but that would fail on something like this -> "Luke Luk" and checking it on "Luke O'Niel") ... There is alot of proble

Max · Answer

Do get you started, here are the steps to make this program:

Create a for to look through each individual url and using the split('/') function break every url into a list and search for the 2 value in that list.
Then you can use another for loop to go through the keys and values of the existing_names dictionary. Within that loop include an if statement that compares the name you extracted to the names present.
Then you add those values to the dictionaries or lists you want.

pythonic way to identify names in url and match it with an existing set of names

3 Answers

Add your own answers!

Ask a Question