Stack Overflow Asked by user7041266 on January 9, 2021
I am not very proficient at python but my aim was to extract data from my share dealing website with the intention of analysis further down the line. The below code worked for me once and now i get an error about arrays not being the same length but they already are. this literally worked for me with no modification of the code but now suddenly its not working.
Code and error below:
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
pd.set_option('display.max_rows', None)
r = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/a")
soup = bs(r.content, features = "lxml")
a = soup.find('ul', {'class':'list-unstyled list-indent'}).find_all("strong")
r3 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/b")
soup = bs(r3.content, features = "lxml")
b = soup.find('ul', {'class':'list-unstyled list-indent'}).find_all("strong")
r5 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/c")
soup = bs(r5.content, features = "lxml")
c = soup.find('ul', {'class':'list-unstyled list-indent'}).find_all("strong")
header = a + b + c
r1 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/a")
soup = bs(r1.content, features = "lxml")
links = [a['href'] for a in soup.find('ul', {'class' : 'list-unstyled list-indent'}).find_all("a", href=True)]
r4 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/b")
soup = bs(r4.content, features = "lxml")
links1 = [a['href'] for a in soup.find('ul', {'class' : 'list-unstyled list-indent'}).find_all("a", href=True)]
r6 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/c")
soup = bs(r6.content, features = "lxml")
links2 = [a['href'] for a in soup.find('ul', {'class' : 'list-unstyled list-indent'}).find_all("a", href=True)]
column2 = links + links1 + links2
com_list = []
for b in header[0:]:
result = b.text.strip()
com_list.append(result)
com_com = pd.DataFrame({'COMPANY': com_list, 'LINKS': column2})
print(com_com)
The error I get:
Traceback (most recent call last):
File "hls.py", line 42, in <module>
com_com = pd.DataFrame({'COMPANY': com_list, 'LINKS': column2})
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 392, in __init__
mgr = init_dict(data, index, columns, dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals/construction.py", line 212, in init_dict
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals/construction.py", line 51, in arrays_to_mgr
index = extract_index(arrays)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals/construction.py", line 317, in extract_index
raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length
Hey you have different number of elements in com_list and column2. They must be same.
check
len(com_list) == len(column2)
Correct answer by HimanshuGahlot on January 9, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP