Data Science Asked on September 25, 2020
I have this script:
import requests
from requests import get
from bs4 import BeautifulSoup
import csv
import pandas as pd
f = open('olanda.csv', 'wb')
writer = csv.writer(f)
url = ('https://www......')
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
type(soup)
table = soup.find('table', id='tablepress-94').text.strip()
print(table)
writer.writerow(table.split(), delimiter = ',')
f.close()
When it writes to a CSV file it writes everything in a single cell like that:
Sno.,CompanyLocation,1KarifyNetherlands,2Umenz,Benelux,BVNetherlands,3TovertafelNetherlands,4Behandeling,BegrepenNetherlands,5MEXTRANetherlands,6Sleep.aiNetherlands,7OWiseNetherlands,8Healthy,WorkersNetherlands,9&thijs,|,thuis,in,jouw,situatieNetherlands,10HerculesNetherlands, etc.
I wanted to have the output in a single column and each value (separated by comma) in a single row.
I tried to use delimiter = ‘,’ but I got:
TypeError: a bytes-like object is required, not ‘str’.
How can I do this? Thanks!
Please see doc of csv.writer
delimiter
param is used when instantiating the writer
, not on writerow
method.
Furthermore from what I understand, your approach is wrong. For example you want to scrape the html table and store it as csv file.
But what you do is get the text from table
table = soup.find('table', id='tablepress-94').text.strip()
which is a concatenated string of all columns with no structure and try to split by nothing, which is wrong
while you should be getting a dict
of table values, eg using:
table = soup.find('table', id='tablepress-94')
# assuming table has a
# <thead><tr><td>Column 1</td>..<td>Column n</td></tr></thead>
# section with column headers, else adjust accordingly
column_headers = [td.get_text() for td in table.find('thead').find('tr').find_all('th')]
print(column_headers)
data = {}
row_index = 0
for row in table.find('tbody').find_all('tr'):
row_index += 1
column_index = 0
columns = row.find_all('td')
for td in columns:
if column_index >= len(column_headers):
column_headers.append('Column_'+str(column_index+1))
if column_headers[column_index] not in data:
data[column_headers[column_index]] = []
data[column_headers[column_index]].append(td.get_text())
column_index += 1
print(data)
# now output your scarped table data, data into csv properly
writer.writerow(column_headers) # print headers
# print each row of data
for row in range(row_index):
writer.writerow([data[column][row] for column in column_headers])
Correct answer by Nikos M. on September 25, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP