Stack Overflow Asked by c120 on November 27, 2021
I am scraping an Amazon product page and using Beautiful Soup to find the product name and price. For some reason, the "title" variable will return sometimes and other times I will get the error, "’NoneType’ object has no attribute ‘get_text’"
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.com/Lenovo-ThinkPad-i5-10210U-i7-7500U-Wireless/
dp/B08BYZD4H9/ref=sr_1_2_sspa?dchild=1&keywords=thinkpad&qid=1595377662&sr=8
-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyMVhTU1BOODg5TlgmZW5jcnlwdGVkS
WQ9QTAzMTc5MDFMNjhGMUE0VlRHT1gmZW5jcnlwdGVkQWRJZD1BMDY3MDc3MzJPQzc2QkI5UlcwSUE
md2lkZ2V0TmFtZT1zcF9hdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl'
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
price = soup.find(id="priceblock_ourprice").get_text()
converted_price = int(price[1:6].replace(',',''))
print(converted_price)
print(title)
Try to specify more HTTP headers, for example User-Agent
and Accept-Language
. Also, change the parser to lxml
or html5lib
.
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0',
'Accept-Language': 'en-US,en;q=0.5'
}
URL = 'https://www.amazon.com/Lenovo-ThinkPad-i5-10210U-i7-7500U-Wireless/dp/B08BYZD4H9/ref=sr_1_2_sspa?dchild=1&keywords=thinkpad&qid=1595377662&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyMVhTU1BOODg5TlgmZW5jcnlwdGVkSWQ9QTAzMTc5MDFMNjhGMUE0VlRHT1gmZW5jcnlwdGVkQWRJZD1BMDY3MDc3MzJPQzc2QkI5UlcwSUEmd2lkZ2V0TmFtZT1zcF9hdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl'
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'lxml') # <-- change to `lxml` or `html5lib`
title = soup.find(id="productTitle").get_text(strip=True)
price = soup.find(id="priceblock_ourprice").get_text(strip=True)
converted_price = int(price[1:6].replace(',',''))
print(converted_price)
print(title)
Prints (in my testing always):
1049
2020 Lenovo ThinkPad E15 15.6 Inch FHD 1080P Laptop| Intel 4-Core i5-10210U (Beats i7-7500U)| 16GB RAM| 1TB SSD (Boot) + 500GB HDD| FP Reader| Win10 Pro+ NexiGo Wireless Mouse Bundle
Answered by Andrej Kesely on November 27, 2021
You are getting this error 'NoneType' object has no attribute 'get_text'
because the webpage's data is changing and there is no attribute having id="productTitle"
or there is no attribute having id="priceblock_ourprice"
.
Put some debug statements like this and you will know why exactly this error is coming.
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)
title_soup = soup.find(id="productTitle")
print(title_soup) # <- this might print None
print(title_soup.get_text())
price_soup = soup.find(id="priceblock_ourprice")
print(price_soup) # <- this might print None
print(price_soup.get_text())
Answered by Kaushal Kumar on November 27, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP