Stack Overflow Asked by Edison on February 8, 2021
My regex is producing split results so I have to subscript for a quick fix.
Code
my_url = 'https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine'
house_listings = page_soup.findAll("div", {"class":"listing-results-right clearfix"})
listings = house_listings[3] # item 3 for prototyping
house_type = re.findall('(?:(?!.for).)*', str(listings.h2.a.text))
print(house_type)
# `['4 bed detached house', '', 'for sale', '']`
Fix
house_type = re.findall('(?:(?!.for).)*', str(listings.h2.a.text))[0]
print(house_type)
# 4 bed detached house
But beyond that, I need a new regex for better matching.
Desired Match
start from the word after ‘bed’ (minus the following space) and ignore the "for sale" portion.
e.g. results: detached house
, terrace house
, semi-detached house
, flat
, maisonette
.
Answered by jdaz on February 8, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP