regex with bs4 is splitting the results

Question

My regex is producing split results so I have to subscript for a quick fix.
Code
my_url = 'https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine'

house_listings = page_soup.findAll("div", {"class":"listing-results-right clearfix"})

listings = house_listings[3] # item 3 for prototyping

house_type = re.findall('(?:(?!.for).)*', str(listings.h2.a.text))

print(house_type)
# `['4 bed detached house', '', 'for sale', '']`

Fix
house_type = re.findall('(?:(?!.for).)*', str(listings.h2.a.text))[0]
print(house_type)
# 4 bed detached house

But beyond that, I need a new regex for better matching.
Desired Match
start from the word after 'bed' (minus the following space) and ignore the "for sale" portion.
e.g. results: detached house, terrace house, semi-detached house, flat, maisonette.
Source
https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine

jdaz · Answer

This should be all you need:
(?<=bed ).*(?= for)

Demo

regex with bs4 is splitting the results

One Answer

Add your own answers!

Ask a Question