Data Science Asked by randomizer0000 on December 23, 2020
How can I select only certain entries that match my condition and from those entries, filter again using regex?
For instance, I have this dataframe (df)
+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
| A| f| 5| g|
| D| er| 2e| sd|
| F| g|23sd| a|
| F| fgf| 45| d|
| E| r| 3| e|
| A| sd| 8f| dw|
| F| sd| 3h1| d|
+----+----+----+----+
I want to select those entries with ‘F’ value in col1, and filter again with regex ([a-zA-Z0-9]+) to get only entries with numbers and letters.
+----+----+----+----+ +----+----+----+----+
|col1|col2|col3|col4| |col1|col2|col3|col4|
+----+----+----+----+ +----+----+----+----+
| F| g|23sd| a| --> | F| g|23sd| a|
| F| fgf| 45| d| | F| sd| 3h1| d|
| F| sd| 3h1| d| +----+----+----+----+
+----+----+----+----+
You can use the filter
method on Spark's DataFrame API:
df_filtered = df.filter("df.col1 = F").collect()
which also supports regex
pattern = r"[a-zA-Z0-9]+"
df_filtered_regex = df.filter([df_filtered.c.rlike(pattern) for c in df.columns]).collect()`
Answered by Brian Spiering on December 23, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP