Data Science Asked on March 24, 2021
Overall context:
I have a data frame that contains observations for every five minute starting at 5 AM in the morning and ending at 8 PM in the evening for several days. I need to filter all the observations that start from 9 AM in the morning and end at 5 PM in the evening for every day.
The input data frame looks like this:
Date Time
2019-09-20 05:00:00,..,..
2019-09-20 05:05:00,..,..
...
2019-09-20 09:00:00,..,..
...
2019-09-20 17:00:00,..,..
2019-09-20 17:05:00,..,..
...
2019-09-20 20:00:00,..,..
2019-09-21 05:00:00,..,..
2019-09-21 05:05:00,..,..
...
2019-09-21 09:00:00,..,..
...
2019-09-21 17:00:00,..,..
2019-09-21 17:05:00,..,..
...
2019-09-21 20:00:00,..,..
and the output data frame should look like this:
2019-09-20 09:00:00,..,..
...
2019-09-20 17:00:00,..,..
2019-09-21 09:00:00,..,..
...
2019-09-21 17:00:00,..,..
Steps taken so far
In order to extract the rows between 9 am and 5 pm, I determined the number of seconds since midnight for every row by
extracting the hours, minutes and seconds using vectorized data operations
so input dataframe will have column like:
Date Time, Number of seconds since midnight
2019-09-20 05:00:00,xxxx,..,..
2019-09-20 05:05:00,yyyy,..,..
...
2019-09-21,05:00:00,xxxx,..,..
2019-09-21, 05:05:00,yyyy,..,..
Note that for the same time on every day, the number of seconds will remain the same
Now I was hoping to extract alll the rows between 9 am and 5 pm by
df[(df['Number of seconds since midnight'] > (nseconds for 9 am from midnight)) & ((df['Number of seconds since midnight'] < (nseconds for 5 pm from midnight))
but I get the rows from only the last date between 9am and 5 pm.
TO me, it looks it is ignoring all the duplicate rows with the same time.
Can anyone suggest a possible solution that does not iterate over each row and uses the vectorized operations as the database is very large
I think you have defined midnight as today's 00:00. Therefore, the rows before today are out of your range.
I think this may work for this cases:
# Convert string to datetime format
df['Date Time'] = pd.to_datetime(df['Date Time'])
selected_rows = df[((df['Date Time'].dt.hour * 60 + df['Date Time'].dt.minute) >= 9 * 60) &
((df['Date Time'].dt.hour * 60 + df['Date Time'].dt.minute) <= 17 * 60)]
The filter rules use the time only and ignores the date.
Answered by Felix Chan on March 24, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP