Data Science Asked by lucia on June 23, 2021
I am trying to read and plot several files which looks like as below when I open with python using pandas read.table. The columns to plot are “dev” vs timestamp.
The timestamp should be compose from the columns; mm dd,time and yyyy
.
mm dd time yyyy dev st fault typ
0 Jul 5 2:48:29 2018 aaa STANDBY HRW_FAULT neg
1 Jul 5 2:48:29 2018 aaa SOFT SWF_FAULT ack
2 Jul 5 2:48:29 2018 aaa HARDWARE disable
3 Jul 5 2:50:47 2018 bbb STANDBY HRW_FAULT pos
……………
df['DateTime'] = df[['Year', 'Month', 'Day', 'Hour']].apply(lambda s : datetime.datetime(*s),axis = 1)
This might pop some error depending on the fact that that whether you have hour properly defined like it should be in between 0-23 , seconds in-between 0-59 etc...
Or try this
You can pass only the columns that you need to assemble.
In [33]: pd.to_datetime(df[['year', 'month', 'day']])
Out[33]:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns
]
From the docs itself,
pd.to_datetime
looks for standard designations of the datetime component in the column names, including:
Also Another Approach is that Since it's probably a .csv
file, you can do this at the time of reading/parsing the file itself, pandas is very intelligent and interesting!
This is done by passing in the column names as a list([year, month, day, time ])
to the parameter parse_dates
, or via infer_datetime_format
to True
or by using a custom date_parser
in the function pd.read_csv()
>...
Hope this helps!
Answered by Aditya on June 23, 2021
@Aditya's answer is already very good. I just want to contribute an alternative.
We can consider that pandas pd.to_datetime
has a built in parser that can take as input a datetime string. We can construct these strings to look like a standard
5/JUl/2018 T2:48:29
This is a standard format and is well understood by pandas.
This following dataframe
data = {'mm': ['JUl', 'Jul', 'Jul', 'Jul'],
'dd': ['5', '5', '5', '5'],
'time': ['2:48:29', '2:48:29', '2:48:29', '2:50:47'],
'yyyy': ['2018', '2018', '2018', '2018']}
df = pd.DataFrame(data)
We can get the desired column by doing
df['formatted_datetime'] = pd.to_datetime(df['dd'] + '/' + df['mm'] + '/' + df['yyyy'] + ' T' + df['time'])
We can also verify the types using
df.dtypes
dd object
mm
object
time object
yyyy
object
formatted_datetime datetime64[ns]
dtype: object
You will notice that when you try to plot this df you will get an error, this is because matplotlib cannot handle datetime objects in their plot function. They do have an alternative plot_date
function which can be used as
plt.plot_date(df['formatted_datetime'], df['dev'])
plt.show()
For example, we will recreate the same data as above with the dev column as well. We will do the same type conversion for the datetime.
data = {'mm': ['JUl', 'Jul', 'Jul', 'Jul'],
'dd': ['5', '5', '5', '5'],
'time': ['2:48:29', '2:48:29', '2:48:29', '2:50:47'],
'yyyy': ['2018', '2018', '2018', '2018'],
'dev': ['aaa', 'aaa', 'aaa', 'bbb']}
df = pd.DataFrame(data)
df['formatted_datetime'] = pd.to_datetime(df['dd'] + '/' + df['mm'] + '/' +
df['yyyy'] + ' T' + df['time'])
Now we will convert the categorical value of dev
into numerical values, we will also keep track of this conversion so that we can set them on y-axis ticks.
df['dev'] =df['dev'].astype('category')
categorie_codes = dict(enumerate(df['dev'].cat.categories))
df['dev'] =df['dev'].cat.codes
Then we can plot
plt.plot_date(df['formatted_datetime'], df['dev'])
plt.yticks(range(len(categorie_codes)), list(categorie_codes.values()))
plt.show()
Answered by JahKnows on June 23, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP