plot multiple strings from a dataframe

Question

I am trying to read and plot several files which looks like as below when I open with python using pandas read.table. The columns to plot are "dev" vs timestamp.

The timestamp should be compose from the columns; mm dd,time and yyyy.

mm dd time yyyy dev st fault typ
0   Jul 5 2:48:29 2018 aaa STANDBY HRW_FAULT neg
1   Jul 5 2:48:29 2018 aaa  SOFT SWF_FAULT  ack
2   Jul 5 2:48:29 2018 aaa HARDWARE  disable
3   Jul 5 2:50:47 2018 bbb STANDBY HRW_FAULT pos

...............

Aditya · Answer

df['DateTime'] = df[['Year', 'Month', 'Day', 'Hour']].apply(lambda s : datetime.datetime(*s),axis = 1)

This might pop some error depending on the fact that that whether you have hour properly defined like it should be in between 0-23 , seconds in-between 0-59 etc...

Or try this

You can pass only the columns that you need to assemble.

In [33]: pd.to_datetime(df[['year', 'month', 'day']])
Out[33]: 
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns

]

From the docs itself,

pd.to_datetime looks for standard designations of the datetime component in the column names, including:

required: year, month, day
optional: hour, minute, second, millisecond, microsecond, nanosecond

Also Another Approach is that Since it's probably a .csv file, you can do this at the time of reading/parsing the file itself, pandas is very intelligent and interesting!

This is done by passing in the column names as a list([year, month, day, time ]) to the parameter  parse_dates, or via  infer_datetime_format to True or by using a custom date_parser in the function pd.read_csv()>...

Hope this helps!

JahKnows · Answer

@Aditya's answer is already very good. I just want to contribute an alternative.

We can consider that pandas pd.to_datetime has a built in parser that can take as input a datetime string. We can construct these strings to look like a standard

5/JUl/2018 T2:48:29

This is a standard format and is well understood by pandas.

This following dataframe

data = {'mm': ['JUl', 'Jul', 'Jul', 'Jul'],
        'dd': ['5', '5', '5', '5'],
        'time': ['2:48:29', '2:48:29', '2:48:29', '2:50:47'],
        'yyyy': ['2018', '2018', '2018', '2018']}
df = pd.DataFrame(data)

We can get the desired column by doing

df['formatted_datetime'] = pd.to_datetime(df['dd'] + '/' + df['mm'] + '/' + df['yyyy'] + ' T' + df['time'])

We can also verify the types using

df.dtypes

dd                            object mm
  object time                          object yyyy
  object formatted_datetime    datetime64[ns] dtype: object

Plotting the columns

You will notice that when you try to plot this df you will get an error, this is because matplotlib cannot handle datetime objects in their plot function. They do have an alternative plot_date function which can be used as

plt.plot_date(df['formatted_datetime'], df['dev'])
plt.show()

For example, we will recreate the same data as above with the dev column as well. We will do the same type conversion for the datetime.

data = {'mm': ['JUl', 'Jul', 'Jul', 'Jul'],
        'dd': ['5', '5', '5', '5'],
        'time': ['2:48:29', '2:48:29', '2:48:29', '2:50:47'],
        'yyyy': ['2018', '2018', '2018', '2018'],
        'dev': ['aaa', 'aaa', 'aaa', 'bbb']}

df = pd.DataFrame(data)
df['formatted_datetime'] = pd.to_datetime(df['dd'] + '/' + df['mm'] + '/' + 
                                          df['yyyy'] + ' T' + df['time'])

Now we will convert the categorical value of dev into numerical values, we will also keep track of this conversion so that we can set them on y-axis ticks.

df['dev'] =df['dev'].astype('category')
categorie_codes = dict(enumerate(df['dev'].cat.categories))
df['dev'] =df['dev'].cat.codes

Then we can plot

plt.plot_date(df['formatted_datetime'], df['dev'])
plt.yticks(range(len(categorie_codes)), list(categorie_codes.values()))
plt.show()

plot multiple strings from a dataframe

2 Answers

Plotting the columns

Add your own answers!

Ask a Question