Data Science Asked by James Arten on January 24, 2021
I’m dealing with a materials science dataset and I’m in the following situation,
I have data organized like this:
Chemical_ Formula Property_name Property_Scalar
He Electrical conduc. 1
NO_2 Resistance 50
CuO3 Hardness
... ... ...
CuO3 Fluorescence 300
He Toxicity 39
NO2 Hardness 80
... ... ...
As you can understand it is really messy because the same chemical formula appears more than once through the entire dataset, but referred to a different property that is considered. My question is, how can I easily maybe split the dataset in smaller ones, fitting every formula with its descriptors in ORDER? I really need help on this… thank you. ( I used fiction names and values, just to explain my problem.)
I’m on Jupyter Notebook and I’m using Pandas.
I’m editing my question trying to be more clear:
My goal would be to plot some histograms of (for example) n°materials vs conductivity at different temperatures (100K, 200K, 300K). So I need to have both conductivity and temperature for each material to be clearly comparable. For example, I guess that a more convenient thing to obtain would be:
Chemical formula Conductivity Temperature
He 5 10K
NO_2 7 59K
CuO_3 10 300K
... ... ...
He 14 100K
NO_2 5 70K
... ... ...
Given that your Dataframe is:
df2 = pd.DataFrame({
"Chemical_Formula":["He", "NO_2", "CuO3", "CuO3", "He", "NO2"],
"Property_name":["Electrical conduc.", "Resistance", "Hardness", "Fluorescence", "Toxicity", "Hardness"],
"Property_Scalar":[1, 50, 10, 300, 39, 80]
})
Chemical_Formula | Property_name | Property_Scalar | |
---|---|---|---|
0 | He | Electrical conduc. | 1 |
1 | NO_2 | Resistance | 50 |
2 | CuO3 | Hardness | 10 |
3 | CuO3 | Fluorescence | 300 |
4 | He | Toxicity | 39 |
5 | NO2 | Hardness | 80 |
You can use pivot to "unmelt" this in a wide format
df3 = df2.pivot(index="Chemical_Formula", columns="Property_name")
Chemical_Formula | ('Property_Scalar', 'Electrical conduc.') | ('Property_Scalar', 'Fluorescence') | ('Property_Scalar', 'Hardness') | ('Property_Scalar', 'Resistance') | ('Property_Scalar', 'Toxicity') |
---|---|---|---|---|---|
CuO3 | nan | 300 | 10 | nan | nan |
He | 1 | nan | nan | nan | 39 |
NO2 | nan | nan | 80 | nan | nan |
NO_2 | nan | nan | nan | 50 | nan |
From then on you can drop columns you don't need and plot them.
Answered by lytseeker on January 24, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP