TransWikia.com

Specify column data type when importing

Mathematica Asked by KHAAAAAAAAN on October 27, 2020

I’m running into an annoying issue when importing a table of tab-separated-data. Several columns are numeric, while several are strings. Using Import[url,"TSV"] basically works perfectly – however, some of the strings are “5d2”, “4e1” or things of that nature, which Import then interprets as scientific notation. For instance, ImportString["4d2", "TSV"] yields {{400.}}, which I do not want. However there are some columns which are properly in scientific notation (i.e. 2.3e+02) which I do want intepreted as numbers – is there a clean way to selectively import certain table columns as numbers, leaving others as strings?

3 Answers

Without knowing more, I would first say to look at using the "Numeric" -> False option in

Import["data.tsv", "TSV", "Numeric" -> False]

This seems (I've never worked with this functionality until now and I got the idea from here.) to leave everything as strings.

This also takes care of the misinterpretation of scientific notation problem as

ImportString[#, "TSV", "Numeric" -> False] & /@ {"4e1", "5d2"}
InputForm@%

{{{4e1}},{{5d2}}}

{{{"4e1"}}, {{"5d2"}}}

Then, once everything is imported as strings, you can change the columns of scientific notation strings to numbers. For example,

data[[;;, column]] = Internal`StringToDouble /@ data[[;;, column]]

(Also stole the Internal`StringToDouble from here.)

All together

data = Import["data.tsv", "TSV", "Numeric" -> False]
data[[-1]] = Internal`StringToDouble /@ data[[-1]];
data

{{"1"}, {"2"}, {"3"}, {"4"}, {"5"}, {"6"}, {"7"}, {"8"}, {"9"}, {"10"}, {"4e1"}, {"5d2"}, {"2.3e+02"}}

{{"1"}, {"2"}, {"3"}, {"4"}, {"5"}, {"6"}, {"7"}, {"8"}, {"9"}, {"10"}, {"4e1"}, {"5d2"}, {230.}}

Answered by NonDairyNeutrino on October 27, 2020

Suppose your file is like this (two different types of columns for simplicity):

"4d2"    2.3e+02
"5e1"    -1.3e-05

I use here StringToStream to simulate file, but you just place your file (path + name) instead (ReadList["path to file",{Word, Number}]) with appropriate number and types of columns you have:

ReadList[StringToStream[""4d2"t2.3e+02n"5e1"t-1.3e-05"], {Word, Number}]

which gives

{{"4d2", 230.}, {"5e1", -0.000013}}

Answered by Alx on October 27, 2020

If your file.tsv has data such as:

"4d2" 2.3e+02 105.5

"5e1" -1.3e05 235

Then SemanticImport may help:

data = SemanticImport["file.tsv",{"String","Number","Number"}, "HeaderLines"-> 0]

Set HeaderLines appropriately to reflect the presence of header row(s) in your file.

Answered by Lee on October 27, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP