Mathematica Asked by ATomek on May 27, 2021
I have following .txt file:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
5500
ITEM: BOX BOUNDS pp pp pp
-8.3282525640612670e-02 1.2318704739253342e+01
-5.7499999999999996e-01 1.0925000000000001e+01
-5.5720358308663780e+00 5.0148322477797400e+01
ITEM: ATOMS id element xu yu zu
1 A 1.074 0.000 4.843
2 A 0.691 0.000 3.919
...
ITEM: TIMESTEP
1000
ITEM: NUMBER OF ATOMS
5500
ITEM: BOX BOUNDS pp pp pp
-8.3282525640612670e-02 1.2318704739253342e+01
-5.7499999999999996e-01 1.0925000000000001e+01
-5.5720358308663780e+00 5.0148322477797400e+01
ITEM: ATOMS id element xu yu zu
1 A 1.074 0.000 4.843
2 A 0.691 0.000 3.919
...
What I would like to do is to, basing on the aforementioned file, create a list according to the pattern:
{{{0,1,A,1.074,0.000,4.843},{0,2,A,0.691,0.000,3.919},...},{{1000,1,A,1.074,0.000,4.843},{1000,2,A,0.691,0.000,3.919},...}
In other words, I would like to take the value which is after the line "ITEM: TIMESTEP", skip reading next lines up to line "ITEM: ATOMS id element xu yu zu", after which we take all rows till next line "ITEM: TIMESTEP". We repeat this process $n$-times, where $n$ is the number of "TIMESTEPS".
I am aware that I can do it by importing the file as a list through command IMPORT
and onward use TABLE
to pinpoint desired elements, but it seems to be the least efficient method.
High-performance operation on files is not my expertise, especially in Mathematica. Here is another solution that involves python along with ExternalFunction
. First you need to install python from the official site:
Save this code as a python file (I will save it as "C:file.py"):
def read_my_format(file):
read = 0
r1 = []
r2 = []
temp = []
result = []
with open(file,'r') as f:
for line in f.readlines():
if line == 'ITEM: TIMESTEPn':
read = 1
continue
elif line == 'ITEM: BOX BOUNDS pp pp ppn' or line == 'ITEM: NUMBER OF ATOMSn':
read = 0
continue
elif line.strip() == 'ITEM: ATOMS id element xu yu zu':
read = 2
continue
if read == 1:
r1.append(int(line))
if temp:
r2.append(temp)
temp = []
read = 0
elif read == 2:
c = line.strip().replace(' ',' ').split(' ')
c1 = float(c[0])
c2 = map(float,c[2:])
temp.append([c1,c[1],*c2])
r2.append(temp)
for i,vs in enumerate(r2):
temp2 = []
for v in vs:
temp2.append([r1[i],*v])
result.append(temp2)
return result
Use the new function introduced in V12:
myFormat = ExternalFunction["Python", ReadString@"C:file.py"]
Now apply it:
myFormat["C:data.txt"];
Note: I test it on a 11.5 MB
data file and it tooks 3.8 seconds
to call python from Mathematica, read and parse the file in python and translate back the result. If you only use python and get the result in the python environment, it's around 0.6 seconds
(it means Mathematica takes ~3 seconds to translate data to it's data types).
You could also register it as an import format.
Correct answer by Ben Izd on May 27, 2021
For creating a custom import format, read this article.
ImportExport`RegisterImport["MyFormat", MyFormat`MyFormatImport]
MyFormat`MyFormatImport[filename_String, options___] :=
Module[{stream, result, lines, parser},
parser1[line_String] := SemanticImportString[line, "Integer", "Rows"];
parser2[line_String] :=
First@SemanticImportString[
StringReplace[line, " " -> " "], {"Real", "String", "Real",
"Real", "Real"}, "Rows", Delimiters -> " "];
stream = OpenRead[filename];
lines = ReadList[stream, "String"];
result =
Tuples /@
Transpose@{parser1 /@
Part[lines, Flatten@Position[lines, "ITEM: TIMESTEP"] + 1],
parser2 /@
lines[[#[[1]] + 1 ;; #[[2]] - 1]] & /@
Flatten /@
Transpose@{Position[lines, "ITEM: ATOMS id element xu yu zu "],
Append[Position[lines, "ITEM: TIMESTEP"][[2 ;;]], {0}]}
};
Close[stream];
Return[Map[Flatten, #, {2}] &@result]]
Import your file with the defined format:
Import["C:data.txt", "MyFormat"]
(*Out: {{{0, 1., "A", 1.074, 0., 4.843}, {0, 2., "A", 0.691, 0., 3.919}}, {{0, 1., "A", 1.074, 0., 4.843}, {0, 2., "A", 0.691, 0., 3.919}}} *)
Note: ITEM: BOX BOUNDS pp pp pp
data should be separated by 1 or 2 spaces.
C:data.txt
is your raw data without ...
.
Answered by Ben Izd on May 27, 2021
Here is one way to read this. It is assumed, the data is in a file with name "name":
name = "d:/tmp/test.txt";
getDat[nam_] := Module[{fil, lin, tab}, fil = OpenRead[nam];
tab = Reap[
While[Find[fil, "ITEM: ATOMS"] =!= EndOfFile,
While[((lin = ReadLine[fil]) =!= EndOfFile) && (lin =
StringSplit[lin]; lin[[2]] == "A"),
lin = MapAt[ToExpression, lin, {{1}, {3}, {4}, {5}}];
Sow[lin];];];][[2]];
Close[fil];
tab]
getDat[name]
Answered by Daniel Huber on May 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP