TransWikia.com

Sophisticated read data from .txt file

Mathematica Asked by ATomek on May 27, 2021

I have following .txt file:

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
5500
ITEM: BOX BOUNDS pp pp pp
-8.3282525640612670e-02 1.2318704739253342e+01
-5.7499999999999996e-01 1.0925000000000001e+01
-5.5720358308663780e+00 5.0148322477797400e+01
ITEM: ATOMS id element xu yu zu 
1 A  1.074  0.000  4.843 
2 A  0.691  0.000  3.919 
...
ITEM: TIMESTEP
1000
ITEM: NUMBER OF ATOMS
5500
ITEM: BOX BOUNDS pp pp pp
-8.3282525640612670e-02 1.2318704739253342e+01
-5.7499999999999996e-01 1.0925000000000001e+01
-5.5720358308663780e+00 5.0148322477797400e+01
ITEM: ATOMS id element xu yu zu 
1 A  1.074  0.000  4.843 
2 A  0.691  0.000  3.919
...

What I would like to do is to, basing on the aforementioned file, create a list according to the pattern:

{{{0,1,A,1.074,0.000,4.843},{0,2,A,0.691,0.000,3.919},...},{{1000,1,A,1.074,0.000,4.843},{1000,2,A,0.691,0.000,3.919},...}

In other words, I would like to take the value which is after the line "ITEM: TIMESTEP", skip reading next lines up to line "ITEM: ATOMS id element xu yu zu", after which we take all rows till next line "ITEM: TIMESTEP". We repeat this process $n$-times, where $n$ is the number of "TIMESTEPS".

I am aware that I can do it by importing the file as a list through command IMPORT and onward use TABLE to pinpoint desired elements, but it seems to be the least efficient method.

3 Answers

High-performance operation on files is not my expertise, especially in Mathematica. Here is another solution that involves python along with ExternalFunction. First you need to install python from the official site:

Save this code as a python file (I will save it as "C:file.py"):

def read_my_format(file):
    read = 0
    r1 = []
    r2 = []
    temp = []
    result = []
    with open(file,'r') as f:
        for line in f.readlines():
            if line == 'ITEM: TIMESTEPn':
                read = 1
                continue

            elif line == 'ITEM: BOX BOUNDS pp pp ppn' or line == 'ITEM: NUMBER OF ATOMSn':
                read = 0
                continue

            elif line.strip() == 'ITEM: ATOMS id element xu yu zu':
                read = 2
                continue

            if read == 1:
                r1.append(int(line))
                if temp:
                    r2.append(temp)
                    temp = []
                read = 0
            
            elif read == 2:
                c = line.strip().replace('  ',' ').split(' ')
                c1 = float(c[0])
                c2 = map(float,c[2:])
                temp.append([c1,c[1],*c2])
        r2.append(temp)

    for i,vs in enumerate(r2):
        temp2 = []
        for v in vs:
            temp2.append([r1[i],*v])
        result.append(temp2)
    
    return result

Use the new function introduced in V12:

myFormat = ExternalFunction["Python", ReadString@"C:file.py"]

Now apply it:

myFormat["C:data.txt"];

Note: I test it on a 11.5 MB data file and it tooks 3.8 seconds to call python from Mathematica, read and parse the file in python and translate back the result. If you only use python and get the result in the python environment, it's around 0.6 seconds (it means Mathematica takes ~3 seconds to translate data to it's data types).

You could also register it as an import format.

Correct answer by Ben Izd on May 27, 2021

For creating a custom import format, read this article.

ImportExport`RegisterImport["MyFormat", MyFormat`MyFormatImport]
MyFormat`MyFormatImport[filename_String, options___] := 
 Module[{stream, result, lines, parser},
  
  parser1[line_String] := SemanticImportString[line, "Integer", "Rows"];
  parser2[line_String] := 
   First@SemanticImportString[
     StringReplace[line, "  " -> " "], {"Real", "String", "Real", 
      "Real", "Real"}, "Rows", Delimiters -> " "];
  
  stream = OpenRead[filename];
  lines = ReadList[stream, "String"];
  result = 
   Tuples /@ 
    Transpose@{parser1 /@ 
       Part[lines, Flatten@Position[lines, "ITEM: TIMESTEP"] + 1],
      parser2 /@
         lines[[#[[1]] + 1 ;; #[[2]] - 1]] & /@
       Flatten /@ 
        Transpose@{Position[lines, "ITEM: ATOMS id element xu yu zu "],
          Append[Position[lines, "ITEM: TIMESTEP"][[2 ;;]], {0}]}
      };
  
  Close[stream];
  Return[Map[Flatten, #, {2}] &@result]]

Import your file with the defined format:

Import["C:data.txt", "MyFormat"]

(*Out: {{{0, 1., "A", 1.074, 0., 4.843}, {0, 2., "A", 0.691, 0., 3.919}}, {{0, 1., "A", 1.074, 0., 4.843}, {0, 2., "A", 0.691, 0., 3.919}}} *)

Note: ITEM: BOX BOUNDS pp pp pp data should be separated by 1 or 2 spaces.

C:data.txt is your raw data without ....

Answered by Ben Izd on May 27, 2021

Here is one way to read this. It is assumed, the data is in a file with name "name":

name = "d:/tmp/test.txt";
getDat[nam_] := Module[{fil, lin, tab}, fil = OpenRead[nam];
  tab = Reap[
     While[Find[fil, "ITEM: ATOMS"] =!= EndOfFile, 
       While[((lin = ReadLine[fil]) =!= EndOfFile) && (lin = 
            StringSplit[lin]; lin[[2]] == "A"),
         lin = MapAt[ToExpression, lin, {{1}, {3}, {4}, {5}}];
         Sow[lin];];];][[2]];
  Close[fil];
  tab]

getDat[name]

enter image description here

Answered by Daniel Huber on May 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP