TransWikia.com

Import Fortran unformatted binary

Mathematica Asked by kale on April 1, 2021

I have an unformatted binary file generated using the Compaq Visual Fortran compiler (big endian).

Here’s what the little bit of documentation states about it:

The binary file is written in a general format consisting of data arrays, headed by a descriptor record:

  1. An 8-character keyword which identifies the data in the block.
  2. A 4-byte signed integer defining the number of elements in the block.
  3. A 4-character keyword defining the type of data. (INTE, REAL, LOGI, DOUB, or CHAR)

The header items are read in as a single record. The data follows the descriptor on a new record. Numerical arrays are divided into block of up to 1000 items. The physical record size is the same as the block size.

Additional keyword info:

  1. SEQHDR – 1 item – INTE – Sequence header, with data value. If number is present it is an encoded integer corresponding to the time the file was created.
  2. MINISTEP – 1 item – INTE – Ministep number is essentially the data number (ex: psi on day 1)
  3. PARAMS – n items – REAL – Vector parameter at ministep value.

Attempts to read such data into Mathematica including Import

data=Import["file", "Binary", ByteOrdering -> +1];
data = FromCharacterCode[data]

and OpenRead

OpenRead["file", BinaryFormat -> True]

enter image description here

show me some identifiable text, but no useful numerical values.

A file in question is available here.

Is Mathematica able to parse this file type, and if so, what is the best way?

2 Answers

The file appears to be a Unified Summary File from the Schlumberger Eclipse Reservoir Simulator. This file format uses Compaq Visual Fortran variable length record encoding.

Mathematica does not offer any built-in functionality to read this file format, so we will have to parse it ourselves.

We start by defining a convenience function to read big-endian binary data from a file:

read[s_, t_] := BinaryRead[s, t, ByteOrdering -> +1]

Logical records in Eclipse files come in two parts: the header and the data. The following function reads the header:

readEclHeader[s_] :=
  read[
    s
  , {"Integer32"
    , Sequence@@ConstantArray["Character8", 8]
    , "Integer32"
    , Sequence@@ConstantArray["Character8", 4]
    , "Integer32"
    }
  ] /. {EndOfFile, ___} :> EndOfFile

The CVF leading and trailing record lengths are skipped, leaving the record type keyword, the number of data elements, and the type of the data elements. Each element type requires special handling:

readEclData[s_, "INTE", n_] := readEclElements[s, "Integer32", 4, n]
readEclData[s_, "REAL", n_] := readEclElements[s, "Real32", 4, n]
readEclData[_, t_, _] := (Message[readEclData::unknowntype, t]; Abort[])

This code only handles the integer (INTE) and real data types (REAL), although it would be easy to extend this to handle the other types as well. readEclElements is used in each case to read the required number of data elements -- which may span multiple variable records:

readEclElements[s_, t_, b_, n_] :=
  Module[{len, next, r}
  , len[] := read[s, "Integer32"]
  ; next[] := (If[r == 0, len[]; r = len[]]; r -= b; read[s, t])
  ; r = len[]
  ; (len[]; #) &@ Table[next[], {n}]
  ]

These helper functions are used to read a complete header/data pair:

readEclRecord[s_] :=
  readEclHeader[s] /. {_, k__String, n_, t__String, _} :>
           {StringJoin[k], readEclData[s, StringJoin[t], n]}

All that remains is to open the file, read all of the records, and close the file:

readEclFile[filename_] :=
  Module[{s = OpenRead[filename, BinaryFormat -> True], r}
  , r = Reap[
          While[readEclRecord[s] /. {EndOfFile -> False, d_ :> (Sow[d]; True)}]
        ][[2, 1]]
  ; Close[s]
  ; r
  ]

Here is readEclFile in action, reading the supplied data file (assuming that file is in the same directory as the notebook):

$file = FileNameJoin[{NotebookDirectory[], "INITIAL-TEST.UNSMRY"}];

readEclFile[$file] // Column

(*
{SEQHDR  ,{-1163229266}}
{MINISTEP,{0}}
{PARAMS  ,{0.,0.,0.,0.,0.,0.,0.,4085.81,4085.81,0.,0.,0.}}
{MINISTEP,{1}}
{PARAMS  ,{1.,0.00273785,3348.6,3468.9,0.,0.,0.,3694.18,3662.5,0.,0.,0.}}
{MINISTEP,{2}}
{PARAMS  ,{4.,0.0109514,3348.6,3468.9,0.,0.,0.,3561.9,3519.26,0.,0.,0.}}
{MINISTEP,{3}}
{PARAMS  ,{11.5,0.0314853,3348.6,3468.9,0.,0.,0.,3422.25,3369.69,0.,0.,0.}}
{MINISTEP,{4}}
{PARAMS  ,{19.,0.0520192,3348.6,3468.9,0.,0.,0.,3343.98,3286.4,0.,0.,0.}}
{SEQHDR  ,{-1163229208}}
{MINISTEP,{5}}
{PARAMS  ,{37.,0.1013,6419.3,6882.3,0.,0.,0.,2591.91,2425.78,0.,0.,0.}}
...
{SEQHDR  ,{-1163228692}}
{MINISTEP,{30}}
{PARAMS  ,{616.,1.68652,1826.6,2386.1,0.,0.,0.,2616.22,2432.4,0.,0.,0.}}
*)

I do not know the time encoding used in the SEQHDR records.

Disclaimer: I have no affiliation with Schlumberger.

Correct answer by WReach on April 1, 2021

To decipher the header structure is not really a question that we should solve here, but I can show you how to read this file in a structured way. In order to assist you in deciphering and to demonstrate the necessary tools I have build the below file and data browser. Everything you need can be found in this short program.

Mathematica graphics

(*Use this out-commented part if you have downloaded the OP's 
original binary file,add the full path to the file name if it isn't 
in the current directory.
file="INITIAL-TEST.UNSMRY";
fbc=FileByteCount[file];
str=OpenRead[file,BinaryFormat->True];
*)

(*Easy demo for those that haven't downloaded the OP's binary 
file:The following part reads the binary in a 'string file'. Data is 
fetched from an image that contains the original data.Comment-out the 
following block if you are using the original downloaded file,and 
have uncommented the preceding block.*)
(*begin demo block*)
fileString = 
  FromCharacterCode@
   ImageData[Import@"http://i.stack.imgur.com/DmCwW.gif", "Byte", Interleaving -> False][[1, 1]];
str = StringToStream[fileString];
fbc = StringLength[fileString];
(*end demo block*)

lineWidth = 30;
lineNum = 20;

Manipulate[
 SetStreamPosition[str, i];
 chars = BinaryReadList[str, "Character8", lineNum lineWidth, ByteOrdering -> +1];
 SetStreamPosition[str, i];
 bytes = BinaryReadList[str, "Byte", lineNum lineWidth, ByteOrdering -> +1];
 SetStreamPosition[str, i + pos[[1]] + (lineNum - (pos[[2]] + 1)/2) lineWidth - 1];
 foundData = BinaryRead[str, dataType, ByteOrdering -> +1];
 Column[{Framed[Style[foundData, 48, FontFamily -> "Arial", Bold], Background -> LightBlue], 
   Grid[Riffle @@ (Partition[#, lineWidth] & /@ {chars, bytes}), 
    Dividers -> {{{True}}, {True, {False, True}}}, 
    Background -> {None, None, 
                    {
                     {2 lineNum - pos[[2]], pos[[1]]} -> Pink, 
                     {2 lineNum - pos[[2]] + 1, pos[[1]]} -> Pink}
                    }
   ]}], 
 {{i, 1, "File Position"}, 1, fbc - lineNum lineWidth, lineWidth}, 
 {{pos, {1, 2 lineNum - 1}, "Data Select"}, {1, 1}, {lineWidth, 2 lineNum}, {1, 2}},
 {dataType, {"Byte", "Character8", "Character16", "Complex64", "Complex128", 
   "Complex256", "Integer8", "Integer16", "Integer24", "Integer32", 
   "Integer64", "Integer128", "Real32", "Real64", "Real128", 
   "TerminatedString", "UnsignedInteger8", "UnsignedInteger16", 
   "UnsignedInteger24", "UnsignedInteger32", "UnsignedInteger64", 
   "UnsignedInteger128"
  }}]

Close with Close[str]; when done.

enter image description here

Answered by Sjoerd C. de Vries on April 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP