TransWikia.com

Import and export a list of SparseArrays efficiently?

Mathematica Asked on October 2, 2021

I have a list of integers dims and a list of SparseArrays bdrs (representing a chain complex $mathbb{Z}^{d_0}overset{partial_1}{leftarrow}mathbb{Z}^{d_1}overset{partial_2}{leftarrow}mathbb{Z}^{d_2}leftarrowldots$).

I wish to import/export such data from/to a file.txt (each line should be a matrix entry). For instance, the data
$$mathbb{Z}^{2}xleftarrow{left[begin{smallmatrix}5&0&0&6&7end{smallmatrix}right]} mathbb{Z}^{3}xleftarrow{left[begin{smallmatrix}0&8&0&09&0&0&0&0&-1&-2end{smallmatrix}right]}mathbb{Z}^{4}$$
corresponds to a file

2 3 4

1 1 5
2 2 6
2 3 7

1 2 8
2 1 9
3 3 -1
3 4 -2

and $$
mathbb{Z}^{7}xleftarrow{0} mathbb{Z}^{0}xleftarrow{0} mathbb{Z}^{5}
xleftarrow{left[begin{smallmatrix}0&0&0&0&0&0end{smallmatrix}right]}
mathbb{Z}^{2}xleftarrow{left[begin{smallmatrix}0&0&0&1521&0&0&0end{smallmatrix}right]} mathbb{Z}^{4}$$

corresponds to a file

7 0 5 2 4




1 4 14
2 1 21

My solution is:

chcxIn[file_]:= Module[{s,dims,bdrs={},k=1,i=1}, s=Import["/home/"<>file,"List"]; 
   s=Map[If[#=="",{},ImportString[#,"Table"][[1]]]&,s];   dims=s[[1]];   s=ParallelMap[If[#=={},{},#[[;;2]]->#[[3]]]&,s[[3;;]],{1}]; 
   Do[ If[s[[j]]=={}, AppendTo[bdrs,SparseArray[s[[i;;j-1]], dims[[k;;k+1]]]]; k+=1; i=j+1;],{j,Length@s}];   Return@{bdrs,dims}]; 
chcxOut[bdrs_,dims_,file_]:= Export["/home/"<>file, {StringReplace[ ToString@dims, {"{"->"","}"->"",","->""}],""}~Join~
   Flatten[Table[ArrayRules[b][[;; -2]]~Join~{""} /.({u_,v_}->w_):>(ToString[u]<>" "<>ToString[v]<>" "<>ToString[w]), {b,bdrs}],1]~Join~{""}, "List"]; 

However, this is hopelessly inefficient (time and memory wise). For 50MB of data, chcxOut needs 65 seconds and 700MB of RAM. This seems excessive. I wish to deal with files of size 10GB. Is there an efficient way of doing this?


Edit: With the help of @HenrikSchumacher, here is an improvement.

chcxIn[fileName_] := Module[{s=OpenRead[fileName],r(*read*), l(*line*), dims,bdrs={},k=0,e={}}, 
   dims=ImportString[Read[s,String],"Table"][[1]];  
   r:=Read[s,Record,NullRecords->True]; Monitor[If[s=!=$Failed, While[l=!=EndOfFile, l=r;   
   Which[l=="0", , l=="", k+=1; AppendTo[bdrs,SparseArray[e,dims[[k;;k+1]]]]; e={}, True, 
   l=ImportString[l,"Table"][[1]]; AppendTo[e,l[[1;;2]]->l[[3]]]]; ]], k];  Close[s]; {bdrs,dims}]; 
chcxOut[bdrs_,dims_,fileName_] := Module[{f=OpenWrite[fileName], w(*write*)}, 
   w=WriteString[f,ExportString[#,"Table"]]&; w@{dims}; WriteString[f,"nn"]; 
   Monitor[ Do[ If[Times@@dims[[k;;k+1]]==0 || bdrs[[k]]["Density"]==0, w@{0}, 
      w@Join[bdrs[[k]]["NonzeroPositions"],Partition[bdrs[[k]]["NonzeroValues"], 1], 2]];   
     WriteString[f,"nn"],{k,Length@bdrs}],k];     Close[f];]; 

For a 2MB file, the time and memory performance is:
Export 0.1sec 4MB, Import 0.2sec 8MB, chcxOut 1.1sec 12MB, chcxIn 265sec 9MB. As we can see, importing from my custom format is still much slower. Hopefully, there is a better way to do this.

One Answer

Something like this should work.

dims = Prepend[(Dimensions /@ bndrs)[[All, 2]], Dimensions[bndrs[[1]]][[1]]];
file = OpenWrite["a.txt"];
WriteString[file, ExportString[{dims}, "Table"]];
Do[
  WriteString[file, "nn"];
  WriteString[
   file,
   ExportString[
    Join[A["NonzeroPositions"], Partition[A["NonzeroValues"], 1], 2],
    "Table"
    ]],
  {A, bndrs}];
Close[file]

The result is a human-readible file, so it is not really super compressed.

Answered by Henrik Schumacher on October 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP