Mathematica Asked on January 22, 2021
In Python scipy.sparse, there are methods to convert between CSR, CSC, LIL, DOK, etc. implementations of a sparse matrix. What is the most efficient way in Mathematica to construct a mxn
SparseArray
from the LIL data? (inverse of this question)
More specifically, I have a list ll={l1,...,ln}
, where each lv
is of the form {{u1,w1},...}
, which means the matrix has an entry {u,v}->w
. Note that lv
may be empty (zero column). Note that lv
may have repeated entries, which should be summed (solution for this is here). For testing purposes, my cases are similar to the following example (e.g. millionXmillion matrix with 10 entries per column, all from the list R):
m=n=10^6; r=10; R={-1,1};
ll=Table[Transpose@{RandomInteger[{1,m},r],RandomChoice[R,r]},n];
My current solution is:
SetSystemOptions["SparseArrayOptions"->{"TreatRepeatedEntries"->1}];
LIL[ll_,m_,n_] := Module[{l,uu,vv,ww}, l=Length/@ll;
If[Plus@@l==0,Return@SparseArray[{},{m,n}]];
vv=Flatten[Table[ConstantArray[v,l[[v]]],{v,n}],1];
{uu,ww}=Transpose@Flatten[ll,1]; SparseArray[Transpose[{uu,vv}]->ww] ];
AbsoluteTiming[LIL[ll,m,n];]
{5.07803,Null}
Is there a better way? What about parallelization? How could I compile this code? (the matrix entries are integers or rationals)
P.S. Let me just mention that in Python, I haven’t yet found a library for sparse matrices that allows rational number entries (exact fractions). Also, when I set every second column and every second row in a matrix to 0, the scipy.sparse implementation is waaay slower than Mathematica’s SparseArray (by a factor of 100). So I’m incredibly happy we have this data structure implemented in Mathematica in such an efficient way.
You seem to do something wrong because the LIL you provide is more suitable to assemble the transpose of the desired matrix in CRS format (or to assemble the desired matrix in CCS format). Since Mathematica uses CRS, I show you how to assemble the transpose.
First two compiled helper functions:
getColumnIndices = Compile[{{p, _Integer, 1}, {a, _Integer, 2}},
Block[{b, label, newlabel, counter, pointer, n, pos, boolean},
n = Min[Length[p], Length[a]];
b = Table[0, {n}];
counter = 0;
pointer = 0;
label = 0;
pos = 0;
While[pointer < n,
pointer++;
pos = Compile`GetElement[p, pointer];
newlabel = Compile`GetElement[a, pos, 1];
boolean = Unitize[label - newlabel];
counter += boolean;
label += boolean (newlabel - label);
b[[counter]] = label;
];
b[[1 ;; counter]]
],
CompilationTarget -> "C",
RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"
];
getNonzeroValues = Compile[{{p, _Integer, 1}, {a, _Integer, 2}},
Block[{b, label, newlabel, counter, pointer, n, pos, boolean},
n = Min[Length[p], Length[a]];
b = Table[0, {n}];
counter = 0;
pointer = 0;
label = 0;
pos = 0;
While[pointer < n,
pointer++;
pos = Compile`GetElement[p, pointer];
newlabel = Compile`GetElement[a, pos, 1];
boolean = Unitize[label - newlabel];
counter += boolean;
label += boolean (newlabel - label);
b[[counter]] += Compile`GetElement[a, pos, 2];
];
b[[1 ;; counter]]
],
CompilationTarget -> "C",
RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"
];
I am not really happy with them because both tasks can actually be fused into one loop. But since CompiledFunction
s cannot return more than one array and because messing around with unpacked arrays is so expensive, I leave it like this for now.
Here is the interface; CompiledFunction
s don't like empty arrays as input, so I have to clean up first. Unfortunately, this has some extra cost.
LIL2[ll_, m_, n_] := Module[{idx, llclean, orderings, vals, rp, ci},
idx = Pick[Range[Length[ll]], Unitize[Length /@ ll], 1];
llclean = ll[[idx]];
rp = ConstantArray[0, Length[ll] + 1];
orderings = Ordering /@ llclean;
vals = Join @@ getNonzeroValues[orderings, llclean];
With[{data = getColumnIndices[orderings, llclean]},
ci = Partition[Join @@ data, 1];
rp[[idx + 1]] = Length /@ data;
];
rp = Accumulate[rp];
SparseArray @@ {Automatic, {n, m}, 0, {1, {rp, ci}, vals}}
]
Here is how the two methods compare:
m = n = 10^6;
r = 10;
R = {-1, 1};
ll = Table[Transpose@{RandomInteger[{1, m}, r], RandomChoice[R, r]}, n];
A = LIL[ll, m, n]; // AbsoluteTiming // First
B = LIL2[ll, m, n]; // AbsoluteTiming // First
A == Transpose[B]
4.02563
1.81523
True
Correct answer by Henrik Schumacher on January 22, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP