TransWikia.com

Function arguments are taking way too much RAM

Mathematica Asked on October 22, 2021

I have to compute a function “func” with 24 arguments. Unfortunately when I apply the arguments the RAM fills up immediately. I even change the swap size of my Linux system to 100 G.

I hope somebody can help me to implement this more efficiently

Outer[func[##]&,{q,-q},{q},DeleteDuplicates[momentum[q,p1,p2,p3,p4,p5]],{p1},DeleteDuplicates[momentum[q,p1,p2,p3,p4,0]],{p2},DeleteDuplicates[momentum[q,p1,p2,p3,0,0]],{p3},DeleteDuplicates[momentum[q,p1,p2,0,0,0]],{p4},DeleteDuplicates[momentum[q,p1,0,0,0,0]],{p5},{0},{0},{0},{0},{0},{0},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5}]];

with

momentum[q_,p1_,p2_,p3_,p4_,p5_]:=Flatten@Outer[{q,p1,p2,p3,p4,p5}.{##}&,{1,-1},{1,-1},{1,-1},{1,-1},{1,-1},{1,-1}]

I’ve found a method called lazy lists (see https://github.com/ssmit1986/lazyLists) but need help to implement it for my example. Can anybody help me out with this.

2 Answers

Nice to see that someone found my lazy list package. Like the others said: the sheer amount of data involved here is going to make this quite difficult because even if you solve the memory problem it's still going to take forever. What's even worse is that your list of arguments is symbolic, so packed arrays are out. I'd highly recommend trying to re-formulate this problem purely in terms of integers because that would make things a lot more efficient. Regardless, I'd like to help a little bit by showing how to do this with my package in principle, even if I'm a bit late to the party.

First of all, we need to generate a list of all the tuples of arguments. This can be done with lazyTuples. Let's first put all the arguments to Outer in one list:

list={{q,-q},{q},DeleteDuplicates[momentum[q,p1,p2,p3,p4,p5]],{p1},DeleteDuplicates[momentum[q,p1,p2,p3,p4,0]],{p2},DeleteDuplicates[momentum[q,p1,p2,p3,0,0]],{p3},DeleteDuplicates[momentum[q,p1,p2,0,0,0]],{p4},DeleteDuplicates[momentum[q,p1,0,0,0,0]],{p5},{0},{0},{0},{0},{0},{0},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5},{-5,-4,-3,-2,-1,0,1,2,3,4,5}};

Next, define the lazy version of the Tuples of this list:

<< lazyLists`;
lzTuples = lazyTuples[Hold[list], "PartitionSize" -> 100000 (*or however much you want to handle in a single go *)];

In principle, your answer is now defined by:

Map[Apply[func], lzTuples]

or (slightly more efficient):

Map[{func @@@ #&, Listable}, lzTuples]

The second form skips an inner map that maps Apply[func] over a batch of tuples. Depending on the nature of func, you could try

Map[{ParallelMap[Apply[func], #] &, Listable}, lzTuples]

to parallelize the inner maps over the batches. It's not guaranteed to speed things up, however. You should only do this if func is non-trivial.

Since you probably want to store the computed results on disk while you're iterating over the tuples, you'll want to add an Export step as well. For example:

output = Module[{i = 1},
 Map[
   {Function[Export["file" <> ToString[i++] <> ".m", func @@@ #]], Listable},
   lzTuples
 ]
]

Alternatively you can keep writing to the same file with PutAppend or Write or something like that.

After that, you should be able to iterate over the whole thing with:

output[[-1]]

or

LengthWhile[output, True&]

I do recommend trying this code with a smaller example first, obviously. The example notebook on the repository also has some examples of how to use the function bulkExtractElementsUsingIndexList to improve the performance of lazyTuples. Basically, instead of generating tuples of list, you generate tuples of integers that represent indices of the sublists of list instead. That should be a little faster since integer arrays can be packed.

Answered by Sjoerd Smit on October 22, 2021

The problem here is that the exhibited Outer expression generates a 24-dimensional tensor that contains nearly 4 x 10^12 elements at the deepest level. There is no practical way to operate upon a structure of this size. Even simply iterating over the deepest elements will require a significant amount of time.

For discussion purposes, let us consider a smaller set of basis vectors when forming this outer product:

$small =
  { {q,-q}
  , {p1, p2, p3}
  , {4,5}
  };

For this small example, it is feasible to evaluate the Outer expression:

Outer[func[##]&, Sequence @@ $small]

(*
{ { {func[q, p1, 4], func[q, p1, 5]}
  , {func[q, p2, 4], func[q, p2, 5]}
  , {func[q, p3, 4], func[q, p3, 5]}
  }
, { {func[-q, p1, 4], func[-q, p1, 5]}
  , {func[-q, p2, 4], func[-q, p2, 5]}
  , {func[-q, p3, 4], func[-q, p3, 5]}
  }
}
*)

The result is a 2 x 3 x 2 tensor. It is challenging to process such structures incrementally at anything other than the deepest level (the func[...] elements), especially for gigantic tensors. So we will focus upon generating the individual elements without regard to preserving the overall tensor structure.

We will define tupleSpigot, a facility for generating the individual tuples that comprise the deepest tensor level:

ClearAll[tupleSpigot]

tupleSpigot[l:{{__}..}] :=
  With[{ns = Length/@l}, tupleSpigot[l, MixedRadix[ns], Times@@ns, Length@l]]

Length[tupleSpigot[_, _, n_, _]] ^:= n

tupleSpigot[l_, r_, n_, c_][[i_]] ^:=
  MapThread[#[[#2]]&, {l, 1+IntegerDigits[i-1, r, c]}]

We can use this to represent all combinations of basis vectors from $small:

$s = tupleSpigot[$small]

(* tupleSpigot[{{q, -q}, {p1, p2, p3}, {4, 5}}, MixedRadix[{2, 3, 2}], 12, 3] *)

Note that this representation is relatively small. In particular it does not explicitly contain all of the generated tuples. But we can see that there are 12 elements as we saw from the earlier Outer expression:

Length[$s]
(* 12 *)

We can generate the tuples individually on demand, so that the whole structure is never materialized into memory all at once:

Do[Print[$s[[i]]], {i, 1, Length@$s}]

(*
  {q,p1,4}
  {q,p1,5}
  {q,p2,4}
  {q,p2,5}
  {q,p3,4}
  {q,p3,5}
  {-q,p1,4}
  {-q,p1,5}
  {-q,p2,4}
  {-q,p2,5}
  {-q,p3,4}
  {-q,p3,5}
*)

So, now let's try the same trick with the original full set of basis vectors from the question:

momentum[q_,p1_,p2_,p3_,p4_,p5_] := 
  Flatten@Outer[{q,p1,p2,p3,p4,p5}.{##}&,{1,-1},{1,-1},{1,-1},{1,-1},{1,-1},{1,-1}]

$original =
{ {q,-q}
, {q}
, DeleteDuplicates[momentum[q,p1,p2,p3,p4,p5]]
, {p1}
, DeleteDuplicates[momentum[q,p1,p2,p3,p4,0]]
, {p2}
, DeleteDuplicates[momentum[q,p1,p2,p3,0,0]]
, {p3}
, DeleteDuplicates[momentum[q,p1,p2,0,0,0]]
, {p4}
, DeleteDuplicates[momentum[q,p1,0,0,0,0]]
, {p5}
, {0}
, {0}
, {0}
, {0}
, {0}
, {0}
, {-5,-4,-3,-2,-1,0,1,2,3,4,5}
, {-5,-4,-3,-2,-1,0,1,2,3,4,5}
, {-5,-4,-3,-2,-1,0,1,2,3,4,5}
, {-5,-4,-3,-2,-1,0,1,2,3,4,5}
, {-5,-4,-3,-2,-1,0,1,2,3,4,5}
, {-5,-4,-3,-2,-1,0,1,2,3,4,5}
};

$o = tupleSpigot[$original];

How many tuples does this generate?

Length@$o

(* 3715232694272 *)

Yikes, that is a big number. If each element were 1000 bytes long, then this would require 3715 terabytes of memory to store.

But we can access the elements individually. For brevity, we will only show the first and last three elements from this monstrous set:

Do[Print[$o[[i]]], {i, 1, 3}]

(*
{q,q,p1+p2+p3+p4+p5+q,p1,p1+p2+p3+p4+q,p2,p1+p2+p3+q,p3,p1+p2+q,p4,p1+q,p5,0,0,0,0,0,0,-5,-5,-5,-5,-5,-5}
{q,q,p1+p2+p3+p4+p5+q,p1,p1+p2+p3+p4+q,p2,p1+p2+p3+q,p3,p1+p2+q,p4,p1+q,p5,0,0,0,0,0,0,-5,-5,-5,-5,-5,-4}
{q,q,p1+p2+p3+p4+p5+q,p1,p1+p2+p3+p4+q,p2,p1+p2+p3+q,p3,p1+p2+q,p4,p1+q,p5,0,0,0,0,0,0,-5,-5,-5,-5,-5,-3}
*)

Do[Print[$o[[i]]], {i, 3715232694270, 3715232694272}]

(*
{-q,q,-p1-p2-p3-p4-p5-q,p1,-p1-p2-p3-p4-q,p2,-p1-p2-p3-q,p3,-p1-p2-q,p4,-p1-q,p5,0,0,0,0,0,0,5,5,5,5,5,3}
{-q,q,-p1-p2-p3-p4-p5-q,p1,-p1-p2-p3-p4-q,p2,-p1-p2-p3-q,p3,-p1-p2-q,p4,-p1-q,p5,0,0,0,0,0,0,5,5,5,5,5,4}
{-q,q,-p1-p2-p3-p4-p5-q,p1,-p1-p2-p3-p4-q,p2,-p1-p2-p3-q,p3,-p1-p2-q,p4,-p1-q,p5,0,0,0,0,0,0,5,5,5,5,5,5}
*)

We might have overcome the memory problem, but if we want to process all those elements it is going to take a very long time. Somewhat optimistically, let's assume that we are running 256 parallel kernels and each kernel can process 10,000 elements per second. How many days will it take to perform the scan?

Length@$o / 256 / 10000 / 60 / 60 / 24 // N

(* 16.797 *)

If our expensive hardware doesn't die after pegging the CPUs at 100% for two straight weeks then maybe this is in the realm of possible. But it is certainly an expensive undertaking.

It would be fruitful look for an algorithmic solution to this problem. Will the domain allow us to reduce the dimensionality of the problem space so that we do not have to process so many elements? Can the we solve our problem, perhaps only probabilistically, by sampling a small subset of the large space?

Answered by WReach on October 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP