StackOverflowException when big numbers in recursive algorithm. Optimization?

Stack Overflow Asked by Ivan G on August 4, 2020

Task: I need to distribute a sum of packages into containers with different capacity.

Each capacity has restrictions: minimum and maximum package count.

Package count should be equal or less than sum of selected container capacities.

Capacities are stored in database. Packages sum is prompted from user and could be decimal.

Result should be a collection of container capacities for every container.
There is no limit to results count.

I have a recursive solution written in C# but it crashes with StackOverflowException for large package sums.

// returns all combinations of capacity max and min values which package sum could include
static IEnumerable<List<int>> GetCombinations(int[] set, int sum, List<int> values)
{
for (var i = 0; i < set.Length; i++)
{
var left = sum - set[i];
var vals = new List<int>(values);

if (left == 0)
{
yield return vals;
}
else
{
int[] possible = set.Where(n => n <= sum).ToArray();
if (possible.Length > 0)
{
foreach (var s in GetCombinations(possible, left, vals))
{
yield return s;
}
}
}
}
}


Calling code where packageCapacities contains properties Count (which is MaxCount for real) and MinCount:

var allCapacityValues = packageCapacities
.SelectMany(x => Enumerable.Range((int)x.MinCount, (int)x.Count - (int)x.MinCount + 1))
.OrderByDescending(x => x)
.ToArray();
// gets first combination, sort numbers in it and distinct it
var combination = GetCombinations(allCapacityValues, (int)Math.Ceiling(contentData.FactCount), new List<int>())
.Select(x => x.OrderByDescending(o => o))
.Distinct(new EnumerableComparer<int>())
.FirstOrDefault();


where there are two capacities and sum of 13 which is distributed into 3 container sizes.

Reproducible code:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Data.Services
{
public class ContainerGenerationService1
{
public void GenerateContainersWorks()
{
int capacity1Min = 4;
int capacity1Max = 5;
int capacity2Min = 2;
int capacity2Max = 2;
int[] set = Enumerable.Range(capacity1Min, capacity1Max - capacity1Min + 1)
.Concat(Enumerable.Range(capacity2Min, capacity2Max - capacity2Min + 1))
.ToArray();
int sum = 13;

var combination = GetCombinations(set, sum, new List<int>())
.Select(x => x.OrderByDescending(o => o))
.Distinct(new EnumerableComparer<int>())
.FirstOrDefault();
}

public void GenerateContainersFails()
{
int capacity1Min = 3;
int capacity1Max = 9;
int[] set = Enumerable.Range(capacity1Min, capacity1Max - capacity1Min + 1).ToArray();
int sum = 999999;

var combination = GetCombinations(set, sum, new List<int>())
.Select(x => x.OrderByDescending(o => o))
.Distinct(new EnumerableComparer<int>())
.FirstOrDefault();
}

static IEnumerable<List<int>> GetCombinations(int[] set, int sum, List<int> values)
{
for (var i = 0; i < set.Length; i++)
{
var left = sum - set[i];
var vals = new List<int>(values);

if (left == 0)
{
yield return vals;
}
else
{
int[] possible = set.Where(n => n <= sum).ToArray();
if (possible.Length > 0)
{
foreach (var s in GetCombinations(possible, left, vals))
{
yield return s;
}
}
}
}
}

class EnumerableComparer<T> : IEqualityComparer<IEnumerable<T>> where T : IComparable<T>
{
public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
{
if (first == second)
return true;
if ((first == null) || (second == null))
return false;

return new HashSet<T>(first).SetEquals(second);
}

public int GetHashCode(IEnumerable<T> enumerable)
{
return enumerable.OrderBy(x => x)
.Aggregate(1, (current, val) => current + val.GetHashCode());
}
}
}
}


Calling code:

var svc = new ContainerGenerationService1();
svc.GenerateContainersWorks(); // works
svc.GenerateContainersFails(); // fails with StackOverflowException


Stack overflow is by no means the only problem of your code, but lets start there. GetCombinations calls itself recursively. When you get hundreds of thousands call deep, you run out of the stack. You can not use system stack in this case, you need bigger data storage.

You are looking just for one solution here, but the code is obviously written with the intent to return all distinct sets. But you should reconsider the approach. You generate all variations and then pick unique sets and discard the rest. This is very expensive. Like, several orders of magnitude worse. You should generate distinct sets directly. Like, if you have number 6, the next number can be 6 or 5 or 4, but not 7.

The next big problem are situations with no solution. You can find some solution quite fast probably if it exists. But if not, you would loop in many combinations. You can use dynamic programming to solve this. It would tell you which amounts are valid with containers you have a which are not. And you can use it to further improve efficiency of the recursion.

You create new List every time you return from the function. This is safe approach. But you can often just return the same list and modify it. For cases like GetCombinations(...).Count() it is more efficient. Lets put it all together

static IEnumerable<List<int>> GetCombinations(int[] set, int sum)
{
var orderedSet = set.Distinct().OrderByDescending(o => o).ToArray();

bool[] valid = new bool[sum + 1];
valid[0] = true;
for (int i = 0; i < sum; ++i)
{
if (valid[i])
{
for (int j = 0; j < orderedSet.Length; ++j)
{
int next = i + orderedSet[j];
if (next < valid.Length)
{
valid[next] = true;
}
}
}
}

if (!valid[sum])
{
return new List<int>[0]; //no solution
}

return GetCombinationsRecurse(orderedSet, sum, new List<int>(), valid, 0);
//return GetCombinationsNoRecurse(orderedSet, sum, valid);
}

static IEnumerable<List<int>> GetCombinationsRecurse(int[] set, int sum,
List<int> values, bool[] valid, int setIterator)
{
for (var i = setIterator; i < set.Length; i++)
{
var left = sum - set[i];
if (left < 0 || !valid[left])
{
continue;
}

if (left == 0)
{
yield return values;
}
else
{
foreach (var s in GetCombinationsRecurse(set, left, values, valid, i))
{
yield return s;
}
}
values.RemoveAt(values.Count - 1);
}
}


I gave here the recursive version because it matches your original code and is easier to follow. But the stack overflow for big numbers obviously remains. Recursive function can always be rewritten into non recursive version. You need to use some data structure instead of the system stack. Either stack, or array or whatever. But it tends to be not pretty

static IEnumerable<List<int>> GetCombinationsNoRecurse(int[] set, int sum, bool[] valid)
{
List<int> sums = new List<int>() { 0 };
List<int> setIterators = new List<int>() { 0 };
int iter = 0;
List<int> values = new List<int>() { set[iter] };

while (true)
{
int actSum = sums[iter] + values[iter];
int left = sum - actSum;
if (left == 0)
{
yield return values;
}

if (left <= 0 || !valid[left])
{
while (++setIterators[iter] >= set.Length)
{
if (--iter < 0) { yield break; }
values.RemoveAt(values.Count - 1);
}
values[iter] = set[setIterators[iter]];
continue;
}

{ // left > 0
if (sums.Count > iter + 1)
{
sums[iter + 1] = actSum;
setIterators[iter + 1] = setIterators[iter];
}
else
{
}

iter++;
}
}
}


Correct answer by Antonín Lejsek on August 4, 2020