Architecture to show the number of results within the filter

Question

I am building a web application connected to MySQL database that is supposed to work like an e-shop. User should be able to filter products, but I want to display the number of results that would be added to result with each choice, like in this picture:

This should work across all parameters (like all display, battery or price options), as well as showing the total number of results with current filter, that would change each time the user changes something in the filter.

I know this is a common feature, but I need to get the general idea how this works. These are my ideas:

there might be a tool that periodically calculates and caches the number of results for each possible combination in the background.
or are the SELECT COUNT(*) FROM database WHERE <conditions> queries running each time the user checks a field in the filter, for each possible combination? But that would take too long, right?

I would be grateful for any ideas, especially article links or anything related to this topic.

Andy · Answer

But that would take too long, right?

Have you actually ran a benchmark on that? Databases are more capable than you probably think. Quite often the group by isn't the biggest problem, as long as the database contains valid indices and the query is constructed properly. We were running seemingly extremely heavy PostgreSQL query containing multiple aggregation functions (the likes of GROUPING SETS,...) on a database with tens of millions of records. The database handled it just fine, because the query was constructed in a proper way.
A simple design which works to a certain degree therefore is a simple GROUP BY on the database level, along with a little bit of work on the front end.
Given the following filter object structure (json representation for simplicity):
{
  "brand": [
    1 // Apple
  ]
}

you could construct a query giving you the following result:
+-------+----+-------+----------+
| type  | id | count | selected |
+-------+----+-------+----------+
| BRAND | 1  | ?     | 1        |
+-------+----+-------+----------+
| BRAND | 2  | 73    | 0        |
+-------+----+-------+----------+
| BRAND | 3  | 113   | 0        |
+-------+----+-------+----------+
| BRAND | 4  | 92    | 0        |
+-------+----+-------+----------+
| BRAND | 5  | 97    | 0        |
+-------+----+-------+----------+

where type is an enum defined in the application layer and would also have other values, such as SCREEN_SIZE, BATTERY_CAPACITY.
Depending on the database table structure, to retrieve such data, this then may be an operation of few UNIONs containing GROUP BYs, or even simpler.
This is then very easily transformed to a data representation, like this:
{
  "brands": [
    {
      "id": 1, // apple
      "count": ?,
      "selected": true
    },
    {
      "id": 2,
      "count": 73,
      "selected": false
    },
    {
      "id": 3,
      "count": 113,
      "selected": false
    },
    // ...
  ],
  "screenSizes": [
    // ...
  ],
  // ..
}

and properly displaying the data to the user is a task for simple transformers.
Lowering the database load
By introducing very simple caching, you can very easily circumvent (I'd say quite a lot of) unnecessary database requests. If you use a set for filter identifications, so that:
[1, 2, 3] == [3, 1, 2] // true

which should generally be the truth when trying to filter some data, and therefore the following two filter objects are considered equal:
{
  "brand": [1, 2, 3]
}

{
  "brand": [3, 2, 1]
}

and because of that should lead to the same results (for a certain period of time), even introduction of caching of the database results is extremely easing. You would simply compute some hash from the filter object, to be used as a cache key.
From then on, it's pretty straight forward. If a key exists in cache, use the result from cache. Otherwise compute the result in the database, store it in the cache and return it.
It's very likely, with growing application traffic, the filters of lots of users will match, and because of that caching the result in such way is beneficial.
Do not overcomplicate things unless you have an actual problem at hand. Once the problem arrives, you can temporarily disable the counting feature (so that it does not slow down your application) and then start to think about another approach. It is quite likely the problem would arrive at a time where a lot of aspects of the project are already defined (infrastructure, application architecture,...). Focusing only on the performance part at that time will make finding the solution faster.

Kain0_0 · Answer

What works?
Honestly what Amazon has to do to make this work, is not what Etsy does, isn't what the niche web store does.
Where are you on the scale will greatly affect what is reasonable.
At a minimum you do need to count the items in your database. How fast is that, what sort of load does this generate? Is this tolerable?
Okay maybe its too slow. How about caching, what about a smarter front end that doesn't keep asking each time the page is updated.
Right so caching isn't working. What about a speciality data structure into which the counts are prepopulated at a fine level. Perhaps its not even in the db but in a ram store somewhere, and dynamically updated as products are added/removed so that it only needs to be rebuilt very occasionally.
That's not working? Well then how about you drop the feature? Is it really the crowd pleaser you think it is? Many web stores do not in fact have this, they are arguably successful.

Cap Barracudas · Answer

I am not sure but you might want to do that on the front end using javascript. So :

a) fetch all your results

b) filter them on the front end upon request.

Not sure if JavaScript has caches filters and stuff. You could have a look on this but in any case given you have 1000 products this should run in no time. And yes, polling the database each time you need a subset seems like a very bad solution to me too.

Architecture to show the number of results within the filter

3 Answers

Lowering the database load

What works?

Add your own answers!

Ask a Question