Computer Science Asked by Pragya on November 11, 2021
Is there any possibility of improvement in the result of count-min-sketch algorithm if we will use Map Reduce approach?
Improvement in performance can be in terms of accuracy, time complexity or the work done needed.
yes. section 3.2 from here.
A key feature of the operations which manipulate the sketch is that they are largely oblivious to the current state of the data structure. That is, the updates to the sketch do not require inspection of the current state. This means that data structure is highly suitable for parallelization and distributed computation. Firstly, each row of the sketch is updated independently of others, so the sketch can be partitioned row-wise among threads on a single machine. But more than this, one can build sketches of different subsets of the data (after agreeing on the same parameters w, d and set of hash functions to use), and these sketches can be combined to give the sketch of the union of the data. Sketch combination is straightforward: given sketch arrays of size w × d, they are combined by summing them up, entry-wise. This implies that sketches can be a useful tool in large scale data analysis, within a distributed model such as MapReduce. Each machine can build and “emit” a sketch of its local data, and these can then be combined at a single machine (i.e. a single reducer simply sums up all the sketches, entry-wise) to generate the sketch of a potentially huge collection of data. This approach can be dramatically more efficient in terms of network communication (and hence time and other resources) than the solution of computing exact counts for each item, and filtering out the low counts.
Answered by Felipe on November 11, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP