TransWikia.com

Best topology for joining data from multiple sensors

Stack Overflow Asked by utxeee on October 23, 2020

I have n sensors generating measurements every t minutes to their own topic as follows:

Topic_1: {timestamp: 1, measurement: 1}, {timestamp: 2, measurement: 4}, ...

Topic_2: {timestamp: 1, measurement: 5}, {timestamp: 2, measurement: 3}, ...
 
Topic_n: {timestamp: 1, measurement: 3}, {timestamp: 2, measurement: 5}, ...

This number of sensors is dynamic but for sake of simplicity let’s assume I have 3 sensors, therefore, 3 topics getting data every t minutes.

What is the best topology for joining all measurements with the same timestamp as shown below?

{timestamp: 1, measurement: 1} 
{timestamp: 1, measurement: 5}  --------> {timestamp: 1, measurements: [1,5,3]}
{timestamp: 1, measurement: 3}

2 Answers

TANSTAFL: There Ain't No Such Thing As A Free Lunch

Every tradeoff will apply in different situations.

I recommend writing a stupidly simple service first, like an in memory default dictionary. Having something stupid and slow can validate your tests work and sometimes be run in parallel to make sure your complex algorithms work.

I've used 'hop and consolidate' star networks for sensors that collect and forward on a schedule (sleep 6 minutes, wake for 40 ms). Coupling with telemetry, this can give a very low transmission cost. Adds one bit/sensor for no bit received. Downside is it doesn't handle out of order readings, retransmission, etc. Also consolidation systems have a minimum latency.

There as been a lot of work on extremely compact, read-only database readings for logs. Basically, timestamps allow distributing your queries correctly across your compute and drive resources. Sensage and others did this.

Like most Stack Overflow questions, I'm just guessing about your actual problem. :)

Answered by Charles Merriam on October 23, 2020

You have a few options. You can use join and define a joiner to make the list. However it would have to be a windowed stream after the join. If your measurements always come in during the grace period then this should not be a problem.

A little more complicated, if your time stamps do not have duplicates you can groupByKey then aggregate into the lists. this will form a table with the results you want. If you need it to be a stream you can use toStream and filter out updates without a list of length n.

There are probably a few other ways of doing this as well, but these come to mind first.

Answered by wcarlson on October 23, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP