Data Science Asked on June 16, 2021
Given is a vehicle which is moving past a station. If the vehicle passes the station (for simplicity determined by GPS) it sends RF-signals, to the station saying "Hello I am here now" marked as A and "Bye I am leaving" marked as B. The station is just a µ-Controller logging the received signals in a file with a timestamp and a corresponding letter (A or B). The time on the µC is set by hand and drifts over time.
The vehicle also logs the sent messages with a timestamp.
Usually this works well, 98% of all bypasses are good (A followed by B).
But there are a few stations 5% or so, for which only about 60% of bypasses are good.
Now my task is to find out why this is not working correctly!
So I plotted the data:
On the x-axis is the period under review. On the y-axis is the minimal time delta, in seconds, from the station to any corresponding (A or B) log entry in the vehicle, for one station entry.
Thinking that the slope you can see is the drift over time and the offset is the time-shift cased by manually setting the clock on the µC, I found the function y = k*x + d
and corrected the shifts.
So I got this:
(which does not look better to be honest)
Same axis as before. In red is the data which has a minimal delta over 60 seconds.
Now thinking that the time is not synchronized perfectly, I experimented with some algorithms. Namely DTW from github and Biopython’s pairwise2 needle algorithm.
The input data for DTW was the ordered (by timestamp) sequence of the differences between an A or B.
np.delete(n1.diff().to_numpy(),0)
whereby the n1
is a Pandas Dataframe.
It found missing data, both on the vehicle side and on the station side.
The input data for align
-class was just the sequence of A & Bs without the timestamp.
It found missing data, both on the vehicle side and on the station side.
I was now realizing that the drift correction was unnecessary, as the timestamp is either not used (align
) or a lagged difference is calculated (DTW). Also verified by plotting the DTW – path and the alignment-score of align
before and after the correction.
What I want to know is, which algorithm can I use to determine a shift for synchronizing the data-sets, when assuming one data-set is perfect (the vehicle side). And which sub-set is probably the corresponding one on the station side.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP