TransWikia.com

How to improve the efficiency of periodicity detection for long time based lined and gapped datasets

Computational Science Asked on February 16, 2021

Our data set has $10^4$ data points, but has a long baseline and many gaps. As the histogram shows, the horizontal-axis is time and most of the time, there are no data. The vertical-axis is data counts. The total time is, in fact, short (green lines), but gaps make the time baseline very long.
enter image description here

If we bin the data, there would be $10^8$ data points[t,value], but only about $1%$ are non-zero values. After binning, most of those values are zero because of those gaps.

How to improve detection efficiency(a faster method)?

A multi-threading way is possible(especially for Lomb-Scargle)?

One Answer

One way to approach it, could be to do the analysis in two steps. First, you do a sweep along your dataset, and collect all nonzero values, and their times into a (much shorter!) data structure. You collect a list of tuples [t,value] basically. Every sweep after that will be extremely fast as you can safely assume every datapoint not in your list to be zero. Clicking on your link I have a hard time understanding what I see. These are not simple csv-style datapoints. They are in the form of:

139459196.2742752731

139462208.5806673169

139462689.1677284241

139467485.6161292493

...

Are these just your times and the second number is omitted?

How do you want to go about the periodicity detection? The obvious way here is to use a fourier transform which, after transform, will show you at which frequencies you have periodicity. Have you researched fast-fourier-transforms for python: (scipy.fftpack) ? I would be surprised if there would be no parallel implementation for a fourier transform in python. If that is still not fast enough there is the FFTW-library, boldly calling itself "Fastest Fourier Transform in The West", but I have to warn you that it is a pain in the ass to use. [EDIT:] There is a pyhton wrapper for FFTW pyFFTW.

You seem to have an impressive ammount of data. You might get away with doing some averaging, depending on your precision needs. If you take the first, say 10, datapoints, and take the maximum amplitude, store in a shorter array, and repeat for the following. You may shorten your dataset significantly (with loss in precision on your time axis of course).

Answered by MPIchael on February 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP