TransWikia.com

What's going on with my "triplet optimization" algorithm?

Data Science Asked by Algorithm on September 8, 2020

I’ve tried StackOverflow and Comp Sci SE, and everyone points here.

I wrote an algorithm for data analysis using the CERN ROOT framework. It takes in three files of sorted UNIX timestamps from a single year, and pairs them up in the closest triplets, where each file contributes a triplet, and each triplet is unique. I know there are some more "well known" algorithms for accomplishing this, however this algorithm completes the task much faster, clocking in at about 20 minutes on my machine, as compared to many, many hours for some of the other algorithms I’ve tried. When complete, the algorithm plots the triplets (of the form {a,b,c]) on a 2-dimensional histogram, where the horizontal axis is a-b, and the vertical axis is a-c.

Problem is, it seems to be acting very weird. Namely, when I feed the algorithm one file of real data (these are timestamps generated by an experiment) and two files of completely random data, I get these weird diagonal lines. When I feed the algorithm three files of real data, a single diagonal line through the middle (and two more lines, running horizontally and vertically) appears if I use enough bins. Any idea what’s going on with my algorithm?

    void unbiasedAnalysis(){

TNtupleD *D  = new TNtupleD("D","D","x:y");

ROOT::RDataFrame statn1("D", "./pathtodata");
ROOT::RDataFrame statn2("D", "./pathtodata");
ROOT::RDataFrame statn3("D", "./pathtodata");

vector<double> vec_1, vec_2, vec_3;
statn1.Foreach([&](double tstamp){ vec_1.push_back(tstamp); },{"UNIX"});
statn2.Foreach([&](double tstamp){ vec_2.push_back(tstamp); },{"UNIX"});
statn3.Foreach([&](double tstamp){ vec_3.push_back(tstamp); },{"UNIX"});

vector<vector<double>> pairs;
for(auto tstamp : vec_1){

    double first,second;

    //get iterator pointing to closest element greater than or equal to
    auto geq = std::lower_bound(vec_2.begin(), vec_2.end(), tstamp);
    //get iterator pointing to nearest element less than
    auto leq = geq - 1;

    double foo = tstamp - *geq;
    double bar = tstamp - *leq;

    //compare iterators, save the closest 
    if(dabs(foo) <  dabs(bar)){ first = *geq; }
    else { first = *leq; }




    //repeat
    geq = std::lower_bound(vec_3.begin(), vec_3.end(), tstamp);
    leq = geq - 1;

    foo = tstamp - *geq;
    bar = tstamp - *leq;

    if(dabs(foo) < dabs(bar)){ second = *geq; }
    else { second = *leq; }

    //add to pairs
    pairs.push_back({tstamp, first, second, (tstamp-first), (tstamp-second), std::min((tstamp-first), (tstamp-second))});

}

//sort vector of vectors by size of smallest difference
std::sort(pairs.begin(), pairs.end(),
    [](const vector<double>& A, const vector<double>& B){
        return A[5] < B[5];
});

std::set<double> cache;

ROOT::EnableImplicitMT();

for(auto pair : pairs){
    //if not in cache, add to TNtuple
    if(cache.find(pair[1]) == cache.end() && cache.find(pair[2]) == cache.end()){

            D->Fill(pair[3],pair[4]);

        //add to cache
        cache.insert(pair[1]); cache.insert(pair[2]);
    }
}

D->Draw("x:y>>htemp(100,-0.02,0.02,100,-0.02,0.02)","","colz");
}

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP