Bioinformatics Asked by runeblaze on August 27, 2021
In the Clustal Omega paper, it says the following about mBED ($N$ is the number of input sequences, I assume):
we use a modified version of mBed (Blackshields et al , 2010), which has complexity of $O(N log N)$
But in the mBED paper it says the following (section 2):
A number of $t$ sequences are initially sampled from the input dataset $X .$ Following the LLR
algorithm, this value is set by default to $t=left(log _{2} Nright)^{2} .$
After the seed sequences in $R$ have been chosen, all sequences in the input data are associated with a $t$-dimensional vector. This is done simply by computing the distances from all sequences to each of the $t$ seeds.
Then my conclusion is that this would lead to a complexity of $O(N (log N)^2)$ because for each sequence (in total $N$ of them), distances to $t = (log N)^2$ sequences must be calculated. Where did I do wrong?
Turns out that I was dumb: I misread.
The Clustal Omega paper (in the supplementary materials) chose $t = log N$, which I think originates from this claim from this part of the Linial et al. paper:
In random polynomial time $X$ may be embedded in $l_{2}^{O(log n)}$ with distortion $leq(1+epsilon) cdot c_{2}(X)$ (for any $epsilon>0)$
Several questions still remain though:
Answered by runeblaze on August 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP