Biology Asked on December 6, 2020
I have two datasets, from different sources, that I need to compare.
The first set is deep sequencing results of a directed evolution experiment, where I have the naive library and selected library counts, and have calculated enrichment/depletion (positive and negative values with no upper or lower bound).
The second set is a set of protein sequences for which I calculate amino acid frequencies (positive values from 0-1).
The goal is to calculate a similarity between the two datasets. Typically I have two of the second type of set (protein sequences) and I calculate similarity based on the amino acid frequencies… What’s the best way to convert enrichment/depletion to frequency so I can compare?
Example deep sequencing data, for position 77 of the protein:
$$text{enrichment} = log_2left(frac{F_S}{F_N}right)$$
Where $F_S$ is selected frequency and $F_N$ is naïve frequency.
I came up with a possible solution for frequency equivalent from enrichment ($F_E$) but am open to thoughts if it’s good or not:
$$F_E = frac{displaystylefrac{F_S}{F_N}}{displaystylesum_text{amino acid}frac{F_S}{F_N}}$$
Although the question is kind of confusing at some places, what I understood is that you are trying to compare the relative amino acid enrichments in the two datasets.
As far as I know, you could construct the protein sequences from directed evolution experiment (presumably a time series data. Please clarify that.) and make a multiple sequence alignment (MSA) of that. In order to construct the sequences, there would be some technical procedures that would depend on the type of deep sequencing data you have. Factors such as the read length, protein length and coverage would need to be taken into consideration.
You could similarly make MSA for second datasets too.
Then using tools such as Rate4site (https://www.ncbi.nlm.nih.gov/m/pubmed/12169533/) you would be able to get evoutionary rates per site from MSAs. Then compare the evolutionary rates per sites for two datasets by correlating them.
If the correlation is high, the enrichments in both datasets are similar, otherwise not.
Answered by rraadd88 on December 6, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP