Bioinformatics Asked by space_gladiator on March 25, 2021
I don’t have any theoretical background in biology, so forgive me if my question is a bit.. well..dumb. I’m trying to use a Monte Carlo approach to find the discriminating position of a given sequence alignment column. So essentially, given a sequence alignment, I want to find the position at which we can distinguish active and inactive conformations.
So, here is my code for the Monte Carlo method:
def MonteCarlo(S, max_iter, threshold):
"""
Computes the discriminating position for a sequence column
Input: S = sequence column, max_iter = max iterations, threshold = threshold of acceptance for boltzmann distribution (between 0 and 1)
Output: 2 classes
"""
n = len(S)
S_copy = S.copy()
num_class1 = int(0.30*n) #number in class1
print(num_class1)
for _ in range(num_class1): #intialisation
rand_position = random.randint(0,len(S)-1)
if S_copy[rand_position] == 'X':
rand_position = random.randint(0,len(S)) #reselect if used position
S_copy[rand_position] = 'X'
#Monte Carlo with Insertions and Deletions
s_entropy = entropy(S)
n_iter = 0 #counter for iterations
class_1 = dict() #class 1 determined by X's
class_2 = dict() #class 2 is rest of S
for i in range(len(S)):
if S_copy[i] == 'X':
class_1[i] =S[i]
else:
class_2[i] =S[i]
res = score(S, class_1, class_2)
while n_iter < max_iter:
r = random.random()
if r < 0.25: #25% chance to do an insertion
(c1,c2,pos) = Insertion(S, class_1, class_2)
elif r < 0.5: #25% chance to do a deletion
(c1,c2,pos) = Deletion(S, class_1, class_2)
else: # 50% chance to do a swap
(c1,c2,pos) = Swap(S, class_1, class_2)
res2 = score(S, c1, c2)
#Boltzmann Distribution coefficient
delta_S = abs(res2 - res)
lambda_, N = 0.2, 19
size = 6000
ind1 = random.randint(0,size - 1)
rv = boltzmann.rvs(lambda_, N, size=size)/N
beta = rv[ind1] #pick float between 0 and 1 with boltzmann distribution
if (res2 > res) or (beta*delta_S > threshold): #if better score i.e. higher entropy gain
res = res2
class_1 = c1
class_2 = c2
n_iter += 1
return class_1, class_2, pos
Here I’m using the Boltzmann distribution as a probability measure. All the other functions (swap, insertion, deletion, entropy) calculate pretty much what the name says (if you need more clarification on my side, I’d be happy to provide it!)
Let me explain what my code is trying to do: Given a sequence alignment (as a list) say [‘M’,’M’,’M’,…,’L’,’L’,’L’] I first set a bunch of random positions and mark them out as ‘X’ (the number of Xs are determined by num_class1). Next, in the while loop I do swaps,insertions, and deletions with the goal of obtaining a better score. This meansI’m adding, removing, or swapping positions between class1 and class2. Eventually, my code returns two classes and the discriminating position.
While my code works, I don’t think it’s giving the right result. When I use a base case example of a list with two letters ‘M’ and ‘L with a clear discriminating position, I get random results.
Could someone explain theoretically (or via code) how one can find the discriminating position using a Monte Carlo method as above?
Thanks!
Without asking further questions, my suspicion would be how indels (gaps) are counted in your code against the aligment. Its the classic place for alignment mis-position errors. In other words, based on my experience your calculation probably performs fine, but linking the calculated alignment position against the actual observed alignment position is classic area of logical errors and needs careful monitoring of variables using code breakpoints within your loops.
Answered by M__ on March 25, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP