Bioinformatics Asked on August 31, 2020
I am reading Section 5.2, Kinship and Inbreeding Coefficients, of Kenneth Lange, Mathematical and Statistical Methods for Genetic Analysis. There the kinship coefficient $Phi_{i,j}$ is defined for two relatives $i$ and $j$ as the probability that a gene selected randomly from $i$ and a gene selected randomly from the same autosomal locus of $j$ are identical by descent.
I would think $Phi_{i,j}$ should depend on the particular population frequency of the gene or allele at that locus. However, the book does not seem to indicate that or at least does not stress that at all. Is there a dependence or not? The same question applies to the inbreeding coefficient.
More importantly, I doubt this definition is mathematically rigorous.
Here is an example to make what puzzles me clearer. It must be that the probability I am computing is not what the kinship coefficient. But this is the definition means to me. Some expert please elucidate the correct meaning of the definition.
Consider the simplest pedigree tree of one family of two parents with brothers $A$ and $B$. Allele $a$ is detected at a particular locus. Let $b$ be the other allele of the same gene. We ask for the probability of the brother $B$ having allele $a$ conditioned on $A$ having allele $a$.
In the following, to clutter symbols, we abuse the symbols by use $a$ to denote the population frequency or the unconditional probability of finding allele $a$ at the locus for the whole population as $a$. The same goes for allele $b$. $a,bin[0,1]$ and $a+b=1$. We solve this problem by listing as follows all possible parental gene configurations, the configuration probability (considering the symmetry of paternal and maternal loci), as well as the probability finding allele $a$ for either $A$ or $B$ at the locus.
begin{align}
text{parent config}~~~~ & text{Pr(config)} & text{Pr($a$|config)}
aa|aa~~~~~~~~~ & ~~a^4 & 1
aa|ab~~~~~~~~~ & 4a^3b & 1
aa|bb~~~~~~~~~ & 2a^2b^2 & 1
ab|ab~~~~~~~~~ & 4a^2b^2 & frac34
ab|bb~~~~~~~~~ & 4ab^3 & frac12
bb|bb~~~~~~~~~ & ~~~~b^4 & 0
end{align}
The joint conditional probability of both $A$ and $B$ have allele $a$ at the locus is Pr($a$|config)$^2$. So the required conditional probability is
$$P(r):=text{Pr}(Btext{ has } a|Atext{ has }a)=frac{sum_text{config} text{Pr(config)Pr(a|config)}^2}{sum_text{config} text{Pr(config)Pr(a|config)}} = frac{r^3+4r^2+2r+4r(frac34)^2+frac4{2^2}}{r^3+4r^2+2r+4rfrac34+frac42}, quad r:=frac abin [0,infty).$$
$P(r)$ depends on $r$ which depends on the population frequency of the alleles as I have claimed.
Consider $[r^3+4r^2,2r,3r, 2]$ as a positive entry vector. $P$ is a convex combination of $v:=big[1,1,1,frac34,frac12big]$. So $P$ is between the minimum and maximum of the entries of $v$, and
$$frac12le Ple 1.$$
We see the extrema are achieved at
$$P(r)=begin{cases}
frac12, quad r=0 [0.7ex]
1,quad rtoinfty
end{cases}$$
Kinship selection has a long history in evolutionary genetics, it is where the whole "selfish gene" hypothesis was formed (Maynard Smith, Bill Hamilton, and George Price ). I would be amazed if it was not rigorous. The problem the inuitive understanding of how the co-efficient works in real data appears lacking. So for example $F_{st}$ simulation studies demonstrate that strong values occur above 0.2.
In terms of population size impact on the calculation, this is complex because it is a consequence of $N_{e}$, studies have been done on this - and in general extremely small populations don't adhere and tend to exhibit 'Muller's racket" mechanisms. The general opinion, e.g. in HW equilibrium, is that deviation from expected behaviour can occur due to a variety of factors not represented in the original equation, but given a reasonable $N_{e}$, $F_{st}$ and $F_{is}$ the essential.
In every day terms the parameters can be explored by sensitivity analysis, so e.g. drive the genetic variation to extremes and look at the change in the coefficient, reduce the genetic variation and assess again. Look for past simulation studies etc ...
Answered by Michael on August 31, 2020
I have figured out most of the answer. The definition as set by Kenneth Lange's book is indeed quite vague and thus unrigorous. The kinship coefficient is a simpler probability than I thought originally. It is a probability not conditioned on observing an allele $a$ at a locus of $i$ but conditioned on the following where we focus on one particular locus.
Allele $a$ is present at some of the common ancestors of $i$ and $j$. For each of the aforementioned common ancestor $C$, the probability of the other allele being also $a$ is $Phi_{C,C}$ the inbreeding coefficient of $C$. $Phi_{i,j}$ is the probability of finding $a$ in both $i$ and $j$.
So the kinship coefficient does not only depend on the tree of lineage but also depend on the inbreeding coefficients which is either arbitrarily set or dependent on the population frequency of allele $a$.
In the simple example considered in the question, $Phi_{A,B}=text{Pr}(a|text{config})^2$. Particularly, for the canonical case of the common ancestral (parental) configuration being $ab|bb$, $Phi_{A,B}=frac1{2^2}$.
Answered by Hans on August 31, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP