TransWikia.com

Does the string "...CATCAT..." appear in the DNA of Felis catus?

Biology Asked by skytreader on June 16, 2021

In Hofstadter’s Gödel, Escher, Bach: An Eternal Golden Braid (GEB), the following claim appears:

…in the species Felis catus, deep probing has revealed that it is indeed possible to read the phenotype directly off the genotype. The reader will perhaps better appreciate this remarkable fact after directly examining the following typical section of the DNA of Felis catus:

…CATCATCATCATCATCATCAT…(OP note: truncated because, you get it)

Is this true? A cursory search for the DNA of Felis catus gives me this 1996 paper by Lopez, Cevario, and O’Brien and the given sequence does not appear – there are some instances of "CAT" but not repeated enough to make it as remarkable as claimed in GEB.

I don’t know enough Biology to judge the veracity of this claim. Some points I am considering are:

  • GEB is full of wordplays. However, the tone of this part of the text does not sound like one to me.
  • GEB was written/published around 1978. The paper I linked to – which was cited by some 236 others according to Google – was published in 1996, way after GEB’s time. If my impression that Lopez et al.’s work is significant because it is the first time Felis catus has been sequenced, then there is no way Hofstadter could’ve known of it when he wrote GEB. Then again, I don’t know enough Biology that there might be some nuance to Lopez et al.’s paper that I’m missing (i.e., the results of the paper might not be mutually exclusive to the claim made in GEB).
  • GEB has reference notes and bibliography and there is no reference cited to back this claim. However, GEB does not attempt to be a rigorous academic thesis and the references is only called upon more when Hofstadter quotes other works directly while the bibliography is a list of readings which the reader may want to check out, regarding the main thesis of the book.

So are cats recursions with no base cases?

4 Answers

The Felis catus genome has been published, annotated, and updated quite a bit since 1996, including spans of so-called intergenic regions, which are basically scaffolding and other structures, along with perhaps some unidentified genes, pseudogenes, regulatory sequences, etc. Basically, pretty much the entire DNA sequence is available now, not just the gene sequence of the mitochondrial genome, which was what was published in the 1996 paper you referenced. Mitochondria are the power plants of the cell, but are just an organelle that happens to contain its own DNA; they are separate from the chromosomal DNA in the nucleus. All of this is available for free (if you know where to look) at the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM) at the National Institutes of Health (NIH) in the United States. Other sites are also available, such as Ensembl, a joint project between the European Bioinformatics Institute (EMBL-EBI), part of the European Molecular Biology Laboratory (EMBL), and the Wellcome Trust Sanger Institute (WTSI). Both institutes are located on the Wellcome Trust Genome Campus in the United Kingdom.

So, to the genome. Genomic sequences can be searched in a couple of different ways, depending on what you're looking for, but the most common way is to use BLAST, the Basic Local Alignment and Search Tool. As the name implies, it takes sequences as input and searches one against the other, aligning the results as best as possible using certain algorithms that the user can define and tweak. The BLAST web interface to the cat genome is here. You don't need to worry about any of the other options here except the "Enter Query Sequence" box. FASTA format is just using the single-letter abbreviations for nucleotides (AGCT), all strung together.

The genome we're searching is of an Abyssinian cat named Cinnamon:

Cinnamon

Cinnamon, the cat which was chosen to be the definitive genetic model for all cats in the feline genome project. Image courtesy of the College of Veterinary Medicine at the University of Missouri.

To start with, I typed in CATCATCATCAT and to my surprise got back over 200 hits, covering every chromosome the cat has. So, I doubled the length of the input to 8 CATs, and got back the same result set. Unfortunately, 12 CATs was too many (and really, it is too many), so I worked backwards.

The final results are here (sorry, link expires 10/13/16. To regenerate, go to BLAST link above and enter CATCATCATCATCATCATCATCATCATCAT). Apparently, popular wisdom is incorrect, and Felis catus chromosomes really contain 10 CATs each, one more than is needed for their 9 lives. No word yet as to why this may be, but scientists are presumably working on it.

Correct answer by MattDMo on June 16, 2021

While Matt's answer is perfectly correct, it is important to note that the sequence $(CAT)_n$ in DNA is not restricted to cats, and you would expect to find it anywhere.

For example, searching the human genome for the same 3-tandem repeat CAT sequence results in many hits as well.

This is because you are essentially searching for short tandem repeats on the DNA strand. These repeats can occur in any organism, and therefore while finding CAT substrings in the DNA of the cat may be amusing, they aren't special to cats (or any other animal) and are only the result of an artifact of naming of the bases coincidentally matching with the name of the animal.

Answered by March Ho on June 16, 2021

To augment the other answers, let's compute the probability of CATCATCATCAT occurring in random DNA sequence.

Cat DNA length is 2.7 gigabases (source), and there are 4 possible bases. For 1 CAT there are 3 bases, giving expected number of occurrences in 2.7 Gb as $frac{2.7 cdot 10^9}{4^3} approx 42,188,000$

Repeating the calculation for longer sequences gives:

  • 1 CAT: 42 188 000 occurrences
  • 2 CAT:      659 180 occurrences
  • 3 CAT:        10 300 occurrences
  • 4 CAT:             160 occurrences
  • 5 CAT:                2 occurrences
  • 6 CAT:                0 occurrences

So, indeed, there are many more CATs in cats than could be expected by pure chance alone.

Answered by jpa on June 16, 2021

So, there are a few great answers here already, but it seems nobody addressed an interesting part of your question: GEB was published in 1978 and the genome of Felis catus was not sequenced until many years later... so how did he know?

jpa's answer shows that you'd expect to get only about five CATs - not ten, and the chance of getting ten is astronomically low. I expanded his table to show the depressingly low chance of getting ten by perfect randomness:

5 CAT: 2.5 expected per Felis catus genome
6 CAT: 0.04 expected
7 CAT: 0.00061
8 CAT: 9.54 e-6
9 CAT: 1.49 e-7
10 CAT: 2.32 e-9

That means you'd expect to find 10 CATs about 0.00000000232 times per random genome. So how on earth did the Felis catus genome end up with ten CATs in it? And how did Hofstadter know that there would be this many CATs?

As it turns out, this repeated sequence of a few base pairs is called a "short tandem repeat", or "microsatellite". This is when a 2-5 base pair sequence is repeated several times, usually between 5 to 50 times.

So at this point, to recap: we know the chance of getting this 10 CAT sequence is slightly more probable, but since we're restricted to just the Felix catus genome we definitely aren't guaranteed a 10xCAT sequence. So how did Hofstadter state it as if it was a fact?

As it turns out, one critical property of STRs, or short tandem repeats, is that mutations in these areas are far more common, and they represent a large amount of the genetic variation between individual members of a species. This discovery was made with the advent of DNA sequencing, which began only a few years before the book was published. Therefore, given a large population of nonidentical cats (which we have), we can confidently say that there is an extremely high chance for a 10xCAT sequence.

Hofstadter's genius perfectly combined math (only 2.32e-9 expected sequences per genome) with biology (microsatellites increase the chance of finding this sequence) with forensic genetics (in a population of the same species, individuals are likely to have many STR-related differences.) All of this put together gave Hofstadter what he needed to confidently say: yes, CATCATCATCATCATCATCATCATCATCAT almost certainly exists in the Felis catus DNA. Little things like this are why Godel, Escher, Bach is my favorite book of all time.

Answered by Owen Versteeg on June 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP