https://www.theatlantic.com/science/archive/2018/11/human-genome-300-million-missing-letters-dna/576481/
https://en.wikipedia.org/wiki/Reference_genome#cite_note-3
https://www.theatlantic.com/science/archive/2018/04/23andme-diversity-dna/558575/
An Email from a Subscriber:
I was interested by what you said about the reference genome so I did some research.
On the human reference genome, I found a wikipedia article that says this:
Donors were recruited by advertisement in The Buffalo News, on Sunday, March 23, 1997. The first ten male and ten female volunteers were invited to make an appointment with the project's genetic counselors and donate blood from which DNA was extracted. As a result of how the DNA samples were processed, about 80 percent of the reference genome came from eight people and one male, designated RP11, accounts for 66 percent of the total.
Apparently they wanted to keep the identity of RP11 anonymous — but it seems like it'd be pretty damn important to know this.
And then I found this article by searching that codename.
RP11, scientists later surmised, was probably African American himself, but the problem of using one reference genome to represent the whole human population still holds true.
Well, how about that. So 70% of the "reference human genome" comes from an African American...
There's no way that's an accident. There's no way these scientists just chose a random person to become the fully-sequenced reference human genome. There's no way they did not think about the consequences of using an African for the standard reference genome.
Also from that article:
Since the Human Genome Project began, scientists have slowly realized they underestimated human genetic diversity. At the time, they focused on single-letter mutations. But it’s becoming clear now that big structural variations—thousands of letters being inserted or deleted or flipped around—are common, too, says Deanna Church [...] It’s like comparing two copies of a boo..