The unknown 8% of our DNA has finally been uncovered

2022-05-26 0 By

The Telomere to Telomere (T2T) consortium, a consortium of nearly 100 international researchers, announced on March 31 that the sequencing of the human genome, which was announced at the beginning of this century, has finally reached the 8 percent milestone that has been missing for more than two decades.The result was the most complete sequence of the human genome to date.On April 1, Science published a special issue with six papers on the subject.Human chromosomes under a fluorescence microscope.Steffen Dietzel, CC BY sas 3.0, BY more than 6 billion human genome DNA base composition,It’s distributed in 23 pairs of chromosomes.But for more than two decades, the “complete human genome” has been a relative term.In 2001, when the Human Genome Project (HGP) published its first map of the Human Genome, it contained about 200m missing bases, or 8% of the entire Genome.The missing regions are mainly located in the centromere and telomere regions of chromosomes, which contain highly repetitive sequences.There are also short arms of parts of the chromosome that contain functional genes that encode ribosomes.Now scientists have finally filled in the eight percent gap in our genetic code.The most complete human reference genome to date has been named T2T-CHM13.Compared with the old version, sequences of telomeres on both sides of chromosomes and centromeres, which are mostly in the middle of chromosomes and coordinate the separation of duplicating chromosomes during cell division, were clearly visible.In addition, five short arms of human chromosomes containing a large number of genes encoding the ribosome skeleton have also been identified.The 200 million bases of these “new sequences” contain 99 genes that might encode proteins and nearly 2,000 candidate genes that need further study.In addition to some of the most complex regions of the genome, such as telomeres and centromeres, T2T-CHM13 also corrects thousands of structural errors in the current reference sequence, complementing the existing human reference genome (GRCh38).The most complete human genome t2T-CHM13 to date.Source: Thesis.One of the main reasons that technological breakthroughs left the map of the human genome “blank” two decades ago was the large number of repeated sequences in it.The human genome has previously been sequenced by cutting chromosomal DNA into short pieces, sequencing it, and then piecing the results back together.But there are so many repeat sequences in regions of centromere, telomere or ribosomal DNA that they are too similar to be distinguished, and scientists can’t splice the pieces together to get the correct sequence.Therefore, the sequence of the human genome published by HGP in 2003 is incomplete, covering only about 92% of the human genome.Another obstacle is that human cell chromosomes consist of two sets of genomes from each parent.When researchers try to assemble all the pieces, sequences from the father or mother will get mixed up, obscuring the actual variation in each individual genome.Scientists first found a solution to the second problem: a rare cell line containing only the father’s genome.The cell line, taken from tissue from a hydatidiform mole removed from a woman’s uterus more than two decades ago, is a developmentally abnormal human zygote — the sperm is attached to an egg with a missing maternal genome.A fertilized egg with only sperm genetic material cannot develop into an embryo, but sperm carry sex chromosomes that happen to be X instead of Y, allowing the cell to retain the ability to replicate.Each of the 23 pairs of chromosomes in these cells came from the father and had the same sequence, exactly what T2T tissue would expect.By contrast, the first map of the human genome was a patchwork of genes from multiple people, and the results were subject to errors and errors.When HPG began in the last century, sequencing technology could not accurately read long DNA, so scientists had to cut chromosomes, which resulted in highly repetitive regions that could not be pieced together properly.In the past decade, advances in the ability to sequence long pieces of DNA have made it possible to read an entire chromosome at once.Now, Oxford Nanopore, which can sequence millions of base pairs with moderate accuracy, and PacBio HiFi, which is highly accurate at 20,000 base pairs, allow researchers to sequence across repeated areas and ensure highly accurate assembly,Thus successfully generating the complete human genome sequence.Turning on the “new map” T2T-CHM13 allows more accurate assessment of genetic variation.In clinical studies of genetic variation or diversity in diseases, researchers compare the sequencing results with the reference genome, and the new sequence greatly improves the identification and understanding of genetic variations by pinpointing hundreds of thousands of previously misunderstood variations because it is “very accurate at the base level.”The new sequence also provides insights into the centromere region of human chromosomes.During meiosis, the process that forms sperm or eggs, the centromere is where pairs of chromosomes attach as they separate.This region is uniquely structured, contains long repeats, and the DNA and proteins seem to be particularly tightly intertwined in this region (thus defined as heterochromatin with poor transcriptional activity).The new DNA sequences in and around the centromere account for about 6.2% of the entire genome, or about 190 million bases, according to the study.Nicolas Altemose, a researcher at the University of California, Berkeley, and his team used new techniques to find a large protein complex called kinetochore within the centromere.By attaching to chromosomes, this compound encourages the division of chromosomes.If this process goes wrong in meiosis, it can lead to chromosomal abnormalities, spontaneous abortion or congenital disorders.When the problem occurs in somatic cells, it can disrupt gene expression, which can lead to cancer.In addition, the team found unexpectedly high levels of genetic variation in the centromere and other regions.They found stacks of various sequences in and around the centromere, often with layers of new sequences covering layers of old sequences.The old sequence usually has more random mutations and deletions, indicating that this segment has been abandoned by the cell.The new sequence had fewer mutations and methylation, suggesting it was being used.They also found a large number of repetitive length fragments in and around the centromere.The repeat sequence is based on a sequence of approximately 171 bases (approximately the length of DNA around a nucleosome), which forms a large repeating region around the centromere by repeatedly connecting the same structures in tandem.Another mystery of the centromere is the fixation of position.Comparing the new reference genome with other published centromere sequences, the team at the University of California, Davis, found that human centromere may also move.Similar phenomena have previously been found in other species.The team at the University of California, Santa Cruz, focused on satellite DNA, a long sequence of repeats found mainly near telomeres and centromeres.The researchers say centromeres have been found to be dysregulated in various human diseases, but have previously lacked methods to study them at the sequence level.With the new reference genome, scientists can finally study the satellite DNA sequence “base by base” for the first time and really understand how it works.Future plans to successfully complete a single human genome are not the end.The T2T-CHM13 sequence is from a white European, and it does not contain the Y chromosome.Although the T2T consortium has supplemented the Y chromosome sequence with a sample donated by a Harvard biologist, they still need to obtain more complete genome sequences from a more diverse population by similar means.According to Science News, the T2T Consortium plans to extract 350 genomes from human individuals of different lineages and use the sequencing results to create a new “human pan-genomic reference” that will look for variations and hard-to-read regions in the short arms of chromosomes that may be associated with diseases or genetic traits, leading to a more complete understanding of human diversity.So far, the T2T team has begun deciphering more than 70 genomes.Benedict Paten, associate professor of biomolecular engineering at the University of California, Santa Cruz and one of the leaders of the T2T consortium, said: “Pangenomics will study the diversity of the human population and ensure the accuracy of the genomes we get.Without this cross-individual study of complex regional genetic maps, a large number of population genetic variations would be missed.”6 Science papers:· The complete sequence of a human genome. SERGEY NURK, SERGEY KOREN, ARANG RHIE, et al. SCIENCE. 31 Mar 2022.Vol 376,Issue 6588,pp. 44-53.DOI: 10.1126 /science.abj6987, A complete reference genome littleanalysis of human genetic variation. SERGEY AGANEZOV,STEPHANIE M. YAN, XDANIELA C. SOTO, et al. SCIENCE. 1 Apr 2022.Vol 376, Issue 6588.DOI: 10.1126 /science.abl3533, Segmental duplications and their variationin a complete human genome. MITCHELL R. VOLLGER, XAVI GUITART, PHILIP C. DISHUCK, et al. SCIENCE. 1 Apr 2022. Vol 376, Issue 6588. DOI: 10.1126 /science.abj6965, Complete genomic and epigenetic maps ofhuman centromeres. NICOLAS ALTEMOSE, GLENNIS A. LOGSDON, ANDREY V. BZIKADZE, et al. SCIENCE. 1 Apr 2022. Vol 376, Issue 6588. DOI: 10.1126 /science.abl4178 · the From telomere to telomere:The transcriptional and epigenetic state of human repeat elements. SAVANNAH J. HOYT, JESSICA M. STORER, GABRIELLE A. HARTLEY, et al. SCIENCE. 1 Apr 2022. Vol 376, Issue 6588. DOI: 10.1126 /science.abk3112, Epigenetic patterns in a complete humangenome. ARIEL GERSHMAN, MICHAEL E. G. SAURIA, XAVI GUITART, et al. SCIENCE. 1 Apr 2022. Vol 376, Issue 6588. DOI: 10.1126 /science.abj5089 reference source: