top of page
  • Writer's pictureHirokazu Kobayashi

50-year speed rate of increase: The trajectory of genetic code decoding!

Updated: 6 days ago

Hirokazu Kobayashi

CEO, Green Insight Japan, Inc.

Professor Emeritus and Visiting Professor, University of Shizuoka


The Summer Olympics will be held in Paris beginning July 26. The first Summer Olympics were held in Athens in 1896, and this year marks the 33rd Summer Olympics. World records have been broken in many events each time, and the eyes of the world will be focused on the athletes' success this time as well. My life as a researcher is approaching its 50th year. Let's compare the rates of increase in various speeds over the past 50 years. Fifty years ago, in 1974, Japan experienced its "first oil crisis" in October 1973, when consumers started buying all the bath rolls in the country. As a student at the time, one of the two elevators in my university building was shut down, and the lights in the corridors were dimmed to save electricity. As a result, even my feelings were darkened. Many of you were not born then, but for those who cannot remember those days, the following songs should help you remember the era in which they were famous. In order of release, “Kandagawa” by Kaguyahime, “Anata (You)” by Akiko Kosaka, “Cape Erimo” by Shinichi Mori, “Tsumiki no Heya (Room of Stacked Trees)” by Akira Fuse, “Nagori Yuki (Remaining Snow)” by Iruka, “Shoro Nagashi (Spirit Boat Procession)” by Grape, etc. Late-night radio shows were very popular with young people at that time. My favorite was Shinji Tanimura (Chinpei: 1948-2023) on Nippon Cultural Broadcasting's "Say! Young". Also, Tsurukoh Shofukutei (1948-) on Nippon Broadcasting System's "All Night Nippon.” If we listened to the program as it was, it turned into "Singing Headlights" at 3:00 AM., which affected the next day's classes.


Comparison between 1974 and today

100-meter dash: 9"95 → 9"58, 1.04 times faster

200-meter dash: 19"72 → 19"19, 1.03 times faster

400-meter dash: 43"86 → 43"03, 1.02 times faster

100-meter freestyle: 49"44 → 46"91, 1.05 times faster

200-meter freestyle: 1'45"85 → 1'42"00, 1.04 times faster

100-meter backstroke: 55"49 → 51"85, 1.07 times faster

Maximum speed of Shinkansen bullet train in operation: 210 km/h (130 mi/h)→ 320 km/h (199 mi/h), 1.52 times faster

Commercial car's maximum recorded speed: 302 km/h (188 mi/h) → 490 km/h (304 mi/h), 1.62 times faster

Computer processor speed: 1~5 MHz → 2~5 GHz, about 1,000 times faster

Genetic code determination: 10 bases/day → 90 billion bases/day, about 9 billion times faster


The result is a resounding victory for genetic codebreaking. Form and wear have improved in sports, but the limits of the human body's capabilities have been reached. In transportation, there is a limit to how much power can be transmitted by wheels and tires, which can be increased to "603 km/h (375 mi/h)" in a linear motor car. In air transportation, there was the supersonic airliner "Concorde," which was retired in 2003. I saw this plane at London Heathrow Airport. Reasons for its retirement include the crash in 2000 that killed all 113 people on board, economics, and shock boom noise. Computers as a storage medium have increased millions of times at the level of personal computers, but the computation speed is only about a thousand times faster. On the other hand, the speed of deciphering genetic code has increased about 9 billion times, making it the fastest.

My life as a researcher paralleled the history of genetic code determination. DNA, the main body of genetic information, is a long string, and the original technology could not analyze it one by one molecule because the amount was too small. So they started by studying RNA's genetic code (nucleotide sequence); the first publication dates back to 1960, using a two-dimensional fractionation method with paper electrophoresis and chromatography. Then, in the 1970s, it became possible to fragment and amplify DNA (cloning) using E. coli. This led to the development of techniques for determining the genetic code of DNA (sequencing), which consists of the four letters A, C, G, and T. Chemical methods were developed to precisely cut the left (5') side of these letters (Maxam-Gilbert method). This was published in 1977 and has been widely used by researchers worldwide. I was surprised to find Alan Maxam's (1942-) doctoral dissertation in the library of the Biological Laboratories at Harvard University, where I was a postdoctoral fellow, and to learn that this method had been derived from a graduate student. In 1977, Frederick Sanger (1918-2013) developed a technique to extend the genetic code to the right (3') side of the genetic code using an enzymatic reaction (Sanger method). At the time, all of these techniques were manual. Walter Gilbert (1932-) and Sanger were awarded the Nobel Prize in Chemistry in 1980 for their work in developing a method for determining DNA sequences, along with Paul Berg (1926-2023), who created the cloning technique. This was Sanger's second Nobel Prize in Chemistry.


A single gene often has a genetic code of about 100 to 5,000 letters. In the 1980s, progress was made in determining the codes of many genes from plants, animals, and microorganisms. The first analysis began with the genes for ribosomal RNA and transfer RNA, followed by the elucidation of the genetic information of proteins. In plants, the first step was to analyze the gene for the L-subunit of Rubisco, the enzyme that first fixes carbon dioxide in “photosynthesis,” a function unique to plants. Lawrence Bogorad (1921-2003) and his colleagues at Harvard University published their work in Nature in 1980. I was a graduate student at Nagoya University at the time and looked at this research with envy. In 1983, I had the opportunity to join Bogorad's laboratory. Later, regarding the origin of Rubisco, we were the first in the world to find that in primitive photosynthetic bacteria, this gene first appeared in one set with the S subunit gene and was further duplicated into two sets (1989). Other than that, I devoted little attention to gene decoding but focused on the regulatory mechanisms of gene expression.

The set of genetic information necessary for an organism to live is called the genome. Thus, genome analysis has progressed from viruses, whose genomes are small. In animals, the mitochondria, in addition to the nuclei, contain the genome, starting with the human mitochondrial genome, consisting of 16,569 bases, published by Sanger et al. in Nature in 1981. In plants, the genome exists in chloroplasts in addition to mitochondria. The size of chloroplast genomes is about 120,000 to 150,000 bases. Japan is a world leader in this field. The research group of Masahiro Sugiura (1936-) and a joint group of Haruo Koseki (1925-2009) and Kanji Ohyama (1939-), with whom I have a close relationship, completed the whole-genome analysis of the chloroplasts of tobacco and the liverwort Marchantia polymorpha, respectively, and published their results in the EMBO Journal and Nature, respectively, in 1986. The detection part of the Sanger method was mechanized in the 1990s, followed by the automation of reaction processing. Genome analysis in Japan has continued to lead the world, and people I know have been active in this field. The 3.57 million nucleotides of cyanobacteria (blue-green algae), a model of plant photosynthesis, were published in 1996 by a research group led by Tetsuyuki Tabata (1954-) at the Kazusa DNA Research Institute. The 4.64 million nucleotides of E. coli, a model of molecular biology, were published in 1997 by a mixed research team led by the United States. The nuclear genomes of animals and plants are more than three hundred times larger, and the analysis took longer. For Arabidopsis thaliana, considered a model plant, Tabata's research group sequenced chromosomes 3 (23 million bases) and 5 (26 million bases). The results were published in Nature in 2000. In the Human Genome Project, Prof. Yoshiyuki Sakaki (1942-: Human Genome Center, Institute of Medical Science, University of Tokyo / RIKEN Genomic Sciences Center, now University of Shizuoka, Member of the Management Council) was in charge of chromosomes 11 (134 million bases), 18 (76 million bases) and 21 (47 million bases). Their draft sequences were published in Nature in 2001. In rice, an important cereal and a model of monocotyledonous plants, Prof. Takuji Sasaki (1947-: National Institute of Agrobiological Sciences, NIAS / University of Tsukuba) and Prof. Takashi Gojobori (1951-: National Institute of Genetics, NIG) sequenced chromosome 1 (46 million bases), which was published in Nature in 2002.


In the late 1990s, next-generation sequencing (NGS) methods were developed. While conventional methods use a mass of identical DNA fragments, NGS starts with a mixture of different DNA fragments attached with adapters, subjected to PCR, and sequenced using fluorescently labeled substrates or detecting the pyrophosphate or protons' release. In addition, "third-generation sequencing" was announced in 2014. One uses nanopores to identify bases. In the other, hairpin adapters are attached to both ends of a DNA fragment, which, after denaturation, becomes a circular single-stranded DNA template for repeated incorporation of fluorescently labeled substrates. It is a PacBio Revio sequencer. This allows a catalog performance of 90 billion bases/day. I applied this method to the tea plant genome (4 billion bases) and could read 71 billion bases with a misread rate of about 1/400. This is a satisfactory result for about 18 times the genome size used for our tea genome editing.



39 views0 comments


bottom of page