Sequencing of from the Sequencing of Nucleic Acids: from the First Human Genome to Next Generation Sequencing in COVID-19 Pandemic

Despite being around for more than 40 years, DNA sequencing is regarded as young technology in clinical medicine. As sequencing is becoming cheaper, faster and more accurate, it is rapidly being incorporated into clinical laboratories. In 2003, the completion of the first human genome opened the door to personalized medicine. Ever since it has been expected for genomics to widely impact clinical care and public health. However, many years can pass for genomic discoveries to reflect back and benefit the patients. DNA sequencing represents a less biased approach to diagnostics. It is not only a diagnostic tool, but can also influence clinical management and therapy. As new technologies rapidly emerge it is important for researchers and health professionals to have basic knowledge about the capabilities and drawbacks of the existing sequencing methods, and their use in clinical setting and research. This review provides an overview of nucleic acid sequencing technologies from historical perspective and later focuses on clinical utilization of sequencing. Some of the most promising areas are presented with selected examples from Slovenian researchers.


Introduction
More than forty years ago, two papers described the first method for determining the sequence of nucleotide bases in DNA 1,2 and ever since, the development of new sequencing methods has been exponential. In 2003, the Human Genome Project (HGP) was finalized, presenting the complete version of the Human Genome. 3 Once the reference human genome was established, scientists tried to explain disease mechanisms or susceptibility for certain diseases through population resequencing and determination of disease-causing genomic variants. Recently, DNA sequencing has been moving into the clinical setting to be implemented in diagnostics and clinical management. Due to the rapid development of novel technologies, sequencing of nucleic acids will contribute to the discovery of the genomic, transcriptomic and epigenomic basis of unsolved diseases, improved diagnostics, and personalized therapies.
Advances in sequencing technologies have contributed to a significant reduction of the sequencing costs in the last 15 years (Figure 1). In 2004, the National Human Genome Research Institute started an initiative to reduce the whole genome sequence cost to US$1000 4 accelerating the development of cheaper and faster sequencing methods. A deviation of sequencing cost from 'Moore's law' oc-curred around year 2008 which coincides with the transition from Sanger's sequencing of nucleic acids to the next generation sequencing technologies (NGS) resulting in a rapid fall of sequencing cost. Human genome sequencing for $1000 was achieved a few years ago 5 and with novel technologies we are quickly approaching a $100 human genome. With today's enormous sequencing outputs and lower cost per base the biggest challenge remains meaningful interpretation and informative reporting of sequencing results.
This review provides an overview of nucleic acid sequencing technologies and examples of clinical utilization of sequencing, also from Slovenian researchers. The departure of sequencing cost curve from Moore's law coincides with the emergence of next generation sequencing (NGS). Moore's law originates in the computer hardware industry that involves doubling of 'computing power' every two years. It is considered that technologies that follow the law are regarded as successful. 6 It thus represents a useful relationship to compare technology advances. Data shown in the figure was obtained from the National Human Genome Research Institute. 6

Short Review of the Sequencing Technologies
In 1977, researchers developed two methods which enabled sequencing of nucleic acids of several hundred base pairs (bp), the Sanger's "dideoxy method" and the Maxam-Gilbert's method 1,2 , causing revolution in biology.

1. First-Generation DNA Sequencing
The Sanger sequencing is known today as the first-generation DNA sequencing and was based on the use of polymerase chain reaction. DNA polymerase is an enzyme that can elongate an existing DNA molecule by adding deoxynucleotides (dNTPs; a nucleobase, deoxyribose, and phosphate groups) to the DNA 3'-end via a phosphodiester bond between the 3' carbon atom hydroxy group of the DNA incorporated deoxynucleotide and the 5' carbon atom phosphate group of a joining dNTP. What made the sequencing possible is the addition of labelled dideoxynucleotides (ddNTP); dNTPs lacking the deoxyribose 3' carbon atom hydroxy group needed to form the phosphodiester bond. After the incorporation of a ddNTP, DNA polymerase could no longer add new nucleotides as the phosphodiester bonds can no longer be formed. This results in the production of DNA fragments that vary in length; however, all fragments end with a labelled ddNTP. Initially, sequencing was performed using radioactively labelled ddNTPs. About a decade later Sanger sequencing was automated and commercialized using fluorescent-labelled ddNTPs 7 and capillary electrophoresis, providing a single base pair resolution. 8 Using automated Sanger sequencing, researchers were able to read up to 75,000 bp per day. 9 This method presented the foundation for the HGP, the biggest collaborative biological project that was officially completed in 2003 (it took 13 years and cost almost US$3 billion to obtain the complete version of the human genome). 3,6,10 The next step was to identify genomic differences/variants among people to explain disease mechanisms and/or disease susceptibility. Such projects required genomes of many individuals to be sequenced, however, Sanger sequencing was far too time-consuming and expensive.

Next (or Second) Generation Sequencing (NGS)
The "hunt" began for alternative DNA sequencing methods, ultimately resulting in the emergence of NGS technologies. NGS is based on massive parallel sequencing, meaning that billions of short DNA fragments are sequenced simultaneously producing short sequence "reads". 11 Reads are computationally aligned to the reference sequence to assemble the consensus DNA sequence ( Figure 2). NGS technologies significantly increased sequencing throughput, decreased labour, and sequencing cost. 12 In 2004, the company 454 Life Sciences released the first commercially available NGS platform 13 , which is no longer used today. 454 Life Sciences technology was based on pyrosequencing; using detection of light to determine the DNA sequence. The basic principle of pyrosequencing consists of using enzymes to build the complementary DNA strand and detect the base order of DNA strands mobilized to beads located on a titer plate. Each addition of dNTPs to the growing DNA strand, catalysed by DNA polymerase, results in the release of a pyrophosphate. Pyrophosphate is then catalysed to adenosine triphosphate by sulfate adenylyltransferase. Enzyme luciferase later utilizes the adenosine triphosphate to generate light by converting luciferin to oxyluciferin. Only one of the four dNTPs is added at a time, and unused dNTPs are degraded by the enzyme apyrase before the addition of new dNTP. Intensity of the light is detected by sensors, indicating if and how many dNTP were added to the complementary DNA strand. 14 Furlani et al.: Sequencing of Nucleic Acids: from the First Human Genome ...
Illumina, Ion Torrent, and Beijing Genomics Institute (BGI) are currently the main NGS companies on the market. 15 Today, over 90% of all sequencing in the laboratories around the world is performed on Illumina's platforms. 16 Illumina devices works on the principle of "sequencing by synthesis". Sequencing takes place on the surface of the glass slide flow cell. The single-stranded DNA sequences of the DNA library to be sequenced bind by hybridization to oligonucleotides located at the surface of the flow cell. DNA polymerase completes the missing strand, resulting in a double-stranded DNA sequence, and then the source strand is removed. Only the newly formed DNA strand remains bound to the flow cell. Next, DNA strands are subjected to clonal amplification, denaturation and clustering. Antisense DNA sequences are removed from the surface of the flow cell. The next step is sequencing, which takes place simultaneously on all bound sequences. With each sequencing cycle, a single complementary dNTP is added to the bound sequence. A blocker is located at the 3' of deoxyribose, thus allowing addition of only one dNTP per cycle. Various modes of use are available, with four channel chemistry, two channel chemistry and less often used but still available, one channel chemistry. For four channel chemistry based devices, each of the dNTPs in the mixture is labelled with a specific fluorescent dye (dATP with red, dGTP with blue, dTTP with green, and dCTP with yellow). After each multiplica-tion cycle, the device determines the inserted dNTP using lasers and four filters to be able to distinguish all four possible bases. The number of multiplication cycles determines the length of the reading. Within an individual cluster, all sequences are identical; thus, in each multiplication cycle the whole cluster glows with the same colour. As for two channel chemistry, only two different fluorescent dyes are needed to label dNTPs; after laser excitation, the device detects dTTP as a green signal, dCTP as a red signal, dATP as a combination of the signal of both dyes, and dGTP is not marked and the device detects them as the absence of a signal. One channel chemistry, as the name suggests, uses a single dye to detect all four bases. Compared to glass slide flow cells, one channel chemistry uses CMOS (complementary metal-oxide semiconductor) chips. After each sequencing cycle, newly incorporated bases are detected using two chemistry steps and combination of two images. In the first chemistry steps, dATPs and dTTPs are labelled with the dye and the first image is taken. Next, during the second chemistry steps, added reagent removes the label from dATP (dATPs have a cleavable linker allowing the removal of the dye) and adds the dye to dCTPs (dCTPs have a linker group that allows dye binding). The second image is taken and the combination of both signals then determines, which of the four bases was incorporated at each sequencing cycle. 17 Additionally, Illumina offers also patterned flow cells. Their Steps common to different NGS platforms are DNA fragmentation (that is required or not depending on the library type), library preparation, massive parallel sequencing, bioinformatics analysis, and variant annotation and interpretation. 27 Sequencing generates billions of "reads" that are computationally aligned to the reference genome to assemble linear consensus sequence. The number of reads in which a particular base/variant appears is known as a read depth and determines the confidence with which a particular base/ variant is called. 12 A particular single nucleotide variant should appear in more than 10 reads (meaning read depth to be at least 10 x) to be regarded as a genuine genomic variant. 12 surface area is organized in billions of envenly spaced nanowells with fixed locations, and in each nanowell a disticnt cluster is generated. Therefore it offers more efficient use of the flow cell surface area, increased data output, reduced costs, and faster run times. Due to the structured organization patterned flowcells provide significant advantages over non-patterned cluster generation. It is more tolerant to a broader range of library densities, due to nanowell positioning there is no need to map cluster sites, thus saving sequencing running time. With higher cluster density also data per flow cell are better usable, affecting the reduction of the cost per gigabase (Gb). 18 Illumina offers numerous sequencing platforms for different applications such as MiSeq FGx system that is the first validated benchtop sequencer designed specifically for forensic science or MiSeqDx and NextSeq550Dx which are both Food and Drug Administration-approved platform for in vitro diagnostic testing. 19 NGS has substantially increased sequencing output; the production-scale sequencing platform from Illumina NovaSeq 6000 System can read up to 6 tera-bases in 2 days and is ideal for ultra-deep sequencing of the entire genome. 15 Compared to previously mentioned technologies, Ion Torrent detection of sequencing by synthesis is not based on optics and uses unlabelled dNTPs. As each new dNTP is being incorporated into a growing DNA stand, a pyrophosphate and a hydrogen ion are being released. Detection is therefore based on the change in pH by an ion-sensitive field-effect transistor sensor inside a CMOS layer. 20 BGI sequencing platforms enable nanoball sequencing, a mechanism that bypasses the requirement for PCR amplification during the library preparation. DNA to be sequenced is first fragmented until a desired length is achieved. Next, fragments are end repaired and specific adaptor sequences and split oligo sequences are added. This enables the single stranded DNA fragments to then be circularized-forming a single stranded circular DNA shape-and replicated many times using a modified rolling circle amplification using Phi 29 polymerase, until a long single-stranded DNA is formed. DNA then forms into a nanoball of a few hundred nanometres in diameter. DNA nanoballs are adhered onto the patterned array flow cell. Similar to Illumina sequencing, one dNTP is incorporated per cycle and the sequence order is determined by laser excitation. BGI released the DNBSEQ-T7 machine that can produce 1-6 tera-bases of high-quality data per day for a wide range of applications. Interestingly, BGI also offers whole genome sequencing for only $600, including sample processing, sequencing, and data analysis. 21 The high sequencing output and the rapid decrease of the sequencing cost made it possible for such platforms to be adopted by many clinical laboratories. 22,23 Exponential genomic data generation has accelerated translational research and development of new genomic tests. 24 However, NGS has also a few pitfalls. The read length is important for the accuracy of the generated sequence; technologies that utilize longer reads generally produce longer and high-quality assemblies. 9 However, most of the NGS utilizes short reads (35-600 bp) due to the nature of sequencing chemistry. 9,12 Many genomic regions contain repetitive sequences 9 much longer than sequencing reads which may lead to misassembles and sequencing gaps. 16,25 Moreover, short-read technologies less accurately detect larger structural variations that are frequently clinically relevant. 26 Another disadvantage is that most of the NGS platforms use PCR in the amplification step (to increase signal strength), which tends to be less accurate in genomic areas that are high in guanine-cytosine content, which can result in errors during DNA "photocopying". 9

3. Third Generation Sequencing
The main characteristic of third generation sequencing is utilization of long (10,000-2 million bp) sequencing reads. 15 Long read lengths provide better resolution of repetitive genomic regions and structural variants 28 and allow the assembly of complex genomes. 15 Further, third generation technologies do not require library amplification to increase signal strength and enable real-time sequencing. 9,29 In 2011, Pacific Biosciences released technology named Single-Molecule Real-Time (SMRT) sequencing 30 and in 2014, Nanopore sequencing (Oxford Nanopore Technologies) was introduced. 31 Compared to Illumina, SMRT technology uses a differently labelled dNTPs; as there is no blocker bound to the deoxyribose and the label is located on the phosphate group, dNTPs can be added sequentially without the additional step of terminating blocker removal, hence measuring it in real-time. Nanopore sequencing is based on the tiny changes in current. During sequencing, DNA strands pass though protein nanopores of about 1.8 nanometres in diameter, which are embedded in a polymer membrane. As DNA strands enter the pore, current changes according to the DNA bases that are located inside the pore (about 6 DNA bases at a time). Nanopore sequencing platforms are distinguished by great portability (MinION is in the size of a USB key 32 ), ultralong reads, and simple library preparation. Such devices can be used in virtually any environment; for example, to identify infectious disease outbreaks, as already demonstrated in several studies. 33,34 Most recently, Nanopore sequencing has been used for the accurate and comprehensive detection of SARS-Cov-2 during COVID-19 pandemic. [35][36][37] Third generation sequencing allows direct determination of epigenetic modifications 38 and base modifications in RNA sequencing. 39 The use of methylation profiling by Nanopore sequencing has already been reported by Euskrichen et al. 40 where the power of this technology for rapid tumour classification has been illustrated. However, one drawback of Nanopore sequencing is the higher error rate compared to short-read technologies. 41 Despite many intriguing possibilities of long-read technologies, lower output and accuracy limit their entry into the clinical environment for the time being. 15  ings of both second and third generation technologies can be compensated for by using them in combination -known as hybrid sequencing. 42,43 Recent Nanopore sequencing platforms already offer an improved sequencing throughput; PromethION 48 can run up to 48 flow cells at once, producing up to 7,6 tera-bp output yield. 44 Third generation technologies represent a new revolution in genomics as they enable identification of yet undetermined or poorly determined genomic regions. 45,46 For example, ultra-longread nanopore sequencing enabled the complete resolution of human X-chromosome. 47 and SMRT enabled to resolve 2.25-kb-long stretches of short tandem repeats, implicated in Fragile X syndrome. 48 As the output and accuracy of third generation technologies further increase, they will likely play an important role in the clinical setting for the identification of important structural variants that are poorly determined by short-read technologies.

Genomic Sequencing in the Clinical Setting
The precision medicine is based on treating the patient as an individual, evolving current clinical practice to more individualized health care. 49 It seeks application of genomics as a major strategy in tailoring care to maximise health and minimalize harm to patients. As the genome is the best source of information about what makes an individual unique (e.g. more/less susceptible to neurodegenerative diseases) the inclusion of genomic sequencing in the clinical practice accelerates the advancement of precision medicine.

1. Sequencing Approaches
Sequencing presents a less biased approach to diagnostics and has a great potential for ending the diagnostics odyssey for patients with rare diseases. 50 Clinicians choose between whole-genome sequencing (WGS), whole-exome sequencing (WES), transcriptome sequencing, and target-panel sequencing. The most comprehensive, but also the most expensive, is WGS as it interrogates the entire genome. On average, it detects 3 million genomic variants 11,51 most of which belong to non-coding regions of the genome making interpretation difficult. 50 An extensive bioinformatics analysis is thus required to narrow all genomic variants to only a few that might be related to the patient's phenotype. WGS poses problems of data interpretation and storing due to a large amounts of information. 52 Its use has been studied in neonatal intensive care units in critically ill new-borns 53,54 and oncology patients, 55 but is still mainly used in the scientific research. WES focuses on more manageable portion of the genome, known as exome that codes for proteins (only 1-2% of the genome). Variants from the exome are thus easier to interpret. It is also cheaper, faster and in most cases enough informative for the clinical practice. One disadvantage, however, is lower accuracy in certain areas of genes that are relevant for some medical conditions; which can ultimately result in false-negative results. 15,54,56 Another disadvantage of the WES is also the limited detection of clinically relevant copy number variations (CNVs) and structural variations. Currently there are no accepted standard protocols or quality control measures for CNVs identification in NGS data, and in many cases microarrays are used over WES. 57 WES achieves the diagnostic rate in the range of 25-35%, compared to WGS, which is in the range of 40-60%. 12,58 Due to its better diagnostic yield, WGS is expected to become clinically more important as genomic variant interpretation improves and sequencing cost further decreases. 58 Today, most of the clinical sequencing utilizes target gene panels, which interrogate selected regions of the genome associated with a disease. Targeted sequencing is highly reliable in the identification of variants for disease-related genes, is cost effective, and sequencing results are easier to interpret compared to WES and WGS. 15 There are numerous predesigned targeted gene sequencing panels available from Illumina, for example, AmpliSeq or TrueSeq. 19 Based on numerous studies, small NGS panels focusing on a limited number of actionable genes are expected to become a standard diagnostic tool in oncology. 59-61 A comparison of different sequencing methods can be found in Table 1.
An illustrative example of how the genomic sequencing contributed to the development of minimally-invasive iagnostics (that stems from oncology) is use of liquid biopsy, in which circulating-free DNA (cfDNA) from non-solid biological tissues, primarily blood, is analysed. 62 Circulating tumor DNA (ctDNA) represents only a small fraction (<0.5%) of cfDNA 63 and is to some extent representative of the primary tumor DNA. Liquid biopsy is particularly suitable for tumors that are anatomically inaccessible to perform biopsy, in cases of metastatic and advanced stage cancers, and in minimize the number of recurrent biopsies (as a part of patient's follow up after diagnosis had already been made). 5 CfDNA sequencing was shown to be useful also to monitor response to targeted therapy and to detect new resistance mutations, for example in epidermal growth factor receptor (EGFR) gene. 64 Due to its short half life ctDNA is appropriate for the assessment of tumor dynamics especially in more advanced disease stages. 62 Liquid biopsy has also promising use in the population screening and early cancer diagnostics; high sensitivity and specificity were reported in early lung cancer diagnostics 65 and some mutations were detected 2 years before disease onset. 66 Besides, it also allows studying epigenetic modifications 67,68 and methylation pattern (methylome) which can aid to disease classification. Mutations combined with epigenomic, proteomic, and even demographic data, would present unique tumor molecular profile from individual patient and present an important step forward in personalized medicine.
Nevertheless, the cfDNA sequencing is becoming an important tool also as non-invasive prenatal diagnostics. It has been proven as effective and safe screening method for trisomies 21, 18, and 13, and sex chromosome aneuploidies compared with traditional prenatal screening. The tests also proved to be fast and financially sustainable. 69

2. Integration of Genomic Sequencing in the Clinical Practice
For a successful implementation in the clinical practice, sequencing must provide reliable results to the clinician. NGS requires end-to-end validation from DNA extraction to bioinfomatic analysis to minimize the occurrence of false-positive and false-negative results. 70 Complete implementation also requires integration of genomic data into the electronic health records (EHRs), which can be quite challenging for smaller healthcare institutions. 71 Successful implementation of personalized medicine will increasingly rely on EHRs to store vast amounts of genomic data and to appropriately integrate relevant genomic information into clinical care. 71 One of the barriers for integrating sequencing results in EHRs is that they are frequently entered as a summary rather than raw data which limits data accession and reanalysis. 72 Although genomic data is static, its interpretation is not. As new knowledge arises, interpretation can yield additional diagnoses, and data reanalysis was found to be a cost-effective approach. 73 Routine reanalyses were shown to improve diagnostic rates [74][75][76] due to establishment of new disease-gene associations, improved bioinformatics tools, and data sharing. 77 The integration of artificial intelligence (AI) is also promising, however, translating technical success in AI-driven analytics into meaningful clinical impact remains a challenge. 78 The proliferation of genomic sequencing also requires medical professionals who will be adept at understanding and returning genomic results to the patient. Thus, education for the next generation of health care providers is of great importance. 79 Another factor regarding implementation is the coverage of sequencing costs by health insurance companies, which unfortunately lags behind the advances in the sequencing technology. 80 The clinician must sometimes provide notes on how the genomic testing will affect the course of the disease or its management.81 Increasing cost coverage by the health insurance companies will be catalysed by a further drop in sequencing cost and accumulating evidence in studies of clinical usefulness. 81 Several studies 82-84 demonstrated high diagnostic yield and cost-effectiveness of genomic sequencing which was maximized by its early application in the diagnostic pathway. 81,82

National Genomes Sequencing
To elucidate genetic background of a certain population, more and more countries are opting for studies of national genomes. The genomic data can be of great support to a healthcare system. If we are acquainted with common alleles in the healthy population, it is easier to identify disease related variants. National genome projects are one of the fundamental elements in establishing effective identification and rapid diagnosis of rare diseases as they allow distinguishing between potentially causative genetic variants and rare, benign genetic variants that are unique for the original population. These rare genetic variants are the main source of false-positive results of genetic testing that can directly affect the clinical diagnosis and the course of treatment. Some projects are focused on rare diseases or cancer, whilst others have pursued population-based projects. In the UK, for example, the goal of 100,000 genomes was reached and served to establish the infrastructure needed for the integration of sequencing in the clinical setting. 85 The Chinese Academy of Sciences (CAS) launched the country's Precision Medicine Initiative with the goal of sequencing 1000 million human genomes by 2030.
In Europe, the 1+ Million European Genomes Project began in 2018, in which Slovenia also participates. 86 Slovenian Genome Project began at the end of 2019. This presents a pilot project which will outline the key directions of the future genomic projects, develop bioinfomatic tools for data exchange, prepare the legal and ethical base, educate healthcare professionals and general public, and carry out a pilot genome sequencing project of Slovenian patients with rare diseases and healthy Slovenian population. In Slovenia, NGS methodology is already well established as its advances have been used in fields such as clinical neurology (dementia, 87 multiple sclerosis 88 ) paediatrics (metabolism of new-borns, 89 hearing loss 90 ), clinical oncology, 91 clinical microbiology (microbiota associated with preterm birth, 92 and age and gender 93 ), pharmacogenomics 94 and forensic medicine. 95 The use of genomic screening in preventive health is interesting, although it brings some concerns about its widespread implementation in routine clinical practice. 96,97 Current estimates predict that 3-5% of people present a medically actionable variant. 98 This assumption is based on the DiscovEHR project, which involved over 50,000 adult participants aimed at connecting high-performance sequencing to an integrated health system. Among ~ 4.2 million rare single nucleotide variants, including insertions and deletions, adverse variants in 76 clinically relevant genes were found in about 3.5% of individuals. This study set a basis of individual tailored medicine that is based on therapeutic discoveries guided by genomics. 98 In the cases of "medically actionable genes", genomic screening of asymptomatic population could have a significant public health impact. 99 Finding new population specific actionable genes that may save lives and/or majorly impact the quality of lives remains among the goals of the national genome sequencing projects.

4. Sequencing During the COVID-19 Pandemics
Sequencing technologies can be used to detect and identify pathogens, determine their resistance to antibiot-ics, construct phylogenetic trees, or epidemiologically track disease outbreaks. 100 Genome sequencing can typify microbial strains with greater accuracy compared to classical microbiological methods. 101 As of April 2021, COV-ID-19 pandemic affected over 136 million people worldwide and resulted in death of almost 3 million people. 102 On January 5, 2020, next-generation meta-transcriptomic sequencing allowed researchers to obtain the first and complete viral genome of SARS-CoV-2 from a patient in Wuhan, China. 103 Soon, several hundred genomes became publicly available (htps://www.gisaid.org/), allowing rapid development of diagnostic tests 104 vaccines and antivirals 105 , and disease tracking.106 Few concepts of sequencing have been used for SARS-CoV-2. 107 Most studies have used the Illumina platform, however the Oxford Nanopore Technologies has been utilized for aforementioned shotgun metatranscriptomics 108 which enables de novo genome assembly without prior knowledge of the sequence. 109 Another method is amplicon-based sequencing, limited to specific parts of the viral genome. Such libraries can be sequenced on benchtop platforms with a mid-throughput (Illumina NexSeq, MiSeq, Ion torrent). 107 RNA sequencing using Oxford Nanopore Technologies and DNA nanoball sequencing was applied in a recent study 36 to reveal finished representation of SARS-CoV-2 transcriptome and epitranscriptome.
Comparative genomics revealed, that SARS-CoV-2 was a member of Betacoronavirus and fell into a subgenus Sarbevirus that also includes Sars-Cov 110 and allowed the "hunt" for its zoonotic origins. 103 Sequencing is vital in aspects of finding novel viral hosts to block interspecies transmition. Bat coronavirus, RaTG13, sampled in Yunnan province, is at the nucleotide level approximately 96% similar to SARS-CoV-2; however, there were major differences in key genomic features important for infectivity of the virus. 103,111 Moreover, due to the ecological separation of humans and bats, it is probable that some other species acted as an intermediate host. 103 Recent research reports that viruses in Malayan pangolins are closely related to SARS-CoV-2 112 and these animals are of great interest because of involvement in illegal animal trafficking. 103 Pangolin-CoV was shown to be 91.02% identical to SARS-Cov-2 at the which makes him the second closest relative behind RaTG13. 113 It seems that betacoronaviruses exist in the number of mammalian species and it is thus imperative to perform a wider sampling of animals from wet markets and those who live close to human populations to block potential interspecies transmission. 103 Furthermore, NGS allows tracking of the viral strains. RNA viruses continusly accumulate mutations. 114 New mutations in SARS-CoV-2 genome will continusly arise over time and space and result in branching of the original "reference genome". NGS allows us to observe evolutionary pattern of SARS-CoV-2, which is crucial for efficient disease prevention and control, for example, to reveal new routs of infection. Tracking mutations and variable regions of the viral is thus imperative for development of effective therapy in the eyes of viral diversity and consequent drug resistance. 115 As of April 2021, more than 1 million coronavirus-related genome sequences have been uploaded to EpiFluTM (GI-SAID) world-wide and the number is rapidly increasing. NGS is used to trace interpersonal transmission of the virus. 116 High resolution genomic epidemiology is thereby becoming an effective tool for public health surveillance and disease control.117 Knowing SARS-COV2 genome also helps to achieve a more effective disease strategy, to investigate cases with unclear sources of infection within a short turnaround time. 110,118 Furthermore, NGS technology was applied to investigate mechanisms of SARS-CoV-2 infection. 116 For example, RNA sequencing has been used to determine susceptible organs with higher expression of angiotensin-converting enzyme 2 (ACE2) receptor, which serves as a receptor for SARS-CoV-2. 119 NGS and single-cell RNA sequencing were used to determine expression of ACE2 receptor in numerous organs and cells after infection with SARS-CoV-2 116 , which will most definitely benefit diagnostic and therapeutic target identification.

Future Outlook
Researches tend to sequence as many human genomes as possible, known as population-scale resequencing, capturing not just genomic but also epigenomic data. Genomics and epigenomics are frequently studied separately 120 but to fully reach the potential of precision medicine it will be necessary to study them together. Only integrated data from various big "omics" will enable us to fully understand disease mechanisms and substantially increase the sensitivity of genomic sequencing. One of the obstacles, however, is the storage and processing of large amounts of data, which is why the parallel development of bioinformatic analysis is required. The question which obtained information is of importance requires the existence of large genomic databases. Integrated analysis of medical "big-data" will benefit from artificial intelligence technologies, such as machine learning and its subset, deep learning.120 Novel long-read technologies will enable routine resequencing, allowing better determination of repetitive regions and structural variants. Epidemiologists will follow the outbreak of an infectious disease through microbial sequencing from various samples, such as wastewaters. Therefore, it will be possible to detect a disease outbreak at an early stage and possibly even prevent it. Small, portable devices will be useful in such situations. Oxford Nanopore has recently started developing device called SmidgION, which is even smaller than MinION.44 It will be used with smartphones and other portable devices. The portability, high output, and simplicity of such machines have infinite on-site applications: in ecology, forensics, population screening, epidemiology, to name a few.
However, although the technology shows immense breakthrough, education of staff has to become the central concern of all countries employing next generation technologies. Well trained medical geneticists and consultants with narrow specializations on next generation sequencing will be the ones enabling meaningful interpretation of the results, thus providing use of the result in clinical setting.