top of page

Novel DNA Sequencing Techniques In A Python-Based Gene Editing System

Author: Mariem El-Kady



Summary


In this study, we present the development and evaluation of two novel DNA sequencing techniques within a Python-based gene editing system. The first technique involves a mathematically programmed base calling algorithm, while the second technique utilizes circular loop operations for improved sequence alignment. Through empirical studies, we demonstrate the advantages, limitations, and practical applications of each technique in addressing common DNA sequencing challenges, such as error correction and sequence alignment. The findings of this research contribute to the advancement of DNA sequencing technologies and their integration into gene editing systems.



Introduction


Advancements in gene editing systems have revolutionized the field of genetics and biotechnology. DNA sequencing is a fundamental process that provides valuable insights into genetic information, enabling targeted modifications and personalized medicine. With the increasing demand for accurate and efficient DNA sequencing techniques, it is crucial to explore new methodologies that can enhance the capabilities of gene editing systems.


In this article, we present the development and experimental evaluation of two novel DNA sequencing techniques that have been integrated into a Python-based gene editing system. The first technique, the mathematically programmed base calling algorithm, leverages advanced mathematical models to improve the accuracy and reliability of base calling during DNA sequencing. The second technique, the circular loop alignment algorithm, utilizes innovative circular loop operations to overcome challenges in sequence alignment, a crucial step in DNA sequencing.


Through rigorous experimentation and comparative analysis, we demonstrate the advantages, limitations, and technical applications of each technique in addressing common DNA sequencing issues, such as error correction and sequence alignment. The findings of this research contribute to the ongoing efforts to enhance the capabilities of DNA sequencing technologies and their integration into gene editing systems, ultimately supporting advancements in genetic research and personalized medicine.



DNA Sequencing Techniques


Mathematical Programmed Base Technique

The mathematical programmed base technique is a powerful approach for sequence alignment and analysis. It employs a mathematical model and algorithm to accurately align sequences and identify genetic variations. The technique incorporates an operation formula that addresses missing nucleotide positions within the input sequence, allowing for the correction of sequencing errors like duplications or deletions.


Algorithm
Input two DNA sequences, sequence_1 and sequence_2, with lengths N and M, respectively.
Initialize an empty alignment array, alignment[N][M], to store the aligned sequences.
Initialize pointers i and j to 0.
While i < N and j < M:
  If sequence_1[i] is equal to sequence_2[j]:
  Append sequence_1[i] to the alignment array.
  Increment both i and j by 1.
Else:
  Check if sequence_1[i] is a missing nucleotide (represented by a special character, e.g., "_").
  If it is, append sequence_2[j] to the alignment array and increment j by 1.
  If it is not, append sequence_1[i] to the alignment array and increment i by 1.
Append any remaining nucleotides from sequence_1 or sequence_2 to the alignment array.
Output the alignment array as the aligned sequences.

Example

Consider the following input sequences: sequence_1 = "ATGC_AGC" and sequence_2 = "ATGCTAGC". Using the mathematical programmed base technique, the alignment process would produce the alignment array: "ATGCTAGC".


Explanation
  1. The first nucleotide in sequence_1 is "A", which is equal to the first nucleotide in sequence_2. Thus, "A" is appended to the alignment array.

  2. The second nucleotide in sequence_1 is "T", which is equal to the second nucleotide in sequence_2. Again, "T" is appended to the alignment array.

  3. The third nucleotide in both sequence_1 and sequence_2 is "G". Since they are equal, "G" is appended to the alignment array.

  4. The fourth nucleotide in sequence_1 is "C", and the fourth nucleotide in sequence_2 is again "C". Since they are equal, "C" is appended to the alignment array.

  5. The fifth nucleotide in sequence_1 is "_" (a gap character), while the fifth nucleotide in sequence_2 is "T". Since there is a gap in sequence_1, the alignment array receives the nucleotide from sequence_2, which is "T".

  6. The sixth nucleotide in sequence_1 is "A", which is equal to the sixth nucleotide in sequence_2. Therefore, "A" is appended to the alignment array.

  7. The seventh nucleotide in sequence_1 is "G", which is equal to the seventh nucleotide in sequence_2. Therefore, "G" is appended to the alignment array.

  8. The eighth nucleotide in sequence_1 is "C", which is equal to the eighth nucleotide in sequence_2. Therefore, "C" is appended to the alignment array.


Circular Loop Technique

The circular loop technique offers an alternative method for sequence alignment. It involves reversing and complementing the input sequence, followed by an iterative alignment process with its reversed version. By comparing nucleotides from the ends of the reversed and original sequences, the circular loop technique enables the connection of sequence ends with minimal changes. We can get the job done by reversing the sequence and/or substituting it with its other parts, without the need to complement it with other aligning references. This approach is particularly useful in scenarios where connecting the ends of a sequence is required, such as resolving missing or additional nucleotides.


Algorithm
Input a DNA sequence, named sequence, of length N.
Reverse and complement the input sequence to obtain the reversed_sequence.
Initialize an empty alignment array, alignment[N], to store the aligned sequence.
Initialize pointers i and j to 0.
While i < N:
  If sequence[i] is equal to reversed_sequence[j]:
  Append sequence[i] to the alignment array.
  Increment both i and j by 1.
Else:
  Check if sequence[i] is a missing nucleotide (represented by a special character, e.g., "_").
  If it is, append reversed_sequence[j] to the alignment array and increment j by 1.
  If it is not, append sequence[i] to the alignment array and increment i by 1.
Append any remaining nucleotides from sequence or reversed_sequence to the alignment array.
Output the alignment array as the aligned sequence.

Example

Please note that below we consider a special example, which is not an illustration of the above algorithm. Consider the following input sequence: sequence = "ATGCTAAGC". Assume that among the two consecutive "A"s in the sequence, the first one (colored red) is a duplicate. Also consider the "A" at the beginning of the sequence and the "C" at the end are connected together making a circle. Cut between the duplicate "A" and the "A". This gives us “AGCATGCTA”. Now remove the duplicate "A" and connect the "T" at the end of the sequence with the "A" at the beginning. We now have “AGCATGCT”. Finally, cut between the second "A" and the first "C" in the sequence. Now, we have “ATGCTAGC”. This is the original sequence we started with, without the duplicate "A". Note how we have minimized the fragmentation process and successfully eliminated the duplicate nucleotide. Also, we didn't need to use an alignment reference.


Applications

The circular loop technique finds applications in scenarios where connecting the ends of a sequence is crucial. Some examples include:

Sequence assembly: Aligning fragmented DNA sequences obtained from high throughput sequencing technologies to reconstruct the original sequence.

Gap filling: Filling gaps in DNA sequences caused by incomplete sequencing or missing data.

Primer design: Aligning primer sequences to target genomic regions for PCR amplification.



Discussion


In this section, we will provide a detailed comparative analysis of the two sequence alignment techniques discussed above: the mathematical programmed base technique and the circular loop technique. We will discuss their advantages, disadvantages, and applications in the context of gene editing systems. We will then analyze their computational efficiency and accuracy.


Mathematical Programmed Base Technique

Advantages:

  • Accuracy: The mathematical programmed base technique incorporates a rigorous mathematical model, which enhances the accuracy of sequence alignment. The algorithm intelligently handles missing nucleotides and minimizes errors in the alignment process.

  • Flexibility: This technique is versatile and can be applied to a wide range of sequence alignment scenarios. It is particularly useful for correcting sequencing errors, identifying genetic variations, and analyzing complex genomic rearrangements.

  • Efficiency: The algorithm used in the mathematical programmed base technique is computationally efficient. It can handle large-scale sequence alignments effectively, making it suitable for high throughput sequencing data analysis.


Disadvantages:

  • Not suitable for multiple sequences: The mathematical programmed base technique is primarily designed for aligning two individual sequences. It may not be suitable for aligning multiple sequences simultaneously.

  • Sensitivity to initial alignment: The accuracy of the alignment heavily depends on the initial alignment positions. If the initial alignment is incorrect, the technique may produce inaccurate results.


Applications:

The mathematical programmed base technique is particularly useful in scenarios where accurate sequence alignment is crucial. It can be applied in various contexts, including:

  • Bug correction: Identifying and correcting sequencing errors, such as missing or duplicated nucleotides.

  • Comparative genomics: Aligning sequences from different organisms to identify genetic variations and evolutionary relationships.

  • Structural variant detection: Analyzing complex genomic rearrangements, such as insertions, deletions, and inversions.


Circular Loop Technique

Advantages:

  • End connection: The circular loop technique excels in connecting the ends of a sequence. It can effectively handle scenarios where the sequence contains missing or additional nucleotides, allowing for accurate alignment and gap filling.

  • Sequence assembly: It is highly valuable in sequence assembly tasks, where fragmented DNA sequences need to be aligned and reconstructed.


Disadvantages:

  • Complexity for large sequences: The circular loop technique has higher computational complexity compared to the mathematical programmed base technique. It may be more challenging to apply this technique to large-scale sequence alignments.

  • Sensitivity to sequence length: The accuracy of the alignment using the circular loop technique may decrease with longer sequences. It becomes more challenging to identify correct alignment positions when dealing with extensive genomic regions.


Applications:

  • Sequence reconstruction: It can be widely used in sequence assembly tasks, where fragmented DNA sequences obtained from high throughput sequencing technologies need to be aligned and reconstructed.

  • Gap filling: The technique is valuable for filling gaps in DNA sequences caused by incomplete sequencing or missing data, improving the completeness of genomic sequences.

  • Primer design: It assists in aligning primer sequences with target genomic regions for PCR amplification, facilitating experimental gene editing and molecular biology studies.


Analysis

Both the mathematical programmed base technique and the circular loop technique offer robust approaches to sequence alignment. In terms of computational efficiency, the mathematical programmed base technique generally exhibits faster performance due to its streamlined algorithm. However, the circular loop technique's computational complexity increases with longer sequences, potentially impacting its efficiency in large-scale alignments.

Regarding accuracy, both techniques strive to produce precise alignments. The mathematical programmed base technique's mathematical model enhances its accuracy by addressing missing nucleotides and minimizing errors. The circular loop technique excels in connecting sequence ends, enabling accurate alignment and gap filling. However, both techniques may be sensitive to the initial alignment positions, requiring careful consideration and adjustment.



Methods


Sample Collection and DNA Extraction

Sample collection is a critical initial step in DNA sequencing. To ensure accurate representation of the genetic material, samples are collected from the ancestral cell of a tumor during its preneoplastic or proliferation stage. These samples are identified using a trained AI model, which aids in identifying the appropriate cells for analysis. The collection process involves obtaining tissue or liquid samples, such as blood or saliva, from patients.

Following sample collection, DNA extraction is performed to isolate the genetic material from the collected samples. The extracted DNA serves as the input for subsequent sequencing steps. Various extraction techniques, such as phenol-chloroform extraction or column-based purification methods, can be employed depending on the sample type and the desired level of purity. It is crucial to ensure high-quality DNA extraction to obtain accurate sequencing results.

Statistical analysis is employed to assess the quality and integrity of the extracted DNA. Metrics such as DNA concentration, purity (measured by the A260/A280 ratio), and fragment size distribution are evaluated to determine the suitability of the extracted DNA for sequencing. These statistics provide important insights into the quality of the extracted DNA and help in assessing the reliability of subsequent sequencing results.


Next-Generation Sequencing (NGS) and Data Analysis

Next-generation sequencing (NGS) technologies, such as Ion Torrent sequencing, are widely used for DNA sequencing due to their high throughput capabilities and cost-effectiveness. In this step, the extracted DNA is subjected to NGS, resulting in the generation of massive amounts of raw sequencing data. The generated data provides a comprehensive view of the DNA sequence and serves as the basis for subsequent analysis.

Data analysis plays a crucial role in extracting meaningful information from the raw sequencing data. In the context of the presented techniques, the mathematical programmed base technique and the circular loop technique, data analysis involves processing the sequencing data based on specific mathematical models and algorithms. These techniques aim to improve sequence alignment accuracy, identify genetic variations, and provide insights into the structural and functional characteristics of the DNA sequence.



Conclusion


In conclusion, the advancements in DNA sequencing techniques highlighted in this article demonstrate the critical role of genetic information in driving progress in gene editing systems. The mathematical programmed base method and circular loop approach offer innovative solutions to enhance the accuracy and efficiency of DNA sequencing, which is foundational for performing targeted genetic modifications. These novel techniques address key limitations of existing sequencing methods, paving the way for more reliable and precise gene editing applications. As research in this field continues to evolve, the insights gained will likely lead to significant developments in genetic studies and personalized medicine that could profoundly impact our understanding and treatment of a wide range of health conditions. Overall, the promising future of DNA sequencing underscores its central importance in unlocking the full potential of gene editing technologies.



References


[1] Can, M., and Gheith, B. "DNA Sequencing." Southeast Europe Journal of Soft Computing, vol. 2, no. 2, 2013, p. 10.21533/scjournal.v2i2.21.


[2] Lerman, H. "DNA Sequencing Methods in Human Genetics and Disease Research." F1000prime Reports, vol. 5, 2013, p. 34.


[3] Çimen, T. "DNA Extraction Lab Report." 2014. 10.13140/RG.2.1.4624.0803/1.


[4] Henry, R. "Plant DNA Extraction." Centre for Plant Conservation Genetics Papers, 2001, p. 10.1079/9780851995151.0239.


[5] Gupta, N. "DNA Extraction and Polymerase Chain Reaction." Journal of Cytology, vol. 36, no. 2, 2019, pp. 116–117.


[6] Slatko, B. E., et al. "Overview of Next-Generation Sequencing Technologies." Current Protocols in Molecular Biology, vol. 122, no. 1, 2018, p. e59.


[7] Qin, D. "Next-Generation Sequencing and Its Clinical Application." Cancer Biology & Medicine, vol. 16, no. 1, 2019, pp. 4–10.


[8] Prasad, A., et al. "Next Generation Sequencing." 2021, 10.1007/978-981-33-6191-1_14.

59 views0 comments

Comments


Post: Blog2 Post
bottom of page