Microbial whole genome analysis

Getting Started…

Sequencing microbial genomes has enhanced our understanding of the diverse functions carried out by microorganisms. Now researchers and clinical microbiologists have the opportunity to use our pre-built bioinformatics pipeline to explore in-depth DNA sequencing data.

This analysis uses DNA sequencing data generated on Illumina platforms.

INPUT

  • Illumina FASTQ.
  • Paired-end run FASTQ Files, one Read 1(R1) and one Read 2 (R2). As a reference, per each paired-end sample, you must upload 2 files (R1,R2).
  • FASTQ files compressed in *.zip

OUTPUT: Full Access To Analyzed Data

We present our report in HTML format, making it simple to understand and act on the insights provided. The report covers a broad range of topics to quickly assess:

  • data quality
  • taxonomic classification
  • assembly statistics
  • annotation
  • plasmid detection
  • pangenome analysis
  • antibiotics resistance
  • virulence resistance
Also, you can download the output files generated by each bioinformatics tool and all data tables from the report in CSV format compatible with Excel or Google Sheets.

How it works

list-image Choose the sample count and then click on ‘Launch Analysis’. You will receive an email containing simple instructions on how to upload your data files. You don’t need to install any software.

list-imageAfter you upload your DNA-sequenced data files, our cloud-based platform will process them and notify you via email when the results are ready.

Processing time may vary based on the number of sample files, taking several hours, not days.

Introduce the number of paired-end samples

Bioinformatics Pipeline

  • Quality Control - FASTQC

  • Trim sequence data & Quality Filtering - FASTP

  • Taxonomic Classification - KRAKEN2 + BRACKEN

  • Assembly - SPADES + PILON

  • Assembly quality evaluation - CHECKM + QUAST

  • Annotation - BAKTA

  • Pangenome Analysis - ROARY

  • Plasmid Identification - PLATON

  • Antibiotics resistance - ABRICATE (CARD + ARG-ANNOT)

  • Virulence resistance - ABRICATE (VFDB)

  • Macromolecular system detection - MACSYFINDER

Analysis in detail…

This analysis is for RESEARCH USE ONLY. This pipeline is designed for researchers, bioinformatics experts and genomics professionals and is not intended to diagnose, treat, cure, or prevent any disease.

Genomix Cloud(ref1) implements this  Microbial whole genome analysis as a pipeline (workflow) with popular third-party bioinformatics tools.

This processes raw DNA-sequenced data(FASTQ) and generates interpretations in the form of user-friendly reports.

The bioinformatics tools used in the workflow are third-party and their credits belong to their creators. The tools are distributed as open-source software under GNU General Public License version 2+ or MIT License.

We perform the following workflow:

We used the FASTQC tool (ref2) to analyze the quality of reads for each sample.

Afterwards, we use the FASTP tool (ref3) to process the sequences. This tool executes various tasks such as maintaining quality control, trimming adapters, checking for quality, and getting rid of low-quality reads.

The taxonomy classification uses the KRAKEN2 tool (ref4) and BRACKEN tool (ref5) using exact k-mer matches.

To assemble sequences (FASTQ files), we use the SPADES tool (ref6). Afterwards, we correct draft assemblies and sequence variants of various sizes with the PILON tool (ref7).

After the assembly is completed, it undergoes an evaluation process with the CHECKM tool (ref8) and QUAST tool (ref9). To further improve its accuracy, it is then annotated using the BAKTA tool (ref10).

To analyze the pangenome, the ROARY tool (ref 11) is utilized and its alignments are used to calculate phylogeny with the FASTREE tool (ref 12).
To identify plasmids, we use the PLATON tool (ref13).
Additionally, we use the ABRICATE tool (ref14) along with the CARD, ARG-ANNOT, and VFDB databases to screen assemblies for antimicrobial resistance and virulence genes.
To detect macromolecular systems, we use the MACSYFINDER tool (ref15).

Genomix Cloud utilizes the Plotly Open Source Graphing Library for Python (ref16) to create and produce visualizations and HTML reports.

Check out the parameters used in each tool and the citations by clicking on the sections below.

Here the key parameters used in the calls to thrid-party bioinformatics tools.

We’d love for you to cite us.

  1. Genomix Cloud: Unique online suite of genome analysis where scientists launch analysis with only a few clicks and receive user-friendly reports that can be easily interpreted and shared. https://www.genomixcloud.com.
  2. Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  3. Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560
  4. Wood, D. E. Kraken 2 GitHub repository. at https://github.com/DerrickWood/kraken2
  5. Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L.Bracken: estimating species abundance in metagenomics data. https://github.com/jenniferlu717/Bracken/
  6. Using SPAdes De Novo Assembler. Andrey Prjibelski, Dmitry Antipov, Dmitry Meleshko, Alla Lapidus, Anton Korobeynikov. https://doi.org/10.1002/cpbi.102
    Bruce J. Walker, Thomas Abeel, Terrance Shea, Margaret Priest, Amr Abouelliel, Sharadha Sakthikumar, Christina A. Cuomo, Qiandong Zeng, Jennifer Wortman, Sarah K. Young, Ashlee M. Earl (2014)
  7. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE 9(11): e112963. https://doi.org/10.1371/journal.pone.0112963
  8. CheckM; Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055.
  9. Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi and Glenn Tesler, QUAST: a quality assessment tool for genome assemblies, Bioinformatics (2013) 29 (8): 1072-1075. https://doi.org/10.1093/bioinformatics/btt086 First published online: February 19, 2013
  10. Schwengers O., Jelonek L., Dieckmann M. A., Beyvers S., Blom J., Goesmann A. (2021). Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial Genomics, 7(11). https://doi.org/10.1099/mgen.0.000685
  11. “Roary: Rapid large-scale prokaryote pan genome analysis”, Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill, Bioinformatics,(2015).doi:http://dx.doi.org/10.1093/bioinformatics/btv421
  12. Price, M.N., Dehal, P.S., and Arkin, A.P. (2009) FastTree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix. Molecular Biology and Evolution 26:1641-1650, doi:10.1093/molbev/msp077.
  13. Schwengers O., Barth P., Falgenhauer L., Hain T., Chakraborty T., & Goesmann A. (2020). Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores. Microbial Genomics, 95, 295. https://doi.org/10.1099/mgen.0.000398
  14. Seemann T, Abricate, Github https://github.com/tseemann/abricate;CARD – doi:10.1093/nar/gkw1004; ARG-ANNOT – doi:10.1128/AAC.01310-13; VFDB – doi:10.1093/nar/gkv1239
  15. Abby SS, Néron B, Ménager H, Touchon M, Rocha EPC (2014) MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems. PLoS ONE 9(10): e110726. https://doi.org/10.1371/journal.pone.0110726
  16. Plotly, Open Source Graphing Library for Python version 5.13.0, https://plotly.com/python/

We are here to help

If you need assitance to launch the analysis do not hesitate and contact us!