Single-genome and metagenome de novo assembly

Project duration: 2018 - 2022

Funding: Croatian Science Foundation   

Key collaborator: Niranjan Nagarajan (A*STAR GIS, Singapore)

The first modern software for DNA assembly was developed by Celera for generating the draft of human genome in 2001. Since then ample of methods have tried to correctly assemble genomes, but a high-quality assembly still requires laborious work of large groups of scientists and many years of data curation. The biggest challenges for achieving high accuracy and contiguity of genome assemblies have been long stretches of highly repetitive regions. The recent advent of new generation of sequencing technologies such as those of companies Pacific Biosciences and Oxford Nanopore Technologies gives us hope that automated complete genome reconstruction is feasible. They produce long, but error-prone reads whose size exceeds hundreds of thousand nucleotides which should be long enough for spanning most repetitive parts. Nevertheless, researchers still struggle to completely assemble long genomes (ie. animal and plant genomes) and genomes of microbial communities. Assembly methods usually use a graph-based approach which starts with building a graph by joining overlapping reads, followed by using heuristics to find a path which visits each read once. However, this is often unfeasible because of tangles in the graph which occur due to incorrect read overlaps and repetitive regions. This is particularly critical for both long genomes with many chromosomes and for metagenomic samples with anything from ten to several hundred present genomes. The primary aim of this project is the development of methods which will result in (i) complete large genomes and (ii) accurate metagenomic assemblies. To achieve this aim we will develop several graph-based and machine learning methods for detection of incorrect overlaps.

Project team

Project members:

  • Prof. dr. sc. Mile Šikić - head
  • Izv. prof. dr. sc. Igor Mekterović
  • Dr. sc. Niranjan Nagarajan (A*STAR GIS, Singapore)
  • Dr. sc. Nino Antulov-Fantulin (ETH Zurich)
  • Robert Vaser - phd student

Collaboration with other institutions:

  • Ph.D. Ivan Sović
  • Ph.D. Pauline C Ng (A*STAR GIS, Singapore)
  • Prof. Christophe Dessimoz (University of Lausanne)
  • Prof. Marc Robinson-Rechavi (University of Lausanne)
  • Ph.D. Julien Roux (Department of Biomedicine, University Hospital Basel)
  • Amina Echchiki (Swiss Institute of Bioinformatics, Lausanne)
  • Associate prof. Petra Korać (University of Zagreb, Faculty of Science, Department of Biology)
  • Prof. Karin Kovačević Ganić (University of Zagreb, Faculty of Food Technology and Biotechnology)
  • Associate prof. Snježana Židovec Lepej (University Hospital for Infectious Diseases "Dr. Fran Mihaljević")
  • Martin Šošić - student



Papers in scientific journals:

Presentations at scientific conferences:

Doctoral disertations:

Graduate and undergraduate theses: