Single-genome and metagenome de novo assembly

Project duration: 2018 - 2022

Funding: Croatian Science Foundation   

Key collaborator: Niranjan Nagarajan (A*STAR GIS, Singapore)

The first modern software for DNA assembly was developed by Celera for generating the draft of human genome in 2001. Since then ample of methods have tried to correctly assemble genomes, but a high-quality assembly still requires laborious work of large groups of scientists and many years of data curation. The biggest challenges for achieving high accuracy and contiguity of genome assemblies have been long stretches of highly repetitive regions. The recent advent of new generation of sequencing technologies such as those of companies Pacific Biosciences and Oxford Nanopore Technologies gives us hope that automated complete genome reconstruction is feasible. They produce long, but error-prone reads whose size exceeds hundreds of thousand nucleotides which should be long enough for spanning most repetitive parts. Nevertheless, researchers still struggle to completely assemble long genomes (ie. animal and plant genomes) and genomes of microbial communities. Assembly methods usually use a graph-based approach which starts with building a graph by joining overlapping reads, followed by using heuristics to find a path which visits each read once. However, this is often unfeasible because of tangles in the graph which occur due to incorrect read overlaps and repetitive regions. This is particularly critical for both long genomes with many chromosomes and for metagenomic samples with anything from ten to several hundred present genomes. The primary aim of this project is the development of methods which will result in (i) complete large genomes and (ii) accurate metagenomic assemblies. To achieve this aim we will develop several graph-based and machine learning methods for detection of incorrect overlaps.

Project team

Project members:

  • Prof. dr. sc. Mile Šikić - head
  • Associate prof. dr. sc. Igor Mekterović
  • Assistant prof. dr. sc. Krešimir Križanović
  • Dr. sc. Niranjan Nagarajan (A*STAR GIS, Singapore)
  • Dr. sc. Nino Antulov-Fantulin (ETH Zurich)
  • Robert Vaser - phd student

Collaboration with other institutions:

  • Prof. Jianjun Liu, Genome Institute of Singapore, A*STAR Singapore
  • Prof. Ken Wing Kin Sung, National University of Singapore
  • Dr. Hwee Kuan Lee, Bioinformatics Institute, A*STAR Singapore
  • Dr. Mike Vella, NVIDIA
  • Prof. Christophe Dessimoz (University of Lausanne)
  • Prof. Marc Robinson-Rechavi (University of Lausanne)
  • Associate prof. Petra Korać (University of Zagreb, Faculty of Science, Department of Biology)
  • Prof. Karin Kovačević Ganić (University of Zagreb, Faculty of Food Technology and Biotechnology)
  • Associate prof. Antonio Starćević (University of Zagreb, Faculty of Food Technology and Biotechnology)



Papers in scientific journals:

Presentations at scientific conferences:

  • Robert Vaser and Mile Šikić, Yet another de novo genome assembler, 2019, 11th International Symposium on Image and Signal Processing and Analysis (ISPA)
  • Sara Bakić, Luka Požega, Robert Vaser and Mile Šikić, Assessing sequencing data for genome assembly, 2019, 27th Conference on Intelligent Systems for Molecular Biology and the 18th European Conference on Computational Biology, poster
  • Marić, J.; Šikić, M. Approaches to metagenomic classification and assembly // MIPRO, Biomedical Engineering, Opatija: IEEE, 2019.
  • Vrček, Lovro; Šikić, Mile; Supervised learning approach to long read classification // Fourth International Workshop on Data Science Abstract Book Zagreb, Hrvatska, 2019. str. 71-72, poster

Doctoral disertations:

Graduate and undergraduate theses:

  • Floreani, F. Classification of 1D-Signal Types Using Deep Learning (2019)
  • Lipovac, J. Ocjena alata za identifikaciju vrsta u metagenomskim uzorcima (2019)
  • Batić, D. Mapiranje slijeda na graf (2019)
  • Pongračić, K. Mapiranje dugačkih očitanja (2019)
  • Pavlić, S. Mapiranje kratkih očitanja (2019)
  • Penić, R. J. Izgradnja biblioteke za poravnavanje parova dugačkih RNA očitanja (2019)
  • Kosier, S. Pronalaženje varijanti gena iz podataka dobivenih sekvenciranjem (2019)
  • Relić, B. Klasifikacija očitanja koristeći metode dubokog učenja (2019)
  • Bakić, S. De novo sastavljanje genoma vođeno referencom (2019)
  • Vrček, L. Poliranje DNA slijeda koristeći metode dubokog učenja (2019)
  • Požega, L. Gornja granica u sastavljanju genoma (2019)