Algorithms for genome sequence analysis

Project duration: 2014 - 2017

Funding: Croatian Science Foundation   

Key collaborator: Niranjan Nagarajan (A*STAR GIS, Singapore)

The overarching goal of the project is to develop accurate, fast algorithms and tools for analyzing the whole genome-sequencing data. The emphasis of the project is on the output data from the third-generation of sequencing machines that produce longer, more error-prone sequence reads. The basis of the project are sequence alignment algorithms, graph algorithms and signal processing methods. They will be implemented for DNA sequence assembly, sequenced RNA data and sequence similarity database search. The algorithms should feasibly handle data obtained from mammalian and plant genomes (sizes greater than 109 base pairs). Special emphasis will be put on multi-core, many-core (GPU - graphics processing unit) and intra-core (Intel's SSE - Streaming SIMD Extensions and AVX - Advanced Vector Extensions) parallelism. Additionally, the algorithms should provide good scalability over the available underlying computational architecture. All implementations of algorithms will be performed in C/C++. The research will result in the development of novel algorithms tailored to specifications of current and future long-read sequencing data. The implemented methods will further the state-of-the-art of sequence similarity database search, RNA-seq mapping and DNA assembly, ideally providing researchers with methods that return results in feasible time with limited computational resources. This, in turn could affect the current practices of genomic research; help designing new medical strategies and enable faster and more accurate analyses and diagnoses.

 

Project team

Project members:

  • Izv. prof. dr. sc. Mile Šikić - head
  • Prof. dr. sc. Branko Jeren - asociate
  • Prof. dr. sc. Damir Seršić - asociate
  • Doc. dr. sc. Ana Sović-Kržić - asociate
  • Dr. sc. Niranjan Nagarajan (A*STAR GIS, Singapore) - asociate
  • Dr. sc. Krešimir Križanović - postdoc
  • Robert Vaser - phd student

Collaboration with other institutions:

  • Ph.D. Ivan Sović (Ruđer Bošković Institute)
  • Ph.D. Pauline C Ng (A*STAR GIS, Singapore)
  • Prof. Christophe Dessimoz (University of Lausanne)
  • Prof. Marc Robinson-Rechavi (University of Lausanne)
  • Ph.D. Julien Roux (Department of Biomedicine, University Hospital Basel)
  • Amina Echchiki (Swiss Institute of Bioinformatics, Lausanne)
  • Associate prof. Petra Korać (University of Zagreb, Faculty of Science, Department of Biology)
  • Prof. Karin Kovačević Ganić (University of Zagreb, Faculty of Food Technology and Biotechnology)
  • Associate prof. Snježana Židovec Lepej (University Hospital for Infectious Diseases "Dr. Fran Mihaljević")
  • Martin Šošić - student

 

Publications

Papers in scientific journals:

Presentations at scientific conferences:

  • Krešimir Križanović, Ivan Sović, Ivan Krpelnik, Mile Šikić; RNA Transcriptome mapping with Graphmap; Bioinformatics Research and Applications 13th International Symposium, ISBRA 2017
  • Robert Vaser, Mile Šikić; Ra - Rapid de novo genome assembler, poster; ISMB/ECCB 2017
  • Robert Vaser, Mile Šikić; Rala - Rapid layout module for de novo genome assembly, poster; ISMB/ECCB 2017
  • Neven Miculinić, Marko Ratković, Mile Šikić; MinCall — MinION end2end convolutional deep learning basecaller; ECML-PKDD 2017, Skopje Macedonia
  • Tomislav Šebrek, Jan Tomljanović, Josip Krapac, Mile Šikić; Read classification using semi-supervised deep learning; ECML-PKDD 2017, Skopje Macedonia
  • Jan Tomljanović, Tomislav Šebrek, Mile Šikić; Unsupervised learning of sequencing read types; ICCBB 2017, Newark USA
  • Krešimir Križanović, Mladen Marinović, Ana Bulović, Robert Vaser, Mile Šikić; TGTP-DB – a database for extracting genome, transcriptome and proteome data using taxonomy; Mipro 2016, DC VIS 
  • Robert Vaser, Dario Pavlović, Mile Šikić; SWORD—a highly efficient protein database search; ECCB 2016: THE 15TH EUROPEAN CONFERENCE ON COMPUTATIONAL BIOLOGY
  • Andrej Novak, Krešimir Križanović, Alen Lančić, Mile Šikić; Some new results on assessment of Q-gram filter efficiency; 9th International Symposium on Image and Signal Processing and Analysis (ISPA) 2015

Doctoral disertations:

  • Ivan Sović,  Algoritmi za de novo sastavljanje genoma iz sekvenciranih podataka treće generacije - doktorska disertacija, 2016 (pdf)

Graduate and undergraduate theses:

  • Marko Ratković, Model dubokog učenja za određivanje očitanih baza dobivenih uređajem za sekvenciranje MinION - diplomski rad, 2017 (pdf
  • Jan Tomljanović, Identifikacija tipova 1D-signala pomoću nenadziranog dubokog učenja - diplomski rad, 2017 (pdf)
  • Tomislav Šebrek, Identifikacija tipova 1D-signala pomoću polu-nadziranog dubokog učenja - diplomski rad, 2017 (pdf)
  • Antonio Jurić, Poravnanje dugačkih RNA očitanja - završni rad, 2016 (pdf)
  • Ivan Krpelnik, Poravnanje RNA očitanja na poznate gene - završni rad, 2016 (pdf)
  • Luka Škugor, Stablo Bloomovih filtara za spremanje sljedova - završni rad, 2016 (pdf)
  • Mario Kostelac, De novo Assembly Using Long Error-prone Reads - diplomski rad, 2016 (pdf)
  • Luka Šterbić, EAGLER - Eliminating Assembly Gaps by Long Extending Reads - diplomski rad, 2015 (pdf)
  • Robert Vaser, De novo transcriptome assembly - diplomski rad, 2015 (pdf)
  • Josip Marić, Long Read RNA-seq Mapper - diplomski rad, 2015 (pdf)
  • Marko Čulinović, Scaffolding using longerror-prone reads - diplomski rad, 2015 (pdf)
  • Martin Šošić, An SIMD dynamic programming C/C++ Library - diplomski rad, 2015 (pdf)
  • Dorija Humski, A reduced gene database for precision species detection - diplomski rad, 2015 (pdf)
  • Dario Pavlović, Splice isoform identification from transcript graphs - diplomski rad, 2015 (pdf)