Validation and Submission is no more available

EBI BioSample and OLS services made changes on their API and InjectTool is now unable to validate and submit new dato into BioSamples. You can see more information in github issues #119 and #120

About IMAGE-InjectTool

Introduction

One key challenge of that IMAGE addresses is the integration and transparent use of the vast information stored within more than 60 gene banks/genetic collections spanning 20 European countries, together with the collection of newly generated data. Information to be collected and integrated through the IMAGE Data Portal is currently stored in each individual's local databases at gene banks, research institutions, breeding companies etc. which utilizes a wide range of systems : from simple excel files to large national well-structured databases.

Cryoweb is a web-based software for the documentation and storage of gene bank data. Currently, the majority of National gene banks are using Cryoweb to manage their information. In France, the CRB-Anim database collects and stores information for reproductive and genomic material of domestic animals. Other countries have developed relational databases to manage their collections. Using InjectTool, data derived from different sources can be standardized and validated against IMAGE metadata standards and finally submitted to EBI BioSamples archive, which has been chosen as the sample reference archive for all IMAGE data.

The IMAGE InjectTool is a tool which makes it easier to archive data into BioSamples from the user's point of view.

The benefit of using InjectTool

The benefits of depositing data in BioSamples includes clear data organization with each record assigned a unique globally recognized identifer, ensuring sample description consistency for data records in different sequencing archives, improved reproducibility, conformation to FAIR data standards and synchronization with sister databases in National Center for Biotechnology Information (NCBI) and DNA Data Bank of Japan (DDBJ).

BioSamples is widely used as the sample reference for other molecular archives, for example ENA (European Nucleotide Archive, which covers raw sequencing data, sequence assembly information and functional annotation), EVA (European Variation Archive, which stores all types of genetic variation data) and ArrayExpress which stores data from high-throughput functional genomics experiments. These core data archives are important for the deposition of the genomic data generating during the IMAGE project. Example can be found in the in the Guide for submitting genotype data for inclusion in IMAGE

Methods

The InjectTool has been developed to help gene bank managers to enhance, standardize, tag and submit their gene bank data to BioSamples archive that integrates all gene bank records from across Europe. The InjectTool uses a well-defined metadata ruleset ensuring high quality and comparable data across the diverse collections originating in different storage formats and languages.

User will be able to import data from the dump files of either Cryoweb or CRB-Anim, or the template spreadsheet file into InjectTool and the system will validate and submit data to the BioSamples archive. User interaction is only required when the data fails the validation against IMAGE metadata standards to make manual curations. Error messages are reported to the user within the user interface or by emails, and user is guided through the whole submission process by the system, which provides different functionalities according to the submission stage.

InjectTool can provide ontologies for Countries, Species, Breeds, Physiological and developmental stages, if possible. When unannotated terms are found during the data importation, annotation tasks will start in the background to look up for the most suitable ontology using EBI Zooma Tools. InjectTool can also provide translation tables for Species terms, in order to refer to the correct species ontology by providing the common name in the user's own language during data loading. In such way, you can use the Pig term while defining your species: the system will translate it into Sus scrofa. A set of english word is defined for the most common species, however when InjectTool finds a new Species value it has not encountered before, it will stop the data import process and ask the user to manually input an appropriate ontology term. After that, all such word occurrences will be translated with the user provided species. If you have other terms that require translation tables, please contact us.

The system is built on a set of customized Docker images using docker-compose: the user will interact on a webpage-based application powered by Nginx, while in the backend a PostgreSQL instance stores data and a Django image renders web pages using Bootstrap 4 in order to display, validate, curate and submit data. Celery and Redis are used to perform time-consuming and monitoring tasks in the background such as the data validation process, data submissions to BioSamples archive and monitoring of BioSamples submission statuses. The project is mainly written in Python 3 using the test-driven approach. A live version of the system is available at https://inject.image2020genebank.eu/ while the code is available at GitHub. The BioSamples submission is done through the Data Submission Portal using the pyUSIrest python package.

Contributing and Reporting Bugs

We welcome any feedback/comment/suggestion which can be submitted by opening a new issue to make InjectTool better. InjectTool is developed under the GNU General Public License v3.0. Detailed instructions on how to build and deploy a local InjectTool instance is available inside the InjectTool GitHub repository, while the InjectTool developers guide is available here.