Bioinformatics Data Manager
If you’re a data manager with a strong background in biology, this is an opportunity for you to be challenged in a fast-growing company as well as be recognized as an individual.
- Source, import as well as maintain high-quality data in ODM by performing data extract-transform-load (ETL) functions:
- retrieving data regularly from large public repositories or ad hoc from individual high-profile projects,
- creating, running, adapting and maintaining data transformation Python scripts,
- loading the transformed data into ODM by running existing data loading code.
- Collaborate with the data scientists by preparing “clean” data, allowing your colleagues to focus on tasks such as performing statistical analyses on the data.
- Maintain and update existing ontologies/vocabularies/dictionaries and auxiliary data (e.g. reference genome sequences) to ensure interoperability of the data.
- Develop and grow business relationships with our customers' data managers and/or engineers by providing expert advice on how to prepare “clean” data, so our customers can get the maximum value out of ODM.
- You are proficient in ingesting, persisting and exporting large volumes of biological omics data and metadata from disparate sources using common file transfer protocols (e.g. HTTP, FTP) and RESTful APIs.
- You have domain knowledge about omics data captured in common data formats (e.g. GFF, GCT, VCF, FASTA, FASTQ, BAM)
- You have used or constructed controlled vocabularies/ontologies and are familiar with the related file formats (e.g. OWL, OBO, SKOS).
- You are proficient in writing bespoke scripts in Python for wrangling and transforming data from a source into required formats.
- You understand how auxiliary/reference data (e.g. reference genome sequences) are used in the analysis and interpretation of omics data.
- You understand what a "data model" is and, using your omics domain knowledge, can make recommendations to more software engineers/architects on its design