This job has expired

Senior Data Engineer

Employer
Genmab
Location
Princeton, New Jersey
Salary
Highly competitive salary, incentives, and new hire equity grant. No-cost medical benefits
Closing date
Jun 30, 2022

View more

Position
Bioinformatics Scientist
Category
Biotechnology
Hours
Full Time
Education
MS, BS
You need to sign in or create an account to save a job.

Data Engineering and Bioinformatics is part of an enterprise effort to enable data-driven decisions at Genmab.  The partner-centric group is embedded with stakeholders across the entire pipeline value chain from Discovery, Translational, Clinical Innovation to Commercial and more.  Our focus is to enable Data Science with data in a rapid, exploratory environment and ensure data is democratized across the organization.  We constantly push ourselves to innovate, improve our practices, and bring value to our stakeholders.

Your expertise

The successful candidate will contribute to the mission of the global data engineering function and be responsible for many aspects of data including architecture, access, classification, standards, integration, pipelines and visualization. Although your role will involve a diverse set of data-related responsibilities, your expertise will be on pipeline and workflow management.  You are an expert in using tools such as Airflow to create workflows, connect systems, enable tracking of data, implement triggers and configure programmatic accessibility to enable job automation and you will be required to share your expertise throughout the group.  Your ultimate goal will be to place data at the fingertips of stakeholders and enable science to go faster. You will join an enthusiastic, agile, fast-paced and explorative global data engineering team.

 

Responsibilities

 

  • Design, implement and manage ETL data pipelines that ingest vast amounts of commercial and scientific data from public, internal and partner sources into various repositories on a cloud platform (AWS)

     

  • Enhance end-to-end workflows with automation that rapidly accelerate data flow with pipeline management tools such as Airflow

     

  • Implement and maintain databases for raw and processed commercial and scientific data

     

  • Innovate and advise on the latest technologies and standard methodologies in Data Engineering and be able to identify software solutions that can address hurdles in data enablement

     

  • Manage relationships and project coordination with external parties such as Contract Research Organizations (CRO) and vendor consultants / contractors

     

  • Define and contribute to data engineering practices for the group, including expertise in your focus area, and establishing templates and frameworks, determining best usage of specific cloud services and tools, and working with vendors to provision cutting edge tools and technologies

     

  • Collaborate with data scientist leads to determine best-suited data enablement methods to optimize the interpretation of the data, including creating presentations and leading tutorials on data usage as appropriate

     

  • Apply value-balanced approaches to the development of the data ecosystem and pipeline initiatives

     

  • Proactively communicate data ecosystem and pipeline value propositions to partnering scientific collaborators

     

Requirements

 

  • BS/MS in Computer Science, Bioinformatics, or a related field with 8+ years of software engineering experience or a PhD in Computer Science, Bioinformatics or a related field and 5+ years of software engineering experience

     

  • Excellent skills and deep knowledge of ETL pipeline, automation and workflow managements tools such as Airflow, AWS Glue, Amazon Kinesis, AWS Step Functions, and CI/CD is a must

     

  • Excellent skills and deep knowledge in Python, Pythonic design and object-oriented programming is a must, including common Python libraries such as pandas.  Experience with R a plus

     

  • Solid understanding of databases such as Postgres, Elasticsearch, Redshift, and Aurora, including distributed database design, SQL vs. NoSQL, and database optimizations

     

  • Solid understanding of AWS cloud computing services such as Lambda functions, ECS, Batch and Elastic Load Balancer and other compute frameworks such as Spark, EMR, and Databricks

     

  • Proficiency with modern software development methodologies such as Agile, source control, project management and issue tracking with JIRA

     

  • Proficiency with container strategies using Docker, Fargate, and ECR

     

  • Proficiency with Linux and shell scripting

     

  • Experience working with GxP and non-GxP data a plus

     

You need to sign in or create an account to save a job.

Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert