IBM Watson Health - Data Scientist - Map, Extract, Transform, Load (METL) in CLEVELAND, Ohio

IBM Watson Health business, provides the healthcare industry a protected, cloud-based analytics platform that harnesses big data for clinical integration, predictive analytics, and business intelligence. Its platform empowers the country's leading provider organizations to more effectively utilize their data to improve care quality, patient satisfaction, and deliver value-based care. Watson Health is working to enhance, scale, and accelerate health and wellness, and to facilitate collaboration across the community of care. The Explorys solution supports Population Health Management and Accountable Care models while applying the power of massively-parallel data processing to help save lives and make healthcare more affordable.Job DescriptionThe Data Scientist on Map, Extract, Transform, Load (METL) is responsible for the data and quality of the data that is brought into the platform. The candidate determines the critical data elements to extract from a variety of systems Electronic Health Records (EHR), claims and billing, flat files, and other databases. The candidate is also responsible for building the code to extract and load those data elements. The final step is to conduct Quality Assurance (QA) on the Extract Transform and Load (ETL) process and resolve any issues determined internally or externally with the data. Essential Functions:

  • Identifying clinical, financial, and operational data elements within different systems (Electronic Medical Records (EMRs), billing, etc.)

  • Developing new methods of data extraction, abstraction, and mining to increase efficiency and reusability

  • Working on a day-to-day basis with Watson Health software engineering teams to ensure that our clinical, financial, and operational data are being accurately represented in our applications

  • Formulating validation strategies and methods to ensure accurate and reliable data

  • Supporting the rest of the Platform Services Team in understanding and processing the data in our system

  • Extracting data from traditional database architecture/flat files, performing transformation on the extracted data using technologies like Cloudera Impala, Apache Pig, Apache Hive, Ruby and loading data into the Hadoop grid

The ideal candidate will possess:

  • Ability to work in a cross-functional environment

  • Self-motivated ability to multitask

  • 2 or more years of SQL experience

  • Familiarity with application architecture.

  • Programming experience in at least one development environment (Python, Ruby, etc.)

  • Exposure to big data technologies (Pig, Hive, Impala, Hadoop Distributed File System (HDFS), HBase)

  • Experience working with EHR systems

  • Communications skills – Effective interpersonal and customer service skills

  • Analytical skills – Ability to conduct descriptive statistics of data populated in a database

  • minimum 2 years experience with SQL

  • minimum 1 year experience with Linux/Shell

  • minimum 3 years experience in a clinical and technical environment, or a degree in Computer Science, Mathematics, Operations Research, Bioinformatics or related field