IBM Watson Health - Site Reliability Engineer - Explorys in CLEVELAND, Ohio

Explorys, now a part of IBM and the newly formed IBM Watson Health business is the premier company empowering population health analytics. We provide care teams with proven technology to deliver timely actionable insight to provide coordinated care to their patients. The Software as a Service (SaaS)-based solution uses advanced data management technology, data sciences and predictive algorithms to identify opportunities for care improvement.Are you interested in solving operations problems using modern software engineering practices? Do you get excited about running mission critical infrastructure? Do you believe the only way to scale reliably is through automation?Site Reliability Engineers approach traditional operations work as a software problem. Applying software engineering practices to our work enables our services to better adapt to changes and failure scenarios. We write software to manage the entire lifecycle of our infrastructure and build tools to help ensure it stays healthy.Communication and cross-team coordination are paramount as you work to provide the best infrastructure and service possible. Essential Functions:

  • Develop tools to deploy and manage server infrastructure

  • Develop monitoring and alerting platforms to help identify and resolve problems

  • Partner with engineering teams to troubleshoot outages, and develop tools to prevent them from happening again

The ideal candidate will possess:

  • BS in Computer Science or equivalent experience

  • Experience troubleshooting complex systems, including the operating system, network, and application code

  • Proficiency in Java, Python, Perl, Ruby or another high-level programming language

  • Experience implementing and troubleshooting Linux systems

  • Proficiency in Java, Scala, Groovy or another JVM-based language

  • Experience with public cloud infrastructure

  • Hands on experience with Hadoop and related technologies

  • Experience implementing and troubleshooting large-scale distributed systems

  • Experience troubleshooting complex systems, including the operating system, network, and application code

  • Proficiency in Java, Python, Perl, Ruby or another high-level programming language

  • Experience implementing and troubleshooting Linux systems

Business Dev