IBM Senior Site Reliability Engineer in New York, New York

Job Description

Are you passionate about technology? Do you love building new things? Do you want to develop the future of IBM's Cloud offerings? If you answered YES, then we have the right opportunity for you!

The shift toward the consumption of IT as a service, i.e., the cloud, is one of the most important changes to happen to our industry in decades. At IBM, we are driven to shift our technology to an as-a-service model and to help our clients transform themselves to take full advantage of the cloud. With industry leadership in analytics, security, commerce, and cognitive computing and with unmatched hardware and software design and industrial research capabilities, no other company is as well positioned to address the full opportunity of cloud computing.

The Next Generation Cloud Network Engineering (NextGenCloud) team is a team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design to network architecture to storage and compute clusters to flexible infrastructure services. While our focus is on Network as a Service (NaaS), we are part of the team building IBM's next generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients. We are looking for a Site Reliability Engineer to join our team, who innovates & shares our passion for winning in the cloud marketplace.

This position is for a Senior Site Reliability Engineer who should have at least 12 years' industry experience. In this role, you will work as the lead member of the Site Reliability team with the following key responsibilities:

  • Troubleshoot and debug software delivered by various development teams within NextGenCloud and ensure that more junior members of the team are capable of the same; coach team members in this practice.

  • Provide detailed trouble reports back to the development teams including automated methods to reproduce any defects; ensure that these reports and those from others on the team are complete and accurate.

  • Drive troubleshooting and maintaining pre-production CICD systems in support of deployment.

  • Lead the team to ensure automation and the highest level of determinism possible in the installation and configuration of new systems (software and hardware).

  • Document automation and the interaction of software and system as necessary to enable in others; ensure that other members of the team meet the same high standard of documentation.

  • Lead the development of the processes and software necessary to maintain services post-deployment through data collection and monitoring ensuring overall health of the services provided.

  • Help drive the solution to trouble issues when on call.

  • Participate, collaborate and provide guidance in retrospectives.

  • Lead and encourage collaboration and a focus on issue resolution.

  • Lead meaningful planning to improve software, systems, and processes.

To summarize, in this role you will engage in all aspect of the lifecycle of the IBM’s NaaS, from idea to architecture and through deployment, operation, and improvement ensuring that our clients have the most reliable and performant experience possible.

This opportunity is for someone in the continental United States.

Job Requirements

  • 12+ years’ experience as with systems and/or software engineering.

  • 5+ years’ experience with software development.

  • 5+ years’ experience with systems engineering.

  • 5+ years’ experience troubleshooting software.

  • 2+ years’ experience leading a team.

  • Experience in a devops environment.

  • Strong experience with Git.

  • Experience with OpenStack or similar proprietary cloud like Azure or AWS.

  • Experience with CICD and their pipelines; experience with Zuul or Jenkins a plus.

  • Experience with containers and HA clusters; experience with Docker and Kubernetes a plus.

  • Excellent knowledge of TCP/IP networking.

  • Strong background in network engineering.

  • Hands-on data center operational experience.

  • Proven ability to collaborate and work well within a team.

  • Ability to communicate effectively both verbally and in writing.

Required Technical and Professional Expertise

Job Requirements

  • 12+ years’ experience as with systems and/or software engineering.

  • 5+ years’ experience with software development.

  • 5+ years’ experience with systems engineering.

  • 5+ years’ experience troubleshooting software.

  • 2+ years’ experience leading a team.

  • Experience in a devops environment.

  • Strong experience with Git.

  • Experience with OpenStack or similar proprietary cloud like Azure or AWS.

  • Experience with CICD and their pipelines; experience with Zuul or Jenkins a plus.

  • Experience with containers and HA clusters; experience with Docker and Kubernetes a plus.

  • Excellent knowledge of TCP/IP networking.

  • Strong background in network engineering.

  • Hands-on data center operational experience.

  • Proven ability to collaborate and work well within a team.

  • Ability to communicate effectively both verbally and in writing.

Preferred Tech and Prof Experience

Preferred


  • 15+ years experience in all of the above

  • Devops experience working with Ansible, Puppet, or Chef

  • Experience with Data Center layout planning

EO Statement

IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.