Site Reliability Engineer

With JP Morgan Chase in Chicago IL US

More jobs from JP Morgan Chase

Posted on May 13, 2020

About this job

Job type: Full-time
Role: System Administrator


java, python, sql

Job description

As a Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment you’ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE you’ll be focused on running better production applications and systems.  

  • Design, code, test and deliver software to automate manual operational work.

  • Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of  incidents.

  • Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.

  • Identify application patterns and analytics in support of better service level objectives.

  • Design self-healing and resiliency patterns.

  • Design automated software and product upgrades, change management, and release management solutions.

  • Coach or manage teams as applicable.

  • Participate in the 24x7 support coverage as needed.

  • Expertise in Incident, Problem and Change Management processes and tools.

  • Collaborate across Application Development, Product and production management to establish and maintain Service Level Objective (SLO), Service Level Indicator (SLI) and Error Budget for key Production services.

  • Implement required telemetry and abiltiy to monitor and measure the quality of service in real-time against the established SLO.

  • Manage, track and validate all changes to the Production, Disaster Recovery environment.

  • Manage priority incidents and leverage cross-functional teams to quickly eliminate impacts.

  • Escalate issues/Risks effectively when necessary across supporting framework.

  • Ability to align IT service offerings with business strategies, goals, and objectives.

  • Troubleshoot Key technical issues or escalate and work with appropriate technology teams to provide solutions.

  • Aggressively respond to service requests from Client facing support teams, Operations partners, etc.

  • Manage application and infrastructure to maximize stability and resiliency. Leverage and improve monitoring and alerting capabilities to ensure application SLAs are met.

  • Strong focus on automation and processes. Design, implement, improve and utilize key monitoring tools.

  • Bachelor’s degree or equivalent experience in an software engineering discipline

  • Expertise in at least one technology stack designing, coding, testing, and delivering software

  • Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm

  • Working knowledge of infrastructure components. (E.g. routers, load balancers , cloud products , container systems , compute, storage and networks)

  • Excellent debugging and trouble shooting skills

  • Expert in performance monitoring and capacity management of large systems using various tools

  • Deep level expertise in instrumentation, customization and usage of modern monitoring toolset such as Dynatrace, AppDynamics, Grafana, Prometheus, ThousandEyes, Splunk, Geneos etc.

  • Expert in at least one technology stack (Java/J2EE/C#.NET)  with designing, coding, testing, and delivering software

  • Exposure to Python and willing to be learn and be Expert in Python Technology for Creating Application Health Dashboards, Machine Learning Projects

  • Expert in at least one of the relational database (SQL Server, Oracle, DB2 etc.)

  • Working knowledge of Groovy, Batch scripting, Ansible, PowerShell or Shell Scripting

  • Working knowledge of infrastructure components like routers, load balancers and networks

  • Comfortable working in Agile mode and proficient in Continuous Integration and Continuous Delivery

  • Solid understanding of object oriented design methodologies

  • Solid analytical and problem solving skills

  • Attention to detail and time-management skills  

Our Corporate & Investment Bank relies on innovators like you to build and maintain the technology that helps us safely service the world’s important corporations, governments and institutions. You’ll develop solutions for a bank entrusted with holding $18 trillion of assets and $393 billion in deposits. CIB provides strategic advice, raises capital, manages risk, and extends liquidity in markets spanning over 100 countries around the world.

When you work at JPMorgan Chase & Co., you’re not just working at a global financial institution. You’re an integral part of one of the world’s biggest tech companies. In 14 technology hubs worldwide, our team of 40,000+ technologists design, build and deploy everything from enterprise technology initiatives to big data and mobile solutions, as well as innovations in electronic payments, cybersecurity, machine learning, and cloud development. Our $9.5B+ annual investment in technology enables us to hire people to create innovative solutions that will not only transform the financial services industry, but also change the world.

At JPMorgan Chase & Co. we value the unique skills of every employee, and we’re building a technology organization that thrives on diversity. We encourage professional growth and career development, and offer competitive benefits and compensation. If you’re looking to build your career as part of a global technology team tackling big challenges that impact the lives of people and companies all around the world, we want to meet you. 

Apply here