Site Reliability Engineer at Blackboard (Washington, DC)

Position can be located in DC, Indianapolis, or Austin. Do you have a passion running Applications with 0 downtime and problems? Do you believe that operating an application is a development problem? And finally – do you believe in doing research on problems that occur until we are absolutely sure that we have the root cause of a problem? Then you have to talk to us! As a member of our Site Reliability Engineering team, you will have the opportunity to join the group responsible for running an industry leading SaaS and Hosted Learn application that is used by millions of students every day. Responsibilities

Desired Skills and Experience

Support an always-available hosted and cloud platform
Join a scrum team of fellow engineers following DevOps philosophies
Identify and drive opportunities to improve automation for deployment, management, and tooling
Do thorough root cause analysis of problems to make sure they never happen again
Stay up to date on the latest Cloud development and deployment technologies
This person will be involved with solving larger complex problems with automation. For example, copying a site from production to development and standing up the systems is automated, but copying over course information, student data etc. is more of a development problem than operations.
Self-starter, collaborative team player
Experience with Linux in a production environment
Experience troubleshooting and resolving application and/or system-related issues
Strong written and spoken English
Demonstrable experience in scripting in python / ruby
Experience with configuration management using Chef
Experience with large scale software development processes and multi-region deployments
Java and/or Scala development experience
Experience working with a global team