Site Reliability Engineer at Blackboard (Washington, DC)
Position can be located in DC, Indianapolis, or Austin. Do you have a passion running Applications with 0 downtime and problems? Do you believe that operating an application is a development problem? And finally – do you believe in doing research on problems that occur until we are absolutely sure that we have the root cause of a problem? Then you have to talk to us! As a member of our Site Reliability Engineering team, you will have the opportunity to join the group responsible for running an industry leading SaaS and Hosted Learn application that is used by millions of students every day. Responsibilities
Desired Skills and Experience
- Support an always-available hosted and cloud platform
- Join a scrum team of fellow engineers following DevOps philosophies
- Identify and drive opportunities to improve automation for deployment, management, and tooling
- Do thorough root cause analysis of problems to make sure they never happen again
- Stay up to date on the latest Cloud development and deployment technologies
- This person will be involved with solving larger complex problems with automation. For example, copying a site from production to development and standing up the systems is automated, but copying over course information, student data etc. is more of a development problem than operations.
- Self-starter, collaborative team player
- Experience with Linux in a production environment
- Experience troubleshooting and resolving application and/or system-related issues
- Strong written and spoken English
- Demonstrable experience in scripting in python / ruby
- Experience with configuration management using Chef
- Experience with large scale software development processes and multi-region deployments
- Java and/or Scala development experience
- Experience working with a global team