Site Reliability Engineer jobs

RESPONSIBILITIES:Kforce has a client that is seeking a Site Reliability Engineer in San Francisco, California (CA). Primary Responsibilities: Contribute to a team responsible for the availability, scalability, and performance of the client’s enterprise SaaS platform Build and maintain automation systems to help the client manage their rapidly growing infrastructure Gain deep knowledge of our complex applications to develop a bird’s eye view of the client’s platform Assist their Software Engineering teams to ensure proper monitoring and metrics are being built into the applications Maintain and develop custom systems and tools to improve the client’s ability to deploy, automate, and effectively monitor custom applications in a large-scale mostly Linux environment Assist in the rollout and deployment of new product features and installations to facilitate the client’s rapid iteration and constant growth Gain and use knowledge of monitoring systems and configuration management systems (AWS-specific tools, Terraform, Chef, Nagios, NewRelic, etc) Troubleshoot issues across the whole stack * hardware, software, applications and network Document current and future configuration processes and policies REQUIREMENTS: 6+ years experience managing UNIX / Linux infrastructure Prior experience in an Internet-facing technical operations role with high uptime requirements Demonstrated ability to successfully work with Cloud architectures (we are primarily an AWS shop) Strong personal and professional initiative with a focus on the success of the team and organization Self-starter who is able to take ownership of technical issues and be a productive member in the on-call rotation and certain off-hours shifts Strong troubleshooting skills that span systems, network, and applications Experience with web-based Java / J2EE architectures Strong scripting ability in Bash, Ruby, Perl and / or Python

Desired Skills and Experience

See application page for details