Linux System Reliability Engineer (Night Shift)
Linux System Reliability Engineer (Night Shift) Clearwater is looking for a Linux Site Reliability Engineer to join our System Operations team. The System Operations team, comprised of System Administrators, System Engineers and Python Developers, is responsible for the design, automation, maintenance and monitoring of our high availability production environment and supporting infrastructure, applications and processes. You will spend much of your time learning new technologies and concepts, applying this knowledge and experience to solve complex problems in new and interesting ways. You will use industry standard DevOps applications and procedures as well as applications developed within our team to effectively manage and scale our system to accommodate our rapidly growing company. The ideal candidate will excel at problem solving, adapt easily to change, contribute while working as a team or on individually assigned projects and thrive in an environment where they are constantly learning new technologies and concepts. They will work equally well in structured as well as unstructured tasks, and with minimal supervision. The ideal candidate will have a passion for systems monitoring, automation and uptime. Responsibilities:
- Assist with the design, deployment, management, scaling and monitoring of a large production Linux environment.
- Monitor site reliability and performance, ensuring and improving availability.
- Assist with the development of applications, scripts and other tools to manage and monitor our large, complex system.
- Assist with risk mitigation, root cause analysis and contingency planning.
- Participate in on-call rotation and occasional off-hours scheduled maintenance.
- Provide support and consultation for internal infrastructure users and application development teams.
- Use configuration management tools to create repeatable environments.
- Troubleshoot and resolve unique system/application problems without a predefined solution.
-
Evaluate new software, hardware, and infrastructure solutions. Requirements:
- Strong knowledge of Linux.
- Knowledge of High Availability and Disaster Recovery Concepts and technologies.
- Scripting skills in at least one of (Python, Bash, Ruby).
- Ability to quickly learn new concepts, applications, processes, technologies, etc.
- Highly motivated self-starter, comfortable working in a fast-paced environment.
- Strong understanding of fundamental technologies and concepts.
- Strong conceptual, analytical and problem-solving skills.
- Excellent written and verbal communication skills, understanding and communicating effectively both inside & outside the organization.
-
Ability and interest in working the night shift M-F Desired experience and skills:
- Knowledge of systems and applications monitoring concepts and related applications.
- Knowledge of Best Practices in Security, Configuration Management, High Availability and Disaster Mitigation.
- Experience automating systems (experience with python preferred).
Desired Skills and Experience
See application page for details