Site Reliability Engineer
Description: Builds, automates, monitors, and troubleshoots external facing web services and systems. Ensures a high level of performance with consistent reliability for all services. Automates routine builds and systems into a repeatable and flexible infrastructure. Responsibilities Include:
- Builds and maintains enterprise systems running on a variety of enterprise operating systems such as Ubuntu Linux and Windows Server.
- Maintains system performance and uptime through proactive monitoring and the engineering of system redundancies. Builds and maintains dedicated monitoring system for customer facing services.
- Implements, manages and maintains an enterprise wide configuration management system to automate service builds and create reproducible environments for development, staging, qa and production.
- Responsible for maintaining the system availability and performance of enterprise grade database systems such as MySQL or Microsoft SQL.
- Implements and maintains centralized service log management system such as Splunk or Logstash.
- Assists the System and Network Administrator in maintaining CommVault and Zerto enterprise backup and replication systems.
- Assists the System and Network Administrator in maintaining enterprise wide network infrastructure such as routers, firewalls and switches.
- Works with Marketing and Development in support of Ektron/Drupal CMS management and implementation.
- Automates routine tasks and configuration through scripting using a variety of languages such as PowerShell, Bash, or Ruby.
- Acts as lead administrator of load balancer clusters for stage and production services.
- Manages private and public cloud resources running on VMware vSphere, AWS, and Microsoft Azure. Responsible for the development of the Society’s enterprise SaaS offerings and platform.
- Assists the System and Network Administrator with enterprise system patching and routine stability and security updates.
- Maintains, manages and supports Microsoft SharePoint in a multi-server web farm.
- Helps maintain and build enterprise image templates for laptops and desktops.
-
Supports the Help Desk by assisting with ticket overflow. Requirements:
- BS degree required
- Experience with Windows Server 2008/2012R2
- Experience with Ubuntu/CentOS
- Experience with load balancing/reverse proxy technology (haproxy, varnish)
- Experience with SQL Server 2005/2008/2012 and MySQL clusters
- Experience configuring and managing key-value systems (Memcached, Couchbase)
- Experience with configuration management systems (Chef, Puppet, DSC)
- Experience working with public and private cloud platforms (vSphere, AWS, Azure)
- Advanced scripting experience (PowerShell, PowerCLI, Bash, Python, Ruby)
- Advanced networking and knowledge of the TCP/IP stack an OSI model
- Good analytical and troubleshooting skills are a must
- Excellent written and oral communication skills
- Excellent organizational, project management and communication skills
- Must be self-directed, flexible and have the ability to prioritize and handle multiple projects simultaneously
- Excellent interpersonal skills; interacts effectively and professionally with individuals at all levels
Desired Skills and Experience
See application page for details