Senior Systems Engineer at Sauce Labs Inc (San Francisco, CA) (allows remote)

Desired Skills and Experience

Provide Strong Leadership on team under the guidance of the Systems Manager/Architect
Write tools and scripts to provide automation and self service solutions for ourselves and other teams
Design new systems to support production services
Install, configure and debug hardware and systems in our data center
Creatively solve scale challenges regarding a rapidly expanding cloud environment
Work with real hardware - Cisco UCS B & C series servers, SuperMicro Twin-Pro, storage (NAS and SAN), Mac-in-a-datacenter, custom appliances for mobile devices, load balancers, and beyond
Help improve monitoring and identify key performance metrics
Proactive R&D - discovering and implementing new tools, emerging technology, etc.
Disaster recovery design, implementation, and maintenance
Create NOC runbooks, procedures, documentation, and diagrams of the environments you manage
Troubleshooting and resolution of server/network issues
Help maintain hardware in Sauce’s colocation facilities
Help build out new data centers around the globe
Participation in 24x7 on-call rotation
Optimize hardware and configuration for improving hypervisor performance
Automating Deployment of operating systems to bare metal servers
Building and optimizing a ELK cluster for our development team to monitor and analyze production system usage
Able execute on high level goals independently and with cross functional teams
8+ years recent experience working as a Linux administrator/engineer at scale (hundreds of systems) and designing/deploying ‘highly available’ solutions
2+ years of recent professional experience designing, developing, and operating Configuration Management solutions such as Chef, Puppet, Salt (preferred), or Ansible (preferred) at scale
Solid experience in Linux tuning, profiling, and monitoring
Strong skills in at least one language: Python (preferred), Ruby, Bash:
Experience deploying/managing KVM-Qemu and LXC
Experience with Kubernetes, Docker and their ecosystems.
Experience managing day-to-day operations with Redis, Memcached
Solid understanding of cloud/networking/distributed computing environment concepts; including TCP/ IP connections, firewalls, VLANs, etc.
Familiar with ZFS on Linux and storage appliances (iSCSI and NFS)
Experience and understanding of contemporary metrics, monitors, and logging solutions especially statsD, Graphite, ELK, Splunk, Nagios, etc.
Highly organized, able to multi-task, able to work individually, as well as within a team, and across teams
Excellent communication skills, both verbal and written across all user levels
Deployment automation in physical and virtual environments (PXE, MAAS (preferred))
Experience with InSpec or a similar tool for testing configuration management.
Working knowledge of load balancing technologies (hard/soft)
Proven experience collaborating in a cross functional team environment
Familiarity with software engineering practices, including n-tier architecture, configuration management, development methodologies (e.g. agile, waterfall, spiral, prototyping), etc.
This role can be located remote from SF in the Continental US. Some travel to South Bay or SF is required