Desired Skills and Experience

  • Provide Strong Leadership on team under the guidance of the Systems Manager/Architect
  • Write tools and scripts to provide automation and self service solutions for ourselves and other teams
  • Design new systems to support production services
  • Install, configure and debug hardware and systems in our data center
  • Creatively solve scale challenges regarding a rapidly expanding cloud environment
  • Work with real hardware - Cisco UCS B & C series servers, SuperMicro Twin-Pro, storage (NAS and SAN), Mac-in-a-datacenter, custom appliances for mobile devices, load balancers, and beyond
  • Help improve monitoring and identify key performance metrics
  • Proactive R&D - discovering and implementing new tools, emerging technology, etc.
  • Disaster recovery design, implementation, and maintenance
  • Create NOC runbooks, procedures, documentation, and diagrams of the environments you manage
  • Troubleshooting and resolution of server/network issues
  • Help maintain hardware in Sauce’s colocation facilities
  • Help build out new data centers around the globe
  • Participation in 24x7 on-call rotation
  • Optimize hardware and configuration for improving hypervisor performance
  • Automating Deployment of operating systems to bare metal servers
  • Building and optimizing a ELK cluster for our development team to monitor and analyze production system usage
  • Able execute on high level goals independently and with cross functional teams
  • 8+ years recent experience working as a Linux administrator/engineer at scale (hundreds of systems) and designing/deploying ‘highly available’ solutions
  • 2+ years of recent professional experience designing, developing, and operating Configuration Management solutions such as Chef, Puppet, Salt (preferred), or Ansible (preferred) at scale
  • Solid experience in Linux tuning, profiling, and monitoring
  • Strong skills in at least one language: Python (preferred), Ruby, Bash:
  • Experience deploying/managing KVM-Qemu and LXC
  • Experience with Kubernetes, Docker and their ecosystems.
  • Experience managing day-to-day operations with Redis, Memcached
  • Solid understanding of cloud/networking/distributed computing environment concepts; including TCP/ IP connections, firewalls, VLANs, etc.
  • Familiar with ZFS on Linux and storage appliances (iSCSI and NFS)
  • Experience and understanding of contemporary metrics, monitors, and logging solutions especially statsD, Graphite, ELK, Splunk, Nagios, etc.
  • Highly organized, able to multi-task, able to work individually, as well as within a team, and across teams
  • Excellent communication skills, both verbal and written across all user levels
  • Deployment automation in physical and virtual environments (PXE, MAAS (preferred))
  • Experience with InSpec or a similar tool for testing configuration management.
  • Working knowledge of load balancing technologies (hard/soft)
  • Proven experience collaborating in a cross functional team environment
  • Familiarity with software engineering practices, including n-tier architecture, configuration management, development methodologies (e.g. agile, waterfall, spiral, prototyping), etc.
  • This role can be located remote from SF in the Continental US. Some travel to South Bay or SF is required