Desired Skills and Experience

  • Eat, sleep, and breathe services. You have experience balancing live-site management, feature delivery, and retirement of technical debt.
  • Experience designing, developing, debugging, and operating resilient distributed systems.
  • Experience with managing large, complex systems in cloud-based infrastructure.
  • Resolve complex technical issues and drive innovations that improve system availability, resilience and performance.
  • Familiarity with crash-only and recovery-oriented software design.
  • Excited by building reliable, self-healing services on unreliable hardware.
  • Agilista capable of driving and delivering thin slices of functionality on a regular cadence with data-driven feedback loops.
  • Be passionate about automation and to avoid doing things manually.
  • Create, maintain and share technical documentation used by engineers and other team members.
  • Having fun!
  • 5+ years of professional experience in systems engineering in large scale Linux/UNIX data center environments
  • 5+ years professional experience in Java, Go, Scala, C++, Python, Ruby, Perl, or other language
  • Solid understanding of how to configure, deploy, manage and maintain large cloud hosted systems; including auto-scaling, monitoring, performance tuning, troubleshooting and disaster recovery.
  • Experience delivering on strategic initiatives effectively in a fast paced environment while supporting day-to-day issues
  • In depth, hands-on experience with Linux, networking, server, and cloud architectures.
  • Knowledge of metrics & monitoring (e.g., Splunk, Nagios etc.) and configuration management tools (e.g., Chef, Puppet, etc.).
  • Deep understanding of network technologies like DNS, Load Balancing, SSL, TCP/IP, SQL, HTTP.
  • Proficiency with source control, continuous integration, and testing pipelines.
  • Bachelor’s Degree in Computer Science or any engineering discipline Or Equivalent Experience
  • Experience with software based compute infrastructure such as AWS, Azure, GCE, OpenStack, CoreOS
  • Experience with container orchestration systems such as Kubernetes, Docker Compose
  • Experience with resource Management systems such as Borg, Mesos, Aurora, Marathon, Yarn
  • Expertise in live site operations for stateful services, such as Hadoop, HBase
  • Understanding of industry security best practices.