Desired Skills and Experience

  • Empowering our R&D teams, engineers, and data scientists to work at scale.
  • Designing and delivering systems engineering and site reliability strategy. Managing health/monitoring, growth, scalability, reliability, complexity, etc.
  • Building high-availability/disaster failover/recovery infrastructures and procedures.
  • Configuring, monitoring and administering Python services on containers.
  • Managing high-availability of database systems (Mysql, Postgresql, elasticsearch, rabbitmq)
  • Assisting dev/test/support teams, solving problems and automating common tasks
  • Troubleshooting problems, performing triage and recommending resolutions.
  • Developing, testing, improving tools, systems processes and documentation
  • Performing regular essential maintenance tasks – patches/upgrades, rebuilding machine images, phasing in infrastructure-as-code/versioned-infrastructure.
  • Research and development around increased cloud IaaS adoption (Kubernetes/Docker).
  • Strong Systems Engineering/Administration skills in a cloud environment, ideally GCP
  • Production use of Kubernetes and Docker 
  • Expert-level Linux Operating Systems administration experience
  • Strong skills in managing, deploying and scaling container-based services
  • Production exposure to micro-services
  • Excellent problem solving, troubleshooting skills
  • Proficiency in at least one administration/scripting language (Python, Go)
  • Good administration, optimization skills in at least one Database/RDBMS system (MariaDB/MySQL, Postgresql)
  • Familiarity with Redis, ElasticSearch, Postgresql
  • Storage and content delivery infrastructure management (GCS/S3, CloudFront, CloudFlare, Caching)
  • At home with at least one shell/scripting language – for administration and scripting. (bash, posix shell, ksh)
  • Good network design/engineering skills (TCP/IP, Routing, DNS, Firewalling)
  • Good communication skills
  • Professional working proficiency in the English language

Apply