Desired Skills and Experience
- Empowering our R&D teams, engineers, and data scientists to work at scale.
- Designing and delivering systems engineering and site reliability strategy. Managing health/monitoring, growth, scalability, reliability, complexity, etc.
- Building high-availability/disaster failover/recovery infrastructures and procedures.
- Configuring, monitoring and administering Python services on containers.
- Managing high-availability of database systems (Mysql, Postgresql, elasticsearch, rabbitmq)
- Assisting dev/test/support teams, solving problems and automating common tasks
- Troubleshooting problems, performing triage and recommending resolutions.
- Developing, testing, improving tools, systems processes and documentation
- Performing regular essential maintenance tasks – patches/upgrades, rebuilding machine images, phasing in infrastructure-as-code/versioned-infrastructure.
- Research and development around increased cloud IaaS adoption (Kubernetes/Docker).
- Strong Systems Engineering/Administration skills in a cloud environment, ideally GCP
- Production use of Kubernetes and Docker
- Expert-level Linux Operating Systems administration experience
- Strong skills in managing, deploying and scaling container-based services
- Production exposure to micro-services
- Excellent problem solving, troubleshooting skills
- Proficiency in at least one administration/scripting language (Python, Go)
- Good administration, optimization skills in at least one Database/RDBMS system (MariaDB/MySQL, Postgresql)
- Familiarity with Redis, ElasticSearch, Postgresql
- Storage and content delivery infrastructure management (GCS/S3, CloudFront, CloudFlare, Caching)
- At home with at least one shell/scripting language – for administration and scripting. (bash, posix shell, ksh)
- Good network design/engineering skills (TCP/IP, Routing, DNS, Firewalling)
- Good communication skills
- Professional working proficiency in the English language
Apply