Desired Skills and Experience

  • Create a resilient and highly operable production environment with 24x7 availability, high performance, scalable and zero downtime releases in AWS environment.
  • Manage large MySQL database clusters and NoSQL systems such as Redis, DynamoDB, and Cassandra.
  • Manage regional deployments and set up disaster recovery of Kafka data pipelines, systems and stores in AWS environment.
  • Collaborate with Engineers to create a continuous delivery environment and processes.
  • Instrument and monitor the health and availability of services, with fault detection, alerting, triage and recovery (automated and manual).
  • Work closely with Twilio’s cloud infrastructure, orchestration, and security teams to help implement company-wide security and operability initiatives and to provide tooling requirements.
  • Performance manage (with benchmarking and monitoring of vital metrics), capacity plan, and resolve performance problems affecting service levels.
  • Write scripts and runbooks to automate procedures.
  • Enable auto-scaling.
  • Your background will be that of Senior Engineer who has had considerable experience in a highly-complex technical operations environment with cloud-based services.
  • Minimum 5+ years experience building complex distributed systems. In this role, you focused on reliability, high-availability, performance, scalability, capacity planning, backup and recovery, business continuity planning and automation of everything.
  • Strong Amazon AWS experience in a production environment.
  • Experience with managing and automating configuration of MySQL database clusters.
  • Hands-on experience with cloud infrastructure technologies, including continuous integration tools, configuration management, systems monitoring and alerting tools.
  • Experience with managing systems in distributed regions in the cloud or on-site.
  • Adept at troubleshooting and administering Linux systems, dealing with networking issues, and fine tuning instrumentation and alerting systems.
  • Demonstrated experience of agile processes, continuous integration, test automation and release management.
  • Significant development experience in at least one modern scripting language, preferably Python.
  • Preferably experience with operating a high load data pipeline and exposure to technologies such as Kafka, Kinesis, Spark, S3, and Redshift.