Desired Skills and Experience

  • Improve monitoring setup and tools to ensure all systems, services, application events and data is captured
  • Desire to automate all daily tasks
  • Assisting with design, build and scaling out our AWS infrastructure
  • Carry out postmortem of incidents, determining the root cause analysis, to ensure they do not reoccur
  • Communicating any service degradation and outages accordingly
  • Reporting clearly by use of graphics/graphs on usage, support and monitored event trends
  • Willingness to mentor and share knowledge with the team is a must
  • Maintain documentation
  • Some travel may be required
  • Participate in on call rota
  • Basic working knowledge of core Amazon Web Services
  • Working knowledge of linux command line
  • Practical experience of a scripting language
  • Practical experience of troubleshooting/investigation of new/reoccurring problems
  • Trend analysis – i.e. interpreting monitoring graphs, usage and data
  • Experience of monitoring tools (e.g. CloudWatch, DataDog, Splunk)
  • Knowledge of CI tools (CircleCI, Jenkins)
  • Basic working knowledge of networking – dns, routing, gateways, other network protocols, TCP/IP, TLS
  • Working with multiple teams, local and/or remote
  • Ability to work on your own and within a team
  • Good written and verbal communication
  • Ability to learn on the job
  • Experience with high-traffic, scalable web applications
  • In-depth knowledge of HTTP protocol
  • Experience with Infrastructure as Code Terraform, Docker, Github