Desired Skills and Experience
- Improve monitoring setup and tools to ensure all systems, services, application events and data is captured
- Desire to automate all daily tasks
- Assisting with design, build and scaling out our AWS infrastructure
- Carry out postmortem of incidents, determining the root cause analysis, to ensure they do not reoccur
- Communicating any service degradation and outages accordingly
- Reporting clearly by use of graphics/graphs on usage, support and monitored event trends
- Willingness to mentor and share knowledge with the team is a must
- Maintain documentation
- Some travel may be required
- Participate in on call rota
- Basic working knowledge of core Amazon Web Services
- Working knowledge of linux command line
- Practical experience of a scripting language
- Practical experience of troubleshooting/investigation of new/reoccurring problems
- Trend analysis – i.e. interpreting monitoring graphs, usage and data
- Experience of monitoring tools (e.g. CloudWatch, DataDog, Splunk)
- Knowledge of CI tools (CircleCI, Jenkins)
- Basic working knowledge of networking – dns, routing, gateways, other network protocols, TCP/IP, TLS
- Working with multiple teams, local and/or remote
- Ability to work on your own and within a team
- Good written and verbal communication
- Ability to learn on the job
- Experience with high-traffic, scalable web applications
- In-depth knowledge of HTTP protocol
- Experience with Infrastructure as Code Terraform, Docker, Github