Desired Skills and Experience
- Operate and maintain a highly available real-time metrics monitoring and alerting platform
- Develop solutions to real-time and off-line metric collecting from various customers
- Work with customers to establish baseline parameters and implement their requests.
- Work closely across the organization to facilitate measuring the applications and infrastructure
- Develop solutions for real-time issues including monitoring and feature requests
- Support 24x7 large platform for availability.
- Work closely with development for feature enhancements, requests and also to implement new functionalities
- Linux
- OmniOS
- Time series Databases (TSDB)
- Git
- Nagios
- Slack
- Circonus
- Prometheus
- Grafana
- Graphite
- Sparc
- Splunk
- AWS
- Openstack
- Load Balancers
- GO
- Bash
- Python
- Perl
- 5+ years systems engineering or systems administration experience
- Bachelors or Masters in Computer Science or related discipline or comparable experience in the industry.
- Experience in operation of large-scale distributed systems including operational knowledge of backend systems that participate in a complex ecosystem
- Some knowledge of metrics analyzation, related technologies and open source frameworks
- Good current knowledge of Unix/Linux environments
- Basic knowledge of scaled systems and how to administer and operate.
- Enjoy working with metrics metric analysis, metric quality, and visualization/dashboard creation
- Good communicator, able to analyze and clearly articulate complex issues and technologies understandably and engagingly
- Great problem solving skills, with a strong ability to troubleshoot real-time Unix based distributed systems
- Adaptable, proactive and willing to take ownership of issues and complex tasks
- Keen attention to detail and high level of commitment
- Comfortable working in a fast-paced agile environment.Requirements change quickly and our team needs to constantly adapt to moving targets
- Basic scripting skill in common languages.
- Collection, transformation and enrichment with computing frameworks such as Spark
- Knowledge of Time-Series data and ability to explain to customers
- Extensive scripting skills in basic languages like GO,Perl
- Deep understanding of highly structured, speed oriented databases
- Analytics background with focus on application and system level metrics
- Dashboard/UI configuration skills
- Knowledge of common monitoring platforms such as Nagios
- Advanced Linux troubleshooting skills from application to OS level
- Deep understanding of JIRA and Kanban
- Proficient in documentation and writing technical SMOPs
- Understanding of high-availability systems and how to build and maintain
Apply