Desired Skills and Experience

  • Operate and maintain a highly available real-time metrics monitoring and alerting platform
  • Develop solutions to real-time and off-line metric collecting from various customers
  • Work with customers to establish baseline parameters and implement their requests.
  • Work closely across the organization to facilitate measuring the applications and infrastructure
  • Develop solutions for real-time issues including monitoring and feature requests
  • Support 24x7 large platform for availability.
  • Work closely with development for feature enhancements, requests and also to implement new functionalities
  • Linux
  • OmniOS
  • Time series Databases (TSDB)
  • Git
  • Nagios
  • Slack
  • Circonus
  • Prometheus
  • Grafana
  • Graphite
  • Sparc
  • Splunk
  • AWS
  • Openstack
  • Load Balancers
  • GO
  • Bash
  • Python
  • Perl
  • 5+ years systems engineering or systems administration experience
  • Bachelors or Masters in Computer Science or related discipline or comparable experience in the industry.
  • Experience in operation of large-scale distributed systems including operational knowledge of backend systems that participate in a complex ecosystem
  • Some knowledge of metrics analyzation, related technologies and open source frameworks
  • Good current knowledge of Unix/Linux environments
  • Basic knowledge of scaled systems and how to administer and operate.
  • Enjoy working with metrics metric analysis, metric quality, and visualization/dashboard creation
  • Good communicator, able to analyze and clearly articulate complex issues and technologies understandably and engagingly
  • Great problem solving skills, with a strong ability to troubleshoot real-time Unix based distributed systems
  • Adaptable, proactive and willing to take ownership of issues and complex tasks
  • Keen attention to detail and high level of commitment
  • Comfortable working in a fast-paced agile environment.Requirements change quickly and our team needs to constantly adapt to moving targets
  • Basic scripting skill in common languages.
  • Collection, transformation and enrichment with computing frameworks such as Spark
  • Knowledge of Time-Series data and ability to explain to customers
  • Extensive scripting skills in basic languages like GO,Perl
  • Deep understanding of highly structured, speed oriented databases
  • Analytics background with focus on application and system level metrics
  • Dashboard/UI configuration skills
  • Knowledge of common monitoring platforms such as Nagios
  • Advanced Linux troubleshooting skills from application to OS level
  • Deep understanding of JIRA and Kanban
  • Proficient in documentation and writing technical SMOPs
  • Understanding of high-availability systems and how to build and maintain

Apply