Cloud Operations Engineer (Site Reliability) Looking for a Cloud Operations Engineer (SRE) to support the Cloud Platform team’s production environment. Responsible for lifecycle management of tools and frameworks used to maintain cloud infrastructure/services. Cloud Operations Engineer (Site Reliability) Responsibilities:

  • Manage customer requests
  • Support a 24x7 cloud production environment
  • Build and deploy Continuous integration pipeline
  • Perform Linux administration and troubleshooting in a large scale system Cloud Operations Engineer (Site Reliability) Required Skills:

  • BSCS degree; Master’s preferred
  • 3+ years’ experience administering large, complex systems (preferably in a cloud-based environment)
  • Strong Linux OS knowledge
  • Experience with Chef, Puppet, Ansible, or other configuration management tools
  • Scripting abilities in Python, Perl, Bash, or Go
  • Proficient in one or more monitoring tools: Zabbix, Elasticsearch, collectd, statsd, Logstash, Ganglia or Nagios/OpsView Desired Skills:

  • Strong knowledge of networking services and protocols a plus
  • Big Data experience (Hadoop, Spark, Kafka, Storm)
  • Openstack platform experience  

Desired Skills and Experience

See application page for details