Operations Engineer at EMS Software (Centennial, CO) (allows remote)
Desired Skills and Experience
- Design, provision, configure and maintain the platform operations to handle the scale of running several application stacks in the cloud that will be consumed worldwide
- Automate the deployment and maintenance of cloud platform technologies
- Oversee production operations, log management, data warehouse, and database operations, including management of Splunk services
- Ensure all monitoring systems (IT, development, service management, Apdex) are in place
- Enforce consistency of monitoring, reporting, and alarming systems
- Help drive process improvements for service management, including: outage/incident management, rollbacks and reporting
- Research emerging virtualization techniques and advise management
- Perform capacity management, load and scalability planning
- Ensure compliance with deployment and operations documentation
- Assist management in development and optimization of operational cost models
- Design cloud infrastructure for high reliability and availability
- Build strategic and tactical plans for continued improvement of cloud architecture and operations
- Assist in the establishment of 24x7 performance monitoring and response protocols
- Provide on-call support outside of normal work hours/days
- You’re driven, humble, and autonomous
- You’re a quick study, a strong communicator, and you’re able to adapt to a fast-paced environment
- You have a working knowledge of Agile Development practices (e.g., SCRUM, TDD)
- You are or have the mindset of a developer, but are intrigued by the operational aspects of hosting developed solutions
- You are devoted to automation
- You’re an expert in Windows (IIS, SQL Server) and Linux
- You have at least 1 years of hands-on production experience with Amazon Web Services (AWS), Google Cloud or Microsoft Azure. This includes:
Configuration of VPCs, with VPN to corporate network Experience setting up, maintaining and monitoring global production environments, QA and staging environments, with a strong understanding of the differing needs of such environments At least 6 months of experience in a professional production environment At least 6 months of experience managing networking infrastructure and monitoring at the application level
- Configuration of VPCs, with VPN to corporate network
- Experience setting up, maintaining and monitoring global production environments, QA and staging environments, with a strong understanding of the differing needs of such environments
- At least 6 months of experience in a professional production environment
- At least 6 months of experience managing networking infrastructure and monitoring at the application level
- Performance optimization experience, including: troubleshooting and resolving network and server latency issues; performing hardware evaluation/selection tasks; performance vs cost vs time analysis
- At least 1 year of experience with automation or scripting tools (e.g., GO, Python, Shell, PowerShell)
- At least 6 months of experience with Ansible, Jenkins
- You’re detail-oriented, with excellent documentation skills, and you’re someone who can successfully manage multiple priorities
- Troubleshooting skills that range from diagnosing hardware/software issues to large scale failures within a complex infrastructure
- Bachelors in Computer Science or equivalent work experience
- Experience with Mongo, MS SQL Server, Splunk, Grafana, Terraform and Prometheus
- Experience working with Docker, Kubernetes and GO Hands-on experience with performance, load and security penetration testing
- Hands-on experience with building out and maintaining a continuous integration and delivery pipeline
- We have current Production and Continuous Integration footprints in Google Cloud (primary), AWS, and Azure
- Our front-end applications leverage React and React Native, Redux, Node, C#, and Knockout
- Our APIs comprises of Golang, .NET and .NET core
- Our backend comprises of MS SQL Server
- We have a well built out CI pipeline that allows us to deploy and stand up customers on demand
- We leverage Ansible heavily, Splunk (JSON Logs) is our blood line and we enjoy operational efficiency and accessibility through Hubot and StackStorm