Desired Skills and Experience

  • Design, provision, configure and maintain the platform operations to handle the scale of running several application stacks in the cloud that will be consumed worldwide
  • Automate the deployment and maintenance of cloud platform technologies
  • Oversee production operations, log management, data warehouse, and database operations, including management of Splunk services
  • Ensure all monitoring systems (IT, development, service management, Apdex) are in place
  • Enforce consistency of monitoring, reporting, and alarming systems
  • Help drive process improvements for service management, including: outage/incident management, rollbacks and reporting
  • Research emerging virtualization techniques and advise management
  • Perform capacity management, load and scalability planning
  • Ensure compliance with deployment and operations documentation
  • Assist management in development and optimization of operational cost models
  • Design cloud infrastructure for high reliability and availability
  • Build strategic and tactical plans for continued improvement of cloud architecture and operations
  • Assist in the establishment of 24x7 performance monitoring and response protocols
  • Provide on-call support outside of normal work hours/days
  • You’re driven, humble, and autonomous
  • You’re a quick study, a strong communicator, and you’re able to adapt to a fast-paced environment
  • You have a working knowledge of Agile Development practices (e.g., SCRUM, TDD)
  • You are or have the mindset of a developer, but are intrigued by the operational aspects of hosting developed solutions
  • You are devoted to automation
  • You’re an expert in Windows (IIS, SQL Server) and Linux
  • You have at least 1 years of hands-on production experience with Amazon Web Services (AWS), Google Cloud or Microsoft Azure. This includes:

Configuration of VPCs, with VPN to corporate network Experience setting up, maintaining and monitoring global production environments, QA and staging environments, with a strong understanding of the differing needs of such environments At least 6 months of experience in a professional production environment At least 6 months of experience managing networking infrastructure and monitoring at the application level

  • Configuration of VPCs, with VPN to corporate network
  • Experience setting up, maintaining and monitoring global production environments, QA and staging environments, with a strong understanding of the differing needs of such environments
  • At least 6 months of experience in a professional production environment
  • At least 6 months of experience managing networking infrastructure and monitoring at the application level
  • Performance optimization experience, including: troubleshooting and resolving network and server latency issues; performing hardware evaluation/selection tasks; performance vs cost vs time analysis
  • At least 1 year of experience with automation or scripting tools (e.g., GO, Python, Shell, PowerShell)
  • At least 6 months of experience with Ansible, Jenkins
  • You’re detail-oriented, with excellent documentation skills, and you’re someone who can successfully manage multiple priorities
  • Troubleshooting skills that range from diagnosing hardware/software issues to large scale failures within a complex infrastructure
  • Bachelors in Computer Science or equivalent work experience
  • Experience with Mongo, MS SQL Server, Splunk, Grafana, Terraform and Prometheus
  • Experience working with Docker, Kubernetes and GO Hands-on experience with performance, load and security penetration testing
  • Hands-on experience with building out and maintaining a continuous integration and delivery pipeline
  • We have current Production and Continuous Integration footprints in Google Cloud (primary), AWS, and Azure
  • Our front-end applications leverage React and React Native, Redux, Node, C#, and Knockout
  • Our APIs comprises of Golang, .NET and .NET core
  • Our backend comprises of MS SQL Server
  • We have a well built out CI pipeline that allows us to deploy and stand up customers on demand
  • We leverage Ansible heavily, Splunk (JSON Logs) is our blood line and we enjoy operational efficiency and accessibility through Hubot and StackStorm