Desired Skills and Experience
- Maintain technical operations for our iPaaS cloud infrastructure; administer Linux systems, including configuration, troubleshooting, and automation, and some limited Windows system administration duties;
- Ensure high reliability and availability for production systems, including upgrade and release processes and incident handling;
- Manage and deploy internal and external monitoring solutions for maintaining high availability for production systems;
- Develop effective tooling, alerts, and response to both identify and address reliability risks;
- Scripting abilities in python, or JVM-based languages;
- Define and evangelize cloud-related optimizations and best practices to improve reliability and performance;
- Responsible for troubleshooting cloud infrastructure, systems, network, and application stacks
- Work with fellow operations engineers and development teams on complex problems, and make decisions and recommendations about systems improvements after analyzing possible courses of action;
- Perform on-call duty as part of a team maintaining the availability and performance of our cloud infrastructure as well as the various internal services and systems that these core services depend on.
- 5-7 years of cloud engineering experience;
- Bachelors in Computer Science or a relevant field;
- Strong working knowledge of Linux (Centos) systems and applications including Tomcat, Java, Apache, ElasticSearch, ActiveMQ, Ngnix Proxy;
- Experience with administering AWS or other IaaS/PaaS Infrastructures;
- Experience with configuration management / systems automation tools at scale (e.g. Puppet, Chef);
- Experience with big data systems and/or database administration (e.g. MySQL, Mongo DB, PostGresSQL, NoSQL, and SQL) a plus;
- Proficiency with scripting languages;
- Experience with network management systems and network monitoring tools such as Nagios, Icinga, Kibana, LogStash, Cacti;
- Ability to work independently, strong interpersonal and communications skills
Apply