Desired Skills and Experience
- Be responsible for leading the Site Reliability Engineering as well as Networking team;
- Support and facilitate the work of a team of engineers who develop, scale and maintain critical systems in our infrastructure;
- Develop and grow the people in your teams, you lead by example and drive the team’s performance;
- Innovate, design, and implement solutions to maintain availability, reliability and efficiency of the services offered by Kreditech;
- Keep our systems up and running and automate all handling of failure conditions;
- Engage with external vendors to identify, negotiate and implement efficient solutions for operating our systems;
- Collaborate closely with the engineering teams to ensure fast delivery at high quality;
- Manage an international team of highly skilled and motivated professionals in a hands-on way across two sites.
- 300+ servers to operate by your team (10% AWS, 90% VPS);
- SaltStack, Debian, Ubuntu, ZFS, KVM, Open vSwitch;
- Transition from VPS to AWS;
- Grow internal adoption of Container technologies (e.g. Docker).
- At least 5 years of experience managing IT operations of a similar size;
- Very good understanding of Linux (Debian/Ubuntu), networking, and databases;
- In-depth knowledge of the state of the art DevOps tooling and methodology and current trends;
- Load balancing and High Availability tools and strategies;
- Strong software engineering background with programming background in at least one language (Java or other JVM-based language, Go, Node.js, C, C++, etc.);
- Deep knowledge of database operations (PostgreSQL, MongoDB) would be a distinct advantage;
- Experience with container orchestration systems (e.g. Kubernetes) is a distinct advantage;
- Able to participate in 24x7 on-call rotation;
- Hands-on management and leadership skills;
- A University degree in Mathematics or Computer Science;
- Proven track record in remote management is a plus;
- Proficient in English;
- Minimal Travel is required.
Apply