Site Reliability Engineer (SRE) at HyperScience (New York, NY)

Desired Skills and Experience

You’ll ensure reliability, scalability and performance. You will tackle problems relating to critical services and prevent problem recurrence.
Creating and managing build/deployment pipelines for continuous integration and continuous delivery to improve the quality and availability of business products.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and blameless postmortems.
You love analyzing, monitoring, and troubleshooting large-scale distributed systems.
You have extensive knowledge of networking and operating systems (e.g. processes, threads, concurrency).
You are comfortable using at least one programming language like Python, Go, C++, Java, Ruby, and scripting languages like Shell and Perl.
You’re familiar with algorithms, data structures, and complexity analysis.
Experience with Unix/Linux operating systems internals and administration (e.g., filesystems, inodes, system calls) or networking (e.g., TCP/IP, routing, network topologies and hardware, SDN).
Expertise in designing, analyzing and troubleshooting large-scale systems.
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Ability to debug and optimize code and automate routine tasks.

Apply