At Apixio we are building a data infrastructure and family of products to help transform the HealthCare industry. Our data scientists and engineering teams are creating smart products that use clinical patient data to do this and need great Site Reliability Engineers to build, deploy and support the infrastructure that brings their work to our customers.   Our Site Reliability Engineering team support our AWS footprint as infrastructure as code, are super security minded and know how to help the engineering team create a highly reliable site. They also automate everything. Our operations team share an on-call rotation backed by our engineering teams.   As one of our SREs you will be capable of doing many of the following:

  • Own and maintain Apixio services and data infrastructure in production
  • Analyze and improve the efficiency, scalability, and reliability of our backend systems
  • Capacity planning and buildout
  • Create scalable and reliable monitoring and alerting that works
  • When things go bad, perform advanced troubleshooting of our systems
  • Build scalable, secure and measurable infrastructure with code
  • Create automation of engineering deployments
  • Support engineering team in implementing system reliability
  • Support disaster recovery design, implementation and testing
  • Support Apixio’s office infrastructure (Cisco ASA, Meraki switch & AP’s etc)
  • Build tooling to allow the self-service of AWS offerings by engineering teams   And you will have knowledge of many of the following:

  • Strong understanding of Unix and system administration
  • Strong programming skills in languages Python, Go, Scala/Java, C++
  • Strong knowledge of best in class security practices and testing methods
  • Amazon Web Services (AWS) and APIs
  • Strong knowledge of the configuration and maintenance of common big infrastructure components such as Cassandra, Redis, FluentD, Apache/Django/Flask, Kafka, Redis, Elasticsearch & Hadoop
  • Strong knowledge of internet service architecture (TCP/IP, HTTP, DNS, routing, load balancing)
  • Experience with monitoring and logging applications like FluentD, Graylog, Datadog
  • Experience with deployment and config management systems like Salt Stack, Ansible and HashiCorp   A strong candidate will have:

  • Past experience and success stories managing  a significant infrastructure in AWS, and have maintained a 24x7 commercial SLA
  • An a passion for killer SLA’s, Secure Infrastructure and Automation of Everything
  • A BS, MS in Computer Science / Engineering or equivalent  

Desired Skills and Experience

See application page for details