Machine Learning Infrastructure, Software Engineer
With GO-JEK in Bengaluru - INMore jobs from GO-JEK
Posted on January 06, 2020
About this job
Job type: Full-time
cloud, java, python
The Data Science Platform (DSP) team is tasked with building out AI capabilities throughout Gojek. We are building out these capabilities through both our Machine Learning Platform and also by building solutions that bridge data science and product engineering. Our work encompasses: Collaboration with data scientists and product teams in the development of innovative AI solutions. Development of an end-to-end platform that enables ML practitioners to rapidly experiment and deliver AI solutions to production. Production support for all systems deployed to the platform, thus freeing up data scientists from the operational burden while benefiting from economies of scale. Use of our domain expertise to enable AI innovation throughout Gojek in the form of wide collaboration, education, and the introduction of best practices. This role requires a deep understanding of the machine learning life cycle and how data scientists turn hypotheses into production systems. You will be tasked with designing and building the products that data scientists leverage at each stage of the machine learning life cycle, ensuring a rapid time to market for ML projects.
Design and build our Machine Learning Platform to help data scientists productionize their models and features faster.
Automate all parts of the data science lifecycle: feature engineering, model training, testing, and deployment.
Deploy, operate, and grow some of the largest ML systems in the region.
Collaborate with product teams to understand operational requirements.
Translate these requirements into observable architecture and SRE processes.
At least 5 years as an infrastructure or software engineer.
Experience with Go, Python, and shell script. Java optional.
Experience with cloud environments. Google Cloud preferred.
Experience with modern cloud deployment technology such as Terraform, Kubernetes, and Helm.
Understanding of Infrastructure as Code (IaC) concepts.E
Experience with deploying, operating, and debugging Big Data frameworks such as Spark, Flink, Kafka, and Airflow. Experience with ML frameworks such as TFX, Kubeflow, and MLflow is a plus.
Experience with relational and non-relational databases, including clustering and high-availability configurations.
Proven track-record building and operating large-scale, high-throughput, low-latency production systems. Experience with microservice architectures and technology (Docker, Istio, nginx) is a huge plus.
Great understanding of DevOps and Site Reliability Engineering (SRE) principles.