Principle Site Reliability Developer (JoinOCI)
With Oracle Corporation in Hyderabad - INMore jobs from Oracle Corporation
Posted on March 28, 2021
About this job
Job type: Full-time
Role: Database Administrator, System Administrator
Industry: Cloud Computing, Cloud Services, Software Development / Engineering
Company size: 10k+ people
Company type: Public
python, go, oracle
Oracle Information Technology is seeking a Site Reliability Engineer or a Software Development Engineer with 10 to 15 years of experience to work with our innovative infrastructure tools development team. A successful candidate will use their experience to design, build, operate and support infrastructure tools and operations of all tools that fall under "Infrastructure as Code" projects.
The role requires significant skills in two of the following three areas: Python/Go/Java, Kubernetes, and cloud infrastructure automation with Terraform and Ansible. Additional skill sets that are appreciated are Linux server administration, automation and knowledge of networking and services running on cloud platforms. The role’s primary focus is providing solutions for infrastructure and services by leveraging software development and industry standard solutions to automate many tasks required to enable and manage our offerings. In addition, this role as this engineer is responsible for complex problem resolution, creating and improving procedures and facilitating communication. Other duties include researching, proofing, and authoring technical documentation that are beneficial to the company. This is a great career opportunity for a highly motivated individual who wants to extend and utilize his or her solid and broad skills.
Responsibilities will include working with a global team of SRE’s and developers to provide a complete solution. You will also work with other development teams to integrate multiple applications into a cohesive whole. End-to-end automation for deployment, configuration, monitoring, self-healing and alerting will be a continual challenge.
-- Automate Kubernetes administrative functions
-- Create and manage CI/CD pipelines
-- Build reusable code and libraries for future use
-- Create automated unit and functional tests
-- Collaborate with other team members and stakeholders
-- Contribute in a DevOps team with rotating on-call requirements
Skills and Qualifications
-- Proficient in two of three areas: (1) Python, Go or Java, (2) Kubernetes and (3) Terraform and Ansible
-- Proficient in Python, Go or Java
-- Proficient in administering and automating Kubernetes clusters
-- Proficient in administering, deploying and configuring cloud infrastructure with Terraform and Ansible
-- Proficient with creating and maintaining CI/CD pipelines
-- Proficient with code versioning tools, such as Git, Mercurial or Subversion
-- Good understanding of Agile software development principles including using common tools such as JIRA
-- Clear understanding of web technologies like REST
-- Basic understanding of database languages such as SQL
-- 10 to 15 years’ development experience
-- Experience with Development Operations or Site Reliability Engineering
-- The work can be demanding at times, particularly as deadlines approach, when extra hours may be required based on the candidate's effective deliverable capacities.
-- Bachelor’s Degree in science or engineering (Computer science preferable)
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
A BS or MS in Computer Science, or equivalent. Identifies and implements complex solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies and implements complex solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 8+ years experience of running large scale customer facing web services.