Desired Skills and Experience
- Manage multiple large scale Hadoop cluster environments, handling all Hadoop environment builds, including design, capacity planning, cluster setup, performance tuning and ongoing monitoring and alerting.
- Evaluate, recommend and implement systems software and hardware for the enterprise system including capacity modeling.
- Contribute to the evolving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, and price.
- Troubleshoot complex issues and proactively put in place changes to address
- Ensure our testing capabilities protect our customers from a rapidly changing infrastructure
- Demonstrate leadership within the department and to outside departments through strong communication skills
- Creation of metrics and measures of utilization and performance. Fine tuning based upon the data
- Capacity planning and implementation of new/upgraded hardware and software releases as well as for storage infrastructure.
- Break down complex requests into manageable tasks
- Ability to work well with a global team of highly motivated and skilled personnel - interaction and dialog are requisites in this dynamic environment.
- Research and recommend innovative, and where possible, automated approaches for system administration tasks. Identify approaches that leverage our resources, provide economies of scale, and simplify remote/global support issues.
- 3 years of professional experience supporting production medium to large scale Linux environments.
- 1 + years of experience working with Hadoop (Apache, CDH, or Hortonworks) and related technology stack.
- Experience setting up and running production clusters
- Experience proactively monitoring and fine tuning clusters
- Experience being the final technical level of escalation in your organization
- A deep understanding of Hadoop design principals, cluster connectivity, security and the factors that affect distributed system performance.
- Solid understanding of configuration/state management tools (puppet, chef, ansible).
- Expert experience with at least one of the following languages; python, Perl, ruby, or bash.
- Good collaboration & communication skills, the ability to participate in an interdisciplinary team.
- Strong written communications and documentation experience
- Knowledge of best practices related to security, performance, and disaster recovery.