Desired Skills and Experience

  • Manage multiple large scale Hadoop cluster environments, handling all Hadoop environment builds, including design, capacity planning, cluster setup, performance tuning and ongoing monitoring and alerting.
  • Evaluate, recommend and implement systems software and hardware for the enterprise system including capacity modeling.
  • Contribute to the evolving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, and price.
  • Troubleshoot complex issues and proactively put in place changes to address
  • Ensure our testing capabilities protect our customers from a rapidly changing infrastructure
  • Demonstrate leadership within the department and to outside departments through strong communication skills
  • Creation of metrics and measures of utilization and performance. Fine tuning based upon the data
  • Capacity planning and implementation of new/upgraded hardware and software releases as well as for storage infrastructure.
  • Break down complex requests into manageable tasks
  • Ability to work well with a global team of highly motivated and skilled personnel - interaction and dialog are requisites in this dynamic environment.
  • Research and recommend innovative, and where possible, automated approaches for system administration tasks. Identify approaches that leverage our resources, provide economies of scale, and simplify remote/global support issues.
  • 3 years of professional experience supporting production medium to large scale Linux environments.
  • 1 + years of experience working with Hadoop (Apache, CDH, or Hortonworks) and related technology stack.
  • Experience setting up and running production clusters
  • Experience proactively monitoring and fine tuning clusters
  • Experience being the final technical level of escalation in your organization
  • A deep understanding of Hadoop design principals, cluster connectivity, security and the factors that affect distributed system performance.
  • Solid understanding of configuration/state management tools (puppet, chef, ansible).
  • Expert experience with at least one of the following languages; python, Perl, ruby, or bash.
  • Good collaboration & communication skills, the ability to participate in an interdisciplinary team.
  • Strong written communications and documentation experience
  • Knowledge of best practices related to security, performance, and disaster recovery.