Desired Skills and Experience

  • Implement, configure, and maintain specialized storage solutions and environments
  • Maintain documentation on system capabilities, storage environments, equipment, and procedures for use by other team members
  • Perform storage administration of data-intensive activities including:
  • Independently write, document, and maintain system or storage administration tools and scripts for internal use. Research and obtain storage-engineering tools available outside of NREL as appropriate
  • Install, configure, test, and maintain innovative storage-related computing hardware and software systems, coordinating with vendors as applicable. Document and report problems to the team, and follow issues through to successful problem resolution
  • Work with the HPC Operations team in a primarily Linux environment to identify and provide hardware, software, and support services that enable and advance NREL’s mission with data-intensive science and engineering goals
  • Make technical recommendations on system software configuration, hardware configuration, user policies, security procedures, and administration procedures
  • Monitor systems and inform operations staff of system events (new features, failures, patches) and the impact of these on the user community. Participate in the Operations Team’s Change Management process to track changes
  • Use, develop, and deploy monitoring tools to provide usage reporting and problem alerting
  • Apply all applicable NREL and DOE policies and procedures to ensure the security and reliability of provided services
  • Share knowledge with other team members, including training others in your areas of expertise
  • You will be expected to learn and adapt new technologies
  • A solid foundation in Linux systems engineering, storage engineering, and data systems architectures
  • Experience with backup software, SAN concepts, tape drives, and tape libraries
  • Experience with Network Attached Storage solutions
  • Storage zoning
  • Ability to research and formulate hardware designs and act according to the decisions
  • Scripting languages such as Python, PERL, Bash
  • TCP/IP Network environment concepts and management of TCP/IP network equipment
  • Some knowledge of InfiniBand
  • Experience with storage clusters such as Ceph, Swift, or Cinder
  • Sun/Oracle SAMQFS management
  • Experience with parallel file systems commonly used in High Performance Computing clusters such as Lustre or GPFS
  • Some Windows systems administration knowledge a plus
  • Experience in systems administration for data-centric servers and services
  • Experience connecting to monitoring tools such as ZenOss, Nagios, etc
  • Experience with Ansible a plus
  • Some experience with git DCVS