Salesforce.com hosts web services and applications written by thousands of internal developers and tens of thousands of customers to provide the largest business automation cloud on the planet. The compute infrastructure that enables this innovation and value is evolving to fully embrace lights-out operations, single-click deploy to tens of thousands of nodes, and services that self-heal and self-optimize.Salesforce is building out our compute infrastructure team to reinvigorate the way we deliver, deploy, operate, secure, monitor, and repair our data centers and the code that runs across them - at consumer web scale. We are looking to add experienced distributed systems engineers who can step up and own big chunks of that vision.Our team is building software infrastructure optimized for stateful services at Salesforce, using Docker containers and Kubernetes orchestration. There are lots of exciting problems to solve across the stack… Declarative manifests; resource management and scheduling; monitoring and analytics; stateful service cluster operators; security; and the developer experience. You will get to start from the ground up, building and operating a large scale distributed system, while working with and contributing to the open source community.

Desired Skills and Experience

  • Eat, sleep, and breathe services. You have experience balancing live-site management, feature delivery, and retirement of technical debt.
  • Experience designing, developing, debugging, and operating resilient distributed systems.
  • Experience with managing large, complex systems in cloud-based infrastructure.
  • Resolve complex technical issues and drive innovations that improve system availability, resilience and performance.
  • Familiarity with crash-only and recovery-oriented software design.
  • Excited by building reliable, self-healing services on unreliable hardware.
  • Agilista capable of driving and delivering thin slices of functionality on a regular cadence with data-driven feedback loops.
  • Be passionate about automation and to avoid doing things manually.
  • Create, maintain and share technical documentation used by engineers and other team members.
  • Having fun!
  • 5+ years of professional experience in systems engineering in large scale Linux/UNIX data center environments
  • 5+ years professional experience in Java, Go, Scala, C++, Python, Ruby, Perl, or other language
  • Solid understanding of how to configure, deploy, manage and maintain large cloud hosted systems; including auto-scaling, monitoring, performance tuning, troubleshooting and disaster recovery.
  • Experience delivering on strategic initiatives effectively in a fast paced environment while supporting day-to-day issues
  • In depth, hands-on experience with Linux, networking, server, and cloud architectures.
  • Knowledge of metrics & monitoring (e.g., Splunk, Nagios etc.) and configuration management tools (e.g., Chef, Puppet, etc.).
  • Deep understanding of network technologies like DNS, Load Balancing, SSL, TCP/IP, SQL, HTTP.
  • Proficiency with source control, continuous integration, and testing pipelines.
  • Bachelor’s Degree in Computer Science or any engineering discipline Or Equivalent Experience
  • Experience with software based compute infrastructure such as AWS, Azure, GCE, OpenStack, CoreOS
  • Experience with container orchestration systems such as Kubernetes, Docker Compose
  • Experience with resource Management systems such as Borg, Mesos, Aurora, Marathon, Yarn
  • Expertise in live site operations for stateful services, such as Hadoop, HBase
  • Understanding of industry security best practices.