800x100 static WP 3 (2)

Site Reliability Engineer

Site Reliability Engineer
by Admin on 06-10-2022 at 3:48 pm

Are you data-driven?  We at NetApp believe in the transformative power of data – to expand customer touchpoints, to foster greater innovation, and to optimize operations.  We are designed for simplicity, optimized to protect, created to embrace future opportunity, and open to enrich choice.  We are the data authority for hybrid cloud, and we are helping our customers realize the full potential of their data.

We’ve built a Data Fabric for a data-driven world – to simplify and integrate data management across the resources that are best for the business.  With the Data Fabric, our customers can harness the power of cloud data services, build cloud infrastructures, and modernize storage through data management.

By harnessing the power of hybrid cloud data services, customers gain the freedom of choice to securely manage and move data – anywhere, on any cloud. Only NetApp can help organizations deliver data-rich customer experiences when they rapidly test and deploy new applications that easily use data and services regardless of where they reside or in what form.

Job Summary

The Cloud Insights team is one of NetApp’s fastest moving teams. We provide a SaaS service our customers use to manage cloud and on-prem infrastructure and applications. We are seeking Site Reliability Engineers to help us delight customers and improve and grow our service.

As a Cloud Insights SRE, you’ll have the opportunity to work with modern cloud and container orchestration technologies in a production setting. You’ll have broad responsibilities and the chance to develop your skills in many different areas. You’ll maintain services by measuring and monitoring availability, latency and overall system health. You’ll play an important role in scaling systems sustainably through automation and evolving them by pushing for changes to improve reliability and velocity.

Responsibilities

  • Work with other SREs and developer teams to ensure maximum performance and reliability of services
  • Work and consult with development teams on new features and software architecture
  • Develop software, both as components of our solution and outside of the solution, for deployment automation, packaging, and monitoring visibility
  • Analyze and improve latency, performance, and availability of services
  • Resolve critical and high visibility customer issues
  • Work with other SREs on compliance process automation

Qualifications

  • Strong understanding of cloud-based infrastructure and networking with regards to performance and scale
  • Strong experience in Linux system administration
  • Scripting and automation with Python or Bash
  • Experience with at least one major cloud provider (AWS preferred)
  • Experience with containerization and container orchestration tools such as Kubernetes
  • Experience operating a cloud service infrastructure, including
    • Scaling and high availability patterns
    • Issue troubleshooting and resolution
    • Software deployment and CI/CD pipelines
    • Monitoring
  • Experience with infrastructure-as-code tools such as Terraform and SaltStack
  • Understanding of microservices architecture and REST interfaces
  • Experience in certified SaaS such as SOC2 or FedRAMP a plus
Share this post via: