Webinar Banner 2

Senior Site Reliability Engineer

Senior Site Reliability Engineer
by Admin on 09-12-2022 at 2:41 pm


Job Description

Process Requires a mix of strategic engineering and design along with hands-on, technical work and problem-solving skills. Passion for quality and automation, an ability to understand complex systems and a desire for continual improvement and innovation. Explore and evaluate new technologies and solutions to push the capabilities forward, getting ahead of customers’ needs. Able to communicate concepts at different levels of abstraction to exercise influence across and at multiple levels of the organization. Share results from incident investigations to a wide IT audience through a blameless postmortem process with the goal of exposing faults so they are fixed instead of leaving issues unresolved. Ability to execute a change through an enterprise environment with consistency and reliability by applying modern software, operations and quality principles such as progressive rollouts, problem detection and rollbacks if needed.

Development Experienced developer in one or more modern high-level languages such as Javascript, C#, or Java and able to contribute at minimum 50% of time to the development of the application. Ability to effectively apply modern object-oriented software design patterns. Proficient in one or more scripting languages (i.e. Perl, Python, Powershell, Bash) Knowledge and experience with Continuous Integration tools such as Jenkins, TeamCity, etc. to design and create reliable releases of the software.

Operations Work closely with software development engineers, systems engineers, network engineers, database administrators, monitoring team, and information security team in supporting new features, services and releases.

Deep understanding of and ability to debug standard networking protocols and components such as: HTTP, DNS, TCP/IP, ICMP and Load Balancing. Experience with infrastructure provisioning on public, private, and hybrid clouds using state of the art tools such as Terraform, vRelaize, Cloud Foundry or CloudFormation. Experience with configuration management tools such as Puppet, Chef or Ansible.

Effectively use metrics, monitoring, and instrumentation of the application and infrastructure to:

  • proactively discover problems before users notice
  • achieve optimal application performance, stability and availability
  • determine optimal configurations for application software and application servers
  • scale infrastructure to meet demand

Experience with and desire to influence emerging operation techniques including, but not limited to: Delivery and deployment through containers, Docker Swarm, or Kubernetes; AutoRemediation to automatically resolve incidents; Applications to test resiliency of systems (ex: Chaos Monkey)


You must possess the below minimum qualifications to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.

Minimum Qualifications

  • Bachelor’s degree in a Technical Discipline with 6+ years (Computer Science or computer engineering or Electrical Engineering or Mathematics or Physics or related technical discipline)
  • Minimum of 6+ years’ experience with remote deployment and administration of Linux servers
  • Hands-on experience in configuring monitoring tools like Prometheus, Grafana and Zabbix
  • Experience running a highly visible, 24×7 mission-critical service using SRE and DevOps practice

Preferred Qualifications

  • Experience to work in global team
  • Master’s degree in Computer Science, Computer Engineering, or a STEM field
  • Experience running serverless infrastructure
  • GIT and parallel development, branching strategies and methodologies including CI/CD
  • Interesting personal projects or contributions to open-source projects
Apply for job

To view the job application please visit jobs.intel.com.

Share this post via: