SRE – Site Reliability Engineer, Senior

Website Synopsys
What You’ll Be Doing
- Accountable for ensuring that underlying systems are working as expected. Thus enabling the deployed services to be reliable.
- Serve as a SME in Observability and Resolution Capability
- Conceive, design, build and run services (observable and self-resolvable) that improve the overall reliability of deployed platforms.
- Promote Continuous Improvement culture.
- Identify, craft, and maintain SLIs and SLOs for teams, as well as metrics such as MTTR and Error budgets.
- Suggest architecture improvements and recommend process improvements.
- Evaluate new technology options.
What You Bring To The Team
- 7+ years of experience as an SRE, DevOps or Systems engineer
- Knowledge and exposure to Linux Operating Systems and Networking
- Knowledge of databases – DBA, SQL, PL/SQL, NoSQL
- Hands-on experience with tools like GIT, Ansible, Helm, Robin bundles, Docker, Docker Registry, Kubernetes (Open source and Robin platform) and Terraform
- Proficiency in Shell scripting and Python programming (Data Structures, Algorithms, OOP Concepts and Design Patterns). PowerShell Programming is a value add.
- Expertise in bare metal, cluster and cloud deployment practices
- Experience in Cloud Infrastructure Automation – Azure (preferred), GCP, AWS or Alibaba.
- Experience on Cloud Operations tools
- Monitoring tools – Nagios, Beats, Elastic stack, Prometheus
- ITSM tools – ServiceNow (preferred) or Remedy
- Good knowledge and working experience on ITIL and Agile processes.
- Critical problem-solving skills
- Cross team partnership and program management skills
- Excellent command in spoken as well as written English and native Korean language
Apply for job
To view the job application please visit sjobs.brassring.com.
Chip War without Soldiers