Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Senior SRE Engineer

Senior SRE Engineer
by Admin on 05-05-2023 at 2:03 pm

Full Time
Hyderabad, India
Posted 2 years ago
Applications have closed

Website Synopsys

Job Description and Requirements

We are seeking a highly skilled professional to join our team. The successful candidate will have the responsibility of designing, implementing, and maintaining the observability platform that monitors the health of our production systems. The candidate should have a proven background in software development, system administration, and monitoring tools, as well as a passion for building scalable and reliable systems.

Key Responsibilities:

Design and implement the SRE & Observability platform to monitor the status & health of our production systems providing a holistic view of the environment.
Partner with other teams to ensure that monitoring tools are effectively integrated with other systems and processes.
Ensure that the SRE & Observability platform is scalable, reliable, and can handle large volumes of data.
Implement SRE best practices for the team and identify KPIs for various systems, organizations, and stakeholders.
Automate the deployment and configuration of monitoring tools to reduce human error and increase efficiency.
Develop custom scripts and tools to extend the functionality of the monitoring platform, including, but not limited to Proactive remediation and Self-Healing.
Perform root cause analysis on incidents, prepare detailed reports to present to the stakeholders, and develop solutions to prevent similar incidents from occurring in the future.
Optimize and refine the SDLC and the On-Call and escalation processes.
Create documentation for all the systems, tools, and processes created by the team, as well as documenting the learnings from incidents and escalations.
Provide guidance and mentorship to junior members of the team.
Drive the design and implementation of major SRE initiatives.
Act as a SME on SRE & Observability, providing guidance to other teams across the organization.
Continuously evaluate and implement new tools and technologies to improve the SRE platform.

Qualifications:

Excellent programming and experience skills in Python.
Experience with data tools such as Elasticsearch is a must. Other technologies such as Prometheus, Grafana, etc. is a plus.
Good knowledge of Linux OS, Networking and NFS technologies.
Expertise in Cloud computing platforms such as AWS, GCP, or Azure.
Familiarity with containerization technologies such as Docker and Kubernetes.
Experience and knowledge of the SDLC including Source code management tools, CI/CD pipelines and end to end testing.
Excellent problem-solving skills and attention to detail.
Ability to work collaboratively with other teams and stakeholders.
Excellent communication skills, both verbal and written.
Ability to drive and mentor junior members of the team.
Experience leading major SRE & Observability initiatives.
Exceptional knowledge of SRE & Observability best practices and trends.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
10+ years of experience in software development, system administration, or a related field.

If you meet the above qualifications and are passionate about building scalable and reliable systems, we encourage you to apply for this exciting opportunity.

Share this post via:

Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Appreciate your take, Rahul. You’re absolutely right that market scale drives architectural investment—scalar dominated when desktop and enterprise ruled, and…

— Jonah McLeod on June 29, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Well.. I found this to be a funny article. Flynn's critique is fine and good...but not really the driving factor…

— Rahul Razdan on June 29, 2025
Reachability in Analog and AMS. Innovation in Verification
Apologies for that slip-up on our part. Failing memories!

— Bernard Murphy on June 27, 2025
Reachability in Analog and AMS. Innovation in Verification
swka: This is true, I worked with MunEDA up until the Cadence acquisition. Before that I worked with Solido up…

— Daniel Nenni on June 26, 2025
Reachability in Analog and AMS. Innovation in Verification
One quick correction. WiCkeD was MunEDA tool, which was acquired by Cadence. So it is never part of Synopsys. Synopsy…

— swka on June 26, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
At Simplex Micro, the name says it all. Founder Dr. Thang Tran chose it to reflect his belief that in…

— Jonah McLeod on June 25, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Thanks for the thoughtful read—and you're right, we’re in a fascinating inflection point. On your first point: Lunar Lake doesn’t…

— Jonah McLeod on June 24, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
An interesting article for sure, as we are in a sea of change. I have perhaps two nitpicks; - Lunar…

— Xebec on June 24, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Recent Forum Threads

Recent Article Comments