Senior Staff Site Reliability Engineer, Cloud Observability
Company: Google
Location: New York City
Posted on: April 3, 2026
|
|
|
Job Description:
Minimum qualifications: Bachelor’s degree in Computer Science, a
related field, or equivalent practical experience. 8 years of
experience with data structures and algorithms. 8 years of
experience with software development in one or more programming
languages (e.g., Go, C, C++, Python, Java). 5 years of experience
leading projects and designing, analyzing, and troubleshooting
distributed systems. 4 years of experience leading projects and
providing technical leadership. Preferred qualifications: Master's
degree in Computer Science or Engineering. About the job Site
Reliability Engineering (SRE) combines software and systems
engineering to build and run large-scale, massively distributed,
fault-tolerant systems. SRE ensures that Google Cloud's
services—both our internally critical and our externally-visible
systems—have reliability, uptime appropriate to customer's needs
and a fast rate of improvement. Additionally SRE’s will keep an
ever-watchful eye on our systems capacity and performance. Much of
our software development focuses on optimizing existing systems,
building infrastructure and eliminating work through automation. On
the SRE team, you’ll have the opportunity to manage the complex
challenges of scale which are unique to Google Cloud, while using
your expertise in coding, algorithms, complexity analysis and
large-scale system design. SRE's culture of intellectual curiosity,
problem solving and openness is key to its success. Our
organization brings together people with a wide variety of
backgrounds, experiences and perspectives. We encourage them to
collaborate, think big and take risks in a blame-free environment.
We promote self-direction to work on meaningful projects, while we
also strive to create an environment that provides the support and
mentorship needed to learn and grow. DevEx SRE is responsible for
the Observability Infrastructure (Monarch/Cloud Monitoring/Cloud
Logging/Alerting) as well as Cloud Developer infrastructure
(Firebase, Artifact Registry, Cloud Build, Gemini Code Assist).
Cloud Lifecycle SRE (the product area) make it easier for customers
to onboard onto the cloud, easier to use more cloud services
together, and easier to build & manage applications. It is part of
the Google Cloud Platform Reliability organization, which aims to
provide a reliable, high-performance and secure platform exposing
APIs on behalf of all API producers at Google. Behind everything
our users see online is the architecture built by the Technical
Infrastructure team to keep it running. From developing and
maintaining our data centers to building the next generation of
Google platforms, we make Google's product portfolio possible.
We're proud to be our engineers' engineers and love voiding
warranties by taking things apart so we can rebuild them. We keep
our networks up and running, ensuring our users have the best and
fastest experience possible. The US base salary range for this
full-time position is $262,000-$365,000 bonus equity benefits. Our
salary ranges are determined by role, level, and location. Within
the range, individual pay is determined by work location and
additional factors, including job-related skills, experience, and
relevant education or training. Your recruiter can share more about
the specific salary range for your preferred location during the
hiring process. Please note that the compensation details listed in
US role postings reflect the base salary only, and do not include
bonus, equity, or benefits. Learn more about benefits at Google .
Responsibilities Participate in setting the strategic direction of
the Observability and Cloud Developer Infrastructure SRE teams, and
partner with the Producer Foundations/D&E dev partners to
ensure SRE engages in the most critical and impactful programs. Set
clear technical direction, intent and best practices within your
team. Provide expert advice and guidance to other infrastructure
team staff and software developers, and mentor and grow team
leaders. Respond effectively to service failures to maintain the
platform conformance to SLOs.
Keywords: Google, White Plains , Senior Staff Site Reliability Engineer, Cloud Observability, IT / Software / Systems , New York City, New York