Senior Site Reliability Engineer - Observability (x/f/m)
Doctolib
Berlin1 day ago
Engineering
DevOps
Senior
Your Impact
We are looking for a Senior Site Reliability Engineer to join the Core Reliability & Observability team in Platform Engineering.
Your mission will be to shape Doctolib's observability strategy and ensure our platform remains reliable, debuggable, and scalable at a European scale. You will work in a feature team developing logging, metrics, tracing, and alerting capabilities, contributing directly to supporting 400,000 health professionals and 80 million patients in their daily healthcare journey.
Working in the tech team at Doctolib means building innovative products and features to improve the daily lives of care teams and patients.
What you'll do
Your responsibilities include but are not limited to:
- Lead the observability strategy across the platform, with an emphasis on building scalable, developer-friendly logging and tracing capabilities
- Identify and lead large-scale cross-cutting reliability initiatives, including improvements to our incident detection, response, and postmortem analysis capabilities
- Take part in the on-call rotation, and actively contribute to improving our on-call experience by refining alerting, reducing noise, and ensuring actionable telemetry
Who you are
Before you read on: if you don't have the exact profile described below, but you feel this job description matches your skill set, we still encourage you to apply.
You'll be a great fit if you:
- Have a solid hands-on experience (3y+) on a large-scale production platform
- Have proven experience with cloud platforms such as AWS, Azure or Google Cloud
- Have solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
- Have a strong understanding of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows
- Have deep expertise in observability tooling and architecture, such as:
- Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector
- Tracing: OpenTelemetry or proprietary APMs
- Metrics: Prometheus, Thanos, Datadog, or equivalent
- Have proficiency in at least one programming language (Ruby, Python, Go, Java, etc.) and a deep understanding of infrastructure as code principles
- Have experience with monitoring and observability tools
- Like troubleshooting performance issues in complex environments
- Are fluent in English
It would be fantastic if you:
- Have experience contributing to open-source observability projects
- Have worked in a high-growth tech environment
- Are passionate about developer experience and platform engineering
Life at Doctolib Tech
- Our solutions are built on a single fully cloud-native platform that supports web and mobile app interfaces, multiple languages, and is adapted to country and healthcare specialty requirements.
- Our stack is composed of Rails, TypeScript, Java, Python, Kotlin, Swift, and React Native.
- We leverage AI ethically across our products to empower patients and health professionals. Discover our AI vision here.
Want to learn more about our tech culture and environment? Visit the Doctolib Tech site.
What we offer
- A Deutschlandticket (Germany-wide public transport pass) fully paid for by Doctolib
- 28 vacation days + 1 additional day for each full calendar year of employment (up to a maximum of 30 days)
- Work from abroad for up to 10 days per year thanks to our flexibility days policy
- Company health insurance with great supplementary benefits through our partner Allianz
- Company pension scheme (bAV) through Allianz with an employer subsidy of 40% (15% within the probationary period)
- The Doctolib Parent Care program, which includes one month additional parental leave and much more
- Enrollment in Doctolib's long-term employee value sharing plan called DoctoGrowth
- Free mental health and coaching services through our partner Moka.care
- Subsidized sports membership through our partner Urban Sports Club
- A flexible workplace policy offering both hybrid and office-based mode
- Alongside healthy snacks and our regular breakfast buffet, we provide a subsidized meal benefit
- For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
- Relocation support in case of international mobility
- Access to the best AI tools for coding, development and dedicated training
Our interview process
- Recruiter Interview
- Technical SRE Interview
- System Design Interview
- Behavioral Interview
- At least one reference check
We want your experience to be clear, respectful, and transparent. Learn more about our hiring process on our candidate experience page.
Job details
- Permanent position
- Tech stack: Kubernetes, Prometheus, OpenTelemetry, Loki, ArgoCD, Ruby, Python, Go
- Full-time
- Berlin, Germany
- Hybrid work setup (up to 2 remote days per week)
- Start date: as soon as possible
We welcome everyone
At Doctolib, we are committed to improving access to healthcare for everyone. This translates into our recruitment process. We evaluate candidates based solely on qualifications and motivation, without any form of discrimination.
The more diverse ideas are heard, the more our product will truly improve healthcare for all. You are welcome to apply to Doctolib, regardless of your gender, religion, age, sexual orientation, ethnicity, or disability.
To ensure equal opportunities, we invite you to exclude personal information (e.g., pictures, age) from your applications. If you require any accommodation, please let us know for support during the hiring process.
Join us in building the healthcare we all dream of!
Your data privacy
All information provided is processed by Doctolib for application management. For data processing details, click here: Germany l France l Italy l Netherlands. Please contact hr.dataprivacy(at)doctolib.com for inquiries or to exercise your rights.
Senior Site Reliability Engineer - Observability (x/f/m)
Doctolib · Berlin