Browse all jobs
    deepset

    Site Reliability Engineer

    deepset

    Germany Berlin10 days ago
    Engineering
    DevOps
    Mid-Level
    Remote

    TL;DR Site Reliability Engineer to own and evolve deepset's cloud and customer infrastructure end to end. You'll work across SaaS, private cloud, and on-prem environments to make our self-hosted platform production-ready, drive CI/CD and GitOps maturity, and reduce complexity at scale. Your work will directly shape how deepset's AI platform is built, deployed, and scaled for our own cloud and for customers running it in their own environments.]]> Why deepset Haystack, thousands of developers build advanced AI applications, while our Enterprise platform helps teams scale across use cases, users, and environments. We’re remote-first, flexible, and built on trust and ownership. You’ll work alongside strong technical talent, take on meaningful challenges, and help turn complex AI into solutions that are practical, reliable, and ready for the real world.]]> What you will do
    • Build and operate real-world infrastructureDesign, configure, and evolve infrastructure that runs both in our cloud and inside customer environments (SaaS, private cloud, on-prem).
    • Make self-hosted production-readyHelp us deliver a production-grade, self-hosted platform that can be deployed on any Kubernetes setup in weeks - not months.
    • Drive automation & platform maturityImprove CI/CD pipelines, GitHub workflows, and GitOps setups so teams can ship faster with confidence.
    • Reduce complexity and costContinuously simplify systems and optimize infrastructure spend without compromising performance or reliability.
    • Shape how we buildChampion best practices in reliability, scalability, and security across the organization, not as rules, but as working systems.
    ]]>
    Requirements
  1. 2-5 years of experience working with large-scale production infrastructure
  2. Fluent German language skills
  3. Experience with distributed or service-oriented architectures
  4. Hands-on expertise with:
    • AWS
    • Kubernetes
    • CI/CD and GitOps (e.g. ArgoCD)
  5. Working knowledge of Infrastructure as Code (Terraform preferred)
  6. Solid troubleshooting skills - you can debug across systems, not just within one layer
  7. A pragmatic mindset: you balance speed, simplicity, and reliability
  8. Ownership and accountability - you take responsibility for systems end-to-end
  9. Ability to work independently while staying aligned with the team’s goals
  10. ]]>
    Nice to have
  11. Familiarity with observability stacks (e.g. Datadog, Prometheus)
  12. Experience optimizing cloud costs at scale
  13. Interest or experience in Machine Learning / LLM systems
  14. Experience improving developer experience and platform tooling using AI agents
  15. Contributions to SRE practices like postmortems, SLIs/SLOs, and reliability engineering culture
  16. ]]>
    Benefits
  17. Remote-first setup with flexible hours & tech of your choice
  18. 30 days vacation + extra days for family sick leave
  19. Competitive salary & stock options for every team member
  20. Monthly sports & mental health support allowance with Oliva
  21. Annual learning & development budget
  22. Monthly team socials & in-person meetups
  23. Dog-friendly Berlin HQ
  24. ]]>

    Site Reliability Engineer

    deepset · Germany Berlin

    Apply for this role