
Principal Site Reliability Engineer
- Barcelona
- Permanente
- Tiempo completo
- Assess and optimize current infrastructure by analyzing and documenting existing pipelines, and workflows to identify areas for efficiency, scalability, and resilience improvements.
- Design and implement scalable, cloud-ready solutions using Infrastructure as Code (IaC) principles, and manage the deployment of microservices in K8s environments.
- Manage and maintain Kubernetes environments in production, ensuring high availability, reliability, and optimized performance.
- Lead complex troubleshooting efforts, diagnosing and resolving production issues related to Kubernetes, cloud environments, networking, and distributed systems.
- Act as a technical advisor, collaborating with cross-functional teams to ensure smooth deployment and operation of systems.
- Drive innovation by researching, evaluating, and implementing continuous improvement initiatives to optimize system performance, reliability, and security.
- Provide mentorship and guidance to other team members, fostering knowledge sharing and upskilling in DevOps, cloud technologies, and infrastructure best practices.
- Opportunity to work on cutting-edge technologies and complex challenges.
- Great atmosphere of working together with professionals and some of the most engaged and knowledgeable people in the industry.
- Advance your professional skills and technical expertise, through individual competence development plans and tailored training.
- Be part of a world-growing and renowned organization with origins dating back to 1864.
- Hybrid working model
- Medical Scheme
- Commuting Allowance
- Life Insurance
- Pension Plan
- Kindergarten Allowance
- 40 hours per week with a flexible schedule
- Home working allowance (up to 2 days per week)
- Tax friendly flexible scheme
- 23 days of annual leave
- Employee Referral scheme.
- Master's degree in Computer Science, Information Technology, or a similar discipline; or a bachelor's degree with equivalent relevant experience.
- 10+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, Cloud Infrastructure, or related roles.
- Expert-level knowledge of Kubernetes K8s: managing, scaling, and troubleshooting clusters in production.
- Strong experience with Infrastructure as Code (IaC) using tools such as Terraform, Helm, Pulumi, or CloudFormation.
- Proficiency in managing and optimizing databases (e.g., ClickHouse, MongoDB) and message brokers (e.g., RabbitMQ).
- Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, ELK Stack).
- Strong scripting and automation skills (e.g., Bash, Python, Go).
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and hybrid cloud environments.
- Solid understanding of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, ArgoCD) to automate deployments and infrastructure changes.
- Fluency in written and spoken English.
- Certifications in Kubernetes (e.g., CKA, CKAD) or cloud platforms (e.g., AWS Certified Solutions Architect, Azure DevOps Engineer).
- Experience with multi-cluster Kubernetes environments.
- Knowledge of security best practices for cloud and containerized environments.
- Hands-on experience with storage solutions (e.g., Ceph, S3-compatible storage).
- Familiarity with renewable energy or IoT systems.