Observability Engineer
As an Observability Engineer, you will build and evolve our modern observability platform, ensuring our systems stay healthy and performant for millions of users. We’re moving from simple monitoring to an observability-first mindset. As an Observability Engineer, you’ll be at the heart of this shift. You’ll design solutions that give us deep insights into system health, helping us reduce MTTD and MTTR. You’ll work with a comprehensive toolkit to provide analytics, alerting, and remediation strategies for our cloud and on-premise applications. This role is about more than just keeping the lights on; it’s about building a platform that lets us truly understand our systems. You’ll set the standards for observability, ensuring it’s baked into every new system we build. This role is eligible for inclusion in the Company’s hybrid working from home policy.
- Building sophisticated monitoring dashboards using log data, metrics, traces and profiles from sources like New Relic, Grafana, Splunk, Kibana and Pyroscope.
- Administrating an incident response platform, like PagerDuty, to enable fast and efficient resolution of incidents.
- Working with service owners on integrations while supporting the onboarding of telemetry data.
- Using automation and orchestration platforms to streamline manual processes and workflows.
- Promoting an observability-first mindset and encourage best practices across teams.
- Contributing to the development of standards for monitoring, logging and tracing.
- Evolving team processes and approaches.
- Mentoring colleagues in the use of new technologies or practices.
- Maintaining and administer existing monitoring and analytic tools.
- Collaborating across teams to solve complex challenges and prevent recurrence.
- Excellent knowledge of contemporary monitoring, analytics tooling and best practice (required).
- Strong experience integrating systems and applications with monitoring and APM tools (required).
- Demonstrable experience instrumenting applications for observability, ideally with OpenTelemetry (required).
- Experience with IaC, automation and orchestration tools such as Ansible and Terraform (required).
- Basic programming experience, ideally with Python, Golang or Javascript (required).
- Basic scripting ability with Powershell and Bash (required).
- Strong experience working in a large scale, 24/7 enterprise where system uptime is paramount (required).
- Experience with public and private Cloud (required).
- Proficiency with Linux operating system (required).
- Ability to work with autonomy and collaborate well within a wider team (required).
bet365 is one of the world's leading online gambling companies, founded in 2000 by Denise Coates CBE. The company employs over 9,000 people and serves more than 100 million customers in 27 languages, with a market-leading position built on its In-Play betting product. It offers betting across 96 sports and hundreds of thousands of streaming events, handling millions of requests and bets at peak times. Headquartered in Stoke-on-Trent, England, bet365 is known for its software innovation and continues to develop its online betting and gaming platform.
