Job description
Technical Skills Must have:
Must‑Have
Observability & Reliability Engineering
· Strong hands‑on experience across core observability pillars including metrics, traces, service health and distributed systems visibility
· Practical experience implementing OpenTelemetry across application, platform and infrastructure layers
· Ability to design, deploy and operate end‑to‑end observability pipelines (collector‑to‑backend, agent management, data flows, routing and filtering)
· Strong understanding of SLI/SLO frameworks, error budgets and reliability‑focused operating models
· Experience defining alerting strategy, tuning thresholds and reducing operational noise through effective signal engineering
Observability Platforms & Tooling
· Hands‑on expertise in one or more enterprise‑grade observability platforms (Dynatrace, Splunk Observability, Datadog or equivalent)
· Proficiency with Prometheus ecosystem components including Alertmanager
· Experience designing clear, insightful dashboards and visualisations using Grafana
· Strong troubleshooting capability using metrics, traces and dependency insights to diagnose performance and availability issues
Cloud & Platform Monitoring
· Strong technical experience with at least one major public cloud (AWS, Azure or GCP)
· Monitoring fundamentals across cloud‑native services including compute, storage, networking, load balancers and managed services
· Solid understanding of cloud networking constructs (VPC/VNet, subnets, routing, NAT, firewalls and security groups)
Containers & Kubernetes
· Working knowledge of Kubernetes objects (pods, services, deployments) and operational lifecycle
· Experience monitoring containerised/app‑modernisation workloads
· Basic experience with Helm or Kustomize for packaging, configuration and deployment
· Ability to troubleshoot application behaviour and platform-level issues within container environments
Programming & Automation
· Proficiency in one or more languages (Python, Go, Java) to support automation and tooling
· Experience writing automation scripts and utilities supporting observability and SRE practices
· Awareness of integrating observability checks within CI/CD pipelines
· Comfort with shell scripting for diagnostics and operational tasks
Data & Analytics
· Strong understanding of time‑series data and telemetry characteristics
· Hands‑on experience with PromQL, SignalFlow, Metrics Explorer or equivalent query languages
· Ability to analyse latency percentiles (p95/p99), error rates and throughput metrics
· Working knowledge of SQL for querying telemetry backends or data stores
Extra information
- Status
- Open
- Education Level
- Secondary School
- Location
- Newbury
- Type of Contract
- Full-time jobs
- Published at
- 03-05-2026
- Profession type
- ICT
- Full UK/EU driving license preferred
- No
- Car Preferred
- No
- Must be eligible to work in the EU
- No
- Cover Letter Required
- No
- Languages
- English
Get similar vacancies sent to your mailbox
Fill in below which area you are searching in for a similar function and don't forget your e-mail address!