What part will you play? If you’re looking for a place where you can make a meaningful difference, you’ve found it.
The work we do at Markel gives people the confidence to move forward and seize opportunities, and you’ll find your fit amongst our global community of optimists and problem-solvers. We’re always pushing each other to go further because we believe that when we realize our potential, we can help others reach theirs.
Join us and play your part in something special!
About the Role
We are seeking a senior technology leader to build and lead our Site Reliability Engineering (SRE), Disaster Recovery (DR), and FinOps capabilities across Cloud and Data Center environments.
This role owns the reliability, availability, scalability, resilience, and cost efficiency of all US and Bermuda infrastructure and applications. The successful candidate will establish modern SRE practices, ensure disaster recovery readiness, and drive financial discipline across the platform—partnering closely with Infrastructure, Application, Architecture, Security, and Finance teams. This is a strategic, engineering-led leadership role, not a traditional operations position.
Key Responsibilities
Site Reliability Engineering (SRE)
- Establish and lead an enterprise SRE function across cloud and data center platforms
- Define and operationalize SLOs, SLIs, and error budgets for critical services
- Partner with engineering and application teams to embed reliability into system design and delivery
- Continuously improving service availability, performance, and operational predictability
Observability, Monitoring & Metrics
- Own the enterprise observability strategy (monitoring, logging, tracing, alerting)
- Standardize tools and practices across infrastructure and applications
- Ensure alerts are actionable and aligned to customer and business impact
- Provide executive-level dashboards on platform health, reliability, and risk
Disaster Recovery & Resilience
- Define and own enterprise disaster recovery (DR) strategy, BCP, and execution
- Establish RTO/RPO standards aligned to business criticality
- Ensure DR plans are architected, automated, tested, and audit-ready
- Lead regular DR testing, failover exercises, and resilience reviews
- Partner with Architecture and Security teams to balance resilience, risk, and cost
Reliability Security & Resilience Engineering
- Design and implement security controls that are highly available, scalable, and fault tolerant.
- Collaborate with Security team to Identify and remediate security-related single points of failure (e.g., IAM, secrets management, certificate lifecycles).
- Work with Security team to ensure critical security services (IAM, PKI, WAF, secrets, logging) meet defined SLOs and recovery objectives.
- Partner with Enterprise Architecture to embed Zero Trust and defense-in-depth into resilient system designs.
Reliability Analytics & Data-Driven SRE
- Design and maintain reliability telemetry pipelines across metrics, logs, traces, and events.
- Build and maintain SLO, error budget, and reliability health dashboards aligned to business services.
- Develop and apply predictive analytics and AIOps techniques (anomaly detection, capacity forecasting, alert noise reduction).
Engineering & Platform Reliability
- Own the reliability architecture of cloud platforms across Azure and hybrid environments, ensuring availability, scalability, security, and cost efficiency are designed‑in by default.
- Partner with Cloud Engineering, Architecture, and Application teams to define standard platform patterns for high availability, resiliency, fault tolerance, and multi‑region design.
- Establish and enforce cloud reliability standards, including:
- Resilience patterns (active/active, active/passive, graceful degradation)
- Capacity planning and scalability strategies
- Platform guardrails for reliability, security, and cost controls
- Drive Infrastructure as Code (IaC) and platform automation to reduce configuration drift, manual intervention, and operational risk.
Chaos Engineering & Resilience Testing
- Establish and lead an enterprise Chaos Engineering program to proactively test system resilience and validate failure assumptions across cloud and application platforms.
- Define chaos engineering strategy, principles, and guardrails aligned with business criticality, SLOs, and risk tolerance.
- Partner with SRE, Cloud Engineering, and Application teams to:
- Design and execute controlled failure experiments (infrastructure, network, dependency, and application-level faults)
- Validate system behavior against defined SLOs, SLIs, and error budgets
- Integrate chaos testing into continuous delivery pipelines, game days, and resilience readiness exercises where appropriate.
Qualifications
Required
- 15+ years of experience in infrastructure, platform engineering, SRE, or operations leadership
- Proven experience building or leading an enterprise-scale SRE or reliability function
- Strong knowledge of cloud platforms (AWS, Azure, and/or GCP) and data center environments
- Hands-on experience with observability, monitoring, and automation technologies
- Deep understanding of disaster recovery architectures and resilience strategies
- Experience implementing FinOps practices, cost governance, and optimization
- Strong leadership, communication, and stakeholder management skills
Preferred
- Experience in large, complex, or regulated environments
- Background in hybrid or multi-cloud platforms
- Familiarity with DevSecOps and platform engineering models
US Work Authorization
US Work Authorization required. Markel does not provide visa sponsorship for this position, now or in the future.
Pay information:
The base salary offered for the successful candidate will be based on compensable factors such as job-relevant education, job-relevant experience, training, demonstrated competencies, geographic location, and other factors. The base salary range for the Senior Director position is $188k - $259k/year with a 55% bonus potential.
Markel Group (NYSE – MKL) a fortune 500 company with over 60 offices in 20+ countries, is a holding company for insurance, reinsurance, specialist advisory and investment operations around the world.
We’re all about people | We win together | We strive for better
We enjoy the everyday | We think further
In keeping with the values of the Markel Style, we strive to support our employees in living their lives to the fullest at home and at work.
All full-time employees have the option to select from multiple health, dental and vision insurance plan options and optional life, disability, and AD&D insurance.
We also offer a 401(k) with employer match contributions, an Employee Stock Purchase Plan, PTO, corporate holidays and floating holidays, parental leave.
Are you ready to play your part?
Choose ‘Apply Now’ to fill out our short application, so that we can find out more about you.
Caution: Employment scams
Markel is aware of employment-related scams where scammers will impersonate recruiters by sending fake job offers to those actively seeking employment in order to steal personal information. Frequently, the scammer will reach out to individuals who have posted their resume online. These "job offers" include convincing offer letters and frequently ask for confidential personal information. Therefore, for your safety, please note that:
Markel is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of any protected characteristic. This includes race; color; sex; religion; creed; national origin or place of birth; ancestry; age; disability; affectional or sexual orientation; gender expression or identity; genetic information, sickle cell trait, or atypical hereditary cellular or blood trait; refusal to submit to genetic tests or make genetic test results available; medical condition; citizenship status; pregnancy, childbirth, or related medical conditions; marital status, civil union status, domestic partnership status, familial status, or family responsibilities; military or veteran status, including unfavorable discharge from military service; personal appearance, height, or weight; matriculation or political affiliation; expunged juvenile records; arrest and court records where prohibited by applicable law; status as a victim of domestic or sexual violence; public assistance status; order of protection status; status as a smoker or nonsmoker; membership or activity in local commissions; the use or nonuse of lawful products off employer premises during non-work hours; declining to attend meetings or participate in communications about religious or political matters; or any other classification protected by applicable law.
Should you require any accommodation through the application process, please send an e-mail to the rarecruiting@markel.com.
No agencies please.