VP of Site Reliability Engineering
Smart Data Solutions, a leading provider of data management, claim routing and workflow solutions to health plans and TPAs, is looking for a VP of Site Reliability Engineering to join our team!
The VP of Site Reliability Engineering (SRE) will lead SDS’s Support & Monitoring organization, ensuring the reliability, scalability, and performance of our healthcare automation platform. This executive role oversees Engineering Managers and their teams of Software Engineers, driving operational excellence, proactive monitoring, and customer-focused support strategies. The VP will set the vision for SRE, implement best practices, and foster a culture of resilience and continuous improvement.
What you’ll be doing?
Strategic Leadership
- Define and execute the SRE strategy to ensure high availability, reliability, and performance of SDS’s SaaS platform.
- Establish organizational goals, KPIs, and SLAs for system uptime, incident response, and customer satisfaction.
- Drive alignment between SRE, Product, Engineering, and Customer Success teams.
Team Management
- Lead and mentor Engineering Managers overseeing Support & Monitoring teams.
- Build a high-performing SRE organization focused on proactive monitoring, automation, and rapid incident resolution.
- Support career development and succession planning for technical leaders and engineers.
Operational Excellence
- Oversee incident management processes, ensuring timely resolution and root cause analysis.
- Implement robust monitoring, alerting, and observability frameworks to detect and prevent issues before they impact customers.
- Drive automation initiatives to reduce manual intervention and improve system reliability.
Customer-Centric Support
- Serve as an escalation point for critical customer issues impacting system reliability.
- Partner with Customer Success and Account Management to ensure transparency and trust during incidents.
- Develop strategies to improve customer experience through proactive communication and reliability improvements.
Process Improvement & Governance
- Establish best practices for incident response, change management, and post-mortem reviews.
- Ensure compliance with healthcare security and regulatory standards (HIPAA, HITRUST).
- Continuously analyze reliability metrics to identify trends and implement improvements.
Innovation & Technology
- Evaluate and adopt emerging technologies for observability, automation, and resilience.
- Champion cloud-native architectures and scalability enhancements.
- Collaborate with Product and Engineering to influence design decisions for reliability.
Perform other duties as assigned. The duties set forth above are essential job functions for the role. Reasonable accommodations may be made to enable individuals with disabilities to perform essential job functions.
What we’re looking for?
Required skills:
Education & Experience
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 10+ years of experience in software engineering, operations, or site reliability, with at least 5 years in senior leadership roles.
- Proven track record of building and scaling SRE or DevOps teams in a SaaS environment.
- Experience managing engineering managers and large distributed teams.
- Strong background in incident management, monitoring, and operational excellence.
Technical Expertise
- Deep knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures.
- Expertise in observability tools (Datadog, Prometheus, Grafana, Splunk) and proactive monitoring strategies.
- Strong understanding of automation frameworks and Infrastructure as Code (Terraform, Ansible).
- Familiarity with containerization and orchestration (Docker, Kubernetes).
- Proficiency in programming or scripting languages (Java preferred).
- Knowledge of healthcare compliance and security standards (HIPAA, HITRUST).
- Experience with CI/CD pipelines and reliability engineering best practices.
Soft Skills
- Strategic thinker with the ability to align technical decisions to business objectives.
- Exceptional leadership and team-building skills; proven ability to mentor and develop technical leaders.
- Strong communication and stakeholder management skills for cross-functional collaboration.
- Analytical mindset with a focus on data-driven decision-making.
- Ability to thrive in a fast-paced, high-growth environment and manage competing priorities.
Location:
This role is located in our Dallas, TX office.
Why this is the company for you?
Top Benefits & Perks:
- A company culture that is authentic, innovative, and collaborative! Our most powerful strength is our people! We build impactful solutions for our customers - their success is our success!
- A professional development and growth-oriented workplace
- Generous benefits including, health insurance, short-term, and long-term disability
- 401(k) with a company match to provide a better future in your retirement years
- A flexible environment with a competitive paid time off package; including vacation, holiday, give-back day, and a floating day
Who is Smart Data Solutions?
Smart Data Solutions (SDS) is a technology leader in healthcare process automation and interoperability. As a strategic partner, SDS helps clients digitally transform their operations, delivering tangible value through reduced costs, streamlined workflows, and an improved customer experience. With data, AI, and automation at its core, SDS provides solutions in Digital Mailroom and Data Capture, Clearinghouse, Intelligent Medical Records, Prior Authorization and Claim Operations. Trusted by over 500 clients—including multiple Blue Cross Blue Shield plans, regional health plans, TPAs, providers, and healthcare partners—SDS streamlines complex front, middle, and back-office operations.
Smart Data Solutions is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, sex, sexual orientation, gender identity, religion, national origin, disability, veteran status, age, marital status, pregnancy, genetic information, or other legally protected status.