Staff AI Engineer (Orchestration)
Who are Heidi?
Heidi is building an AI Care Partner that supports clinicians every step of the way, from documentation to delivery of care.
We exist to double healthcareβs capacity while keeping care deeply human. In 18 months, Heidi has returned more than 18 million hours to clinicians and supported over 73 million patient visits.
Today, more than two million patient visits each week are powered by Heidi across 116 countries and over 110 languages.
Founded by clinicians, Heidi brings together clinicians, engineers, designers, scientists, creatives, and mathematicians, working with a shared purpose: to strengthen the human connection at the heart of healthcare.
Backed by nearly $100 million in total funding, Heidi is expanding across the USA, UK, Canada, and Europe, partnering with major health systems including the NHS, Beth Israel Lahey Health, MaineGeneral, and Monash Health, among others.
We move quickly where it matters and stay grounded in whatβs proven, shaping healthcareβs next era.
Ready for the challenge?
The Role
You will operate as a Staff+ AI scientist/engineer on the Orchestration team. You will own the design and delivery of a clinicianβgrade retrieval and questionβanswering stack across data ingestion, indexing, ranking, grounding, and safe deployment.
You will set technical direction, establish quality bars, and lead crossβfunctional execution with engineering, product, and clinical experts.
You will move between research and production, turning prototypes into reliable services with clear SLAs, traceable outputs, and unit/acceptance metrics that matter in clinical contexts.
What youβll do:
- Define the endβtoβend architecture for literature and guideline ingestion, normalization, metadata extraction, deβduplication, and versioning. 
- Build hybrid search and retrieval: lexical + vector + reβranking, with tight latency budgets and cost controls. 
- Design grounding and answer synthesis that cite sources, preserve provenance, and expose confidence and abstention. 
- Lead model work across prompting, fineβtuning, distillation, and tool use to improve faithfulness, coverage, and utility. 
- Stand up goldβstandard evaluation: offline IR metrics (nDCG, MAP, recall), factuality/faithfulness audits, and human review with adjudication. 
- Run online experiments at scale. Define guardrails, KPIs, and ship A/Bs to measure impact on clinician workflows. 
- Productionize services with observability, tracing, canaries, rollbacks, and incident playbooks. 
- Set data governance for medical content: access control, PHI handling, audit logs, and retention policies. 
- Partner with clinicians to define intents, schemas, and acceptance criteria. Convert ambiguous questions into testable specs. 
- Coach engineers and scientists. Raise the technical bar through design docs, reviews, and reusable components. 
What we will look for:
- Staffβlevel track record shipping search, NLP, or LLM systems that serve real users at scale. 
- Mastery of Python and SQL. Strong software engineering fundamentals, testing strategy, and API/service design. 
- Depth in modern IR/NLP: embeddings, ANN indexes, reβrankers, retrievalβaugmented generation, and prompt/program synthesis. 
- Experience building data pipelines: parsing PDFs/HTML, OCR when needed, metadata extraction, and content hashing/versioning. 
- Familiarity with PyTorch, plus distributed training/inference patterns. 
- MLOps and reliability: containers, Kubernetes, feature/model registries, experiment tracking, monitoring, and alerting. 
- Evidence of rigorous evaluation design: offline metrics, humanβinβtheβloop judging, power analysis for online tests. 
- Clear thinking on safety: hallucination controls, calibration, abstention, redβteaming, and privacy/security by design. 
- Ability to lead crossβfunctional initiatives and make crisp decisions with incomplete information. 
Bonus:
- Search relevance expertise for longβform, citationβheavy domains. 
- Knowledge of biomedical ontologies and standards (e.g., SNOMED CT, UMLS, ICD, RxNorm, FHIR). 
- Prior work with literature and guideline corpora, deβduplication, and document lineage tracking. 
- Experience with hybrid retrieval stacks (e.g., BM25 + ANN) and learned reβrankers. 
- Familiarity with clinical evaluation methods, EBM hierarchies, and annotation workflows. 
- Strong costβperformance tuning for LLM inference, caching, and batching in production. 
What do we believe in?
- We create unconventional solutions to difficult problems and we build them fast. We want you to set impossible goals and make them happen, think landing a rocket but the medical version. 
- You'll be surrounded by a world-class team of engineers, medicos and designers to do your best work, inspired by our shared beliefs: - We will stop at nothing to improve patient care across the world. 
- We design user experiences for joy and ship them fast. 
- We make decisions in a flat hierarchy that prioritizes the truth over rank. 
- We provide the resources for people to succeed and give them the freedom to do it. 
 
Why you will flourish with us π?
- Flexible hybrid working environment, with 3 days in the office. 
- Additional paid day off for your birthday and wellness days 
- Special corporate rates at Anytime Fitness in Melbourne, Sydney tbc. 
- A generous personal development budget of $500 per annum 
- Learn from some of the best engineers and creatives, joining a diverse team 
- Become an owner, with shares (equity) in the company, if Heidi wins, we all win 
- The rare chance to create a global impact as you immerse yourself in one of Australiaβs leading healthtech startups 
- If you have an impact quickly, the opportunity to fast track your startup career! 
Help us reimagine primary care and change the face of healthcare in Australia and then around the world.