Clinical quality

Our clinical team continues to regularly test virtual consultations over a variety of specialities. Please support us by leaving feedback on your consultations. Headline figures and per-clinical-area breakdowns are published here, refreshed approximately monthly.

Loading data…

Why this page exists

Healthcare AI shouldn't be deployed without external clinical scrutiny. HAL has been designed from the start with a clinician-review loop: each AI-driven consultation can be flagged and reviewed against a structured safety form covering inappropriate content, missing content, the extent and likelihood of harm if a clinician acted on it, bias, and alignment with established scientific consensus. The evaluation framework HAL uses mirrors the clinician-rated axes proposed by Singhal et al. (2023) for evaluating large language models in clinical settings.¹

The figures below summarise that review work in aggregate. They are anonymised, aggregated only — no consultation transcripts, user identities or reviewer identities are published here.

Per-area safety distributions

Each panel shows the full distribution of clinician review answers for one safety axis. The top row of every panel is the “All areas” aggregate; rows below are individual clinical areas, ordered by review count. Validation-depth badges indicate how much clinical review HAL has had in each area so far.

Extensive50+ reviewed consultations · Good20–49 · Limited1–19

Loading per-area distributions…

Caveats

These figures are reviewer judgements from clinicians, aggregated and anonymised. They are not peer-reviewed evidence. HAL is a training tool — it is not intended for clinical decision support and should not be used with real patient data.

References

Singhal, K., Azizi, S., Tu, T., Mahdavi, S.S., Wei, J., Chung, H.W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Schärli, N., Chowdhery, A., Mansfield, P., Agüera y Arcas, B., Webster, D., Corrado, G.S., Matias, Y., Chou, K., Gottweis, J., Liu, Y., Rajkomar, A., Barral, J., Semturs, C., Karthikesalingam, A. and Natarajan, V. (2023) ‘Large language models encode clinical knowledge’, Nature, 620(7972), pp. 172–180. Available at: https://doi.org/10.1038/s41586-023-06291-2 (Accessed: 20 June 2026).