AI Assisted Detection of Fractures on X-Rays (FRACT-AI)
Study Details
Study Description
Brief Summary
This study has been added as a sub study to the Simulation Training for Emergency Department Imaging 2 study (ClinicalTrials.gov ID NCT05427838). This work aims to evaluate the impact of an Artificial Intelligence (AI)-enhanced algorithm called Boneview on the diagnostic accuracy of clinicians in the detection of fractures on plain XR (X-Ray). The study will create a dataset of 500 plain X-Rays involving standard images of all bones other than the skull and cervical spine, with 50% normal cases and 50% containing fractures. A reference 'ground truth' for each image to confirm the presence or absence of a fracture will be established by a senior radiologist panel. This dataset will then be inferenced by the Gleamer Boneview algorithm to identify fractures. Performance of the algorithm will be compared against the reference standard. The study will then undertake a Multiple-Reader Multiple-Case study in which clinicians interpret all images without AI and then subsequently with access to the output of the AI algorithm. 18 clinicians will be recruited as readers with 3 from each of six distinct clinical groups: Emergency Medicine, Trauma and Orthopedic Surgery, Emergency Nurse Practitioners, Physiotherapy, Radiology and Radiographers, with three levels of seniority in each group. Changes in reporting accuracy (sensitivity, specificity), confidence, and speed of readers in two sessions will be compared. The results will be analyzed in a pooled analysis for all readers as well as for the following subgroups: Clinical role, Level of seniority, Pathological finding, Difficulty of image. The study will demonstrate the impact of an AI interpretation as compared with interpretation by clinicians, and as compared with clinicians using the AI as an adjunct to their interpretation. The study will represent a range of professional backgrounds and levels of experience among the clinical element. The study will use plain film x-rays that will represent a range of anatomical views and pathological presentations, however x-rays will present equal numbers of pathological and non-pathological x-rays, giving equal weight to assessment of specificity and sensitivity. Ethics approval has already been granted, and the study will be disseminated through publication in peer-reviewed journals and presentation at relevant conferences.
Condition or Disease | Intervention/Treatment | Phase |
---|---|---|
|
Study Design
Arms and Interventions
Arm | Intervention/Treatment |
---|---|
Readers/participants Reader Selection: 18 readers will be selected from the following five clinical specialty groups (3 readers each): Emergency Medicine Trauma and Orthopaedic Surgery Emergency Nurse Practitioners Physiotherapy General Radiology Radiographers And from the following level of seniority/experience: Consultant/Senior/Equivalent - >10yrs experience Middle Grade/Registrar/Equivalent - 5-10yrs experience Junior Grade/Senior House Officer/Equivalent - <5yrs experience Each specialty reader group will include 1 reader at each level of experience. Readers will be recruited from across 5 NHS organisations which comprise the Thames Valley Emergency Medicine Research Network (www.TaVERNresearch.org): Oxford University Hospitals NHS Foundation Trust Royal Berkshire NHS Foundation Trust Buckinghamshire Healthcare NHS Trust Frimley Health NHS Foundation Trust Milton Keynes University Hospital NHS Foundation Trust |
Other: Cases reading
The reading will be done remotely via the Report and Image Quality Control site (www.RAIQC.com), an online platform allowing medical imaging viewing and reporting. Participants can work from any location, but the work must be done from a computer with internet access. For avoidance of doubt, the work cannot be performed from a phone or tablet.
The project is divided into two phases and participants are required to complete both phases. The estimated total involvement in the project is up to 20-24 hours.
Phase 1: Time allowed: 2 weeks
- Participants must review 500 X-rays and express a clinical opinion through a structured reporting template (multiple choice, no open text required).
Rest/washout period - Time allowed: 4 weeks, to mitigate the effects of recall bias.
Phase 2 - Time allowed: 2 weeks
- Review 500 X-rays together with an AI report for each case and express their clinical opinion through the same structured reporting template used in Phase 1.
|
Ground truthers Two consultant musculoskeletal radiologists. A third senior musculoskeletal radiologist's opinion (>20 years experience) will undertake arbitration. |
Other: Ground truthing
Two consultant musculoskeletal radiologists will independently review the images to establish the 'ground truth' findings on the XRs, where a consensus is reached this will then be used as the reference standard. In the case of disagreement, a third senior musculoskeletal radiologist's opinion (>20 years experience) will undertake arbitration. A difficulty score will be assigned to each abnormality by the ground truthers using a 4-point Likert scale (1 being easy/obvious to 4 being hard/poorly visualised).
|
Outcome Measures
Primary Outcome Measures
- Performance of AI algorithm: sensitivity [During 4 weeks of reading time]
Evaluation of the Gleamer Boneview algorithm will be performed comparing it to the reference standard in order to determine sensitivity.
- Performance of AI algorithm: specificity [During 4 weeks of reading time]
Evaluation of the Gleamer Boneview will be performed comparing it to the reference standard in order to determine specificity.
- Performance of AI algorithm: Area under the ROC Curve (AU ROC) [During 4 weeks of reading time]
Evaluation of the Gleamer Boneview algorithm will be performed comparing it to the reference standard. Continuous probability score from the algorithm will be utilised for the ROC analyses, while binary classification results with a predefined operating cut-off will be used for evaluation of sensitivity, specificity, positive predictive value, and negative predictive value.
- Performance of readers with and without AI assistance: Sensitivity [During 4 weeks of reading time]
The study will include two sessions (with and without AI overlay), with all 18 readers reviewing all 500 XR cases each time separated by a washout period to mitigate recall bias. The cases will be randomised between the two reads and for every reader.
- Performance of readers with and without AI assistance: Specificity [During 4 weeks of reading time]
The study will include two sessions (with and without AI overlay), with all 18 readers reviewing all 500 XR cases each time separated by a washout period to mitigate recall bias. The cases will be randomised between the two reads and for every reader.
- Performance of readers with and without AI assistance: Area under the ROC Curve (AU ROC) [During 4 weeks of reading time]
The study will include two sessions (with and without AI overlay), with all 18 readers reviewing all 500 XR cases each time separated by a washout period to mitigate recall bias. The cases will be randomised between the two reads and for every reader.
- Reader speed with vs without AI assistance. [During 4 weeks of reading time]
Mean time taken to review a XR, with vs without AI assistance.
Eligibility Criteria
Criteria
Inclusion Criteria:
-
Emergency medicine doctors, trauma and orthopaedic surgeons, emergency nurse practitioners, physiotherapists, general radiologists and radiographers reviewing X-rays as part of their routine clinical practice.
-
Currently working in the National Health Service (NHS).
Exclusion Criteria:
-
Non-radiology physicians with previous formal postgraduate XR reporting training.
-
Non-radiology physicians with previous career in radiology
Contacts and Locations
Locations
Site | City | State | Country | Postal Code | |
---|---|---|---|---|---|
1 | Oxford University Hospitals NHS Foundation Trust | Oxford | Oxfordshire | United Kingdom | OX3 9DU |
Sponsors and Collaborators
- Oxford University Hospitals NHS Trust
- Gleamer
Investigators
None specified.Study Documents (Full-Text)
None provided.More Information
Additional Information:
- 3. Clinical negligence claims in Emergency Departments in England. Report 2 of 3: Missed fractures. NHS Resolution. March 2022
- 11. The NICE Evidence Standards Framework for digital health and care technologies. (ECD7) Last Updated: 9 August
- 12. Emergency Medicine Refresh Top 10
Publications
- Blazar E, Mitchell D, Townzen JD. Radiology Training in Emergency Medicine Residency as a Predictor of Confidence in an Attending. Cureus. 2020 Jan 9;12(1):e6615. doi: 10.7759/cureus.6615.
- Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau NG, Venugopal VK, Mahajan V, Rao P, Warier P. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018 Dec 1;392(10162):2388-2396. doi: 10.1016/S0140-6736(18)31645-3. Epub 2018 Oct 11.
- Donaldson LJ, Reckless IP, Scholes S, Mindell JS, Shelton NJ. The epidemiology of fractures in England. J Epidemiol Community Health. 2008 Feb;62(2):174-80. doi: 10.1136/jech.2006.056622.
- Duron L, Ducarouge A, Gillibert A, Laine J, Allouche C, Cherel N, Zhang Z, Nitche N, Lacave E, Pourchot A, Felter A, Lassalle L, Regnard NE, Feydy A. Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study. Radiology. 2021 Jul;300(1):120-129. doi: 10.1148/radiol.2021203886. Epub 2021 May 4.
- Fenton JJ, Taplin SH, Carney PA, Abraham L, Sickles EA, D'Orsi C, Berns EA, Cutter G, Hendrick RE, Barlow WE, Elmore JG. Influence of computer-aided detection on performance of screening mammography. N Engl J Med. 2007 Apr 5;356(14):1399-409. doi: 10.1056/NEJMoa066099.
- Hussain F, Cooper A, Carson-Stevens A, Donaldson L, Hibbert P, Hughes T, Edwards A. Diagnostic error in the emergency department: learning from national patient safety incident report analysis. BMC Emerg Med. 2019 Dec 4;19(1):77. doi: 10.1186/s12873-019-0289-3.
- National Clinical Guideline Centre (UK). Fractures (Non-Complex): Assessment and Management. London: National Institute for Health and Care Excellence (NICE); 2016 Feb. Available from http://www.ncbi.nlm.nih.gov/books/NBK344251/
- Obuchowski NA, Bullen J. Multireader Diagnostic Accuracy Imaging Studies: Fundamentals of Design and Analysis. Radiology. 2022 Apr;303(1):26-34. doi: 10.1148/radiol.211593. Epub 2022 Feb 15.
- Patel MR, Norgaard BL, Fairbairn TA, Nieman K, Akasaka T, Berman DS, Raff GL, Hurwitz Koweek LM, Pontone G, Kawasaki T, Sand NPR, Jensen JM, Amano T, Poon M, Ovrehus KA, Sonck J, Rabbat MG, Mullen S, De Bruyne B, Rogers C, Matsuo H, Bax JJ, Leipsic J. 1-Year Impact on Medical Practice and Clinical Outcomes of FFRCT: The ADVANCE Registry. JACC Cardiovasc Imaging. 2020 Jan;13(1 Pt 1):97-105. doi: 10.1016/j.jcmg.2019.03.003. Epub 2019 Mar 17.
- Smith BJ, Hillis SL. Multi-reader multi-case analysis of variance software for diagnostic performance comparison of imaging modalities. Proc SPIE Int Soc Opt Eng. 2020 Feb;11316:113160K. doi: 10.1117/12.2549075. Epub 2020 Mar 16.
- Snaith B, Hardy M. Emergency department image interpretation accuracy: The influence of immediate reporting by radiology. Int Emerg Nurs. 2014 Apr;22(2):63-8. doi: 10.1016/j.ienj.2013.04.004. Epub 2013 May 30.
- van Leeuwen KG, Schalekamp S, Rutten MJCM, van Ginneken B, de Rooij M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur Radiol. 2021 Jun;31(6):3797-3804. doi: 10.1007/s00330-021-07892-z. Epub 2021 Apr 15.
- York TJ, Jenkins PJ, Ireland AJ. Reporting Discrepancy Resolved by Findings and Time in 2947 Emergency Department Ankle X-rays. Skeletal Radiol. 2020 Apr;49(4):601-611. doi: 10.1007/s00256-019-03317-7. Epub 2019 Nov 21.
- 310995-C