Comparison of Different Feature Engineering Methods for Automated ICD Coding

Sponsor
China National Center for Cardiovascular Diseases (Other)
Overall Status
Active, not recruiting
CT.gov ID
NCT04849195
Collaborator
(none)
6,947
1
1
6820.9

Study Details

Study Description

Brief Summary

Using traditional machine learning classifiers, this study targets on comparing bag-of-words, word2cec and roberta on automated ICD coding related to cardiovascular diseases in Chinese corpus.

Condition or Disease Intervention/Treatment Phase
  • Other: No intervention

Detailed Description

ICD coding is quite important as it serves as basis for a wide range of economic and academic applications. Currently, manual coding is mainly adopted, which faces several limits like being time-consuming and prone to error, and this makes automated ICD coding via machine learning a hot research topic.

As an inevitable phase during machine learning, feature engineering plays a crucially important role in leading to promising coding performance. Although have reached enlightening conclusions, existing studies lacked comparison of different feature engineering methods. Finding out what methods under what circumstances perform better can be quite helpful in promoting practical applications of automated coding.

The investigators will implement this study based on inpatient' data collected from electronic medical records from Fuwai Hospital, the world's largest medical center for cardiovascular disease. Bag-of-words, word2cec and roberta will be respectively used to extracted features from training data. Then code-wise logistic regression classifiers and support vector machine classifiers will be trained to auto-assign codes. Afterwards, performances of the models on test data will be evaluated.

Study Design

Study Type:
Observational
Anticipated Enrollment :
6947 participants
Observational Model:
Other
Time Perspective:
Retrospective
Official Title:
Comparison of Different Feature Engineering Methods for Automated ICD Coding
Actual Study Start Date :
Mar 1, 2021
Anticipated Primary Completion Date :
Apr 1, 2021
Anticipated Study Completion Date :
Apr 1, 2021

Arms and Interventions

Arm Intervention/Treatment
Model training and test group

Data set will be split into training group and test group, where training group will be used for model building, and test group for subsequent evaluation and verification.

Other: No intervention
No intervention

Outcome Measures

Primary Outcome Measures

  1. ICD-10 codes for each admission [At the end of enrollment]

    Each admission will be a sample in this study. The ICD-10 codes assigned by medical coders for each admission will be collected as the primary outcome.

Eligibility Criteria

Criteria

Ages Eligible for Study:
N/A and Older
Sexes Eligible for Study:
All
Accepts Healthy Volunteers:
No
Inclusion Criteria:
  • Admissions in Fuwai Hospital, from January 1, 2019, to February 28, 2019
Exclusion Criteria:

Contacts and Locations

Locations

Site City State Country Postal Code
1 Fuwai Hospital Beijing Beijing China 100037

Sponsors and Collaborators

  • China National Center for Cardiovascular Diseases

Investigators

  • Principal Investigator: Wei Zhao, PhD, China National Center for Cardiovascular Diseases

Study Documents (Full-Text)

None provided.

More Information

Publications

None provided.
Responsible Party:
China National Center for Cardiovascular Diseases
ClinicalTrials.gov Identifier:
NCT04849195
Other Study ID Numbers:
  • 2021-1425-02
First Posted:
Apr 19, 2021
Last Update Posted:
Apr 19, 2021
Last Verified:
Apr 1, 2021
Individual Participant Data (IPD) Sharing Statement:
No
Plan to Share IPD:
No
Studies a U.S. FDA-regulated Drug Product:
No
Studies a U.S. FDA-regulated Device Product:
No
Additional relevant MeSH terms:

Study Results

No Results Posted as of Apr 19, 2021