Abstract
Efficient identification of patient, intervention, comparison, and outcome (PICO) components in medical articles is helpful in evidence-based medicine. The purpose of this study is to clarify whether first sentences of these components are good enough to train naive Bayes classifiers for sentence-level PICO element detection. We extracted 19,854 structured abstracts of randomized controlled trials with any P/I/O label from PubMed for naive Bayes classifiers training. Performances of classifiers trained by first sentences of each section ( CF) and those trained by all sentences ( CA) were compared using all sentences by ten-fold cross-validation. The results measured by recall, precision, and F-measures show that there are no significant differences in performance between CF and CA for detection of O-element ( F-measure. = 0.731. ±. 0.009 vs. 0.738. ±. 0.010, p= 0.123). However, CA perform better for I-elements, in terms of recall (0.752. ±. 0.012 vs. 0.620. ±. 0.007, p<. 0.001) and F-measures (0.728. ±. 0.006 vs. 0.662. ±. 0.007, p<. 0.001). For P-elements, CF have higher precision (0.714. ±. 0.009 vs. 0.665. ±. 0.010, p<. 0.001), but lower recall (0.766. ±. 0.013 vs. 0.811. ±. 0.012, p<. 0.001). CF are not always better than CA in sentence-level PICO element detection. Their performance varies in detecting different elements.
Original language | English |
---|---|
Pages (from-to) | 940-946 |
Number of pages | 7 |
Journal | Journal of Biomedical Informatics |
Volume | 46 |
Issue number | 5 |
DOIs | |
Publication status | Published - Oct 2013 |
Keywords
- Evidence-based medicine
- Information extraction
- Information retrieval
- Natural language processing
- Question answering
- Text mining
ASJC Scopus subject areas
- Computer Science Applications
- Health Informatics