top of page
  • Cathleen Beliveau

Using ICD-10-CM codes to detect illicit substance use: A comparison with retrospective self-report

Post written by Cathleen Beliveau, Figures by Irene Liu


The Center on Substance Use and Health conducted the Transitions study in 2017 and 2018. The study included 602 people who had been prescribed opioid pain medication for chronic non-cancer pain, and who were either publicly insured or uninsured patients in San Francisco. The data from the Transitions study was used in a paper recently published in the journal Drug and Alcohol Dependence, and sheds light on how data from electronic medical records can be used to learn about patterns of substance use in the general population. The authors asked the question: can ICD-10-CM codes be used to accurately detect substance use?


ICD-10-CM stands for International Classification of Disease, 10th Revision, Clinical Modification. According to the CDC website, these ICD codes are ways to specifically identify health conditions. This data can be used to better understand health trends more broadly (e.g. what medical conditions are being reported most frequently?) and to conduct specific research, such as in the case of this paper. ICD-10-CM codes are published by the WHO, and are added to a patient’s medical record by individual health care providers to classify what happened in a visit (for billing purposes as well as research/surveillance activities).

The authors of this paper compared self-reported substance use history described during interviews to ICD-10-CM codes collected from charts.


Study participants came in for a single visit where they had an in-depth conversation with a research associate about their medication and substance use from 2012 to the date of their visit – using a technique called “historical reconstruction” in which participants described their substance use for each calendar quarter since 2012. This technique is described in the paper:

“The historical reconstruction interview procedure involved constructing a visual display of a personal timeline from 2012 to the date of the study visit using major autobiographical (e.g., marriage, deaths, housing transitions) and societal (e.g., natural disasters, sporting outcomes, local news) events and using this display as a basis to reconstruct the participant’s illicit substance use patterns by calendar quarter.”


After those interviews, other research staff gathered data from participants’ medical charts. This involved manually collecting data from medical charts, as well as gathering data through a “pull” for all of the patients at once. In this analysis, authors used the ICD-10-CM billing codes. Only codes from October 2015 onwards were used because prior to that ICD-9-CM codes were used. Codes chosen for the analysis referred to disorders and poisoning (overdose) events associated with methamphetamine, cocaine, and opioids.

In analyzing the data, authors looked at the sensitivity and specificity of ICD-10-CM codes compared to the self-reported data on substance use.


  • Specificity in this article refers to the ability of the ICD-10-CM codes to correctly identify when people use substances (e.g. methamphetamine, cocaine, or non-prescribed opioids). This can also be thought of as “ruling in” substance use if the patient has an ICD-10-CM code related to substance use. If specificity of these codes is high, that means there will be very few false positives, or people counted as using substances who did not actually use them.

  • Sensitivity in this article refers to the ability of the ICD-10-CM codes to correctly detect who does not use substances (e.g. methamphetamine, cocaine, or non-prescribed opioids). This can be thought of as “ruling out” substance use. If the sensitivity of these codes is high, that means there will be very few false negatives, or people weren’t counted but who actually used substances.

See the figures below for additional explanation of the ideas of sensitivity and specificity in this analysis.

Fig. 1: The vertical lines represent the diagnostic test threshold, or how sensitive/specific the test is.

At line A, the test is at 100% sensitivity, meaning ICD-10 codes would be present for everyone who uses substances, or everyone under the blue line. There are no false negatives. At line B, the test is at 100% specificity, meaning ICD-10 codes would be absent for everyone who does not use substances, or everyone under the orange line. There are no false positives.


In practice, there is a tradeoff between sensitivity and specificity. In this paper, the specificity was 85-97% and the sensitivity was 35-50%, so our test threshold would be somewhere in between the center and line B.


Fig. 2: Table showing test result vs substance use status by self-report. Sensitivity and specificity may be calculated using the equations shown above. Again, sensitivity is the probability of an ICD-10 code present given substance use; specificity is the probability of no ICD-10 code given no substance use.

The results of this paper tell us that the specificity of ICD-10-CM codes is fairly high, ranging from 85% to 97% depending on the substance. Sensitivity was fairly low, ranging from 35% to 50%. This means that these codes will allow someone to accurately rule in substance use for a patient, but will not be particularly useful for ruling out substance use.

The authors found that both sensitivity and specificity were higher for methamphetamine than other substances. They also found that sensitivity was higher for people who reported more frequent use. This makes sense, since people who use substances more frequently are probably going to have more healthcare encounters involving that substance than people who use it less frequently.


This analysis is important in that it can help people who work in public health better understand the tools we have to learn about community trends. The widespread use of electronic health records by healthcare systems is relatively new (in the past 10-20 years), and has exciting implications for the ability of public health researchers to understand what health conditions communities are facing at a given time. This paper will help future researchers and public health programs to understand the strengths and limitations of data from electronic health records, in order to support improved health conditions of a population.


Full citation:

Rowe CL, Santos GM, Kornbluh W, Bhardwaj S, Faul M, Coffin PO. Using ICD-10-CM codes to detect illicit substance use: A comparison with retrospective self-report. Drug Alcohol Depend. 2021 Apr 1;221:108537.


Want to learn more? You can find a list of all of CSUH’s publications here and the major findings of each study are summarized on our website here!

163 views4 comments

Recent Posts

See All

How a little pill created a big change for me...

John Farley, Recruitment and Community Relations Manager February 2nd, 2023 Our Say When results were recently published which showed statistically significant promise that oral naltrexone taken on an

I am home

Sam Torres, Research Associate January 7th, 2015. I was curled up on her lap watching TV with friends. Her fingers fluttered through my hair like a butterfly floating through a fall breeze. Courage ov

bottom of page