Making open drug pricing data more accessible
If you build it, they might not come... unless it's super easy to use.
They didn’t teach us much (if anything) about drug pricing in pharmacy school. I’m not talking about “you should substitute generics for brands wherever possible to save the patient money” simplistic drug pricing concepts. I’m talking about concepts like MAC, WAC, AWP, SWP, spread, rebates, DIR fees, clawbacks, U&C, cash price, ASP, PBMs, discount cards, and NADAC. If that looks like alphabet soup to you, it’s fine. Just know that drug pricing in the US is overly complicated and likely intentionally obfuscated from the general public. Many of these pricing numbers are only available behind a paywall, and even then they are generally “made up numbers” that are only useful in the shell game that is US drug pricing.
That being said, there is one price point that is freely available and fairly intuitive: NADAC (or National Average Drug Acquisition Cost). Basically, it’s the average price that pharmacies in the US pay to acquire a drug. Centers for Medicare & Medicaid Services (CMS) contracts with an external firm to survey pharmacies about how much they pay to acquire drugs at an NDC level based on their invoice price. This is a voluntary survey and independent pharmacies are better represented than chain pharmacies (more about that and other caveats in 46brooklyn’s excellent Drug Pricing 101 podcast series). CMS then takes this data and uploads it to a Medicaid.gov Pharmacy Pricing website every week.
Great… So how do I use CMS NADAC data?
If you’re just curious what the most recent average price is for a medication, the data as presented on the CMS website is not the easiest to use. You have to take the following steps.
Click here to go to the Medicaid.gov Pharmacy Pricing website
Scroll past Federal Upper Limit pricing info
Click on the link for the current week and wait for a new page to load
Take a moment to figure out how to filter the table
Click the plus sign in the top right to add a filter
Change the Property field to NDC and delete the date that remains in the value field
Type in an NDC in NDC11 format (with no hyphens) and press enter or click Apply Filters
Click the Effective Date column twice to sort by date descending
The top row is the most recent average drug acquisition cost
Ignore duplicate rows that don’t get cleaned up in the data loading process every week
There has to be a better way!
As I always say, there’s a fine line between laziness and innovation. If I had to do this every time I wanted to know a drug price, I would instead spend time over-engineering an alternative to automate it for me somehow… which is more or less what we did.
Go to the SageRx NADAC Pricing website (demo below)
Type in an NDC in NDC11 format (with no hyphens) and press enter or click the search button
Review the data (automatically updated weekly)
Most recent NADAC price per unit
Interactive historical price trend line chart
Table with all of the pricing data, including dollar change and percent change between each effective date
Looks pretty simple - must have been easy
Heh... Well it wasn’t that hard, but there were a few things that went into building this.
Extracting and loading the data
As we mentioned in our last article, the CSV file names change every week and the naming convention is not consistent. So we first use the API available from CMS to get the correct filename.
Once we have the filename, we can use a Python script we wrote to automatically download the file and SQL scripts to:
Load the columns from the CSV source into a Postgres database
Create a staging table to store an enhanced version of the data (see next section)
We use Airflow to automate all of this - it’s basically a task scheduler on steroids. It’s configured to automatically start up every week on Thursday and run the Python and SQL scripts to load the current week’s data into the database. Set it and forget it.
Since NADAC has data back from 2013, we also configured Airflow to go back in the archives and get all the data from 2013 until today - not just the current week or month.
Transforming the data
As mentioned before, the data is severely duplicated as it exists in the CSV files that come from CMS. So the first thing we do is de-duplicate it.
Since we have many years’ worth of data, we wanted it to be easy to find the first price ever reported and also the most recent price reported. So we added a column for first_price
and a column for current_flag
.
Additionally, to make it easy to know how much a price increased or decreased since the last time it was reported, we added columns for dollar_change
and percent_change
.
Creating an API
We wanted the data in the Postgres database exposed so that we could create a frontend. So we threw together a basic FastAPI setup with an endpoint that would work with our frontend to return the appropriate data.
Creating a frontend search tool and visualization
Building user interfaces for a frontend is tedious and detail-oriented work with nearly unlimited options for tools to use. We stuck with the tools that we’re familiar with and which seem to be widely supported by a development community. The overall look and feel was inspired by stock market summary visualizations.
React - The main framework that is the foundation for the website / webapp functionality. Originally created by the Facebook team, but widely adopted by developers building user interfaces.
Material UI - Has excellent user interface components that follow Google’s Material Design philosophy. We use it for various parts of the website, but particularly for the input form and the data table.
Nivo - There’s a variety of web-based visualization tools, most of which built off of D3.js. We liked Nivo because of the consistent look and feel of the components and the variety of visualization components available.
Data nerds read on
Because we load the NADAC data into the SageRx database which also contains enhanced RxNorm data, we can easily write queries to filter and group pricing data by normalized RxNorm concepts.
For instance, let’s say we want to evaluate the claim that a banana tablet could cost $10. We could just try to string match the NADAC ndc_description
on things like “tablet” and “tab”, but that would be pretty unreliable and inconsistent. Instead, we can join NADAC data to RxNorm data and find products that have a normalized dose form (TTY = DF) of “Oral Tablet” (NDC → Clinical Product → Dose Form). Then we can find all the “Oral Tablet” products that have a NADAC price per unit (per “EA”) of at least $10.
As you can see below, there are approximately 762 NDCs that have an average acquisition cost of greater than or equal to $10 / oral tablet. I guess Lucille actually under-estimated the cost of a tablet in many cases. Note that some of these NDCs don’t have updated NADAC prices in the last 2 years, and since the SageRx search tool is filtered to only the past 2 years of data, they won’t return results.
What’s next
This is only one example of what can be done with cleaned up, enhanced, aggregated open drug data. Hopefully it helps to give you an idea of the possibilities. In terms of what we could do to make this specific NADAC Drug Pricing search tool use case better, we have a few thoughts.
Allow users to expand the lookback period of the visualization past 2 years, all the way back to 2013 if needed (would just require a UI change and an API change)
Allow users to download the data in CSV format from the data table (easily doable with the Material UI data grid component)
Allow users to search by ingredient / clinical product instead of by NDC to compare prices of NDCs across groups (more complicated, but doable)
Allow users to enter non-NDC11 format NDCs and automatically convert to NDC11 format for the search (a general feature we will eventually build out)
Calculate the cost per fill or cost per package instead of just displaying the cost per unit (would require either knowing pack size per NDC - which is more complicated to do than you might think - or a simple user input to enter a number to multiply by)
What other enhancement ideas do you have for the SageRx NADAC Pricing search? What other ideas do you have for use cases for SageRx in general? Drop us a comment with your thoughts! 🌿