There's so much open drug data available online for free and maintained by different areas of the federal government. NLM RxNorm and DailyMed. CMS NADAC and Medicaid Utilization. FDA NDC Directory and Drug Enforcement Reports. These are just a few examples off the top of my head, but there are dozens more.
We spent the last couple of years learning as much as we could about these data sources and using them in our work wherever possible. In the process of doing so, we experienced many of the pain points associated with using any variety of data sources - but specifically with open drug data. These are things like different file formats, different update intervals, and non-standardized formatting of common fields like NDCs - just to name a few.
This led to us falling backwards down a rabbit hole of discovering and learning as much as we could about data engineering. Early on, we realized that applying some basic data engineering principles to these open drug data sources could address many of these pain points. After a while, we started taking it a step further and transforming the raw data with our clinical and domain knowledge in pharmacy (did I mention we're pharmacists who code?). Now, it feels like it could either be the start of something big or at least something really useful for a few groups of people.
We built a platform of one-click data pipelines that that can automatically extract up-to-the-day current open drug data and not only load it all into a common database so it’s easier to work with, but also transform it into something greater than the sum of its parts.
Oh - and we're open sourcing it all. https://github.com/coderxio/sagerx
We've already built pipelines for the things that we use frequently (RxNorm, FDA NDC Directory, NADAC, etc), but there's so much more potential. Our biggest challenge with this project has been focus. There so many more data sources that could be loaded and transformed, but if we spend a lot of time taking the effort to build one of these pipelines and nobody knows about it or uses it, then what's the point?
We want to explain what we've been working on and how it works. We want to open the doors and let people poke around. And we want people to try it out. And if somebody really likes it, we want to focus our efforts on building something that makes them like it even more.
This is different from the commercial drug database you might be familiar with.
For one, not only is the code and SQL to do the data transformations completely open source, the documentation is also open and written by pharmacist / developer hybrids who know how to translate pharmacy domain knowledge into developer-friendly concepts.
Second, it is fairly lightweight, easy to spin up (using Docker), and pretty much runs itself. Even not-super-technical people can add their own custom data transformations just by writing some SQL. And - if you think your work could benefit others - you could even contribute a pull request to the overall open-source SageRx project.
Lastly, at its core, SageRx is based on open common standards that promote interoperability - instead of licensed, proprietary coding systems that make it difficult to share data between organizations.
To be clear, we’re not a huge organization of people scrubbing the source data and phoning manufacturers to fill in gaps… but it’s not our intention to be that. We want to build something sustainable with very little overhead that might make drug data more accessible and understandable for people that need to work with it. Our hope is that SageRx can benefit (at the very least):
Startup founders
Researchers
Data analysts
Maybe you?
If any of this interests you, please star the repo, join our Slack, and/or shoot us an email. Oh - and please be patient as we try to get our documentation in order over the next few weeks. If you have questions or need help getting started in the meantime, the #proj-sagerx channel of our Slack is an excellent resource for support.
Stay tuned for more fun-filled SageRx content in the near future. Expect to see some documentation on SageRx as well as tutorials and example queries. Thanks for reading and let us know if you have any questions or feedback in the comments!
🌿