Data Science
Use Cases
Development of the "Product Finder" for a Generics Manufacturer
through the Use of Web Crawling, Text Mining and Power BI

Project scope

The project goal was to develop a tool called “Product Finder“. Considering the expiration of a patent the tool should support Manufacturers of generic drugs to easily search for active ingredients, therapeutic areas and drugs that could be added to the development pipeline of a generic drug manufacturer. The market attractiveness of the potential active pharmaceutical ingredients should be determined as well.
A user-friendly dashboard interface was supposed to visualize the information.

Data sets

Approximately 1,250 entries of the European Medicines Agency (EMA)¹ database were collected on various subsites using a web crawler and provided the first dataset.
The database of the U.S. Food and Drug Administration (FDA)², which can be downloaded as *.csv file, included 95,000 entries and served as the second dataset for the “Product Finder”.
Another dataset resulted from the text mining applied to both datasets and comprised the most important keywords.

Challenges & Solutions

The data of the EMA database was collected using a web crawler. Text mining was applied to the EMA dataset to extract the most important keywords.
One of the biggest challenges was the data quality contained in the FDA database. The data needed a lot of cleaning and wrangling to be usable in the dashboard application.
Additionally, the databases of the FDA and EMA had a very different level of granularity. The two datasets were consolidated into one, using filters, sorting and mapping algorithms to facilitate their integration into an application in Power BI.
We also determined the market attractiveness and assumed that active ingredients get more and more attractive the less generics exist in a specific therapeutic area - the lower the ratio, the higher the attractiveness of the particular generic.

Project outcome

The “Product Finder“ supports manufacturers of generic drugs searching for expiring drug patents and helps to identify attractive active ingredients that can be added to the development pipeline.
The data is presented in an easy-to-use dynamic dashboard that supports an effective search process. Depending on the individual use case varying user-friendly dashboards can be selected and displayed. Dynamic filters help to extract a list of expiring patents, supported by drilldowns into further details of an active ingredient, therapeutic areas or products of a specific manufacturer. Like this the “Product Finder” allows you to monitor the activities of your competitors.
Due to the technical flexibility of the “Product Finder” additional databases can be integrated easily into the application and dashboards can be designed according to any additional customer need.
1 European Medicines Agency: European public assessment reports. Human medicines. July 2018.
2 U.S. Food and Drug Administration: Medical Device Databases. July 2018.