Points of View Current Status of Secondary Use of Medical Information in the Pharmaceutical Industry
Hodai Okada, Senior Researcher, Pharmaceutical Industry Policy Institute
SUMMARY
- The Ministry of Health, Labour and Welfare (MHLW) has begun to study the secondary use of medical information. Since secondary use has already started in domestic and overseas pharmaceutical companies, we will investigate examples of use in the pharmaceutical industry.
- The characteristics of academic papers that include as authors employees of the top five companies in each of the top five pharmaceutical business sales in the world and Japan were identified.
- When the sources of anonymized medical information used by pharmaceutical companies in their research for publication in papers were identified by country, the most used source was the United States, accounting for the majority of the information. Compared to countries outside the U.S., information from Japan was also used more frequently.
- It was inferred that the most frequently used information was large-scale information that can be used for research on various diseases, mainly claims data, and disease-specific information that contains detailed information that is difficult to structure.
- The discussion on secondary use of Japanese public databases currently under consideration should take into account the differentiation from existing information, and consider the improvement of the environment for use and the expansion of information.
1. Introduction
The Headquarters for Promotion of Medical DX established in the Cabinet released a process chart on the promotion of medical DX in June 20231). 1 ) One of the items on the timetable is the "development of an environment for the secondary use of medical information." In November, a "working group on the secondary use of medical and other information" was newly established under the Health, Medical Care and Nursing Information Utilization Study Group of the Ministry of Health, Labor and Welfare, and studies have begun to develop the environment for the secondary use of medical and other information2). 2) Secondary use of medical information refers to the use of information obtained through routine medical treatment for purposes other than the original purpose for which the information was obtained, such as public health or new drug development. The working group intends to study the direction of the legal system and the development of an information collaboration infrastructure, focusing on public medical databases, but pharmaceutical companies have already begun to use private commercial databases. This paper presents information that will serve as a reference for the promotion of secondary use of medical information in Japan by confirming the characteristics of the current usage of medical information by pharmaceutical companies and the purposes for which it is used, based on publicly available academic papers.
Survey Methodology
To investigate the actual status of secondary use of medical information by pharmaceutical companies from publicly available information, we confirmed the characteristics of academic papers published from 2021 to 2022 that included employees of the top five companies (in FY2021) 3) in each of the world and Japan in terms of pharmaceutical business sales as authors. The Web of ScienceⓇ Clarivate topic search was used to search for articles (search terms: "Real World" OR "Medical Record" OR "Health Record" OR "Electronic Data" OR "Administrative" OR "Claim" OR "Receipt" OR "Commercial", "Commercially " OR "Database"). Among the articles obtained from the search, original articles reporting the results of studies conducted with secondary use of medical information were selected for the survey. Methods of protecting personal information in patient-based medical information for secondary use by pharmaceutical companies can be broadly classified into two types: consent from the patient and anonymization4, 5). Some of the use of biobank information, which is constructed based on consent, has been reported in the past6), and this paper focuses on the use of anonymized information, which is used relatively frequently by the pharmaceutical industry, and was investigated in this paper. After reviewing the method or the section on research ethics in each article, we extracted only those studies that did not require the consent of patients or other information providers and that used information on an individual basis, in which personal information was processed in accordance with the regulations of each country, and studies using biobanks or registries that were constructed with consent were not included in this report. In "3-4. Intended Use of Information," and subsequent sections, only information on the top five global pharmaceutical companies in terms of business sales was used to examine the content.
Results
3-1. target papers
A total of 630 papers were extracted for the top five global companies (the world's leading companies) and 129 papers for the top five Japanese companies (Japan's leading companies). On an individual company basis, it can be seen that the world's leading companies report about 60 papers per year, while Japan's leading companies report about 10 papers per year (Fig. 1). Considering that the use of medical information for the purpose of publication as academic papers is a part of the use of medical information in pharmaceutical companies, it can be inferred that a certain number of papers are continuously used in all major companies in Japan and abroad. The main sources of information for research using anonymized medical information secondarily published in academic papers were information on insurance claims and medical records, and the method of obtaining the information was mainly realized through collaboration with companies that commercially handle medical information, or through joint research with medical institutions that manage medical information.
3-2. countries from which information was obtained (use by world's leading companies)
To begin, we identified the countries in which medical information used by the world's leading companies was acquired. The top 10 countries from which the medical information used was obtained are shown in Figure 2. Approximately 70% of all studies used information originating from the U.S., far exceeding that of other countries. The next closest followed closely behind, with Japan, Germany, and the United Kingdom. While the majority of studies used information from the U.S., a comparison with the other countries shows that information from Japan was also used to a greater extent.
In order to confirm the accessibility of Japanese medical information from overseas, we checked how foreign pharmaceutical companies use Japanese medical information. When we checked the affiliation of the authors of the papers, we found that 85% of the papers in which Japanese information was used included employees of the Japanese offices of the companies concerned among the authors. When we checked the rest of the papers, we found that the majority of the remaining papers were studies that used the Common Data Model (CDM) developed by the Observational Health Data Sciences and Informatics (OHDSI) (Table 1) .7) When using the CDM, information obtained from different sources can be converted into a common data format, allowing analysis that integrates multiple pieces of information. These results indicate that when overseas pharmaceutical companies access Japanese information, they either conduct their research through or led by Japanese centers, or they integrate and use information from other countries after unifying the format of the data with that of other countries. In addition, when we checked the departmental affiliation of the authors of the papers that used Japanese information, 22 out of 28 papers with a statement of affiliation, the majority of the authors included employees of departments involved in medical affairs, such as Medical Affairs and Medical Science. Although this information is limited because many of the papers did not mention any affiliations, we found that the use of information for the purpose of publication in papers was often conducted with the involvement of these departments. In papers using information from other countries, when information from the U.S., the most frequently used country, was used, 95% of the authors included U.S.-based employees. The fact that many U.S.-based companies were included in the survey is thought to have influenced this result. When information from Germany and the U.K., the two countries with the next largest number of use after Japan, was used, the percentage of authorship was 57% and 37%, respectively, for employees based in each country. In Europe, there were many cases where information was used as one country included in a study conducted by integrating information from multiple countries, and thus, compared to Japan, there were many papers in which the authors were employees at locations other than the country from which the information was obtained.
3-3. countries from which the information was obtained (use by major Japanese companies)
Next, we identified the countries in which the medical information used by the major Japanese companies was obtained. The top five countries from which the medical information used was obtained are shown in Figure 3. Approximately 52% of all studies used information derived from the U.S., followed by Japan, the U.K., Spain, and Germany. Even among Japanese pharmaceutical companies, a larger percentage of studies used U.S. information compared to Japanese information.
In terms of the ways in which major Japanese companies use medical information from other countries, 97% of the papers in which U.S. information was used had U.S.-based employees as authors, and none of the papers in the survey were co-authored by authors from the company's Japan headquarters. The authors of the papers were from the U.S. When major Japanese pharmaceutical companies use U.S. medical information, it was found that the majority of the studies were conducted under the leadership of U.S. sites. When using information from the U.K., Spain, and Germany, the majority of studies were also authored by employees of local overseas sites. In a number of cases where studies used European healthcare information, the information was used as one country included in a study conducted by integrating information from multiple countries.
3-4. intended use of information
To investigate how medical information was used in research, we examined the titles of articles from the world's leading companies. Word-by-word bi-grams (strings consisting of two adjacent words) were used to tabulate the frequency of expressions used in the papers (Table 2). The top words identified were "treatment pattern," "medical resource burden," and "economic burden." These are thought to be used as information that contributes to maximizing the value of the company's products, formulating business strategies, and preparing implementation plans for clinical research. To confirm the distribution of the broad purpose of use, papers in which explicit terms were used in the title or abstract were tabulated (Table 3). These results also showed that the surveys were mainly focused on understanding the diseases and the actual treatment status.
Next, the diseases covered by the surveyed papers were investigated. Since the diseases depend on the products and development pipelines of each company, we present the top 10 disease categories as a reference value, given that the information in this survey is from a small number of companies (Figure 4). Although many of the pharmaceutical companies surveyed in this study conducted research on neoplasms, diseases of the circulatory system, and diseases of the nervous system, it is clear that medical information is used for a wide range of diseases, not limited to research on specific diseases.
3-5. use in effectiveness evaluation
Among them, the use of medical information in regulatory applications is increasingly discussed .8) The FDA and other regulatory authorities have issued guidance on the use of medical information databases and other databases in regulatory applications, and there is a growing interest in the use of medical information in the development of drugs for rare diseases for which clinical trials are difficult to conduct. 9, 10). At present, there are relatively few cases of the use of such databases for efficacy evaluation in the survey of academic papers, but this time, from the limited number of cases, we confirmed the details for future use of medical information as a reference.
From the papers surveyed this time, we extracted papers in which a control group was established to compare the efficacy of drugs and studies in which medical information was used as an external control, and confirmed the respective trends. Sixty-one of the 630 papers (9.7%) referred to whether or not a drug was effective by setting up both a group that used the drug of interest and a control group in the medical information that was used as a secondary source. When the method of efficacy evaluation of these studies was checked, 52 of 61 (85.2%), or the majority of the studies, conducted a time-to-event analysis in which the time to the occurrence of an event was compared in both groups. Medical information for secondary use is difficult to obtain unless treatment is provided at a medical institution, and in daily medical practice, it is difficult to obtain information at a defined point in time, as in clinical trials. This is thought to be one of the reasons why most of the evaluation methods are based on time to event. Confirming the diseases studied, the majority of the 61 reports were in two disease areas: 22 of 61 reports (36.1%) were for diseases of the circulatory system, and 21 (34.4%) were for neoplasms. For diseases of the circulatory system, many studies used the diagnosis of ischemic stroke or venous thromboembolism, for which validation studies are being conducted, as the evaluation index, while for neoplasms, many studies used death or discontinuation or change of treatment as the evaluation index. Since information obtained in routine medical care is not information obtained for the purpose of evaluating individual treatment methods, it can be seen that evaluation items that are less susceptible to differences in measurement methods among medical institutions, such as diseases for which general diagnostic definitions have been established and death, are being used. Since there are some diseases for which efficacy evaluation of pharmaceuticals uses indicators that are not measured in daily medical care, it can be inferred that the number of diseases for which efficacy evaluation is possible only by secondary use of medical information is limited at present.
Furthermore, studies that use medical information as an external control should be identified. Only 8 out of 630 reports (1.3%) used medical information as an external control, in which only a control group was constructed from the secondary use of medical information in order to compare the efficacy of drugs for which information on single-arm treatment outcomes was obtained in clinical trials, etc. The disease domain of all these studies was neoplasms, and they performed time-to-event analyses. Furthermore, when the sources of information were checked, 7 of the 8 reports were conducted using information from Flatiron in the United States. When medical information is used as an external control, in addition to the aforementioned limitations in evaluating efficacy, detailed disease-related information is also required to select patients who meet the selection exclusion criteria for clinical trials. The difficulty in obtaining such detailed information solely from insurance claims, which was used in the majority of the papers surveyed in this study, may be another reason for the small number of efficacy evaluation studies. On the other hand, the Flatiron database is constructed in a disease-specific manner, so information on factors affecting prognosis is also available. In order to evaluate efficacy through the secondary use of medical information, it is presumed that it is important to obtain disease-specific information that is structured to include such disease-specific prognostic factors, disease subtypes, and information related to the efficacy and safety outcomes of drugs.
3-6. characteristics of secondary use of information in each country
The characteristics of the information used in each country will be identified. Since some papers only mention information on insurance claims or the use of medical records, it was not possible to compile accurate figures. However, in order to understand the characteristics of the information, we will introduce the most frequently used information with the information source specified in the paper.
The top three most frequently used information sources from the U.S. were MarketScan by Merative (IBM) with 119 out of 439 reports, Optum by UnitedHealth Group with 118 reports, and Flatiron with 56 reports. MarketScan and Optum each contain medical information on more than 100 million people, making them the largest sources of information available to pharmaceutical companies, which is thought to be one of the main reasons for their high frequency of use13). In addition to medical records obtained by disease-specific EHR systems, Flatiron makes available information obtained from genetic mutations, reading reports, and other sources14). There have been cases in which Flatiron has been used for regulatory filings, and the number of users has increased despite the fact that the disease domain is limited to malignant tumors15). In the U.S., in addition to private commercial databases, the Centers for Medicare & Medicaid Services (CMS), which operates Medicare and Medicaid, also maintains information that can be used for secondary use, and some studies using these databases existed.
JMDC and Medical Data Vision were the two most frequently used sources of information of Japanese origin, with 23 of 46 reports from JMDC and 16 from Medical Data Vision, respectively. Both sources are based on insurance claims, and information on around 30 million people is available. InGef (Institute for Applied Health Research Berlin) had 10 of the 37 reports, IQVIA had 9, and AOK PLUS had 5. IQVIA provides a variety of medical information in each country, but in Germany, a high percentage of information derived from medical records called Disease Analyzer is used. The most frequently used information derived from the United Kingdom was the Clinical Practice Research Datalink (CPRD) with 7 out of 30 reports, and the Royal College of General Practitioners Research and Surveillance Center (RCGP RSC) with 6 reports. The CPRD is information derived from medical records provided by public institutions in the U.K., and is characterized by its wide range of information, as it can be linked to laboratory data and death information, etc. 17) The RCGP RSC is also based on medical records collected from medical institutions .18) The information derived from insurance claims is used as a source of information.) Compared to the situation in other countries, where the majority of information used is derived from insurance claims, information in the UK tends to be different.
When the characteristics of the most frequently used information are summarized, it is clear that the most frequently used information is information that is highly comprehensive of patients and versatile enough to be used for a wide variety of diseases and purposes, with information on insurance claims as the core of the information. These information are very useful for a wide range of purposes, such as understanding diseases and the actual status of treatment transitions. On the other hand, disease-specific information, which is not easily structured at this time, is often needed to understand detailed efficacy and safety. Flatiron in the U.S. contains various types of information, and its use is increasing as it is capable of meeting these demands. Based on these results, it can be inferred that at present, two types of information are mainstream: general-purpose information that has accumulated a large amount of easily structured information, and disease-specific information that contains detailed information that is difficult to structure.
Conclusion and Discussion
In this paper, we identified the characteristics of medical information used by both domestic and foreign pharmaceutical companies for the purpose of writing articles. The medical information most frequently used by both domestic and foreign pharmaceutical companies was information from the U.S., mainly from private commercial databases, which accounted for the majority of the information used by major foreign pharmaceutical companies. Information from private commercial databases in Japan was also found to be used in greater numbers than in any other country except the United States. The Japanese information used in the study was mostly from departments involved in medical affairs. Details of secondary use of medical information in medical affairs were published last year by the Committee on Drug Evaluation of the Japan Pharmaceutical Manufacturers Association, which reported that the largest number of companies used the information for the purpose of actual research on disease epidemiology, prescription patterns, etc. 19).
Versatile information that can be used for research on a wide variety of diseases, mainly claims data, is being developed in many countries. The elements required in these information are thought to be important, such as the number of people included in the information, generalizability, and traceability of patients. In Japan, the public and private sectors are independently collecting this information available for secondary use at this time, and it will be necessary to pay attention to what role each will play in the future. International collaboration is also beginning to be considered for this type of information that is relatively easy to structure. It will be important to promote the conversion of information using CDM in order to obtain clinical findings that include Japanese information in studies conducted overseas that integrate information from multiple countries.
Disease-specific information is being prepared mainly in the United States for malignant tumors. In addition to the information from Flatiron, which was frequently used, there were also cases where information obtained from McKesson Specialty Health's iKnowMed EHR system was used. iKnowMed information has also been used in regulatory filings for drugs for malignancies and has been used for drug This shows that disease-specific information that is not included in the claims data is also important when comparing the efficacy and safety of drugs20). Among them, the evaluation of efficacy, as mentioned above, is considered to be suitable or unsuitable for evaluation in secondary use depending on the disease. For diseases such as malignant tumors and COVID-19, clinical trials have a high affinity for time-to-event analysis using death and hospitalization with oxygen supplementation as evaluation indices, which can also be extracted from regular medical records. On the other hand, diabetes mellitus, which is evaluated by routine clinical laboratory tests, and cognitive impairment, which is evaluated by cognitive function tests that are rarely obtained more than once in routine medical care, will be difficult to evaluate only by secondary use. When considering the use of information obtained in routine medical care in place of a prospective disease registry, it is important to evaluate substitutability depending on the disease.
Japanese information used for secondary use by pharmaceutical companies is mainly from private commercial databases. In terms of the use of anonymized information, the Next Generation Medical Infrastructure Act, which came into effect in 2018 in Japan, is often discussed, but this system has been noted for its low number of uses. Although there was only one report of a use case involving a Pfizer employee as an author among the papers surveyed in this study, this paper is a major achievement that has been covered by several media21). This research result differs from the majority of other papers that deal only with structured information in that it uses natural language processing for unstructured information in medical records. The revision of the law last year made it possible to use pseudonymized medical information, and as more detailed unstructured information becomes available in the future, it is expected to differentiate itself from other databases that focus on claims data and become more valuable in research aimed at the development of artificial intelligence, such as the results of this study. It is expected that this information will become more valuable and differentiated from other databases, especially claims data. Members of the MHLW working group also pointed out that private commercial databases are easier to use than public medical databases in clinical research, pharmaceutical company research, etc. Public databases are also required to improve the ease of access to information and data manipulation in analysis 22). In our previous news, we mentioned the desirability of sharing the roles of information to be collected by public organizations and the Next Generation Medical Infrastructure Act's authorized creators23). In the case of the secondary use infrastructure of public medical databases, which is now under consideration, it is important to clarify what position to take in terms of information coverage, types of information to be collected, and consolidation of information, including information from the private sector, which is already being used, and to consider the development of the usage environment and the expansion of information. We believe that it is important to clarify where we stand in terms of information coverage and collection, including private information, types of information, and information consolidation.
-
1) Number of reports and countries from which data was obtained
-
2)
-
3)
-
4)El Emam, Khaled; Rodgers, Sam; Malin, Bradley. Anonymising and sharing individual patient data. bmj , 2015, 350.
-
5)Hiramatsu, Katsutoshi, et al. Current status, challenges, and future perspectives of real-world data and real-world evidence in Japan. Drugs-Real World Outcomes , 2021, 8.4: 459-480.
-
6)National Institute of Biomedical Innovation Policy, Current Status of Biobank (UK Biobank) Utilization, Policy Research Institute News No. 68 (March 2023) (in Japanese).
-
7)
-
8)
-
9)
-
10)
-
11)
-
12)
-
13)
-
14)
-
15)Pharmaceutical and Industrial Policy Research Institute, Utilization of Clinical Trial Control Groups in Clinical Development, Policy Research Institute News No. 58 (November 2019).
-
16)Andersohn, Frank; WALKER, Jochen. Institute for Applied Health Research Berlin (InGef) Database. Databases for Pharmacoepidemiological Research , 2021, 125-129.
-
17)
-
18)CORREA, Ana, et al. Royal College of General Practitioners Research and Surveillance Centre (RCGP RSC) sentinel network: a cohort profile. BMJ open , 2016, 6.4.
-
19)
-
20)
-
21)Araki, Kenji, et al. Developing artificial intelligence models for extracting oncologic outcomes from Japanese electronic health records. Advances in Therapy , 2023, 40.3: 934-950.
-
22)
-
23)Pharmaceutical and Industrial Policy Research Institute, Next Generation Medical Infrastructure Act to be a better system, Policy Research Institute News No. 70 (November 2023).
