Points of View Survey Report of Pharmaceutical Companies on Utilization of Public Databases (2) - Organization of Stored Information and Use Cases -Organizing Stored Information and Use Cases - Report of a Survey of Pharmaceutical Companies on Utilization of Public Databases, etc.
Makoto Edahiro, Senior Researcher, Pharmaceutical Industry Policy Institute
Natsuko Watanabe, Senior Researcher, Pharmaceutical Industry Policy Institute
Mariko Togashi, Senior Researcher, Pharmaceutical Industry Policy Institute
SUMMARY
- Based on the same questionnaire survey as in the previous report, "Report on Survey of Pharmaceutical Companies on Utilization of Public Databases, etc. (1) - Possibilities and Issues of Utilization," 1), we will introduce specific research and analysis concepts by pharmaceutical companies regarding future use cases associated with expanded utilization of public databases and medical and other information.
- Prior to the analysis, information stored in 11 public databases and databases authorized under the Next Generation Medical Infrastructure Act were organized for the purpose of promoting social recognition.
- The most common disease area envisioned in the use cases was "anti-cancer agents," but emphasis was also placed on cross-disciplinary research that is not limited to diseases.
- As for the type of public databases, 74.5% of the respondents answered that they would like to utilize information from multiple databases by linking them together, with a particular preference for combining them with NDB.
- For pharmaceutical companies, the consolidation of information from multiple public databases and their integrated analysis will make it possible to analyze the effectiveness of new treatments and patient profiles, and to promote research on personalized medicine to optimize treatment strategies, as well as to strengthen post-marketing safety monitoring and build efficient evidence. Progress in many areas is also expected.
- In order to promote the secondary use of medical and other information in the future, legal arrangements and social understanding are necessary, and it is important for pharmaceutical companies to actively utilize the information and spread its significance through specific successful examples.
1. Introduction
The use and provision of pseudonymized information in public databases held by the Minister of Health, Labour and Welfare and others, as well as consolidated analysis with other pseudonymized information and pseudonymized medical information based on the Next Generation Medical Infrastructure Act, are currently under deliberation in the Diet. Once this bill is enacted and related laws and regulations are amended, it is expected to become institutionally feasible2). In response to the development of such a legal system, the National Institute of Biomedical Innovation Policy conducted a "Questionnaire Survey on the Utilization Status and Intentions of Public Databases" in 2025 with the aim of clarifying the actual status and issues related to the utilization of public databases, etc. by pharmaceutical companies. In the previous report, we reported on the results of the survey (1), focusing on the "possibilities and challenges of utilization" and analyzing the institutional and operational barriers and expectations for future utilization.
In this report, we summarize the public databases that the government is considering linking in the future, and introduce the future use cases envisioned by the expanded utilization of public databases and medical information, and the concept of specific research and analysis by pharmaceutical companies, based on the results obtained from the survey. The presentation will introduce specific research and analysis concepts by pharmaceutical companies based on the results obtained from the survey.
In particular, the visualization of practical use images based on free descriptions of what databases and for what purposes pharmaceutical companies would like to use in what areas and for what purposes will help to examine the possibility of utilization of medical and other information within and outside of pharmaceutical companies, and to design and operate systems for the future secondary use of medical information in a concrete manner. It is hoped that the visualization of the practical image of the use of medical and other information based on the free descriptions regarding the desire to use medical and other information will help in the consideration of the possibility of the use of medical and other information inside and outside of pharmaceutical companies and in concrete discussions about the future design and operation of the system for the secondary use of medical information.
In order to clarify the intent of the free descriptions in the questionnaire, the National Institute of Biomedical Innovation Policy (NIBIO) has added and supplemented some information related to the use cases. In addition, the number of companies responding to the use cases is limited, and this report is limited to a case study.
Outline of each public database
This chapter briefly summarizes the main information items stored in each public database (11 databases in total) based on publicly available information from the Ministry of Health, Labour and Welfare (MHLW), etc., for the purpose of understanding the outline of each database and enhancing social recognition. The list also includes databases accredited under the Next Generation Medical Infrastructure Act, as well as databases that are currently under construction or scheduled to be constructed. A list of each database is provided in a table as supplementary material (Supplement 1).
The contents are based on information available to the public on various websites, etc. It should be noted that the details of the system and its operation, the scope of data that can be provided, and the information under the jurisdiction may be subject to change as the system develops and additional information is released in the future, and some of the information may be inaccurate.
Public databases
Anonymous database of information related to medical insurance (hereinafter referred to as "NDB")
(Competent authority: Medical Care Coordination Policy Division, Insurance Bureau, Ministry of Health, Labour and Welfare)
This is a public database established based on the "Act on Securing Medical Care for the Elderly" and centrally manages information related to medical insurance. Third-party provision of receipt information, etc., was implemented on a trial basis in FY2011, and full-scale implementation began in FY20133 ). This database stores patient information and information on medical treatment practices, etc. (medical, DPC, dispensing, dental, and home nursing) based on medical fee statements. It also contains information on specific medical examinations and specific health guidance (examinee information, results of medical examinations, level of health guidance, etc.) conducted by medical checkup institutions. In addition, since April 2023, the accumulation of death information (date and time of death, cause and type of death, etc.) collected from death notifications and death certificates has been started, and a system has been established to store this information on an annual basis around December of the following year (Figure 1, Table 1) 4). The database is highly comprehensive, covering almost all of the insured medical examinations conducted throughout Japan, and is also highly representative of the entire national population (Fig. 1, Table 1)4). As of March 2024, the database contained approximately 26.5 billion receipts for medical examinations from 2009 to 20235).
Electronic medical record information sharing service database (hereinafter referred to as "electronic medical record information DB")
(Competent authority: Office of the Counselor for Medical Information, Medical Policy Bureau, Ministry of Health, Labour and Welfare)
This database is currently under construction and is a public database that manages electronic medical record information designed to share information among medical institutions on a nationwide scale6). It is positioned as one of the core components of the "National Medical Information Platform" and is also a pillar of the government's medical DX (Digital Transformation) policy. The database will share three documents, "health checkup result reports," "medical information forms," and "discharge summaries," as well as six types of information related to "injury and disease," "infectious diseases," "drug allergies," "other allergies," "tests," and "prescriptions" (Tables 2 and 3) 7).
Anonymous medical treatment-related information database (hereinafter referred to as "DPCDB")
Jurisdiction: Medical Care Division, Health Insurance Bureau, Ministry of Health, Labour and Welfare
This is a public database that manages medical information and reimbursement data for hospitals subject to the Diagnosis Procedure Combination (DPC), which is a government-established system for hospitalization fees based on the Health Insurance Law. As of June 1, 2025, there were 1,761 DPC hospitals in Japan8). This database stores information on the condition of hospitalized patients from the time of admission to the time of discharge, medical treatment, drugs used, and reimbursement scores, and is particularly suitable for research on serious illnesses that often require hospitalization at large hospitals9). The stored information is shown in Table 4.
Anonymous database of information related to long-term care insurance, etc. (hereinafter referred to as "long-term care DB")
(Competent authority: Geriatric Health Division, Geriatric Health Bureau, Ministry of Health, Labour and Welfare)
Based on the Long-Term Care Insurance Law, this is a public database that centrally manages long-term care benefit statements (long-term care receipts), information on certification of long-term care needs, and various information on long-term care services. The main information stored in this database can be broadly classified into three types: long-term care receipts and other information (Table 5), information on certification of long-term care needs (Table 6), and LIFE information (Table 7). LIFE information is information on the condition of the elderly and the content of care registered in the scientific long-term care information system (LIFE), which manages the condition of long-term care service users and the planning and content of care provided at long-term care facilities and offices10).
Municipal medical checkup information database (hereafter, Municipal medical checkup DB)
(Competent authority: Health Division, Health and Welfare Bureau, Ministry of Health, Labour and Welfare)
This is a public database that is scheduled to be constructed in the future. The purpose of this database is to centrally manage and utilize the results of various health checkups and related information conducted by local governments nationwide. Like the Electronic Medical Record Information DB, this database is positioned as one of the core components of the "Nationwide Medical Information Platform," which aims to establish a system that enables the mutual sharing of necessary information between medical checkup information held by local governments and medical information held by medical institutions, pharmacies, etc. The system is aimed at establishing a system that enables necessary information to be mutually shared between information on medical checkups held by local governments and information on medical care held by medical institutions and pharmacies. The types of medical examinations to be covered by the municipalities will include cancer examinations (stomach, cervical, lung, breast, and colon cancer), osteoporosis examinations, periodontal disease examinations, hepatitis virus examinations, and others. The information on these medical examinations will be examined from a systemic perspective and standardized, and linkage will be initiated while taking into consideration the status of standardization of municipal systems11).
National Cancer Registry Database (hereafter, National Cancer Registry DB)
Jurisdiction: Division of Cancer and Disease Control, Health and Lifestyle Division, Ministry of Health, Labour and Welfare
Based on the "Law Concerning the Promotion of Cancer Registry, etc.," it is a public database that centrally manages information on cancer patients nationwide12). All medical institutions diagnosed with cancer are required by law to notify information on patients diagnosed with cancer to prefectural governors, and the information collected by the prefectures is reported to the national government for integrated management as the National Cancer Registry. This information is reported to the national government and managed in an integrated manner as the National Cancer Registry. These items include the patient's basic information, information on the tumor (primary site, basis of diagnosis, how it was discovered, degree of progression (pre/postoperative pathology), etc.), information on the initial treatment (chemotherapy, etc.), and information at the time of notification (date of death, etc.) (Table 8). ) at the time of notification (date of death, etc.) (Table 8). This enables the collection and analysis of information on progression and survival rate, in addition to the number of patients13).
Database of patients with designated intractable diseases (hereafter, Intractable Disease DB)
(Competent authority: Intractable Disease Control Division, Health and Lifestyle Health Bureau, Ministry of Health, Labour and Welfare)
Based on the "Law Concerning Medical Care for Patients with Intractable Diseases (Intractable Disease Law)," this is a public database that centrally manages medical information on patients with designated intractable diseases. As of April 1, 2025, there were 348 designated intractable diseases14). This database registers the information provided in the "Individual Clinical Investigation Form" prepared by designated physicians, and is intended to promote research and development as well as improve the quality of medical care. There are six main types of information to be included in this "Individual Clinical Investigation Form" (Figure 2). In addition to basic patient information such as family name and date of birth, information on diagnostic criteria and symptom severity necessary for the examination of the grant approval of medical expenses subsidies, and clinical and laboratory findings used in research are registered15).
Database of children with chronic diseases of childhood (hereinafter referred to as "chronic childhood diseases DB")
(Competent authority: Intractable Disease Control Division, Health and Lifestyle Health Bureau, Ministry of Health, Labour and Welfare)
It is a public database that manages information on patients with chronic childhood diseases based on the Child Welfare Law. As of April 1, 2025, 801 diseases16) were included in the database, which covers almost all chronic diseases that occur in childhood. This is a world-class pediatric disease registry database17), and information is registered in this database based on "medical opinion forms" prepared by designated physicians. There are eight pieces of information to be included in this "medical opinion form" (Figure 3). In addition to basic patient information such as family name and date of birth, clinical findings, laboratory findings, progress, and future treatment plan are registered18).
Database of anonymous information on infectious diseases (hereafter, iDB)
(Competent authority: Infectious Disease Control Division, Department of Infectious Disease Control, Health and Lifestyle Hygiene Bureau, Ministry of Health, Labour and Welfare)
Based on the "Act on Prevention of Infectious Diseases and Medical Care for Patients of Infectious Diseases (Act on Prevention of Infectious Diseases)," this is a public database that centrally manages information on outbreaks of infectious diseases and patients reported by the government regarding notification by doctors (outbreak notification) 19). The database contains information necessary to understand the epidemiology of infectious diseases, including information on outbreak reports (date of onset, date of diagnosis, etc.), symptoms, diagnostic methods, routes of infection, causes of infection, etc., and patient information (Figure 4). The framework for the provision of this information to third parties was established with the amendment of the law in April 2024. As of April 2024, only information on novel coronavirus infection (COVID-19) was available20).
Database of information related to vaccinations, etc. (hereinafter referred to as "vaccination DB")
(Competent authority: Immunization Division, Department of Infectious Disease Control, Health and Welfare Bureau, Ministry of Health, Labour and Welfare)
This public database is currently under construction and is being developed based on the "Immunization Law. It is being studied to enable surveys and research necessary to improve the effectiveness and safety of immunizations, etc. 21). This database contains information related to the implementation of vaccinations recorded and stored by local governments (date of birth and gender of the vaccinated person, date and place of vaccination, type of vaccine administered, etc.) and information related to reports of suspected adverse reactions reported by doctors, etc. (type of vaccine, name of manufacturer and seller, lot number, number of vaccinations, The information on suspected adverse reactions reported by physicians, etc. (vaccine type and manufacturer name, lot number, number of inoculations, main symptoms, date of occurrence, etc.) will be stored (Figure 5).
Database of welfare services for persons with disabilities (hereafter, disability welfare DB)
(Competent authorities: Planning Division, Disability Health and Welfare Department, Social and Support Bureau, Ministry of Health, Labour and Welfare; Support Division for Children with Disabilities, Support Bureau, Child and Family Support Agency, Ministry of Health, Labour and Welfare)
Based on the Comprehensive Support for Persons with Disabilities Act and the Child Welfare Act, this is a public database that centrally manages the status of welfare services provided to persons and children with disabilities, as well as information on the certification of support categories. This database contains data on authorization of disability support categories (information on authorization of disability support categories), data on statements of benefits for disability welfare services, etc. (information on disability welfare receipts), and data on ledger information (ledger information) (Figure 6) 22).
Certified databases
Database of certified providers of the Next Generation Medical Infrastructure Act (hereinafter referred to as "Next Generation DB")
It is not a public database directly owned by the Minister of Health, Labor and Welfare, but an authorized database owned and operated by a private company authorized under the Next Generation Medical Infrastructure Act. As of the end of April 2025, the certified providers of anonymized processed medical information include the Life Data Initiative (LDI), the Japan Medical Information Management Organization of the Japan Medical Association (J-MIMO), and the Foundation for the Promotion of Fair Utilization of Anonymized Processed Medical Information (FAST-HDJ). Although the content of information collected differs slightly from one accredited entity to another, as of the end of May 2025, information on approximately 4.99 million patients had been collected from 153 cooperating medical institutions nationwide, mainly acute care hospitals23). The stored information includes receipt information such as patient information and medical treatment practices, admission and discharge information, DPC data such as diagnosis group classification, structured data such as prescriptions and test results, as well as unstructured data such as progress records and clinical summaries (Table 9).
Survey Method
3.1. Survey Overview
This report covers 74 member companies of the Pharmaceutical Evaluation Committee of the Japan Pharmaceutical Manufacturers Association (hereafter referred to as "Pharmaceutical Manufacturers Association") (up to five divisions per company in principle: Research, Clinical Development, Post-Marketing Safety, Medical Affairs, and Health Economics and Outcomes Research) for the period from March 14 to April 14, 2025. The survey was based on the "Questionnaire Survey on the Use of and Intentions toward Public Databases, etc." conducted by the National Institute of Pharmaceutical and Industrial Policy. The details of this survey are described in the survey methodology in the previous report.
3.2. Respondents in this paper
As indicated in the previous report, a total of 139 valid responses were received. This report analyzes the use case trends for 95 of these 139 cases, including 24 cases that responded "strongly intend" and 71 cases that responded "intend" to use public databases in future research and analysis activities. The use cases were collected in a format that allows a maximum of three responses per department, and responses were tabulated by total department, not by company (Table 10).
Results
4.1. Areas of interest for research and analysis using public databases
In this section, the disease areas covered by each use case were analyzed based on the question regarding one of the survey items, "Specific research/analysis concepts you would like to work on in the future using public databases, etc.". Note that the survey limited the number of responses to this item to a maximum of three for each response, so it is necessary to be careful in interpreting the results, as they may not reflect all intentions.
The most frequently cited "area of research/analysis that the respondents would like to work on in the future using public databases, etc." was "anti-cancer agents," which was selected by 43.2% of the total respondents. This was followed by "systemic anti-infectives (including vaccines)" at 18.9%, "cardiovascular agents" at 15.8%, and "nervous system agents" at 14.7%. In addition, 30.5% of the respondents chose "No specific disease area (basic research, cross-disciplinary research, etc.)," indicating a trend toward cross-disciplinary and structural research themes that are not dependent on a specific disease. In addition, a wide variety of areas were freely listed as "other," including rare diseases, pediatric areas, and designated intractable diseases (Figure 7).
4.2. Types of public databases, etc. you intend to utilize in each use case
Next, we analyzed the "types of public databases that are expected to be utilized in each use case.
In this question, respondents were asked to select all applicable databases if they expected to utilize a combination of multiple databases. As a result, the largest number of respondents (25.5%) selected only one type of database (stand-alone), followed by three types (21.7%), two types (18.6%), and four types (18.6%), indicating that many respondents wanted to utilize the information from multiple databases in a consolidated manner (Figure 8). The total number of respondents who wanted to utilize information from two or more databases by linking them together amounted to about 74.5% of the total, far exceeding the utilization of single databases.
Analyzing the breakdown of combinations of public databases and other databases (Table 11), a particularly large number of combinations of two types of databases centered on NDB, such as "NDB, electronic medical record information DB," "NDB, next-generation DB," and "NDB, DPCDB" were observed. Utilization of three or more databases in combination was also seen in multiple responses, with combinations such as "NDB, electronic medical record information DB, and next-generation DB" and "NDB, electronic medical record information DB, and DPCDB" being frequently cited.
In addition, many respondents wished to utilize databases specialized for specific diseases, such as the National Cancer Registry DB and the Intractable Disease DB. Databases that handle community-based medical and nursing care information, such as the Municipal Medical Checkup DB and the Nursing Care DB, were also mentioned as targets for consolidation.
4.3. Purposes of using public databases by sector
In this section, we closely examined the free responses to the question on "the purpose and content of research and analysis you would like to conduct in the future" among the survey items, and organized them based on common themes and categories. In particular, the "purpose of utilizing public databases, etc." was categorized and organized by sector. The "purpose of use" here refers to the description of what kind of needs and issues the public databases, etc. will be used to address in the future, and is intended to clarify the sense of purpose behind the use cases (Table 12).
For the research section, the use cases were grouped into four categories: "Analysis of social demand for medical care," "Investigation of unmet needs and treatment status," "Statistics and analysis of diseases," and "Evaluation of treatment and prognosis. Major specific examples included "Analyzing the number of potential patients for unmet medical needs," "Identifying drug targets based on predicted patient populations using AI," and "Assessing and calculating the medical and health economic benefits of preventive and early onset interventions. In addition, "analysis of medical information on the elderly not covered by advanced pediatric drug development research and drug market analysis companies" was also mentioned as a specific age group.
Clinical development was divided into four categories: "Clinical trials and therapeutic strategies," "Approval and submission process," "Disease distribution and data analysis," and "Long-term evaluation and prognosis. The main specific examples included "Identifying the number of patients in planning clinical trials," "Efficacy evaluation of vaccines and infectious disease drugs," "Analysis of treatment status by cancer stage and prognosis by cancer type," and "Analysis of natural history of diseases in rare and intractable diseases. More specific information on conditions and diseases included "identification and treatment history of patients with cardiovascular and renal conditions and diseases (dilated heart failure/systolic heart failure, acute myocardial infarction, cardiovascular disease + chronic kidney disease) and metabolic and endocrine conditions and diseases (type 2 diabetes, obesity, high BMI, metabolic-related fatty hepatitis)" and "analysis of the natural history of rare and intractable diseases. The study also included "identification of institutional groups in which patients with metabolic and endocrine-related conditions and diseases (2 diabetes, obesity, high BMI, metabolic-related fatty hepatitis) are present and analysis of their treatment history," "relapse status and changes in prescription drugs for patients with ulcerative colitis," and "long-term follow-up studies on the risk of developing colorectal cancer and other related diseases.
The post-marketing safety section was divided into four categories: "Signal evaluation and causal relationship studies," "Epidemiological studies and analysis of risk factors," "Post-marketing database studies and risk assessment," and "Pharmacovigilance and monitoring. Major specific examples included "Analysis of relatively mild gastrointestinal disorders, an adverse effect for which it is difficult to assign a diagnosis code," and "Analysis of the impact of treatment status, concomitant medications, medical history, and other factors on the outcome of antineoplastic agents with serious skin disorders.
The Medical Affairs category was divided into three categories: "Evaluation of drugs and generation of evidence under real clinical conditions," "Research on rare and intractable diseases," and "Cross-sectional analysis of medical and nursing care in immunization. Major specific examples included "evaluation of outcomes in treatment with antineoplastic agents," "analysis of the prevalence of infectious diseases and regional differences in vaccination status," and "research on the effectiveness and cost-effectiveness of immunization. Furthermore, in the area of rare diseases in children, "research on patient journey with the aim of exploring the time to diagnosis and course of treatment" and "analysis of prognosis and nursing and disability welfare services for patients with mental disorders" were among the topics mentioned.
As for the Health Economics and Outcomes Research section, the studies were classified into three categories: "Evaluation of outcomes in treatment and preventive interventions," "Evaluation of economic impact including medical and nursing care," and "Analysis of the actual status of prescribing and treatment. The main specific examples included "understanding the actual status of medical costs and medical resource use according to the type of disease and its stage of progression in the cardiovascular field," "evaluating the effect of immunization on disease prevention," and "cost-effectiveness analysis that even takes into account the cost of nursing care.
In common with the three sectors (Clinical Development, Medical Affairs, and Health Economics and Outcomes Research), there was interest in visualizing the "patient journey," which captures the patient's progress over time.
4.4. Details of use cases by sector
In this section, we carefully examined the free responses received to the question regarding "databases and specific data contents and information that you would particularly like to use," and identified five specific use cases (disease areas, expected effects, explanation of use cases, databases you would like to link, and data items you would like to see when linking databases) The use cases were organized by department.
The use cases were selected based on the assumption that data in multiple databases would be linked, and the National Institute of Biomedical Innovation Policy (NIBIO) compiled, added to, and supplemented cases with similar objectives and intentions to the extent possible. However, there are limitations in organizing the contents described in the free responses, and a certain degree of variation can be seen in the granularity of the data items described. The "Expected Effects" are organized to indicate the actual results or improvements that will be obtained through the research, etc., and the "Use Case Description" is organized to indicate what specifically will be done.
For the research section, the main use case is to "identify unmet medical needs and analyze the number of potential patients, the number of patients per severity of illness, the current treatment ratio, and prognosis" by utilizing the information from the seven databases in basic research and cross-disciplinary research (Table 13). The cases were listed (Table 13). This is expected to enable "the selection of new drug candidates, evaluation of the need for new treatments, and verification of their effectiveness" in a wide range of areas. In particular, it is noteworthy that a wide range of data items were listed as items that the respondents would like to use, including "items related to the patient's medical condition," "items related to the severity of the condition," "findings such as treatment and laboratory values," and "research information," in addition to "basic patient information.
For the clinical development sector, there are three main use cases (Table 14). The first is "information collection during development planning" in the area of anti-cancer drugs, where information from four databases is used to "ascertain the number of target patients, identify candidate patients for inclusion, ascertain treatment history and actual treatment status, select facilities, consider inclusion criteria, and consider establishing an external control group. The cases in which the "development plan" is implemented are listed below. By collecting this information, "optimization and efficiency in development planning" is expected. In particular, regarding the electronic medical record information DB, there were requests for the disclosure of "data on drug prognosis" and "image data" as detailed information on treatment. Although the data presented here are for the area of antineoplastic agents, there were also requests for use in other areas, indicating the potential for use in a wide range of fields.
The second case is the use of the information in the four databases as a "basis for research for the development of new treatments and vaccines" in the area of systemic anti-infective agents, and the analysis of information ranging from vaccination status to the occurrence and treatment of infectious diseases and death (outcome) to "evaluate the efficacy of vaccines and drugs to treat infectious diseases. Among the NDB information, "data on symptoms related to otorhinolaryngology (olfactory and taste disorders after infection)" were requested in order to analyze the incidence, course, and treatment status of sequelae after COVID-19 infection.
Third, in the area of other (rare diseases in children), a case of "use as a control group in clinical trials" was cited by utilizing information from the three databases. This utilization is expected to "realize the establishment of control groups in disease areas where it is difficult to conduct comparative studies. Disclosure of "information on clinical endpoints (test items, etc.)," "outcomes (death information)," "drug information," etc., which are necessary for use in clinical efficacy and safety evaluation, was requested.
For the post-marketing safety section, two major use cases are discussed (Table 15). The first is the case of "research to investigate potential adverse events of drugs" in the area of anti-cancer drugs, where "early identification and risk assessment of adverse events related to anti-cancer drugs" is expected by utilizing the information in the three databases. In particular, the specific data items they wanted to use were "external causes of death (accidents, suicide, violence, etc.)," which could be affected by the drugs, and "types and causes of death," which could identify risks that the drugs may cause.
The second was the case of "a study of trends in descriptive statistics over time in a treated patient population" in the area of anti-cancer drugs. By utilizing the information in the five databases widely, it is expected to "contribute to the improvement of disease management and treatment policies. In particular, a wide range of detailed data items were listed as items that the respondents wanted to use, including "basic patient information (age in months, days, race, education, annual household income)," "clinical data (severity of illness, blood pressure, temperature, weight, intraocular pressure)," "treatment-related information (treatment policy at each hospital)," and "time data (in hours, minutes, and seconds). The medical affairs section was characterized by a wide range of detailed items.
For the medical affairs section, two major use cases are discussed (Table 16). The first is in the area of rare pediatric diseases, where information from three databases is utilized to "visualize the patient journey from initial diagnosis to confirmation of diagnosis and start of treatment. This is expected to provide evidence to promote early diagnosis and appropriate intervention in rare diseases in children, and to contribute to diagnostic support and consideration of treatment policies in the medical field. In particular, information such as "time of onset" to understand medical history and clinical course, "course" to understand treatment efficacy and disease characteristics, and "laboratory findings" to provide the basis for diagnosis were sought.
Secondly, in the area of systemic anti-infective agents (including vaccines), the information in the three databases should be utilized to "track and evaluate the relationship between vaccination status and infectious disease incidence, disease severity, medical and nursing care costs, and incidence of sequelae, as well as regional differences in vaccination coverage and disease prognosis over the long term. The cases were mentioned. This was expected to clarify the effects of vaccination and lead to vaccination awareness.
As for the Health Economics and Outcomes Research section, one of the main use cases was the use of information from five databases in the area of systemic anti-infectives (including vaccines) to "evaluate the impact of vaccination on disease onset, death and sequelae, and medical and nursing care costs over the long term. and to analyze the actual status of treatment and the progress of patients" (Table 17). Through such analysis, it is expected to "quantify the impact of immunization on reducing medical and nursing care costs and improving patient outcomes, and to evaluate its cost-effectiveness. In particular, "date of vaccination and type of vaccine," "diagnosis and prescription details," and "death information" were cited as specific data items they would like to use.
Summary and Discussion
5.1. Areas of research and analysis to be covered by the use of public databases
Although the number of respondents and the number of responses in the survey were limited, the most frequently cited area in which they would like to utilize public databases, etc. in the future was "anti-cancer drugs" (43.2%), followed by "systemic anti-infective agents (including vaccines)" (18.9%).
From these results, it can be inferred that public databases, etc. are attracting attention as a research method for new approaches and treatment methods in cancer treatment. In particular, the National Cancer Registry DB contains information on cancer collected on a nationwide scale, and it can be inferred that there are growing expectations for the utilization of this information to enable a variety of analyses in cancer research.
In addition, the recent situation affected by the new coronavirus infection (COVID-19) may have reaffirmed the importance of research and development of rapid and effective treatments for infectious diseases.
On the other hand, 30.5% of the respondents answered "no specific disease area (basic research, cross-sectional research, etc.)," indicating that there is a certain orientation toward cross-sectional and structural research that is not dependent on specific diseases. Cross-sectional research that does not depend on specific diseases may contribute to the elucidation of mechanisms common to multiple diseases and the discovery of new treatment methods. In addition, the accumulation of extensive knowledge in basic research, free from disease specificity, is expected to lead to the development of innovative therapeutic methods in the future.
The "Other" category included a wide range of diseases, such as rare diseases, pediatric diseases, and designated intractable diseases. This clearly indicates that pharmaceutical companies are interested in utilizing public databases for a wide range of diseases and emphasizes the importance of research and development for unmet medical needs. Rare diseases, pediatric diseases, and designated intractable diseases are areas where researchers are limited due to the small number of patients and the need for specialized knowledge, making it difficult to fully understand the actual situation. Accumulation and analysis of data for these areas is expected to lead not only to the discovery of new treatments, but also to findings that will have a direct impact on the improvement of the quality of healthcare and the quality of life of patients in particular.
5.2. Types of public databases, etc. you plan to use in each use case
The results of the survey revealed that there is a significant need to combine and utilize multiple databases. In particular, the combination with the NDB was seen in many cases, indicating that the value of the NDB as a comprehensive information base for all citizens is highly valued. By utilizing the NDB in combination with information from other databases, it is possible to obtain detailed patient profiles and treatment courses that are difficult to obtain with the NDB alone, and to analyze diseases from a comprehensive perspective.
There is also a high level of interest in databases specialized for specific diseases, such as the National Cancer Registry DB and the Intractable Disease DB, and it is expected that utilizing these databases to obtain and analyze detailed information in specific disease areas will contribute to solving specific disease-specific issues. For example, by utilizing the information in the National Cancer Registry DB and other databases, it will be possible to collect and analyze detailed medical information on cancer patients, knowledge on the relationship between cancer and other diseases and complications, medical economic aspects of cancer treatment, and the use of care services by cancer patients. The increased utilization of public databases on these specific diseases will increase the possibility of contributing to the development of effective treatment methods based on the actual conditions.
In addition, there were also many cases of willingness to utilize databases that handle community-based medical and nursing care information, such as municipal health checkup databases and nursing care databases. By understanding health issues and disease trends in each region through such information, it may contribute to the planning and implementation of measures tailored to regional characteristics, such as the optimization of regional medical care and strengthening of preventive medicine.
5.3. Purpose and details of use of public databases by sector
In the research sector, research on new drugs and treatments is expected to be conducted using statistical data on unmet needs and diseases to explore research themes. Specifically, it is envisioned that the data will be used to identify issues related to the lack of effectiveness of current treatments for specific intractable diseases and the quality of life of patients, to determine the prevalence of treatments based on the severity of the disease, and to use the data for research to evaluate which treatment methods are superior in terms of therapeutic outcomes. In such exploratory studies, the information described in unstructured data can contribute to the construction of more sophisticated patient profiles and improve the accuracy of predictions, which is considered essential for advancing research on personalized treatment. Disclosure of information on the elderly is particularly important because they often suffer from many chronic diseases and are prone to polypharmacy, the problem of using multiple drugs at the same time. In the research sector, it was suggested that integrated use of medical and other information could be utilized to establish research themes based on social and clinical needs.
In the clinical development sector, diverse perspectives were observed from drug development to improvement of treatment methods and follow-up of patients. It is expected to reduce the time required for planning clinical trials, improve the efficiency of decision-making for conducting clinical trials in Japan, and increase the probability of success through more precise planning of clinical trials. In addition, understanding the number of eligible patients directly leads to the promotion of recruitment of clinical trial participants, and understanding the natural history of diseases is essential for evaluating the efficacy and safety of new drugs. This will enable the development of highly reliable evidence.
The utilization of NDB "receipt information" and electronic medical record information DB "patient information and treatment details" is important for understanding the number of eligible patients and identifying candidates for inclusion. The "receipt information" will be used to understand the background and actual treatment of patients who actually received treatment, and the "detailed treatment information" will be used to collect evidence for improving treatment methods. Combining these pieces of information is expected to add multifaceted perspectives to the planning of clinical trials and evaluation of treatment methods.
Furthermore, by utilizing information from the NDB "ENT-related symptoms" and the electronic medical record information DB and iDB "test results," it will be possible to evaluate the effects of certain infectious diseases, the prevalence of infectious diseases, and the effectiveness of vaccines, suggesting the high potential for the utilization of this information. In addition, some are considering the evaluation of clinical efficacy of treatment methods, and it is thought that the use of "physician evaluation," "outcome," and "image data" in addition to "laboratory findings" will enable the evaluation of treatment efficacy based on more medical indicators. In order to promote the utilization of medical and other information, there is an urgent need for the early release of such medical information and the development of an access system, and further enhancement of the data utilization environment is strongly desired.
In the post-marketing safety sector, the role of assessing risks associated with the use of pharmaceuticals and ensuring patient safety was once again highlighted. In the activities related to signal evaluation and causal relationship studies, epidemiological studies and analysis of risk factors, post-marketing database studies and risk assessment, and pharmacovigilance and monitoring, which were listed as utilization objectives, the common objective of identifying potential adverse events of drugs and establishing a foundation for appropriate responses was found. The common objective of the study was to establish a basis for identifying and responding appropriately to potential adverse events of pharmaceuticals.
In particular, in studies for early identification and risk assessment of adverse events related to antineoplastic agents, it is expected that the utilization of information from the three public databases and other sources will make it possible to identify risks of pharmaceutical products and take appropriate countermeasures. Specifically, by collecting details of a patient's treatment history and administered drugs from NDB, it will be possible to clarify the use of specific antineoplastic agents and develop data that will serve as a basis for treatment. In addition, by combining the electronic medical record information DB, detailed data on the duration and dosage of that prescribed antineoplastic agent can be supplemented to accurately track the timing and form of adverse events. Furthermore, by utilizing the National Cancer Registry DB, it will be possible to precisely identify the type and frequency of potential adverse events caused by a particular antineoplastic agent based on information on the pathological diagnosis and stage of the tumor. Thus, it is thought that comprehensive evaluation of both the risk of adverse events and treatment efficacy will be possible through integrated use of the three types of public databases.
In the medical affairs sector, it was suggested that the utilization of public databases may play a role not only in data analysis but also in the generation of evidence based on the interface between clinical practice, government administration, and research. In particular, research on visualization of the patient journey is considered highly significant as a means of understanding the chronological progression from initial diagnosis to confirmation of diagnosis and initiation of treatment, and clarifying the factors that delay diagnosis and treatment. In order to realize such visualization, it is effective to utilize information such as "clinical findings, laboratory findings, and age of onset" from the Intractable Disease DB and the Small Chronic Disease DB in addition to "receipt information" from the NDB, which will enable more precise tracking of patient progress and decision turning points.
Furthermore, an attempt to analyze the impact of immunization on morbidity and severity of infectious diseases, as well as medical and nursing care costs, is positioned as an important study to verify the value of the entire immunization program, beyond the evaluation of individual drugs. To realize this, it is necessary to link information from multiple databases, such as the Immunization DB, iDB, and NDB, and conduct an integrated analysis of medical information, vaccination history, and the incidence of infectious diseases. Such efforts are of great significance not only for clarifying the relationship between immunization coverage and outcomes, including regional differences, and for contributing to immunization awareness and policy evaluation, but also for providing concrete suggestions for the revision of strategies for the prevention of infectious diseases.
In the Health Economics and Outcomes Research Division, it is expected to optimize treatment effectiveness and health care economics through the evaluation of outcomes in treatment and preventive interventions, thereby contributing to ensuring patient safety and reducing the economic burden. In particular, a long-term evaluation of the impact of vaccination on the incidence and severity of infectious diseases, death and sequelae, as well as medical and nursing care costs, is an important perspective in clarifying the value of preventive medicine. In order to realize such analysis, it is essential to have a mechanism to link information from multiple databases and to comprehensively understand medical treatment history, vaccination history, and nursing care information. Specifically, the NDB provides detailed data on "medical history (diagnosis name, prescription details, medical procedures, and drug use) and death information" from the NDB, "test results and discharge summaries" from the electronic medical record information DB, and "nursing care level, service details, and frequency of use" from the nursing care DB. In addition to this, the combination of "vaccination history (date of vaccination, type of vaccine)" from the vaccination DB and "information on the date of onset of infectious disease and severity of symptoms" from the iDB will enable multifaceted and long-term analysis to evaluate the effectiveness of vaccinations and their economic impact, etc.
Furthermore, tracking the transition of medical and nursing care service use over the course of treatment and comprehensively analyzing medical and nursing care costs may lead to improvements in the health status of patients by identifying areas for improvement to improve the quality of nursing care services and strengthen the coordination between medical and nursing care services.
In light of the above, it is hoped that pharmaceutical companies will make progress in many areas in the future by linking information from multiple databases and analyzing them in an integrated manner to speed up drug research and development, improve the probability of success, strengthen post-marketing safety monitoring, and more efficiently build evidence. The utilization of medical and other information will not only optimize the R&D strategies of pharmaceutical companies, but also improve the quality of medical care and ensure patient safety, and the scope of utilization is expected to expand and become increasingly important in the future.
6. Conclusion
The government's efforts on the secondary use of medical and other information are progressing as a result of the interaction of legal development, technological innovation, and the development of social infrastructures. 2019 onward, the Pharmaceutical Manufacturers Association of Japan (PMAJ) has made policy proposals on the secondary use of medical and other information, and if the relevant laws and regulations are revised in the future, the use of pseudonymized information in public databases and the use of pseudonymized information in public databases will be promoted. If the relevant laws and regulations are amended in the future, it is expected that the use and provision of pseudonymized information in public databases will become possible on a systematic basis. This will enable consolidated analysis with other pseudonymized information and pseudonymized medical information based on the Next Generation Medical Infrastructure Act, and is expected to lead to richer secondary use of medical and other information. In addition, the revision of the Pharmaceutical Affairs Law is underway, and with this revision, discussions toward specific secondary use of data are becoming more active. Given these circumstances, the organization of information stored in public databases and the presentation of use cases in this paper should provide important suggestions for future institutional and practical discussions.
In order for pharmaceutical companies to promote more effective use of medical and other information, it is first important for them to actively promote the use of medical and other information themselves, and to visualize and share specific successful cases and their effects by using visual materials such as diagrams and videos. It is also essential to conduct activities to deepen understanding of the significance and value of the utilization of medical and other information not only among pharmaceutical companies, but also among various stakeholders, including government agencies, medical institutions, and the public. It is hoped that such efforts will lead to the formation of a common understanding of the utilization of public databases and further promotion of the utilization of medical and other information.
We hope that this report will be of some help in considering the use of public databases.
-
1) Number of reports and countries from which data was obtainedPharmaceutical Industry Policy Institute, "Report on Survey of Pharmaceutical Companies on Utilization of Public Databases, etc. (1) - Possibilities and Issues of Utilization -," Policy Research Institute News No. 75 (July 2025).
-
2)
-
3)
-
4)
-
5)
-
6)
-
7)
-
8)
-
9)
-
10)
-
11)
-
12)
-
13)
-
14)
-
15)
-
16)
-
17)Morikazu Kyotoku, Overview of the Database for Children with Chronic Diseases, Health and Medical Science 2023 Vol.72 No.4 p.303-309
-
18)
-
19)
-
20)
-
21)
-
22)
-
23)
