Points of View Survey Report of Pharmaceutical Companies on Utilization of Public Databases, etc. (1) -Possibilities and Challenges in Utilization of Public Databases, etc.
Natsuko Watanabe, Senior Researcher, Pharmaceutical Industry Policy Institute
Mariko Togashi, Senior Researcher, Pharmaceutical Industry Policy Institute
Makoto Edahiro, Senior Researcher, Pharmaceutical Industry Policy Institute
SUMMARY
- Currently, the use and provision of pseudonymized information from public databases on medical and nursing care held by the Minister of Health, Labor and Welfare and others, as well as consolidated analysis with other pseudonymized information and pseudonymized processed medical information based on the Next Generation Medical Infrastructure Act are under institutional consideration.
- Against this background, we conducted a web-based survey of 74 member companies of the Pharmaceutical Evaluation Committee of the Japan Pharmaceutical Manufacturers Association (JPMA), with the aim of clarifying the actual utilization of public databases, etc. by pharmaceutical companies and their expectations and challenges for future utilization.
- About 40% of all respondents answered that they have used or considered using public databases, confirming a certain level of effort.
- About 70% of all respondents were positive about their future utilization intentions.
- The information expected to be expanded included patient background, medical record origin, and laboratory-related data, with particular requests for laboratory values, imaging data, physician findings, and voluntary immunization records.
- In addition to systemic and operational issues, understanding and systems development on the part of pharmaceutical companies are also barriers to future utilization, requiring both practical system design and system reinforcement on the part of companies.
1. Introduction
In recent years, expectations for the secondary use of diverse medical information, including public databases, have been rapidly rising in line with the progress of digitization in the medical and nursing care fields. Against the backdrop of these expectations, both the public and private sectors have been active in promoting the utilization of medical information, and the Pharmaceutical Manufacturers Association of Japan (PMAJ) has been continuously making proposals to improve the environment.
The direction of these efforts by the public and private sectors is in line with the "Basic Policies for Economic and Fiscal Management and Reform (Kotta Keizai Shindo 2024) " 1) approved by the Cabinet on June 21, 2024. The policy also stipulates the development of an environment in which information shared on a national medical information platform can be used for the development of new medical technologies and drug discovery, as well as the promotion of the utilization of public databases in the medical and nursing care fields and the establishment of an infrastructure that enables researchers and companies to safely and efficiently utilize high quality data.
Furthermore, the "Bill for Partial Revision of the Medical Care Act, etc. " 2) approved by the Cabinet on February 26, 2025, will enable the use and provision of pseudonymized information in public databases held by the Minister of Health, Labor and Welfare, etc., and consolidated analysis with other pseudonymized information and with pseudonymized medical information under the Next Generation Medical Infrastructure Act, which is currently under discussion at the Diet. The bill is currently under deliberation in the Diet. If enacted, this bill will enable more advanced analyses than ever before, including cross-disease analysis, outcome evaluation, and life-course-based disease burden analysis, which have been difficult under the existing system, and is expected to advance the use of real-world data (RWD) in Japan to a new level.
Against this background, the purpose of this paper is to clarify the current status of utilization of public databases as pseudonymized information, as well as expectations and issues for future utilization by pharmaceutical companies. Specifically, we will comprehensively summarize the experience of using public databases, future intention to use public databases, institutional and operational barriers, and expected effects of utilization, and discuss the future institutional design and the operational infrastructure that supports utilization.
In addition, we hope to help promote innovation using medical information infrastructures by raising awareness and stimulating companies' willingness to utilize the information.
Survey Methodology
2-1 Outline of the Survey
The "Questionnaire Survey on the Use and Intentions of Public Databases" (hereafter referred to as "the Survey"), which is the subject of this report, was conducted via the Internet to ascertain the actual status of the use of public databases by pharmaceutical companies, their future intentions to use them, and the issues they face, and to provide basic data that will contribute to promoting their use. Microsoft Forms was used for the survey, which was conducted from March 14 to April 14, 2025.
The survey targeted 74 member companies of the Pharmaceutical Manufacturers Association of Japan's Drug Evaluation Committee. In principle, responses were requested from the five divisions of each company: Research, Clinical Development, Post-Marketing Safety, Medical Affairs, and Health Economics and Outcomes Research.
In principle, one response per division was requested after internal coordination by the person in charge of understanding the utilization of public databases in each division. In addition, for companies where related functions are concentrated in cross-functional departments, such as data science departments, we made an effort to collect information that is relevant to the actual situation by allowing responses on a business-by-business basis.
The survey was conducted anonymously so that neither respondents nor company names could be identified, and individual responses were tabulated and analyzed so that they could not be linked to specific companies. Questions were asked both quantitatively and in the form of open-ended questions, and a combination of both was used in the analysis. In particular, the open-ended questions were designed to supplement specific issues and requests that could not be fully grasped through quantitative options, and were organized and categorized according to common issues and perspectives in the analysis.
The Pharmaceutical Manufacturers Association of Japan (PMAJ) is an industry association of R&D-oriented pharmaceutical companies3), and its Drug Evaluation Committee examines technical and institutional issues related to drug research, development, post-marketing safety measures and appropriate use, and medical affairs activities, and makes policy recommendations4). This survey was conducted by the Pharmaceutical Industry Policy Institute (PIIPI) in cooperation with the Committee, and the findings collected are intended to be used as evidence for future institutional design and improvement of the practice environment.
2-2. Survey Considerations and Characteristics
In principle, this survey was conducted by requesting responses from up to five divisions per company, targeting member companies belonging to the Pharmaceutical Evaluation Committee of the Pharmaceutical Manufacturers Association of Japan (PMAJ). Therefore, it should be noted that the survey does not necessarily reflect statistically the trends of the pharmaceutical industry as a whole or the diverse realities of each company.
On the other hand, the survey is unique in that it is based on the perspectives of the divisions responsible for practical operations, and it is designed with an emphasis on identifying issues and specific needs that are rooted in the field.
The survey also collected free-response statements regarding possible future use cases resulting from the expanded use of public databases and medical information, the results of which are reported separately in the next report, "Report on Survey of Pharmaceutical Companies Regarding Utilization of Public Databases, etc. (2) - Organization of Stored Information and Use Cases, " 5).
2-3. Targeted Databases of the Survey
The survey covered various public databases on medical and nursing care held by the Minister of Health, Labour and Welfare and others, as well as authorized databases held by authorized creators based on the Next Generation Medical Infrastructure Act (NDB, nursing care DB, DPCDB, immunization DB, disability welfare DB, national cancer registry DB, intractable disease DB, small chronic diseases DB, iDB, electronic medical record information DB, municipal medical checkup DB, and next-generation DB: see Table 1 for official names). In the following, these are collectively referred to as "public databases, etc.". Currently, the institutional design of these databases is being studied in the direction that consolidated analysis as pseudonymized information will become possible through amendments to the Medical Service Act and other laws, and it is expected that in the future the utilization of these databases by pharmaceutical companies for non-operational purposes such as research, development, post-marketing safety measures, and medical affairs will be permitted under the system.
As for the secondary use of medical and other information, as the National Institute of Biomedical Innovation Policy has proposed in its position paper7), the development of a system to provide pseudonymized information, centralization of application procedures, and establishment of a cloud-based remote analysis environment should be considered, while referring to overseas trends such as the EU's EHDS (European Health Data Space) Act. The government is considering the establishment of a system to provide pseudonymization information, centralization of application procedures, and construction of a remote analysis environment based on the cloud, which is expected to accelerate the utilization of this information by companies8).
The outline of each database is described in detail in the next report5).
3. Attributes of Respondents
The survey was conducted by requesting responses from companies that are members of the Pharmaceutical Manufacturers Association of Japan (PMAJ) Drug Evaluation Committee, covering up to five divisions (research, clinical development, post-marketing safety, medical affairs, and health economics and outcomes research), and a total of 139 valid responses were received (totaled as the total number of divisions, not per company9) ) (Table 2). (Table 2). Of these, 101 (72.7%) were from domestic firms and 38 (27.3%) were from foreign firms. Post-marketing safety was the most common department to which respondents belonged (44, 31.7%), followed by clinical development (36, 25.9%), medical affairs (35, 25.2%), research (16, 11.5%), and health economics and outcomes research (8, 5.8%).
Results
4-1. experience in using public databases, etc. and status of consideration
In this survey, we first checked whether respondents had experience in using public databases and other public information through the question, "Do you currently or have you in the past conducted research or analysis using public databases and other public information? Of the 139 responses, 19 (13.7%) responded that they "currently or have in the past (including in-house initiative and joint research)" and 40 (28.8%) responded that they "have never done so, but have considered doing so," for a total of 59 (42.4%) respondents who had considered or experienced using public databases in some form. In total, 59 respondents (42.4%) answered that they have considered or experienced the use of public databases in some form. When differences by location of corporate headquarters were examined, 55.3% of foreign-affiliated firms had experience or were considering using public databases, which was higher than the 37.6% of domestic-affiliated firms (Figure 1). By sector, 23 (65.7%) of medical affairs and 5 (62.5%) of health care economics/outcomes research firms had relatively high percentages of respondents who had used or considered the use of the system.
NDB was the most frequently used database with 47 (79.7%) of the respondents having specifically utilized or considered utilizing it. This was followed by 25 (42.4%) for the Next Generation DB (a database of businesses certified under the Next Generation Medical Infrastructure Act), 14 (23.7%) for the DPCDB, 11 (18.6%) for the DB for intractable diseases, and 10 (16.9%) for the National Cancer Registry DB (Figure 2).
These percentages were calculated based on the number of 59 respondents who responded that they "have considered/experienced the use of public databases, etc.
In addition, free text responses regarding challenges faced in research and analysis using public databases revealed several issues commonly pointed out (Table 3).
Specifically, "institutional and operational issues" included the complexity of application and screening procedures, the lengthy and uncertain period of time required to provide data, the difficulty of estimating costs and the cost burden, security requirements, and restrictions on the analysis environment.
In terms of "data challenges," concerns about data reliability (e.g., difficulty in determining accuracy and applicability), complexity and structural limitations of the code system, and lack of outcome information were pointed out. Although outcome information is expected to be supplemented in the future when the database is linked to other databases, at present it is not linked and remains a limitation for stand-alone use.
Furthermore, as "issues on the pharmaceutical company side," it became clear that inadequate understanding of the system, database structure, and operational rules, in addition to a lack of internal systems and human resources, are factors that hinder the utilization of databases.
In addition to these, as issues for individual databases, concerns about the small number of cases covered by the next-generation DB were often cited, and this is recognized as a practical issue in comparison with other databases.
4-2. intention and prospects for using public databases
Regarding the intention to utilize public databases in future research and analysis activities, 68.3% of all respondents expressed a positive intention ("strongly intend" and "intend"), with 17.3% of respondents indicating "strongly intend" in particular. On the other hand, a certain number of "not very much" and "no intention" responses were also found, accounting for 31.7% of all responses (Figure 3).
There was a clear trend between past utilization experience and future utilization intentions. Of the departments that responded that they "currently or have in the past," 78.9% of the respondents stated that they "strongly intend" to utilize the system in the future, suggesting that experience with the system may have an influence on the formation of their intention to utilize it. On the other hand, only 1.3% of the respondents in the "have never implemented or considered" category answered that they "strongly intend" to use the database (Figure 4).
Looking at the intention to utilize each database, the intention to utilize NDB was the highest, with 94.7% of all responses indicating that they "would actively consider using" or "would consider using" NDB. This was followed by electronic medical record information DB (87.4%), next-generation DB (82.1%), intractable disease DB (76.8%), and DPCDB (74.7%), all of which showed high interest (Figure 5). The results in Figure 5 show the percentage of respondents who answered "strongly intend" or "intend" to "actively consider" or "consider" utilizing each database in their future research and analysis activities, based on the population of 95 cases that responded "strongly intend" or "intend" to utilize the database in the future.
In addition, 82.7% of all respondents answered "Yes" to the question, "Will the use of public databases, etc. enable research and analysis that could not be conducted before?" (Figure 6). This trend was consistent regardless of sector, indicating the high expectations of the industry as a whole (Figure 7).
In terms of the reasons given for the answer of "Enables research and analysis that could not be conducted before," many respondents expressed their expectations for "Comprehensiveness and comprehensiveness," "Traceability by patient," and "Integrated use of multiple data sources," in particular. They expressed the view that the utilization of receipts and clinical information collected on a nationwide scale would enable analysis based on a sufficient number of cases, even in areas where the target population is limited, such as children and rare diseases. Several participants also pointed out that the data structure including outcome indicators such as date of death will enable analysis that has been difficult in the past, such as long-term outcome evaluation and survival analysis. Furthermore, by linking information from various databases such as NDB, Next Generation DB, National Cancer Registry DB, Intractable Disease DB, and Immunization DB, and integrating them as consistent data throughout the patient's life course from diagnosis to treatment and prognosis, it is possible to elucidate the mechanisms of disease progression, conduct precise evaluation of the effects of therapeutic intervention, and improve risk assessment. There were high expectations that the integrated use of data throughout a patient's life course, from diagnosis to treatment and prognosis, will enable elucidation of disease progression mechanisms, precise evaluation of the effects of therapeutic interventions, and more sophisticated risk assessment.
In addition, "diversification of data items and progress in structuring data" were also cited as factors that will expand new research possibilities. The inclusion in public databases of variables that are difficult to obtain in commercial databases, such as immunization history, nursing care information, and municipal health checkup information, was voiced as enabling advanced analyses, such as efficacy and safety evaluation of drugs, signal detection of adverse reactions, and patient stratification. In particular, with regard to the immunization DB, in addition to making it possible to obtain a comprehensive immunization history, including routine immunization information, it was hoped that linking with other databases such as the NDB and electronic medical record DB would make it possible to analyze the relationship between vaccination status and disease onset and severity, as well as long-term health outcomes. It was also opined that the development of such items would lead to highly accurate outcome evaluation in line with actual clinical practice and application to personalized medicine.
Furthermore, it was also commented that the "large data size at the national level" of public databases will enable analysis of rare events, establishment of control groups, and other analyses to enhance statistical reliability, which will greatly improve the quality of observational studies. It was hoped that this would enable the generation of real-world evidence under real clinical conditions, which cannot be obtained in clinical trials.
Public databases, etc., were also expected to serve as a "foundation for empirical research that contributes to administrative policies. For example, several commentators expressed the view that public databases could contribute to the formulation of evidence-based social security policies by being used for measures to improve immunization coverage, structural analysis of medical and long-term care costs, and estimation of the social and economic burden of diseases.
In addition, the recent "progress in the development of legal systems and the construction of an information collaboration platform" is expected to simplify the environment for database access and application procedures, making it easier for companies to use databases that were difficult for them to access in the past.
On the other hand, some respondents answered "No" to the question "Will it be possible to conduct research and analysis that has not been possible so far?" because of "Insufficient in-house systems, lack of information on target diseases, complicated data access, and cost concerns" as well as "Not fully realizing the usefulness of public databases, etc. In addition, some respondents stated that they "have not fully realized the usefulness of public databases, etc.". These points need to be kept in mind as issues to be addressed for future promotion of utilization.
4-3. missing information/data items and utilization needs
The results of the open-ended survey on information and data items that are currently lacking in public databases and the specific purposes for which they are expected to be expanded, showed that there is a wide range of information deficiencies that were pointed out, and that there are high expectations for advanced utilization in various aspects, including research, development, pharmaceutical affairs, and policy. The results of the survey revealed a wide range of information shortfalls and expectations for advanced utilization in various aspects such as research, development, drug development, and policy. Table 4 shows the results of the classification of the major deficiencies pointed out and the purpose of their utilization.
First, with regard to information on patient background, many respondents pointed out that there is insufficient basic data to understand patients from multiple perspectives in combination with medical information, such as age, weight, place of residence, socioeconomic status, daily life data represented by Personal Health Record (PHR), and information related to pregnancy and childbirth. Many pointed out that there is insufficient basic data to understand patients from multiple perspectives in combination with medical information. Such information is considered essential for identifying disease risk factors, adjusting for confounding factors in evaluating the efficacy and safety of drugs, and for policy evaluation, including from the perspective of fairness.
Next, for medical record-derived information, there was a need to structure and standardize information to qualitatively capture clinical judgment by physicians and patient conditions, such as physician comments, clinical findings, reasons for treatment discontinuation, description of treatment effects, and imaging findings. The establishment of such information is expected to greatly expand the possibilities for the use of evidence in the evaluation of drug efficacy and safety, and in applications for pharmaceutical approval.
Regarding laboratory information, the expansion of vital signs, clinical laboratory values, imaging and pathology data, biomarkers, and genetic test results was also required. This will enable diagnosis and prognosis prediction of diseases, identification of adverse effects, treatment monitoring, and complementation of outcome measures.
Among the free descriptions, the most frequently mentioned items were laboratory test values, image data, medical record-derived information (physician comments, clinical findings, medical records, etc.), and voluntary vaccination records, for which many requests were received for their maintenance and utilization.
4-4. issues in system and operation and discussion points for promotion of utilization
When asked about issues other than the content of the data itself in utilizing public databases, the most frequently cited issue was "data acquisition process (access procedures, complexity of application flow, purpose and duration of use, etc.)," accounting for 62.6% of all responses. This was followed by "Cost (costs associated with data acquisition, maintenance, analysis, etc.)" at 49.6%, "Analysis environment (status of analysis tools, infrastructure, etc.)" at 38.8%, and "Contents of data provision (format, scope, update frequency, etc. of provided data)" at 36.7%. Other issues cited included data quality (30.2%), security and privacy measures (20.9%), and rules regarding publication obligations (18.7%), indicating that issues are not limited to technical elements, but also include a wide range of regulations and administrative elements. On the other hand, 13.7% of the respondents answered "nothing in particular" and 9.4% "other," but most of the free responses were "cannot judge due to lack of experience," indicating that the awareness of the issue itself has not yet been formed (Figure 8).
To complement these quantitative results, specific problems and requests were collected through free descriptions of systems and operations, and many free descriptions were gathered in the seven areas specified in the quantitative questions, with specific requests for improvement in each (Table 5).
In particular, there were many earnest and specific requests for improvement in the following practical issues: complexity of application procedures at the time of use, opacity of information, difficulty in justifying and estimating costs, inadequate analytical environment, and variations in data format and update frequency.
For example, with regard to the "data acquisition process," there were many references to the complexity of the application process and the length of time required to provide data, as well as the lack of uniform rules and application forms under the current situation where multiple public databases are managed under different jurisdictions and operating systems, making the procedures complicated when users wish to use consolidated data. This makes the procedures complicated for those who wish to use consolidated databases. Regarding "cost," several respondents pointed out that the transparency of the cost structure is not sufficiently ensured, and in particular, the difficulty in estimating the total cost and the lack of clarity of the price structure are barriers to making a decision on the utilization of the data. Regarding the "analysis environment," some pointed out that operability is inferior to that of commercial databases and expressed a desire for more flexible infrastructure, including a cloud environment, and in particular, there is a strong need to improve practical limitations caused by strict on-site conditions. In terms of "data provision content," the lack of clear specification information on the format, scope, and update frequency of provided data was pointed out, and operational innovations to improve data availability and convenience, such as classification of structured/unstructured data and timely presentation of update information, were expected. Regarding "data quality," there were a number of opinions that the system for ensuring the reliability of public databases when they are used for research and other purposes is unclear, and the presentation of a system for ensuring data reliability and clear standards for judging whether data can be used or not were considered issues. With regard to "security and privacy measures," some respondents said that "excessive restrictions make realistic utilization difficult," while others recognized that "maximum protection of personal information is necessary," pointing out the need for a well-balanced system design. In addition, there was a comment that the unclear conditions for the "obligation to disclose" make it difficult to make a decision on the use of personal information. For example, the NDB requires the publication of research results based on the timing and method of publication (papers, reports, conferences, research meetings, etc.) specified in the offer form, and certain institutional procedures are stipulated, such as the need to report the results to the MHLW for confirmation and approval before publication10). Depending on the content of the research conducted, there are cases that are not suitable for publication as an academic paper. In particular, when research that is not necessarily suitable for publication as a paper, such as the results of a simple tabulation of the frequency of adverse reactions or adverse events in a specific disease, is required to be published in some form, situations arise that require coordination between the mandatory publication requirements and the characteristics of the research results. In such cases, there are situations that require adjustments between the mandatory publication requirements and the characteristics of the research results, and some respondents said that it is difficult to deal with such situations in practice.
In a subsequent free-text analysis of the time allowed from data application to provision, more than 70% of the respondents desired "within 3 months", of which "within 1-2 months" was desirable. In addition, some respondents said, "There are cases in which an immediate response is required, such as in response to inquiries." This study revealed that the timeline to provision is an important limiting factor in practical schedules, especially in development and pharmaceutical affairs strategies. Given these circumstances, there is an urgent need to design a system for rapid provision and establish a flexible provision scheme according to the purpose of utilization.
In addition, free descriptions regarding the allowable usage fee per use received a wide range of opinions, from less than 1 million yen to more than 30 million yen. This may be due to the fact that the appropriate amount varies greatly depending on the purpose of the research and the characteristics of the project, as well as the fact that the respondents do not have a clear idea of the expected cost of using public databases, etc. In fact, many respondents who selected "cost" as an issue in the previous question expressed concern about the lack of transparency and predictability of costs, and pointed out that it is difficult to grasp the overall picture of costs from acquisition to analysis. In the future, it will be necessary to design a flexible and easy-to-understand price system that meets the purpose and type of use, and to set appropriate cost levels based on the status as a public resource.
In addition, at the end of the survey, we included a question in the form of an open-ended question to solicit a wide range of opinions and requests, not limited to the system and its operation. Since many of the responses to this question overlapped with the quantitative and qualitative questions asked in advance regarding "issues other than the data content itself," we excluded those that clearly overlapped with the quantitative and qualitative questions, and focused on those opinions that indicated new perspectives and viewpoints in terms of content. As a result, the following four points were extracted as important issues for the promotion of future utilization.
(1) Lack of information on examples of utilization
The lack of sharing of information on past use by other companies, the cost and time required for use, and the results obtained, as well as the lack of decision-making materials for companies considering the use of public databases was identified as an issue.
(2) High barriers to entry for initial use
Many respondents commented that the complexity and uncertainty of the system is a psychological and practical barrier for companies considering the use of public databases for the first time, and that there is no environment in which they can enter the market with confidence.
(3) Lack of information on database characteristics
The characteristics of public databases and their "unique strengths and weaknesses," as well as their separation from other data sources, are not clearly defined, making it difficult to select the appropriate database for a particular purpose.
(4) Lack of understanding of social significance
The public is not fully aware that the use of public databases is not limited to corporate research and analysis, but also contributes to healthcare policy and social security.
Summary and Discussion
5-1. experience in using public databases and status of consideration
The results of this survey showed that about 40% of all respondents had some experience in considering or utilizing public databases, indicating that while some efforts are being made, there is still potential for further development in the future. By company attribute, 55.3% of foreign-affiliated firms have experience with or are considering the use of data infrastructures, higher than the 37.6% of domestic-affiliated firms. This trend may be due in part to the fact that foreign-affiliated companies are more likely than their domestic counterparts to have an internal data science system and to utilize RWD, against the backdrop of the development of data infrastructure and the maturation of institutional frameworks in Europe and the US, as well as the deployment of policies from their home countries.
NDB was the database with the largest number of respondents with experience in its use. This may be due to the fact that NDB has nationwide coverage, mainly of receipt information, is highly versatile and can be used for a variety of research purposes, and is relatively well-developed in terms of systems and technology11), which creates an environment conducive to its use by companies. Some of the public databases that have not been utilized as much include those for which it is practically difficult for companies to make use of them at this point because the systems and operational frameworks have not been sufficiently established. In some cases, the preconditions for utilization are not yet in place, such as the lack of a practical operational scheme, and it is important to note that a small number of cases does not necessarily mean low demand.
In addition, the comments from companies with actual experience in using public databases revealed that the issues that cover multiple aspects, such as systems and operations, data, and corporate structures, are interrelated and create barriers in practice. For example, institutional and operational issues such as cumbersome procedures and uncertain timing of provision are intricately intertwined with corporate issues such as lack of internal systems and knowledge, and contribute to the impression of low practicality when compared to commercial databases.
As for the Next Generation DB, many have raised concerns about the small number of cases to be covered, but as of the end of May 2025, there are only 153 cooperating medical information providers (so-called "cooperating institutions") based on the Act, and most of them are large medical institutions such as university hospitals and national hospital organizations12). This regional maldistribution and variation in the scale of cases may be a constraint on the use of this information.
Therefore, in addition to improving the system design and technological infrastructure, it is important for companies to deepen their understanding of the purpose of the system and to promote the establishment of a system. Mutual maturation of both the institutional side and the user side is the key to promoting the use of public databases.
5-2. intention to use public databases and outlook
About 70% of all respondents expressed a positive intention to utilize public databases in the future, indicating a strong willingness to do so, especially in sectors that have actually utilized or considered the use of public databases. Expectations for the comprehensiveness, traceability, and diversity of public databases were high, indicating their potential for use for a variety of purposes, including long-term outcome evaluation, causal analysis, and provision of evidence for administrative policies, which have been difficult in the past. In fact, more than 80% of all respondents, regardless of department, answered that "research and analysis that could not be conducted before will become possible" through the use of public databases, etc. The percentage was high. In addition, 75.0% (12/16) of the respondents answered "Yes" in research related to the early stages of drug discovery and 3.3% (30/36) in clinical development related to the late stages of drug discovery, indicating that public databases, etc. are expected to serve as a foundation for new research and development activities. These results indicate that the industry as a whole shares strong expectations for the use of public databases.
In terms of intention to utilize each database, NDB was the most popular with 95% of the respondents indicating an intention to consider it, followed by Electronic Medical Record Information DB, Next Generation DB, Intractable Disease DB, and DPCDB. NDB, Electronic Medical Record Information DB, Next Generation DB, and DPCDB are all highly versatile databases that cover basic medical information, and their characteristics of not being limited to specific disease groups are thought to reflect the wide range of research needs and lead to a high level of willingness to utilize them. In particular, the high intention to utilize the Electronic Medical Record Information DB, despite the fact that it is still in the process of development, reflects the industry's strong expectations for its construction and implementation as a public infrastructure. On the other hand, databases with disease-specific information, such as the Intractable Disease DB and the Cancer Registry DB, tend to have a lower overall selection rate than general-purpose databases because the need for utilization depends on the focus areas of companies. However, the results of this survey show a relatively high intention to utilize the Intractable Disease DB, indicating that it is recognized as an extremely useful information base for companies focusing on specific areas. In addition, such disease-specific databases are likely to contribute to more practical research and analysis through linkage and complementary use with NDB and electronic medical record information DBs. In particular, the Immunization DB, as a database that provides basic information on vaccination status, is expected to be utilized not only for research on vaccines, but also as an analytical foundation in a wide range of fields such as immunology and infectious disease science.
Therefore, in the future development of the system, the first priority is to establish a basic database that covers a wide range of medical information as soon as possible, and disease-specific databases must also be established and enhanced as soon as possible. In particular, since basic databases are a prerequisite for linking with other databases, if they are not well maintained, the use of data linkage itself will be restricted.
However, since the intention to utilize such a database depends greatly on the characteristics of each company's focus area and development pipeline, a flexible system design based on diverse needs, rather than a uniform evaluation as a consensus of the industry as a whole, is required.
5-3. insufficient information/data items and utilization needs
As information that is currently recognized as insufficient in public databases, many pointed out that patient background information (socioeconomic status, daily life information, etc.), medical record-derived information (physician findings, treatment effects, reasons for treatment discontinuation, etc.), clinical laboratory values, imaging and pathology information, and laboratory information such as biomarkers are inadequately developed. It was a lot.。 In particular, although there is a certain level of need for the use of imaging and laboratory information, it is not included in the current concept of the electronic medical record information database, and there were requests for its expansion in the future. It is also widely recognized that the "three documents and six information items" currently envisioned in the database are not sufficient to meet the needs of advanced research, and further enhancement of data items is required.
In addition, the utilization of unstructured data, such as physician descriptions and image findings, is also useful for making clinical judgments and qualitatively understanding patient conditions, and efforts toward structuring and standardizing this data are needed. In particular, advances in natural language processing technology provide an important opportunity to enable the utilization of unstructured data. The expansion of such information is expected to lead to more sophisticated utilization, such as better understanding of diseases, more precise drug evaluation, and follow-up research on specific populations, which in turn will enable development in multifaceted areas such as pharmaceutical applications, post-marketing surveillance, promotion of personalized medicine, and research to support policy making.
In expanding and developing such data items, it is necessary to establish a sustainable system that takes into account the costs associated with collection, management, and operation, and it is expected that cost-effectiveness and the securing of financial resources will also be considered.
5-4. issues in system and operation, and discussion points for promotion of utilization
As shown in Figure 8, many of the complex issues faced by pharmaceutical companies in terms of systems and operations were cited in the survey, including application procedures, costs, and analysis environments, while there were also concerns about data quality and publication obligations. While it is important to focus on addressing these issues according to their magnitude, it is not sufficient to solve specific issues individually; comprehensive and step-by-step improvements are required.
In addition to addressing practical issues in terms of systems and operations, more comprehensive and strategic measures are needed to create an environment in which companies feel comfortable stepping forward with utilization and to promote its use in the future. Below, we have organized eight measures for promoting utilization in response to the major issues identified in the survey results. The sequence is structured with an awareness of the gradual flow of the following steps: development of institutional infrastructure, improvement of the usage environment, support for entry into the market, expansion of utilization, and social dissemination.
(1) Streamlining of institutional and operational processes
The practical barriers faced by companies should be removed through institutional design, such as simplification of application procedures and a single point of contact for use, shortening of the period until provision, clarification of the cost structure, introduction of a comprehensive contract system based on the assumption of use for a certain period of time, and development of specifications for the content to be provided.
(2) Improvement of data availability and reliability
It is necessary to form an information infrastructure that is both convenient and reliable through the expansion of provided data, clarification of specifications regarding format, update frequency, etc., development of structured and unstructured data, and establishment of a quality assurance system. In addition, in order to enhance data reliability over the medium to long term, it is considered effective to introduce a participatory operational model that promotes continuous data improvement by establishing a mechanism for feedback and quality reporting by users.
(3) Improvement of analysis and infrastructure environment
Flexible and sustainable utilization of data will be made possible through environmental improvements such as support for cloud computing and remote access, relaxation of on-site restrictions, and provision of easy-to-use analysis tools.
(4) Balance between information protection and operational effectiveness
It is necessary to design a system that balances practicality and security of utilization and ensure flexible operation so that excessive restrictions do not hinder practical use.
(5) Reduction of entry barriers for initial use
In order to lower the psychological and practical hurdles for first-time users, it is effective to introduce pilot programs, clear manuals, and a mechanism for prior access.
(6) Visualization and sharing of use cases
Visualization of past utilization results (including costs, duration, and results) and disclosure and sharing of use cases will help companies considering utilization to make decisions, and is expected to expand the scope of utilization.
(7) Improvement and flexibility of publication rules
It is necessary to ensure consistency between publication obligations and practice through flexible publication requirements according to the nature of research results and the development of a destination for analytical results that are difficult to be published.
(viii) Communicate the public interest and social significance of utilization
It is important to widely communicate to society that public databases are "resources for the public" and that their utilization contributes to the sustainability of social security policies and the healthcare system.
Based on the above results, it is considered extremely important to comprehensively improve the transparency, flexibility, and usability of the entire system, rather than merely addressing technical and procedural issues related to the institutional and operational aspects of public databases, etc., in order to promote their utilization in the future.
In addition, not only the authorities responsible for designing the system, but also the companies responsible for utilizing the system must develop a system that takes into account the purpose of the system and promote its utilization. It is important not only to use the database, but also to play a proactive role as a user, for example, by generating results through its utilization and contributing to the improvement of the system by providing feedback on practical issues and needs.
6. Conclusion
The survey revealed that the development of public databases is steadily progressing, and that interest in and expectations for their utilization are growing in the industrial world, especially among pharmaceutical companies. As the Diet deliberates on revisions to the law that will allow consolidated analysis, and efforts to use pseudonymization and structuring are being expanded on the technical side, public databases are becoming more important than mere information infrastructure, as a core infrastructure that supports the sustainability of Japan's drug development, healthcare policy, and social security system. However, no matter how sophisticated the database is, it must be maintained and improved.
However, no matter how sophisticated the database may be, unless it is actually utilized, it will be difficult to fully realize its expected policy and social significance. If the database is not accompanied by a system design and operation that meet the practical needs of the field, a gap may arise between the system and the field, resulting in a situation in which the utilization of the database does not progress.
On the other hand, expectations for the utilization of public databases are steadily increasing, and new research concepts using various types of medical information are emerging. Through these efforts, research and analysis, which were previously difficult, are now becoming more realistic. Once the medical information infrastructure has been connected and developed, it is essential for both the system and the users to have a vision in advance of how the information will be utilized for research and implementation. In other words, the "establishment of the infrastructure" itself should not be the end of the process, but rather, "for what purpose and how it will be used" should be clarified and shared.
Thus, public databases are not an end in itself, but a social resource that must be continuously updated and evolved. It is essential to establish an "interactive operation cycle," in which knowledge and requests from actual users are continuously collected and reflected, and systems and operations are flexibly improved, rather than being developed only once. In particular, we believe that closing the gap between system design and actual operation through dialogue with companies and researchers who have experience in utilization is the key to extracting the true value of public databases, etc.
In order for public databases, etc. to be used more practically and strategically in various fields such as research, development, pharmaceutical affairs, and policy in the future, the various barriers that exist between the system and users must be carefully removed, and the system must be rebuilt as a resource that can be used by everyone. In addition, since these databases are based on information provided by the public as patients and consumers, it is essential to gain their trust and understanding that their utilization will return benefits to improve the health of the public. It is important for society as a whole to share the recognition that the information provided by the public has the potential to be useful for their own and their families' future medical care, and to expand understanding and sympathy for the use of data.
Through these efforts, we hope that a reliable and sustainable medical information infrastructure will be established that combines both flexibility and usability of the system.
-
1) Number of reports and countries from which data was obtained
-
2)
-
3)
-
4)
-
5)National Institute of Biomedical Innovation Policy, "Report on Survey of Pharmaceutical Companies on Utilization of Public Databases, etc. (2) - Organization of Stored Information and Use Cases -," Policy Research Institute News No. 75 (July 2025)
-
6)
-
7)National Institute of Biomedical Innovation Policy, "Information Linkage for Effective Use of Health and Medical Information," Position Paper Series No. 4 (November 2024) November 2025)
-
8)
-
9)The actual number of responding companies is not specified because multiple departments were asked to respond, but it is estimated that at least 46 companies responded based on the maximum number of responses by department for each domestic and foreign company. However, based on the maximum number of responses by sector for both domestic and foreign companies, we estimate that at least 46 companies responded to the survey.
-
10)
-
11)
-
12)
