Policy Research Institute page How to Obtain Samples and Information for Research in the Field of Oncology

PDF> for printing

Pharmaceutical companies conduct a great deal of research using samples and information of human origin. In order to conduct research efficiently, it is essential to cooperate with outside organizations that handle the distribution of biospecimens and the management of medical and biospecimen information, as well as with medical institutions. In this survey, after analyzing the currently used access routes to samples and information, we will report on the current status of environmental arrangements in Japan and overseas, focusing on the environment in which the information is used.

1. Introduction

Pharmaceutical companies use human-derived samples and information not only in basic research, but also in all phases of drug evaluation, including clinical trials and post-marketing surveillance studies, to derive scientific evidence regarding the efficacy and safety of drugs. Human-derived samples mainly include blood, body fluids, tissues, cells, excretions, and DNA extracted from them, while information includes information related to human health and genetic information, such as names of injuries and diseases, medication details, and test or measurement results obtained through diagnosis and treatment of research subjects*1.

While these samples and information are essential for pharmaceutical companies to conduct research on drugs, it is difficult for pharmaceutical companies to acquire them on their own initiative. Therefore, the cooperation of outside organizations that distribute biological samples and manage medical and biological information, as well as medical institutions, is required. In order to derive high-quality scientific evidence, pharmaceutical companies are required to work closely with these institutions, as it is essential to obtain samples and information that meet the objectives of the research.

In this survey, in order to gain an overview of the sources of samples and information for research involving pharmaceutical companies, we focused on the area of malignant tumors, where the largest number of new drugs have been developed in recent years and where the environment for providing samples and information is being developed ahead of other areas. In the latter half of the report, we will focus on the environment for the use of medical and biological information, which is frequently used by pharmaceutical companies and the availability of information depends heavily on the information provided by external medical information provision infrastructures, and will also discuss the status of the provision environment in Japan and overseas.

Survey Methodology

We surveyed the actual situation regarding the acquisition channels of human-derived research materials and information used for research in pharmaceutical companies, based on published scientific papers. A Web of Science(R) Clarivate topic search was used to identify articles related to the malignant tumor area (search terms*2: "Cancer" OR "Tumor" OR "Oncology" OR "Carcinoma" OR "Sarcoma" OR "Myeloma " OR "Leukemia" OR "Lymphoma"). We identified five pharmaceutical companies (three in the U.S., one in the U.K., and one in Switzerland) focusing on research in the field of malignant tumors that published the largest number of papers in 2023 that corresponded to the above search, and collected original papers published in 2023 that included employees of the relevant pharmaceutical companies as authors for this survey. The survey was conducted to collect original research papers published in 2023 that included employees of the relevant pharmaceutical companies as authors. In order to obtain an overall picture and distribution of the samples and information used in the research, the survey included a wide range of readily available samples and information, such as strain-derived human cells and attribute information associated with questionnaires to investigate preferences, for classification. The nationality of the targeted companies and the status of their development pipelines may affect the results, and this survey does not provide quantitative information on the actual utilization of pharmaceutical companies as a whole.

Results

3-1. Samples and Information Used for Research

From the 1351 papers that met the search criteria, 1344 papers were selected for this survey, excluding 7 papers whose contents were difficult to confirm. The human-derived samples and information used in the surveyed papers can be broadly categorized in terms of their acquisition route, as shown in Figure 1. The acquisition methods can be broadly categorized into the prospective collection of necessary samples and information in accordance with the purpose of the research and the selection and use of samples and information that meet the purpose of the research from those that have already been collected.

Figure 1 Sources of human-derived samples and information used in research
Figure.1  Samples of Human Origin Used for Research・ Sources of Information
Source: Created by the National Institute of Biomedical Innovation Policy
Note: To the right of the box for each information source, examples of information, sources, and methods of use assumed for each source are shown.
This classification was created based on the samples and information used in the papers covered in this survey, and does not cover all research.

Research that uses prospectively collected samples and information on demand has the advantage that a research implementation plan can be created in accordance with the purpose of the research, and the necessary samples and information can be obtained in the necessary quantities. The advantage is that only the required amount of samples and information can be obtained.

Research using already collected samples and information has the advantage of reducing the time and effort required to obtain samples and information by using samples and information that are already available at the time the research is designed, regardless of whether they were collected for research purposes or not. The research using cells and specimens purchased from suppliers and research using information in electronic medical records of medical institutions fall under this category.

Table 1 shows the distribution of the sources of samples and information used by the papers covered by this survey. Research conducted using multiple categories of samples/information was counted in all categories used. In clinical trials of pharmaceuticals, medical information such as efficacy evaluation and adverse event records and biological samples such as blood are often obtained within the same study, and in studies using biological samples, there were many papers in which it was impossible to distinguish between prospective clinical studies for obtaining samples and studies using residual samples from the description of the paper, and therefore In Table 1, some of the classifications shown in Figure 1 have been changed.

Table 1: Classification of Sources of Samples and Information Used in Research
 Table1  Samples used in research・ Sources of Information
Source: Compiled based on original research papers from five pharmaceutical companies focusing on research in the area of malignant tumors.
Note: Research conducted using samples and information derived from multiple sources was counted as one count for all included categories.
Biospecimen suppliers: Organizations that aggregate and provide various types of biospecimens derived from healthy donors and patients for research use.
Database providers: Organizations that aggregate and provide information from medical institutions, insurance claims, public surveys, etc. for research use.
Data portals/data platforms: Environments that make information available on the Web or other means for broad research use.

In studies using human-derived samples and information involving pharmaceutical companies that were published in the field of malignancy, the most frequently used acquisition pathways were prospective clinical trials and cohort studies. This result may be related to the high quality of research results obtained from samples/information obtained in accordance with the research objectives, as well as to the fact that submission of research results to journals is recommended as an activity to increase transparency in clinical trials.

Next, for studies using biological samples, the majority of the results were obtained from suppliers who distribute biological samples, but the use of readily available cell lines was the majority, and when limited to samples other than cells, more studies were identified that directly used samples collected at medical institutions. For studies using medical information, the largest number of studies were conducted by obtaining information from providers who manage medical information.

3-2. characteristics of each access route

Within each of the categories of the sources of samples and information, the following are the contents and characteristics of the samples and information that were used with high frequency in the studies covered by this survey.

3-2-1. Prospective Clinical Trials, Cohort Studies, and Cross-Sectional Studies

The majority of the papers reported the results of clinical trials involving pharmaceutical interventions. Others included reports on the results of post-marketing surveillance of pharmaceuticals, cohort studies for the purpose of safety monitoring, and studies using questionnaires such as treatment preference.

In clinical trials and cohort studies led by pharmaceutical companies, consent for the use of acquired information is obtained from all subjects participating in the study, thus increasing the accessibility of information and the sufficiency of information needed for the study. On the other hand, the acquisition of information often requires human resources and time, which is a reasonable burden when long-term follow-up or large amounts of patient information need to be collected.

3-2-2. Obtaining Biological Samples via Suppliers

Among biological samples, cells and other tissues differ greatly in terms of difficulty of obtaining them from the standpoint of research ethics, so we will organize them separately. The majority of the cell-based studies utilized fractionated strain cells. Among them, the majority of studies were conducted by non-profit organizations in the U.S. as the source of samples. Other studies existed that obtained samples from Japanese cell banks or private U.S. suppliers. Among studies that utilized non-cellular samples, the majority of studies utilized tissue section slides, which were obtained from private suppliers in the United States.

Cells are available by purchase from suppliers, and accessibility to samples is high, with few barriers to obtaining them. The ethical guidelines for life science and medical research involving human subjects in Japan also exclude "samples and information that are already of established academic value, widely used for research purposes, and generally available to the public" from the scope of the guidelines, thus reducing obstacles to their use from an ethical perspective as well. For other biological samples, consent to use the samples for research is generally obtained from all patients at the time the samples are obtained, and if approval can be obtained through ethical review, accessibility of the samples is relatively high.

3-2-3. Acquisition of Biospecimens for Research at Medical Institutions

Most of the studies used biopsies of tumors or blood collected during the treatment period at medical institutions, and the majority of the samples were obtained from collaborations with medical institutions or from clinical biobanks or biorepositories attached to medical institutions. There were both studies that utilized residual samples from surgeries and examinations stored at medical institutions and studies that prospectively collected samples for the purpose of the study.

These biological samples are also generally obtained from all patients at the time of sample acquisition, with consent for the use of the acquired samples in research, but there are cases in which the use of the samples by a for-profit pharmaceutical company is not possible. In the case of prospective collection, as with clinical trials involving interventions, the human resources and financial burden of obtaining samples can be significant and it may take time to obtain all of the samples, since the purpose of the clinical research is to collect samples.

3-2-4. Secondary Use of Samples Obtained During Clinical Trials 3-2-5.

The majority of the studies were conducted by collecting DNA or RNA from residual samples taken during the conduct of clinical trials to explore the relationship of sequence information and gene expression to patient information obtained in the clinical trial.

These biological samples are also generally obtained from all subjects at the time the samples are obtained, with consent for the use of the obtained samples for research, but whether or not consent for secondary use is obtained affects the accessibility of the samples.

3-2-5. Acquisition via Database Providers 3-2-6.

The majority of the studies used databases derived from receipts and electronic medical records provided by private companies and cancer registries provided by public institutions. In terms of the use of medical information provided by the private sector, we observed a greater use of information extracted from disease-specific sources (Flatiron, US Oncology Network) as well as cross-disease sources (Optum, Medical Data Vision), which are limited to the area of malignancies In terms of information from public patient registries, SEER (The Surveillance, Epidemiology, and End Results), which provides US cancer statistics, was the most commonly used source, with cancer registries in the US, UK, Sweden, and Denmark also used by some studies.

These information are available by paying a fee to the provider or applying for access, so accessibility to the information is high if the requirements for use are met. Many of these information are protected by removing personally identifiable information in accordance with the regulations of each country, which is an advantage because it facilitates the use of large amounts of patient information for research. On the other hand, in the process of protecting personal information, date information, genetic information, and other information necessary for research may also be deleted, reducing the sufficiency of the information.

Use of Information Stored at Medical Institutions 3-2-7.

The majority of the studies used information from electronic medical records that were obtained during the treatment period and stored at the medical institution. There were many collaborative studies funded by pharmaceutical companies and studies in which the author from the medical institution was the first author and the author from the pharmaceutical company participated as a co-author. There were also some studies that used pathology results or imaging information such as MRI.

Medical institutions keep comprehensive records of procedures and medical findings performed at medical institutions during hospitalizations and hospital visits, which increases the likelihood of obtaining most of the items of information needed in a study. On the other hand, in order to use such personal information for research, it is necessary to consider information management, including appropriate ethical screening and research systems, and the burden of obtaining such information is significant.

3-2-7. Secondary Use of Information Obtained during Clinical Trials 3-2-8.

There were many studies in which the results of multiple clinical trials were integrated and analyzed for efficacy and safety evaluation and population pharmacokinetic analysis. Other studies existed that used the data in parameter estimation for simulation in cost-effectiveness evaluation or for comparison of efficacy with an external control group.

As with the secondary use of samples obtained during clinical trials, it is common practice to obtain consent from all subjects at the time the information is obtained for use in research, but whether or not consent is obtained that allows secondary use of the information for purposes other than the original clinical trial purpose will affect the accessibility of the information.

Use of Data Portal Platforms

The majority of the studies used transcriptome expression and sequence information, and the majority of the studies used TCGA (The Cancer Genome Atlas Program) and GEO (Gene Expression Omnibus) to obtain the information. and images of pathology specimens were also provided by TCGA, and some studies used this information, although not as many as others.

The information provided on these portals also generally obtained consent from all subjects at the time the information was obtained to use the information for research. The TCGA has established portals and platforms to actively provide information to researchers, and the information is highly accessible, with some information available for immediate download. The TCGA distinguishes between information that can be used immediately and information that requires approval for use from the standpoint of research ethics.

3-3. Infrastructure for Sharing Medical and Biomedical Information

TCGA is continuously conducting research to promote the use of medical and biological information. This time, among the sources of human-derived information, we focused on "access via database providers" and "use of data portals/platforms," which are frequently used by pharmaceutical companies and the availability of information depends largely on the availability of the medical information provision environment outside pharmaceutical companies. The survey focused on "access to medical information via database providers" and "use of data portal platforms". For the purpose of this survey, a database provider is defined as an organization that manages and provides data based on user applications and contracts, and a data portal platform is defined as an environment in which the majority of information is widely accessible via the Web.

3-3-1. Access via database providers 3-3-2.

Table 2 shows the most frequently used information in this survey, categorized as obtained via database providers. First, the top-ranked information from Flatiron Health and SEER are both specific to the area of malignancies in the United States. The former is information derived from disease-specific Electronic Medical Records (EMR), which are provided by private companies. The latter is information from cancer registries maintained by the National Cancer Institute. Following these, Optum is provided by private companies with information on insurance claims and information derived from EMR, etc. Information in the area of malignant tumors is extracted from information in all areas and used. In the area of malignant tumors, both general-purpose information that can be used for various diseases and applications and disease-specific information that is not easily structured are being used to the same extent, indicating that the research use of EMR has already begun to be realized.

Table 2 Breakdown of database providers
 Table2  Breakdown of database providers
Source: Created by the National Institute of Pharmaceutical and Industrial Policy
Note: Sources of information for which the frequency of use was five or more papers are shown.

In Japan, too, the movement toward the secondary use of disease-specific information on malignant tumors has begun, and the aforementioned Flatiron of the United States has established a subsidiary in Japan, which will collaborate with the SCRUM-Japan project of the National Cancer Center to establish a database in Japan. The project is scheduled to be announced in 2022. The Ministry of Health, Labour and Welfare's Health Science Council is currently discussing the availability of public cancer registry information from the private sector*3, and has announced a plan to make it available on the secondary use platform of the National Medical Information Platform*4.

3-3-2 Use of Data Portal Platforms

In this survey, the most frequently used categories of research information obtained via the data portal platform were those using TCGA at 56% (45 of 80 reports) and those using information registered in GEO at 26% (21 of 80 reports), followed by those using TARGET (Therapeutically Applicable Research to Generate Therapeutically Applicable Research to Generate Effective Treatments and UK Biobank each had five reports. The original articles containing information on TCGA were identified in the Web of Science(R) Clarivate database. The percentage of Chinese authors is higher than that of the U.S., and there is also a large number of use from East Asian countries such as South Korea, Japan, and Taiwan ( Fig. 2 ). While such open access to information is a desirable trend in terms of the development of research in the health field, it has also been reported that information on genetic mutations, etc., varies by race, and the MHLW subcommittee has indicated the risks of relying solely on large-scale overseas information sources*5.

Figure 2 Countries of authors of papers with descriptions related to TCGA
Figure.2 TCGA Country of authorship of article with mention related to
Source: Created based on Web of Science(R) Clarivate
Note: The top 10 countries in frequency are shown. Studies conducted using information from multiple countries were counted as one count for all included countries.

In Japan, genomic and medical information obtained from gene panel tests is managed by the Cancer Genome Information Management Center (C-CAT) at the National Cancer Center, and is now being utilized via a portal portal*6. 6 In addition, the Ministry of Health, Labour and Welfare (MHLW) has been implementing a plan for whole genome analysis. The construction of an information infrastructure that incorporates genetic information data as well as multi-omics information is underway for cancer and intractable diseases, and the construction of an information infrastructure for genetic information that can be used for research and development is progressing in the area of malignant tumors ahead of other areas*7.

4. Conclusion

This report reviews the distribution of the frequency of use of sources of samples and information for cancer research involving pharmaceutical companies and summarizes the current status of medical information and genetic information, in which the availability for use in research depends largely on the status of the provision system outside of the company.

When considering how to obtain samples and information for research purposes, it is ideal for pharmaceutical companies to develop a research plan that includes the acquisition of samples and information for all studies and to prospectively obtain such information in order to obtain the samples and information that best meet the research objectives. However, it is often difficult to obtain information prospectively for all studies from various perspectives, including the enormous cost and time required for acquisition and the feasibility of conducting studies that require a large amount of samples and information or samples and information on rare diseases. In order to resolve such situations, we believe that the creation of an environment in which samples and information are made available as needed for research to fulfill its objectives as soon as they become necessary, as a non-competitive area, will promote innovation through cooperation among multiple users.

In particular, the development of information at database providers, expansion of the volume and types of information, and development of information provision environment through data portals/platforms, etc., discussed in the latter half of this paper, are greatly related to the research activities of pharmaceutical companies, and it is desirable that providers and users cooperate to study the direction and develop information. It is desirable for providers and users to cooperate in studying the direction of the environment and developing information. Medical and biological information handled in these infrastructures plays a very important role in maintaining the health of the public and in the development and evaluation of pharmaceutical products for the Japanese people. We believe that it is necessary to promote the development of information for use in research. Currently, the Ministry of Health, Labour and Welfare (MHLW) and the Japan Agency for Medical Research and Development (AMED) are promoting the development of such information, especially in the area of malignant tumors. In order to ensure that this trend spreads to other disease areas, it is necessary to promptly consider the development of an environment and design a system for their use.

( Hōdai Okada, Senior Researcher, Pharmaceutical and Industrial Policy Research Institute)

Share this page

TOP