Points of View How to Obtain Samples and Information for Research in the Field of Oncology

Printable PDF

Hodai Okada, Senior Researcher, Pharmaceutical Industry Policy Institute

SUMMARY

  • Pharmaceutical companies conduct a great deal of research using human-derived samples and information. Because it is difficult for pharmaceutical companies to acquire these samples and information on their own initiative, it is important for them to work closely with outside organizations that handle the transfer of biological samples and the management of medical and biological information, as well as with medical institutions.
  • In this survey, we focused on the area of malignant tumors, and investigated the distribution of the acquisition channels of samples and information used by pharmaceutical companies for research based on academic papers.
  • The most frequently used access route for research using samples and information of human origin was prospective clinical trials and cohort studies, which accounted for about 37% of all studies. In addition, use via organizations that collect and widely provide samples and information for research use (biospecimen suppliers and database providers) followed.
  • Among the sources of human-derived information, "access via database providers" and "use of data portals/platforms," which are used more frequently by pharmaceutical companies and the availability of information largely depends on the state of the external medical information provision environment, showed that activities to improve the provision environment are being conducted in Japan as well. The following two activities were identified. 3-1.

1. Introduction

Pharmaceutical companies conduct research using human-derived samples and information not only in basic research but also in all phases of drug evaluation, including clinical trials and post-marketing surveillance studies, to derive scientific evidence on the efficacy and safety of drugs. Human-derived samples mainly refer to blood, body fluids, tissues, cells, excretions, and DNA extracted from them, while information refers to information on human health and genetic information obtained through diagnosis and treatment of research subjects, such as names of injuries and diseases, medication details, and test or measurement results1).

While these samples and information are essential for pharmaceutical companies to conduct research on drugs, it is difficult for pharmaceutical companies to acquire these samples and information on their own initiative. Therefore, the cooperation of outside organizations that distribute biological samples and manage medical and biological information, as well as medical institutions, is necessary. In order for pharmaceutical companies to derive high-quality scientific evidence, it is essential that they work closely with these institutions to obtain samples and information that meet the objectives of their research.

In this paper, in order to overview the sources of samples and information for research involving pharmaceutical companies, we conducted a survey focusing on the area of malignant tumors, where the largest number of new drugs have been developed in recent years and the environment for providing samples and information is being developed ahead of other areas. In the latter half of the report, we will focus on the environment for the use of medical and biological information, which is frequently used by pharmaceutical companies and the availability of information depends heavily on the information provided by external medical information provision infrastructures, and will also touch on the state of development of provision environments in Japan and overseas.

Survey Methodology

We surveyed the actual situation regarding the acquisition channels of human-derived research materials and information used for research in pharmaceutical companies, based on published scientific papers. Topic search of Web of ScienceⓇ Clarivate was used to identify articles related to the malignant tumor area (search term 2): "Cancer" OR "Tumor" OR "Oncology" OR "Carcinoma" OR "Sarcoma" OR "Myeloma " OR "Leukemia" OR "Lymphoma"). We identified five pharmaceutical companies (three in the U.S., one in the U.K., and one in Switzerland) focusing on research in the field of malignant tumors that published the largest number of papers in 2023 that corresponded to the above search, and collected original papers published in 2023 that included employees of the relevant pharmaceutical companies as authors for the current survey. The original papers published in 2023 that included employees of the relevant pharmaceutical companies as authors were collected for this survey. The source of the samples and information used in the research was identified from the description in the section on research methods in each article. In order to obtain an overall picture and distribution of the samples and information used in the research, the survey was classified to include a wide range of readily available samples and information, such as human cell lines and attribute information associated with questionnaires to investigate preferences. The survey presented in this paper does not quantitatively represent the actual utilization status of pharmaceutical companies as a whole, as the nationality of the target companies and the status of their development pipelines may affect the results.

Results

3-1. Samples and Information Used for Research

There were 1,351 papers that met the search criteria. From those 1,344 papers, we excluded 7 papers whose contents were difficult to confirm, and included them in this survey. The number of papers per company ranged from about 280 to 350 (including co-authored papers among the companies surveyed).

The human-derived samples and information used in the surveyed papers can be broadly categorized in terms of the access routes, as shown in Figure 1. The acquisition methods can be broadly classified into the prospective collection of necessary samples and information according to the purpose of the research and the selection and use of samples and information that meet the purpose of the research from those that have already been collected.

Research that uses prospectively collected samples and information on demand has the advantage that a research implementation plan can be developed in accordance with the purpose of the research, and the necessary samples and information can be obtained in the necessary quantities. The following examples fall under this category.

Research using already collected samples and information has the advantage of reducing the time and effort required for obtaining samples and information by using samples and information that are already available at the time of research planning, regardless of whether they were collected for the purpose of research or not. Research using cells and specimens purchased from suppliers and research using information in electronic medical records of medical institutions fall under this category.

Table 1 shows the distribution of the acquisition routes of samples and information used by the papers covered by this survey. Research conducted using samples/information from multiple categories was counted in all categories used. In clinical trials of pharmaceuticals, medical information such as efficacy evaluation and adverse event records and biological samples such as blood are often obtained within the same study, and in studies using biological samples, there were many papers in which it was impossible to distinguish between prospective clinical studies for obtaining samples and studies using residual samples from the description of the paper, and therefore In Table 1, some of the classifications shown in Figure 1 have been changed. (Samples and information obtained on demand (excluding studies conducted solely for the purpose of obtaining samples) were summarized as prospectively conducted research, and studies in which the primary purpose was to use samples obtained at medical institutions (whether prospectively or retrospectively) were summarized as acquisition of biological samples at medical institutions.)

 Figure 1 Sources of human-derived samples and information used in research

In studies using human-derived samples/information involving pharmaceutical companies that were published in the area of malignancy, the most frequently used acquisition route was prospective clinical trials or cohort studies, which accounted for about 37% of all studies. The high quality of research results obtained from samples and information obtained in accordance with the research objectives and the recommended submission of research results to journals as an activity to increase transparency in clinical trials may be related to these results.

Next, for studies using biological samples, most of the samples were obtained from suppliers who distribute biological samples, but the majority of the studies used readily available cell lines, and when limited to samples excluding cells, more studies were identified that directly used samples collected at medical institutions. The most common use of medical information was in studies conducted by obtaining information from providers who manage medical information.

To provide an overview of usage, we tabulated the categories assigned to journals in Web of ScienceⓇ by the category of the source of the samples and information. More than one category is assigned to each journal, and some journals have multiple assignments3). Based on the categories in Table 1, Table 2 shows the categories of sample and information sources that show characteristic trends. Categories that include prospective clinical trials and cohort studies, as well as studies using information obtained from database providers, are relatively more likely to be published in clinical research-related journals (e.g., Journal of Clinical Oncology and Future Oncology), and those that use medical institution-obtained Research using samples obtained at medical institutions or information obtained from data platforms is relatively more likely to be published in journals related to basic research (e.g., Cancer Cell and Cancer Discovery). Based on these characteristics, the following sections describe the characteristics of each type of sample and information.

 Table 1 Classification of sources of samples and information used in research
 Table 2 Areas of contribution of research results using each sample/information

3-2. characteristics of each access route

The following section describes the contents and characteristics of the samples and information that were frequently used in the studies under review in each of the categories of the access routes of the samples and information.

Prospective clinical trials, cohort studies, and cross-sectional studies

The majority of the papers reported the results of clinical trials involving pharmaceutical interventions. Others included reports on the results of post-marketing surveillance of pharmaceuticals, cohort studies for the purpose of safety monitoring, and studies using questionnaires on treatment preferences and other factors.

Clinical trials and cohort studies led by pharmaceutical companies obtain consent from all subjects participating in the study for the use of the information obtained, and thus the accessibility of information and the sufficiency of information needed in the study are high. On the other hand, the acquisition of information often requires human resources and time, which is a reasonable burden when long-term follow-up or large amounts of patient information need to be collected.

Acquisition of biological specimens via suppliers

Among biological specimens, cells and other tissues differ greatly in terms of difficulty of obtaining from the perspective of research ethics, so they are organized separately. The majority of studies using cells involved the use of fractionated cell lines. Among them, the majority of the studies were those in which the American Type Culture Collection (ATCC), a non-profit organization in the United States, was the source of the samples. Other studies existed that obtained them from the JCRB cell bank in Japan or from private suppliers in the United States.

The majority of the studies that used samples other than cells were those that used tissue section slides and obtained them from private suppliers in the U.S. (Avaden Biosciences, Inc., Proteo-Genex, Inc., etc.).

Cells are available by purchasing from suppliers, and accessibility to samples is high, with few obstacles in obtaining them. The ethical guidelines for life science and medical research involving human subjects in Japan also state that "samples and information that are already of established academic value, widely used for research, and generally available" are exempt from the guidelines, so there are few obstacles to their use from an ethical standpoint.

For other biological samples, it is common practice to obtain consent from all patients at the time the samples are obtained for use in research, and if approval can be obtained through ethical review, accessibility of the samples is relatively high.

Acquisition of biospecimens for research at medical institutions

Many studies use biopsies of tumors or blood samples taken during treatment at medical institutions, and the majority of samples are obtained from collaborations with medical institutions or from clinical biobanks or biorepositories attached to medical institutions. There were both studies that utilized residual samples from surgeries or examinations stored at medical institutions and studies that prospectively collected samples for the purpose of the study.

These biological samples are also generally obtained from all patients at the time of sample acquisition, with consent for the use of the acquired samples in research, but there are cases in which the use of the samples by a for-profit pharmaceutical company is not possible. In the case of prospective collection, as with clinical trials involving interventions, the human resources and financial burden of obtaining samples can be significant and it may take time to obtain all the samples, since the purpose of the clinical research is to collect the samples.

Secondary use of samples obtained during clinical trials

The majority of studies have explored the relationship of sequence information and gene expression to patient information obtained in clinical trials by collecting DNA and RNA from residual samples taken during the conduct of the clinical trial.

These biological samples are also generally obtained from all subjects at the time the samples are obtained, and consent to use the obtained samples for research is generally obtained from all subjects, but whether or not consent is obtained to conduct secondary use affects the accessibility of the samples.

Acquisition via database providers

The majority of the studies used databases derived from receipts and electronic medical records provided by private companies and cancer registries provided by public institutions. In the use of medical information provided by private companies, many studies identified the use of information extracted from disease-specific sources (Flatiron, US Oncology Network) as well as cross-disease sources (Optum, Medical Data Vision), which are limited to the area of malignancy. In terms of information from public patient registries, SEER (The Surveillance, Epidemiology, and End Results), which provides cancer statistics in the US, was the most widely used, and cancer registries in the US, UK, Sweden, and Denmark were also used in some studies.

These information are available by paying a fee to the provider or applying for access, and accessibility to the information is high if the requirements for use are met. The advantage is that most of this information is protected by removing personally identifiable information in accordance with the regulations of each country, which facilitates the use of large amounts of patient information for research. On the other hand, in the process of protecting personal information, information necessary for research, such as date information and genetic information, may also be deleted, reducing the sufficiency of the information.

Use of information stored at a medical institution

The majority of studies have used information from electronic medical records kept at medical institutions that were obtained during the treatment period. There were many collaborative studies funded by pharmaceutical companies and studies in which the author from the medical institution was the first author and the author from the pharmaceutical company participated as a co-author. Some studies used pathology results or imaging information such as MRI.

Medical institutions have comprehensive records of procedures and medical findings performed at medical institutions during hospitalization and hospital visits, and most items of information needed for research are likely to be available. On the other hand, in order to use such personal information for research, it is necessary to conduct a reasonable ethical review and consider information management, including the research system, and the burden of obtaining the information is significant.

Secondary use of information obtained during clinical trials

There were many studies that integrated the results of multiple clinical trials for the evaluation of efficacy and safety or for population pharmacokinetic analysis. Other studies existed that used the results in parameter estimation for simulation in cost-effectiveness evaluation or for comparison of efficacy with an external control group.

As with the secondary use of samples obtained during clinical trials, it is common practice to obtain consent from all subjects at the time the information is obtained for use in research, but whether or not consent is obtained that allows secondary use of the information for purposes other than the original clinical trial purpose will affect the accessibility of the information.

Use of data portals and platforms

The majority of studies used transcriptome expression and sequence information, and the majority of studies used TCGA (The Cancer Genome Atlas Program) and GEO (Gene Expression Omnibus) to obtain information. TCGA provided DNA sequence and The TCGA also provided information on DNA sequences and images of pathology specimens, and some studies used this information, although the number of studies was not large.

The information provided in these portals also generally obtained consent from all subjects at the time the information was obtained to use the information obtained for research. The TCGA distinguishes between information that is immediately available and information that requires approval for use from the standpoint of research ethics, and requires a certain level of use review when using the array of information. A certain level of utilization review is required before sequence information can be used.

3-3. Infrastructure for Sharing Medical and Biomedical Information

The National Institute of Biomedical Innovation Policy (NIBIO) has been conducting ongoing research to promote the use of medical and biological information. This time, among the sources of human-derived information, we focused on "access via database providers" and "use of data portals/platforms," which are frequently used by pharmaceutical companies and the availability of information depends largely on the state of the medical information provision environment outside of pharmaceutical companies, and examined in depth the state of development in Japan. The survey focused on "access to medical information via database providers" and "use of data portal platforms". In this survey, database providers are defined as organizations that manage and provide data based on user applications and contracts, and data portal platforms are defined as environments that have been developed to make most information widely accessible via the Web4, 5, 6).

3-3-1. Access via database providers 3-3-2.

Table 3 shows the information that was used most frequently in this survey in the category of acquisition via database providers. The characteristics of the top three most frequently used information will be touched upon. First, both Flatiron Health and SEER are specialized in the area of malignancies in the United States. The former is information derived from disease-specific EMR and provided by private companies. The latter is information from the cancer registry maintained by the National Cancer Institute, and Optum uses information from insurance claims and information derived from EMRs and other sources provided by private companies to extract information in the area of malignant tumors from information in all areas. In the area of malignant tumors, as mentioned in News No. 71 in the previous issue, both general-purpose information that can be used for various diseases and applications and disease-specific information that is not easily structured are being used to the same extent7), indicating that research use of EMR has already begun to be realized.

 Table 3 Breakdown of database providers

In Japan, too, there is a movement toward the secondary use of disease-specific information on malignant tumors, and the aforementioned Flatiron of the U.S. has established a subsidiary in Japan and aims to establish a database in Japan in collaboration with the SCRUM-Japan project of the National Cancer Center, as announced in 20228). 8). In addition, the New Medical Real World Data Research Organization, Inc., which developed an electronic medical record entry support system based on the results of an AMED project, has been selected by the Cabinet Office's Strategic Innovation Creation Program, and public support is also being provided for research into structuring medical information9, 10). The Ministry of Health, Labour and Welfare's Health Science Council is currently discussing the availability of public cancer registry information from the private sector11), and a plan to make it available on the secondary use platform of the National Medical Information Platform has been announced12).

3-3-2 Use of Data Portal Platforms

In this survey, the most frequently used categories of research information obtained via the data portal platform were those using TCGA (56%, 45 of 80 reports) and those using information registered in GEO (26%, 21 of 80 reports), followed by those using TARGET ( Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and UK Biobank followed with five reports each. The original articles containing information on TCGA were identified in the Web of ScienceⓇ Clarivate database, and all articles, not only those involving pharmaceutical companies, were identified. The percentage of Chinese authors involved is higher than that of the U.S., and there is also a large number of use from East Asian countries such as South Korea, Japan, and Taiwan (Figure 2). While such open access to information is a desirable trend in terms of the development of research in the health field, it has also been reported that information on genetic mutations, etc., varies by race, and the MHLW subcommittee has indicated the risks of relying solely on large-scale overseas information sources13).

 Figure 2 Countries of authors of papers with mention related to TCGA

In Japan, genomic and medical information obtained from gene panel tests is managed by the Cancer Genome Information Management Center (C-CAT) of the National Cancer Center, and is now being utilized via a portal14). In addition, the Ministry of Health, Labour and Welfare (MHLW) has been promoting the Whole Genome Analysis and Other Action Plans. The construction of an information infrastructure that incorporates genetic information data along with multi-omics information is underway for cancer and intractable diseases, and the construction of an information infrastructure for genetic information that can be used for research and development is progressing in the area of malignant tumors ahead of other areas15).

4. Conclusion

In this paper, we review the distribution of the frequency of use of sources of cancer research samples and information involving pharmaceutical companies, and summarize the current status of medical information and genetic information, especially that the availability of such information for research depends largely on the status of the provision system outside of the companies.

When considering how to obtain samples and information for research purposes, it is ideal for pharmaceutical companies to develop a research plan that includes the acquisition of samples and information for all studies and to prospectively obtain such information in order to obtain the samples and information that best meet the research objectives. However, it is often difficult to obtain information prospectively for all studies from various perspectives, including the enormous cost and time required for acquisition and the feasibility of conducting studies that require a large amount of samples and information or samples and information on rare diseases. In order to resolve such situations, we believe that the construction of an environment in which the necessary samples and information are made available to fulfill the objectives of research as soon as they are needed, as a non-competitive area, will promote innovation through cooperation among multiple users.

In particular, the development of information at database providers, expansion of the volume and types of information, and development of information provision environment through data portals/platforms, etc., discussed in the latter half of this paper, are greatly related to the research activities of pharmaceutical companies, and it is desirable that providers and users cooperate to study the direction and develop information. It is desirable for providers and users to cooperate in considering the direction and developing the information. Since medical and biological information handled in these infrastructures plays a very important role in maintaining the health of the public and promoting the development and evaluation of pharmaceuticals for the Japanese people, it is necessary to promote the development of information for use in research as part of measures such as the government-led promotion of medical DX. Currently, the Ministry of Health, Labour and Welfare (MHLW) and AMED projects, among others, are promoting such development, especially in the area of malignant tumors. In order to ensure that this trend spreads to other disease areas, it is necessary to promptly consider the development of an environment and design a system to facilitate their use.

Share this page

TOP