Overview

The Cancer Data Repository (CaDR) is a joint Yale Medical School–Yale-New Haven Hospital resource to provide clinical information on cancer patients to clinical investigators. It is being developed within the Yale Pathology Informatics Program. Built on the Tumor Registry data set, this system aggregates data from multiple hospital systems, both electronically and via manual curation, to provide diagnostic, treatment, and outcome information on patients diagnosed and/or treated at Yale for malignant neoplasms.

Traditional data warehouses aggregate large quantities of data from disparate systems. In medical environments, while there are definite advantages to having all of the information about a given patient in one electronic location, a number of factors have limited the general utility of these warehouses. The complexity of the data models, variations in coding conventions over time and across source systems, and the sheer volume of data make it difficult for the user to extract a relatively small signal from amongst a large volume of “noise.” Doing so typically requires a detailed knowledge of the data structure of the repository and of the conventions and practices used in the primary source systems, interposing the requirement for a highly trained data-extractor between the clinical investigator and the data. Identifying and maintaining funding for such individuals has been problematic. This model also precludes the end-user from the ability to do any sort of interactive browsing, intelligent selection, or ad hoc discovery.

The CaDR is intended to be different. It is designed to be user-friendly. Rather than creating a comprehensive data repository and then attempting to make it usable by investigators, the CaDR will be first and foremost usable, and over time will become increasingly comprehensive.

Eventually (once the appropriate functionality is developed), approved investigators may be granted higher levels of access. This can only be done after the user has set up their initial profile. In most cases, this will require prior Human Investigation Committee Approval. Contact the CaDR unit in the Pathology Informatics program for more details.