The GAAIN Entity Mapper (GEM) is a software system for automated matching or mapping of data elements. The application context is that of matching corresponding data elements in the process of harmonization of disparate, independently created datasets or databases in the biomedical domain. The current process of manually mapping datasets is very resource and time intensive and we built the GEM system as an intelligent software assistant to aid data analysts with the data mapping process by providing suggestions for element matches.
GEM leverages technology from the areas of database record-linkage, text mining and modeling, semantics, and machine-learning classification. GEM also incorporates active-learning capabilities for more efficient training of the systems in new domains. An API is being developed for the system to be used as a service.
Experimental evaluations have demonstrated the system to be very effective in mapping actual datasets, providing data mapping accuracies of around 90%.
The GEM system has been used for data mapping tasks within the BDDS project, starting with the Alzheimer's Disease Neuroimaging Initiative (ADNI) and Parkinson’s Progression Markers Initiative (PPMI) databases where the goal is to find corresponding data elements across these two databases. The GEM system was used to harmonize the data before being loaded into the BDDS ERMRest data catalog. While the ERMRest catalog and related set of tools provide capabilities for querying, viewing and curating the data, the GEM system performs the complimentary task of harmonizing the data before loading into the data management system.
View details on how the GEM system has been used by the GAAIN project