Harvard Catalyst Profiles

Contact, publication, and social network information about Harvard faculty and fellows.

Use Our Data

 
Enhance Your Website
Data from Harvard Catalyst Profiles (e.g., faculty affiliations, publications, coauthors, etc.) may be integrated into other websites or exported using public application programming interfaces (APIs). Some basic computer programming knowledge is needed to do this.
Technical Overview
As a Semantic Web application, Harvard Catalyst Profiles uses the Resource Description Framework (RDF) data model. In RDF, every entity (e.g., person, publication, concept) is given a unique URI. (A URI is similar to a URL that you would enter into a web browser.) Entities are linked together using "triples" that contain three URIs--a subject, predicate, and object. For example, the URI of a Person can be connected to the URI of a Concept through a predicate URI of hasResearchArea. Harvard Catalyst Profiles contains millions of URIs and triples. Semantic Web applications use an ontology, which describes the classes and properties used to define entities and link them together. Harvard Catalyst Profiles uses the VIVO Ontology, which was developed as part of an NIH-funded grant to be a standard for academic and research institutions. A growing number of sites around the world are adopting research networking platforms that use the VIVO Ontology. Because RDF can link different triple-stores that use the same ontology, software developers are able to create tools that span multiple institutions and data sources. When RDF data is shared with the public, as it is in Harvard Catalyst Profiles, it is called Linked Open Data (LOD).

There are four types of application programming interfaces (APIs) in Harvard Catalyst Profiles.

  • RDF crawl. Because Harvard Catalyst Profiles is a Semantic Web application, every profile has both an HTML page and a corresponding RDF document, which contains the data for that page in RDF/XML format. Web crawlers can follow the links embedded within the RDF/XML to access additional content.
  • SPARQL endpoint. SPARQL is a programming language that enables arbitrary queries against RDF data. This provides the most flexibility in accessing data; however, the downsides are the complexity in coding SPARQL queries and performance. In general, the XML Search API (see below) is better to use than SPARQL. However, if you require access to the SPARQL endpoint, please contact Griffin Weber.
  • XML Search API. This is a web service that provides support for the most common types of queries. It is designed to be easier to use and to offer better performance than SPARQL, but at the expense of fewer options. It enables full-text search across all entity types, faceting, pagination, and sorting options. The request message to the web service is in XML format, but the output is in RDF/XML format. The URL of the XML Search API is https://connects.catalyst.harvard.edu/API/Profiles/Public/Search.
  • Old XML based web services. This provides backwards compatibility for institutions that built applications using the older version of Harvard Catalyst Profiles. These web services do not take advantage of many of the new features of Harvard Catalyst Profiles. Users are encouraged to switch to one of the new APIs. The URL of the old XML web service is https://connects.catalyst.harvard.edu/ProfilesAPI.
Documentation and Examples
For detailed information about the APIs, please see the documentation and example files.
Funded by the NIH National Center for Advancing Translational Sciences through its Clinical and Translational Science Awards Program, grant number UL1TR002541.