Sharing Data (Export RDF)
Harvard Catalyst Profiles is a Semantic Web application, which means its content
can be read and understood by other computer programs. This enables the data in
profiles, such as addresses and publications, to be shared with other institutions
and appear on other websites. If you click the "Export RDF" link on the left sidebar
of a profile page, you can see what computer programs see when visiting a profile.
The section below describes the technical details for building a computer program
that can export data from Harvard Catalyst Profiles.
As a Semantic Web application, Harvard Catalyst Profiles uses the Resource Description
Framework (RDF) data model. In RDF, every entity (e.g., person, publication, concept)
is given a unique URI. (A URI is similar to a URL that you would enter into a web
browser.) Entities are linked together using "triples" that contain three URIs--a
subject, predicate, and object. For example, the URI of a Person can be connected
to the URI of a Concept through a predicate URI of hasResearchArea. Harvard Catalyst
Profiles contains millions of URIs and triples. Semantic Web applications use an
ontology, which describes the classes and properties used to define entities and
link them together. Harvard Catalyst Profiles uses the VIVO Ontology, which was
developed as part of an NIH-funded grant to be a standard for academic and research
institutions. A growing number of sites around the world are adopting research networking
platforms that use the VIVO Ontology. Because RDF can link different triple-stores
that use the same ontology, software developers are able to create tools that span
multiple institutions and data sources. When RDF data is shared with the public,
as it is in Harvard Catalyst Profiles, it is called Linked Open Data (LOD).
There are four types of application programming interfaces (APIs) in Harvard Catalyst
- RDF crawl. Because Harvard Catalyst Profiles is a Semantic Web application, every
profile has both an HTML page and a corresponding RDF document, which contains the
data for that page in RDF/XML format. Web crawlers can follow the links embedded
within the RDF/XML to access additional content.
- SPARQL endpoint. SPARQL is a programming language that enables arbitrary queries
against RDF data. This provides the most flexibility in accessing data; however,
the downsides are the complexity in coding SPARQL queries and performance. In general,
the XML Search API (see below) is better to use than SPARQL. However, if you require
access to the SPARQL endpoint, please contact
- XML Search API. This is a web service that provides support for the most common
types of queries. It is designed to be easier to use and to offer better performance
than SPARQL, but at the expense of fewer options. It enables full-text search across
all entity types, faceting, pagination, and sorting options. The request message
to the web service is in XML format, but the output is in RDF/XML format. The URL
of the XML Search API is https://connects.catalyst.harvard.edu/API/Profiles/Public/Search.
- Old XML based web services. This provides backwards compatibility for institutions
that built applications using the older version of Harvard Catalyst Profiles. These
web services do not take advantage of many of the new features of Harvard Catalyst
Profiles. Users are encouraged to switch to one of the new APIs. The URL of the
old XML web service is https://connects.catalyst.harvard.edu/ProfilesAPI.
For more information about the APIs, please see the