Banff Manifesto

A place to start mapping the Bioinformatics semantic web, because the Rockies don't move!

Telop best practices for designing RDF documents in bioinformatics. The first major milestone is to agree upon canonical URIs.

Some participants from the HCLS-DI2007 workshop, and from the I3 Workshop propose to the community of life science to establish an authority for attributing namespaces used in bioinformatics semantic web to build normalized URI wich can be URL of URN. This authority will also establish a set of rules for the construction of well formed RDF documents. When someone will use URI approved from the BM authority, they will know that some standard of naming are upheld. The set of rules are good pratices that need to be followed to be in recognized by the BM community.

Much in the way DOI work, BM will certified the appartenance of a namespace to a data provider and disambiguate the naming for it. With normalized URI, RDF document starts to connect together which is one of the main reason to adopt the semantic way of doing data integration.

Why Banff manifesto ?

Because to build the semantic coordinate system based on URIs we need a reference to start from, 'bm' is the proposed namespace authority and if it is based in Banff, it will not move because the Rockies don't move ! Banff Manifesto wants to remove some creeps from bioinformatics semantic web. Creeps rule the web, we gathered to adopt and share Banff Manifesto. Join our community, it is up to you.

Some of the main ideas of the Manifesto comes from Tim Berner-Lee's own reflexion about linked data.

Join the Banff Manifesto by adding your name and web site by commenting this page. All suggestion are welcome.

More on WhyManifesto .

The Banff Manifesto rules of thumb

  • Rule #1 : URI are normalized and dereferencable
  • Rule #2 : Authoritative public namespace are used
  • Rule #3 : Mandatory predicates are used
  • Rule #4 : Resource predicate are prefixed with an "x"
  • Rule #5 : Blank nodes are forbidden
  • Rule #6 : RDFizer program are made available according to the GNU licence for open source
  • Rule #7 : Deferenceable ontologies

Rule #1 : URI are normalized and dereferencable

The syntax of normalized URI is composed of three parts separated by ':'. The autority_namespace is 'bm' an acronym for Banff Manifesto. The literal of public_namespace are attributed by the BM community (for example go, uniprot, pubmed). The private identifier is the responsibility of the RDF provider of the corresponding document (for example 0000007 with go, p26838 for uniprot).

Normalized URL : http://bio2rdf.org/public_namespace:private_identifier

or

Normalized URN : urn:bm:public_namespace:private_identifier

URI examples

The official authoritative web page for uniprot:p26838 protein is http://www.ebi.uniprot.org/entry/uniprot:p26838

The proposed URI attributed according to Banff Manifesto will be

urn:bm:uniprot:p26838

Its dereferencable URL version is:

http://bio2rdf.org/uniprot:p26838

Its purl.org version is as follows:

http://purl.org/bm/uniprot:p26838

Finaly, its LSID equivalent is

urn:lsid:uniprot.org:uniprot:p26838

Rule #2 : Authoritative public namespace are used

The main resource defined by the Banff Manifesto is the table of authoritative namespace used in URIs of the bioinformatics semantic web ressource defined here URL2URI. Namespace are all in lowercass. Most of the time they correspond to the domain name of the corresponding web site. The namespace are attributed on a fisrt come first served basis. Obvious name like protein, gene or mouse ca not be attributed to a ressource.

Namespace examples

DomainDatabaseBM's namespace
www.genome.jpKEGG PATHWAY Databasepath
www.ncbi.nlm.nih.govEntrez PubMedpubmed
http://www.uniprot.orgUniProt (Universal Protein Resource)uniprot

BM's namespaceNamespace synonym
geneidmmu
geneidhsa
omimmim
pathkegg
pubmedpmid
uniprotswissprot

Rule #3 : Mandatory predicates are used

Banff Manifesto RDF documents must contain at least the following predicates :

  • rdf:type Define the class of object described by the document.
  • dc:identifier Define unique identifier of the document.
  • dc:title The document's title.
  • rdfs:label The document's title and identifier (dc:title [dc:identifier]).
  • bio2rdf:url URL of the corresponding human-readable html page.

An example from pubchem:8007419 NCBI PubChem definition of a substance, the phosphoric acid:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:bio2rdf="http://bio2rdf.org/bio2rdf#"

xmlns:pubchem="http://bio2rdf.org/pubchem#"
>

<pubchem:Substance rdf:about="http://bio2rdf.org/pubchem:8007419">
<rdfs:label>PO4 [pubchem:8007419]</rdfs:label>
<dc:title>PO4</dc:title>
<dc:identifier>pubchem:8007419</dc:identifier>
<bio2rdf:lsid rdf:resource="urn:lsid:bio2rdf.org:pubchem:8007419"/>
<bio2rdf:url>http://bio2rdf.org/html/pubchem:8007419</bio2rdf:url>

<bio2rdf:urlImage>http://bio2rdf.org/image/pubchem:8007419</bio2rdf:urlImage>
<rdfs:comment>Mutant Monomer Of Recombinant Human Hexokinase Type I With Glucose And Adp In The Active Site</rdfs:comment>

<bio2rdf:synonym>PO4</bio2rdf:synonym>
<bio2rdf:xRef rdf:resource="http://bio2rdf.org/mmdb:12591.4"/>

<pubchem:comment>PDB Accession Code 1DGK</pubchem:comment>
<pubchem:IUPAC_Name>phosphoric acid</pubchem:IUPAC_Name>
<pubchem:InChI>InChI=1/H3O4P/c1-5(2,3)4/h(H3,1,2,3,4)/f/h1-3H</pubchem:InChI>
<pubchem:Molecular_Formula>H3O4P</pubchem:Molecular_Formula>
<pubchem:SMILES>OP(=O)(O)O</pubchem:SMILES>

</pubchem:Substance>
</rdf:RDF>

Rule #4 : Resource predicate are prefixed with an "x"

Bio2RDF predicate name for resource always starts with an x, for example xPubMed or xPfam or xRef when unknown namespace. which means x for eXternal reference.

Here are some examples :

<bio2rdf:xPubMed rdf:resource="http://bio2rdf.org/pubmed:765489"/>
<bio2rdf:xRef rdf:resource="http://bio2rdf.org/mmdb:12591.4"/>

Rule #5 : Blank nodes are forbidden

Blank node http://en.wikipedia.org/wiki/Blank_node

Blank nodes intentionally omit knowledge about the identifier which is being referenced, requiring an inference to derive the actual identifier from a large pre-populated database. This makes it hard to perform data integration, particularly as there are known identifiers for the subject areas in question.

Rule #6 : RDFizer programs are made available according to the GNU GPL licence for open source

Because everyone should be abale to modify the schema of a RDF document according to their needs, RDFizer program are made open source so they cab be modified by users.

All RDFizer program source converting public resource to RDF are available like those :

Rule #7 : Deferenceable ontologies

The ontologies that define the classes and predicates used in the RDF documents must be provided at a dereferenceable URI indicated in the namespace declaration part of the document. This is just as important, if not more important, as it is to provide dereferenceable URIs for the other identifiers used in the RDF. No more dead ends.

Signatories (add your name and web site by commenting this page)

Francois Belleau, http://bio2rdf.org/

Marc-Alexandre Nolin, http://bio2rdf.org/

Peter Ansell, http://www.mquter.qut.edu.au/

Potential signatories

BenjaminGood


SuggestedAdditions | GeneralComments |

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-2) was last changed on 19-Apr-2008 10:55 by FrancoisBelleau [RSS]