Banff Manifesto
A place to start mapping the Bioinformatics semantic web, because the Rockies don't move!
Telop best practices for designing RDF documents in bioinformatics. The first major milestone is to agree upon canonical URIs.
Some participants from the HCLS-DI2007 workshop
, and from the I3 Workshop
propose to the community of life science to establish an authority for attributing namespaces used in bioinformatics semantic web to build normalized URI
wich can be URL of URN. This authority will also establish a set of rules for the construction of well formed RDF documents. When someone will use URI approved from the BM authority, they will know that some standard of naming are upheld. The set of rules are good pratices that need to be followed to be in recognized by the BM community.
Much in the way DOI work, BM will certified the appartenance of a namespace to a data provider and disambiguate the naming for it. With normalized URI, RDF document starts to connect together which is one of the main reason to adopt the semantic way of doing data integration.
Why Banff manifesto ?
Because to build the semantic coordinate system based on URIs we need a reference to start from, 'bm' is the proposed namespace authority and if it is based in Banff, it will not move because the Rockies don't move ! Banff Manifesto wants to remove some creeps from bioinformatics semantic web. Creeps rule the web, we gathered to adopt and share Banff Manifesto. Join our community, it is up to you.
Some of the main ideas of the Manifesto comes from Tim Berner-Lee's own reflexion
about linked data.
Join the Banff Manifesto by adding your name and web site by commenting this page. All suggestion are welcome.
More on WhyManifesto .
The Banff Manifesto rules of thumb
- Rule #1 : URI are normalized and dereferencable
- Rule #2 : Authoritative public namespace are used
- Rule #3 : Mandatory predicates are used
- Rule #4 : Resource predicate are prefixed with an "x"
- Rule #5 : Blank nodes are forbidden
- Rule #6 : RDFizer program are made available according to the GNU licence for open source
- Rule #7 : Deferenceable ontologies
Rule #1 : URI are normalized and dereferencable
The syntax of normalized URI is composed of three parts separated by ':'. The autority_namespace is 'bm' an acronym for Banff Manifesto. The literal of public_namespace are attributed by the BM community (for example go, uniprot, pubmed). The private identifier is the responsibility of the RDF provider of the corresponding document (for example 0000007 with go, p26838 for uniprot).
Normalized URL : http://bio2rdf.org/public_namespace:private_identifier or Normalized URN : urn:bm:public_namespace:private_identifier
URI examples
The official authoritative web page for uniprot:p26838 protein is http://www.ebi.uniprot.org/entry/uniprot:p26838
The proposed URI attributed according to Banff Manifesto will be
urn:bm:uniprot:p26838
Its dereferencable URL version is:
http://bio2rdf.org/uniprot:p26838
Its purl.org version is as follows:
http://purl.org/bm/uniprot:p26838
Finaly, its LSID equivalent is
urn:lsid:uniprot.org:uniprot:p26838
Rule #2 : Authoritative public namespace are used
The main resource defined by the Banff Manifesto is the table of authoritative namespace used in URIs of the bioinformatics semantic web ressource defined here URL2URI. Namespace are all in lowercass. Most of the time they correspond to the domain name of the corresponding web site. The namespace are attributed on a fisrt come first served basis. Obvious name like protein, gene or mouse ca not be attributed to a ressource.
Namespace examples
| Domain | Database | BM's namespace |
|---|---|---|
| www.genome.jp | KEGG PATHWAY Database | path |
| www.ncbi.nlm.nih.gov | Entrez PubMed | pubmed |
| http://www.uniprot.org | UniProt (Universal Protein Resource) | uniprot |
| BM's namespace | Namespace synonym |
|---|---|
| geneid | mmu |
| geneid | hsa |
| omim | mim |
| path | kegg |
| pubmed | pmid |
| uniprot | swissprot |
Rule #3 : Mandatory predicates are used
Banff Manifesto RDF documents must contain at least the following predicates :
- rdf:type Define the class of object described by the document.
- dc:identifier Define unique identifier of the document.
- dc:title The document's title.
- rdfs:label The document's title and identifier (dc:title [dc:identifier]).
- bio2rdf:url URL of the corresponding human-readable html page.
An example from pubchem:8007419 NCBI PubChem definition of a substance, the phosphoric acid:
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:bio2rdf="http://bio2rdf.org/bio2rdf#" xmlns:pubchem="http://bio2rdf.org/pubchem#" > <pubchem:Substance rdf:about="http://bio2rdf.org/pubchem:8007419"> <rdfs:label>PO4 [pubchem:8007419]</rdfs:label> <dc:title>PO4</dc:title> <dc:identifier>pubchem:8007419</dc:identifier> <bio2rdf:lsid rdf:resource="urn:lsid:bio2rdf.org:pubchem:8007419"/> <bio2rdf:url>http://bio2rdf.org/html/pubchem:8007419</bio2rdf:url> <bio2rdf:urlImage>http://bio2rdf.org/image/pubchem:8007419</bio2rdf:urlImage> <rdfs:comment>Mutant Monomer Of Recombinant Human Hexokinase Type I With Glucose And Adp In The Active Site</rdfs:comment> <bio2rdf:synonym>PO4</bio2rdf:synonym> <bio2rdf:xRef rdf:resource="http://bio2rdf.org/mmdb:12591.4"/> <pubchem:comment>PDB Accession Code 1DGK</pubchem:comment> <pubchem:IUPAC_Name>phosphoric acid</pubchem:IUPAC_Name> <pubchem:InChI>InChI=1/H3O4P/c1-5(2,3)4/h(H3,1,2,3,4)/f/h1-3H</pubchem:InChI> <pubchem:Molecular_Formula>H3O4P</pubchem:Molecular_Formula> <pubchem:SMILES>OP(=O)(O)O</pubchem:SMILES> </pubchem:Substance> </rdf:RDF>
Rule #4 : Resource predicate are prefixed with an "x"
Bio2RDF predicate name for resource always starts with an x, for example xPubMed or xPfam or xRef when unknown namespace. which means x for eXternal reference.
Here are some examples :
<bio2rdf:xPubMed rdf:resource="http://bio2rdf.org/pubmed:765489"/> <bio2rdf:xRef rdf:resource="http://bio2rdf.org/mmdb:12591.4"/>
Rule #5 : Blank nodes are forbidden
Blank node http://en.wikipedia.org/wiki/Blank_node
Blank nodes intentionally omit knowledge about the identifier which is being referenced, requiring an inference to derive the actual identifier from a large pre-populated database. This makes it hard to perform data integration, particularly as there are known identifiers for the subject areas in question.
Rule #6 : RDFizer programs are made available according to the GNU GPL licence for open source
Because everyone should be abale to modify the schema of a RDF document according to their needs, RDFizer program are made open source so they cab be modified by users.
All RDFizer program source converting public resource to RDF are available like those :
- http://bio2rdf.org/jsp/ncbi-omim2rdf.txt
- http://bio2rdf.org/jsp/bio2rdf2creeps.txt
- http://bio2rdf.org/jsp/xml2rdf.txt
- http://bio2rdf.org/pubchem:xsl
Rule #7 : Deferenceable ontologies
The ontologies that define the classes and predicates used in the RDF documents must be provided at a dereferenceable URI indicated in the namespace declaration part of the document. This is just as important, if not more important, as it is to provide dereferenceable URIs for the other identifiers used in the RDF. No more dead ends.
Signatories (add your name and web site by commenting this page)
Francois Belleau, http://bio2rdf.org/
Marc-Alexandre Nolin, http://bio2rdf.org/
Peter Ansell, http://www.mquter.qut.edu.au/
Potential signatories
BenjaminGood
SuggestedAdditions | GeneralComments |