using RDF for scientific data sets
Posted 1 week, 2 days ago
The Resource Description Framework (RDF) is a very flexible data model that breaks down the information to its most basic building blocks: statements of the form subject-predicate-object called "triples". These statements and their ability to describe any relation, no matter how complex, make the separate schema we see in relational databases obsolete, allowing the implied schema to naturally follow the changes in the data.
One specific type of resource identifier, the URI, is the key for creating a self-describing data set. If all the predicate URIs are also valid URLs pointing to web pages describing those predicates, then anyone who comes
in contact with the serialized RDF data will be able to make sense of it. The data can become self-documenting facilitating the kind of collaboration that happens in the scientific community, for example. Further more, if some of those URIs point to external data sets we have interconnectivity, tuning into the wet dream of semantic web proponents - the global graph.
The Brain Architecture Management System (BAMS) is one real world example of what RDF can do. The dynamic RDF/XML serialization alone goes a long way towards facilitating data access. Behind the scenes, RDF simplifies the addition of new predicates (compared to the management of columns, indices and foreign keys in RDBMS) and query construction through SPARQL. This implementation is also the basis for a future OWL ontology on top of RDF and maybe an open SPARQL access point. Yes, I'm excited about it because it's an Odeon project and we got to do the actual migration from mysql to a triple store, but the wind of change is blowing throughout the scientific community (at least in neuroscience), and people start moving towards semantic technologies. Granted, you'll see many static files processed with desktop software like Protégé, but the direction is clear.
Category: RDF
Leave a Comment
