Thursday, March 25, 2010

A GSoC project idea around the Resource Description Framework

I just added an entry to the Google Summer of Code 2010 Ideas wiki page:
    Resource Description Framework (RDF) is ~10 year old W3C standard. Uptake is taking off now, and it would be nice to see a Eclipse project like the Web Tools Package to provide basic RDF related functionality. This would include bundles for RDF libraries (Jena or OpenSesame) and editors for Notation3 and RDF/XML, and perhaps support for a catalog of common ontologies (RDF, RDFS, OWL, DublinCore, FOAF, ...). It could also include a Zest-based RDF graph viewer, SPARQL query editor, etc. There is existing code, for example, developed by the Bioclipse team using Jena, or the older Tripclipse. There also exist commercial offerings stressing the relevance of the RDF platform, such as Semantic Toolkit and the popular TopBraid.
For Bioclipse we have set up bundles for Jena in the bioclipse.rdf git repository, but I am sure they need some improvement. Yet, they might serve as a starting point.

Tuesday, March 16, 2010

IFile.getContentDescription() returns null on files from the workbench. Advice?

When Bioclipse reads filed its workspace, it used IFile.getContentDescription() in version 2.0 and 2.2. However, I now note that unit tests that use this method fail where they used to work in earlier versions. Instead of returning something, I get a null. An example unit test looks like:
  propane = cdk.loadMolecule(path);
  Assert.assertNotNull(propane.getResource());
  Assert.assertTrue(propane.getResource() instanceof IFile);
  IFile resource = (IFile)propane.getResource();
  Assert.assertNotNull(resource.getContentDescription());
  IContentType type = resource.getContentDescription().getContentType();
  Assert.assertNotNull(type);
  IChemFormat format = cdk.determineFormat(type);
  Assert.assertNotNull(format);
  Assert.assertEquals(MDLV2000Format.getInstance(), format);
This test uses the getContentDescription() to get a content description and converts it to a CDK library specific format type.

The JavaDoc lists this methods as more efficient alternative:
    Calling this method produces a similar effect as calling getDescriptionFor(getContents(), getName(), IContentDescription.ALL) on IContentTypeManager, but provides better opportunities for improved performance.

As it used to work, I am considering the option it is a bug. But at the same time, maybe best practices have change? Should I keep using this method, explore the cause, perhaps file a bug report, or start using getDescriptionFor(getContents(), getName(), IContentDescription.ALL)?

Monday, March 15, 2010

RDF-powered QSAR wizard: SPARQL end points providing wizard content

As you know from my blog, one of the things I am working on is to push RDF functionality in Bioclipse, as I believe it to be the missing link between molecular chemometrics and literature, databases, and other non-numerical information sources.

As part of the submission for the SWAT4LS special issue in the new Journal of Biomedical Semantics, Ola hacked up a cool wizard that sets up a new QSAR Project by downloading data directly from our RDF node for the chEMBL data using SPARQL. The paper is based on the SWAT4LS talk I gave, and the proceedings paper that recently appeared. But with more cool stuff, such as this cool RDF graph browser that allows you to open up molecules from the RDF graph in a JChemPaint editor.

Well, this really nice New QSAR Project wizard was cool enough to trigger a I-want-more reaction, so I just had to hack it up with some additional SPARQL functionality. So, the next version does not only use RDF and SPARQL to aggregate the QSAR data set, it also uses SPARQL to make the wizard interactive. While the user is typing a target ID, the wizard will check the SPARQL end point in the background and download the target's type, title and organism, as well as update the list of activities the user can select depending on what the chEMBL database has for that target:

The actual code base is pretty small, and that's what happens when you mash up the right technologies :)

Thursday, March 4, 2010

RDF, Jena, Bioclipse, Eclipse, Zest #2: icons and an extension point

Jonathan worked this week on new features for the Bioclipse RDF editor (see these two earlier items). This version still does not edit, but only display using Zest. Jonathan created for me an extension point so that anyone can make the editor aware of domain objects, by simply registering the extension implementation along with the rdf:Class URI of the rdf:type of an object. This fixes the problem of having to hardcode dependencies of the RDF editor on all the domain code, as was the case earlier.

For example, the cheminformatics IMolecule object is now linked to the rdf:type <http://www.bioclipse.net/structuredb/#Molecule>:
<extension point="net.bioclipse.rdf.rdf2bioobjectfactory">
  <Factory
    instance="net.bioclipse.rdf.ui.RDFToCDKMoleculeFactory"
    uri="http://www.bioclipse.net/structuredb/#Molecule" >
  </Factory>
</extension>
The API for this factory looks like:
public IBioObject rdfToBioObject( Model model, Resource res );
public ImageDescriptor getImageDescriptor();
This is very much tied into the Jena data model, so not entirely clean, but has to do for now. The first method converts RDF content into a Bioclipse IBioObject, such as an IMolecule (see this list of currently supported objects). The second method returns an icon, which makes the editor more visually pleasing, and provides a nice way to see when you can double click the RDF node to have it open in an domain specific editor:
For example, double clicking the ron:mol2 node, would open up a JChemPaint editor.