Sunday, February 20, 2011

This blog is continued elsewhere

I decided to concentrate my blogs and have categories for specialist blogs like this one. You can continue about my chemical RCP adventure here.

Thursday, June 17, 2010

A StackExchange about developing Eclipse RCP applications?

Dear Eclipse Community,

I know there is a tremendous amount of information online, and StackOverflow full of Eclipse-RCP questions already. But the downside of that is that there is a lot of noise. Would it be an idea to set up a StackExchange dedicated to development of Eclipse RCP-based applications?

If you like the idea, please contribute to the Definition process. And if you don't, you're more than welcome to express that too :)

Monday, May 10, 2010

How to use GitHub for [CDK|Bioclipse] code review

Triggered by posts in the past three days, I though about writing up a short tutorial on how to perform code review for existing code on GitHub. Therefore, this applied to CDK and Bioclipse source code, many but will work for any project hosted in GitHub. Even if it is not, you could consider putting up a copy there yourself. This example will demonstrate the procedure on CDK functionality in Bioclipse in the bioclipse.cheminformatics repository.

Click on the images to get a higher resolution version.

Step 1: find the class you want to review
Use the GitHub web interface to browse your way towards the source code of the class you want to review. For example, the SmartsMatchingHelper.java:



Step 2: identify something you like to comment on
Next step is to perform some code reviewing. For example, we might want to ask something about how parseProperty() works:

Now, this page on GitHub does not provide the means to leave comments; instead, you comment on commits.

Step 3: find the last commit that touched the line you like to comment on
Git has a blame option (also called annotate) which will show you for each line who last changed that line. The GitHub web page makes this functionality available with the 'blame' link just above the first line of the source code:

This link will lead us to a page with a new column on the left side showing commit hashes, name of the commit author, and the first few characters of the commit message. For example, the web page bits relevant to code we want to comment on, looks like:

This shows us that commit 3ce78ba5 is the one we are interested in:



Step 4: Look up the line again and add a comment
In the web page with the appropriate commit looked in the previous step, you scroll down to the line you want to comment on. If you hover over that line, a blue comment bubble will show up on the left side:

Clicking that blue comment icon, you get a dialog where you can enter your comment:

The 'Add Line Note' button confirms and saves your comment:



Step 5: inform the commiter about your review
The next step would be to inform the commit author. GitHub actually helps here, and should send a message, like this one:

But it would certainly not hurt of you filed a bug report or sent an email.

Now, I should only convert this into a screencast...

Thursday, March 25, 2010

A GSoC project idea around the Resource Description Framework

I just added an entry to the Google Summer of Code 2010 Ideas wiki page:
    Resource Description Framework (RDF) is ~10 year old W3C standard. Uptake is taking off now, and it would be nice to see a Eclipse project like the Web Tools Package to provide basic RDF related functionality. This would include bundles for RDF libraries (Jena or OpenSesame) and editors for Notation3 and RDF/XML, and perhaps support for a catalog of common ontologies (RDF, RDFS, OWL, DublinCore, FOAF, ...). It could also include a Zest-based RDF graph viewer, SPARQL query editor, etc. There is existing code, for example, developed by the Bioclipse team using Jena, or the older Tripclipse. There also exist commercial offerings stressing the relevance of the RDF platform, such as Semantic Toolkit and the popular TopBraid.
For Bioclipse we have set up bundles for Jena in the bioclipse.rdf git repository, but I am sure they need some improvement. Yet, they might serve as a starting point.

Tuesday, March 16, 2010

IFile.getContentDescription() returns null on files from the workbench. Advice?

When Bioclipse reads filed its workspace, it used IFile.getContentDescription() in version 2.0 and 2.2. However, I now note that unit tests that use this method fail where they used to work in earlier versions. Instead of returning something, I get a null. An example unit test looks like:
  propane = cdk.loadMolecule(path);
  Assert.assertNotNull(propane.getResource());
  Assert.assertTrue(propane.getResource() instanceof IFile);
  IFile resource = (IFile)propane.getResource();
  Assert.assertNotNull(resource.getContentDescription());
  IContentType type = resource.getContentDescription().getContentType();
  Assert.assertNotNull(type);
  IChemFormat format = cdk.determineFormat(type);
  Assert.assertNotNull(format);
  Assert.assertEquals(MDLV2000Format.getInstance(), format);
This test uses the getContentDescription() to get a content description and converts it to a CDK library specific format type.

The JavaDoc lists this methods as more efficient alternative:
    Calling this method produces a similar effect as calling getDescriptionFor(getContents(), getName(), IContentDescription.ALL) on IContentTypeManager, but provides better opportunities for improved performance.

As it used to work, I am considering the option it is a bug. But at the same time, maybe best practices have change? Should I keep using this method, explore the cause, perhaps file a bug report, or start using getDescriptionFor(getContents(), getName(), IContentDescription.ALL)?

Monday, March 15, 2010

RDF-powered QSAR wizard: SPARQL end points providing wizard content

As you know from my blog, one of the things I am working on is to push RDF functionality in Bioclipse, as I believe it to be the missing link between molecular chemometrics and literature, databases, and other non-numerical information sources.

As part of the submission for the SWAT4LS special issue in the new Journal of Biomedical Semantics, Ola hacked up a cool wizard that sets up a new QSAR Project by downloading data directly from our RDF node for the chEMBL data using SPARQL. The paper is based on the SWAT4LS talk I gave, and the proceedings paper that recently appeared. But with more cool stuff, such as this cool RDF graph browser that allows you to open up molecules from the RDF graph in a JChemPaint editor.

Well, this really nice New QSAR Project wizard was cool enough to trigger a I-want-more reaction, so I just had to hack it up with some additional SPARQL functionality. So, the next version does not only use RDF and SPARQL to aggregate the QSAR data set, it also uses SPARQL to make the wizard interactive. While the user is typing a target ID, the wizard will check the SPARQL end point in the background and download the target's type, title and organism, as well as update the list of activities the user can select depending on what the chEMBL database has for that target:

The actual code base is pretty small, and that's what happens when you mash up the right technologies :)

Thursday, March 4, 2010

RDF, Jena, Bioclipse, Eclipse, Zest #2: icons and an extension point

Jonathan worked this week on new features for the Bioclipse RDF editor (see these two earlier items). This version still does not edit, but only display using Zest. Jonathan created for me an extension point so that anyone can make the editor aware of domain objects, by simply registering the extension implementation along with the rdf:Class URI of the rdf:type of an object. This fixes the problem of having to hardcode dependencies of the RDF editor on all the domain code, as was the case earlier.

For example, the cheminformatics IMolecule object is now linked to the rdf:type <http://www.bioclipse.net/structuredb/#Molecule>:
<extension point="net.bioclipse.rdf.rdf2bioobjectfactory">
  <Factory
    instance="net.bioclipse.rdf.ui.RDFToCDKMoleculeFactory"
    uri="http://www.bioclipse.net/structuredb/#Molecule" >
  </Factory>
</extension>
The API for this factory looks like:
public IBioObject rdfToBioObject( Model model, Resource res );
public ImageDescriptor getImageDescriptor();
This is very much tied into the Jena data model, so not entirely clean, but has to do for now. The first method converts RDF content into a Bioclipse IBioObject, such as an IMolecule (see this list of currently supported objects). The second method returns an icon, which makes the editor more visually pleasing, and provides a nice way to see when you can double click the RDF node to have it open in an domain specific editor:
For example, double clicking the ron:mol2 node, would open up a JChemPaint editor.