Saturday, August 29, 2009

Reminder: my talk in Frankfurt on Monday; Want to meet up?

Quick and short reminder about my Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source and Open Standards talk on Monday. The session is great anyway, with other talks from Cameron, John and someone from Berlin on a Open Access HTS system (which reminds me to talk about the Open Access and that the term is tainted).

I still have a free program, other than I want to see Google Wave in action (and while I have receive my invitation, I have not received a login account yet). There is a potentially interesting talk about Second Generation Small Molecule Therapeutics by 15:00. But no plans otherwise for the afternoon and/or evening.

If you like to talk about CDK, Bioclipse and/or the Blue Obelisk movement. Or about my talk on Open Data, Open Standards and Open Source (ODOSOS) in chemoinformatics.

If you happen to be around the Frankfurt Westend campus. In building 4, I think, the Hörsaalzentrum, where the conference is. Please let me know if you like to meet up. I hope to be online :), but no promise on that... should work at a Uni location, not? Let's see... This is how to ping me, and don't worry about redundancy.

Email: egon.willighagen at gmail dot com
IRC: #cdk at
Twitter: egonwillighagen
Identica: chemblaics
Blog: just leave a reply to this message

Monday, August 17, 2009

Bioclipse and SPARQL end points

Last week, there was a very interesting thread on the DBPedia mailing list, on using Java for doing remote SPARQL queries. This was one of the features still missing in bioclipse.rdf. Richard Cyganiak replied pointing the code in Jena which conveniently does this and which bioclipse.rdf is already using anyway. Next, Fred Durao even gave a full code example relieving me from any further research, resulting in sparqlRemote() now implemented in the rdf manager:
> rdf.sparqlRemote(
"select distinct ?Concept where{[] a ?Concept } LIMIT 10"
[[], [],
[], [],
[], [],
[], [],
I reported earlier two example SPARQL queries for chemistry, which can now be rewritten as Bioclipse scripts:


Thursday, August 13, 2009

Making Bioclipse Development easier: the New Manager Wizard

Today, Jonathan, Carl, Arvid and I made writing managers for Bioclipse a bit easier. Plug-in development Eclipse in itself is already tricky to learn, and the use of Spring by the Bioclipse managers is not helping. And because very soon two new people will be starting with writing a new manager rather soon, we thought it was time to lower the activation barrier a bit.

The basic file structure of an Bioclipse manager looks like:
| `-- spring
| `-- context.xml
|-- plugin.xml
|-- .classpath
|-- .project
`-- src
`-- net
`-- bioclipse
`-- foo
`-- business
That is twelve files which need to be just right. I used to copy/paste from an earlier (simple) manager.

But we know and understand that setting up this framework is even more challenging if you have not done this at least 10 times before. So, today we implemented a New Wizard (source available from this Git repository: bioclipse.sdk).

It just asks you a project name:

and a few other settings:

Installing the Bioclipse SDK
Installing this new plugin is fairly easy, and we have set up an Update Site at Just add this as Update site in Eclipse 3.4.x (which is still required for Bioclipse2). It depends on the JDT and PDE, which you will likely already have installed being part of the default Eclipse RCP release.

Go to the Software Updates in the Help menu:

and pick Add Site.... Enter the aforementioned update site as shown here:

Then, select the Bioclipse plugin:

After you hit Install and Eclipse install the fews tens of kBs of the plugin, the plugin should show up in your installation, like it did in mine:

Implementation Details

Writing the plugin was a challenge to me, and I am happy we were doing this in a hackaton. The Bioclipse-QSAR project already had a New Project wizard, but not for a new Plug-in Project. Some things are just slightly different then. For example, it turned out that creating a .classpath cannot be done in the regular way (it never showed up), and I had to dig up some internal code of the PDE. Actually, our current implementation is still using a few internal classes because of this:
IClasspathEntry[] entries = new IClasspathEntry[3];
String executionEnvironment = null;
entries[0] = ClasspathComputer.createJREEntry(executionEnvironment);
entries[1] = ClasspathComputer.createContainerEntry();
IPath path = project.getProject().getFullPath().append("src/");
entries[2] = JavaCore.newSourceEntry(path);
Ideas are most welcome on how to clean up this code, and not make it use internal, non-exported classes. For the Java source files and even the MANIFEST.MF we are using templates, though I have seen this file being created programmatically too.

I'm sure we'll run in some needed plumbing here and there, but that's what update sites are for, not? Release soon, release often is an Open Source concept that works well in the Eclipse world.

Friday, August 7, 2009

Searching PubChem from within Bioclipse

For the application note which we are about to submit, I was working on improving the PubChem Bioclipse API a bit, resulting in new download methods:

The search allows using PubChem Filters which provides many simple means to restrict the search results. For example, we can search molecules and restrict on the molecular weight:
lists ="malaria 300:500[MW]"))
Other filters you can use in (provided by PubChem itself), includes (with examples):
  • [el]:"Au[el]")
  • [inchi]:"\"InChI=1S/CH4/h1H4\"[inchi]")
  • [inchikey]:"VNWKTOKETHGBQD-UHFFFAOYSA-N[inchikey]")
  • [mimass]:"375.9785:375.9786[mimass]")
And many, many more... see the linked Filters page.

Now, you surely want to look at the hits, for which we use the molecular table editor:
list ="375.9785:375.9786[mimass]"))
cdk.saveSDFile("/Virtual/hits.sdf", list)"/Virtual/hits.sdf")
Resulting in:

Wednesday, August 5, 2009

Running Bioclipse Plugin Unit tests: solving the XPCOM error

Sometimes you can feel so stupid. For example, when the answer is right on front of you, but only after many hours you realize the right question belonging to that answer. For example, take this answer:
    add the line: -Dorg.eclipse.swt.browser.XULRunnerPath=/usr/lib/xulrunner
This is the problem I was trying to solve: I'm running 64bit Ubuntu Jaunty with Eclipse 3.4.2 for Bioclipse development. The answer above is the correct answer. So, I added the line. To the $HOME/eclipse.ini and to the eclipse command line to start the program. But I still good not run Bioclipse plugin unit tests; I kept getting that stupid error:
    org.eclipse.swt.SWTError: XPCOM error -2147467262
    at org.eclipse.swt.browser.Mozilla.error( :1638)
    at org.eclipse.swt.browser.Mozilla.setText(Mozilla.ja va:1861)
In retrospect, I was sort of asking the wrong question. I should have asked myself not why I got that XPCOM error even though I was using the solution, but why running the unit tests was not affected by that solution. Realizing that, it became so obvious: the plugin unit testing was using a clean environment, not based on the Eclipse environment I was working in; therefore, adding that line to my Eclipse environment did not help. Instead, I only had to that line to the Run Configuration of my plugin unit tests too:

Surely, there are aspects to this which helped me overlook this solution. For example, I had installed Eclipse freshly yesterday, and then the it worked fine. Only after installing some EMF and GEF features, it stopped working again. Bitten by the correlation/causation pattern :(