Saturday, November 7, 2009

Call for Collaboration: JavaDoc validation with OpenJavaDocCheck

Dear all,

while it is still very much in progress, I have made good progress with writing a new BSD-licensed replacement for Sun's DocCheck utility for testing the library's JavaDoc quality. The DocCheck's never really satisfied me, and the most recent version is ancient. Because it is closed source, no one can continue on those efforts. DocCheck is MIA.

PMD is given nice, simple overviews instead. It provides me with a quick overview of what is wrong with the CDK JavaDoc, and also provides a decent XML format which allows extraction of information, which is used by, for example, SuperNightly as showed yesterday in PMD 2.4.5 installed in the CDK 1.2.x branch.

I have been pondering about it for a long time now, but writing a JavaDoc checking library is hardly core cheminformatics research; at least, you would not get funding for it, despite everyone always complaining about good documentation. Alas.

A few weeks ago, I was reviewing some more code, and again saw the very common error of the missing period at the end of the first sentence in JavaDoc. This one is sort of important for proper JavaDoc documentation generation, but the complexity of the current DocCheck reporting, people are not familiar enough with it. Being tired of having to repeat myself, I decided to address the problen, but creating better Nightly error reporting for the CDK JavaDoc.

So, I started OpenJavaDocCheck, or ojdcheck. As mentioned, I have made quite promising progress, and the current version provides the ability to write custom tests (which I plan to use for validating content of CDK taglet content), and create XML as well as XHTML which can be saved to any file. To give you a glimps of where things are going, here's a screenshot of the current XHTML output:



This list shows a mix of tests that are now implemented in OpenJavaDocCheck itself, but the third line is actually a test that is plugged in and specific for the CDK. This is an important feature, I think, and allows users of OpenJavaDocCheck to add functionality is that is not interesting to the general public, but very interesting for the JavaDoc being analyzed. Well, at least, it is to the CDK project :)

The current list of tests is still quite small, and consists of these tests:
  • test if each class and method has JavaDoc
  • test for missing @return tags
  • test for missing @param tags
  • test for @returns instead of @return
  • test @param template code, such as added by IDEs like Eclipse
  • test @exception template code, such as added by IDEs like Eclipse
  • test for redundant @version tags
I am now seeking feedback on the current code base, and potentially collaboration with writing more JavaDoc validation tests. There is enough to do, and I have been thinking on tests for:
  • spell checking JavaDoc
  • checking for 404s of web pages linked with <a href> in the JavaDoc
  • well-formedness of the HTML in the webpages
And about:
  • a PMD-like system to allow people to choose which testing they want or not
  • an Eclipse plugin

Wednesday, November 4, 2009

New Bioclipse Features: Kabsch Alignment, RMSD Distance and Tanimoto Simarlity Matrices

We recently submitted a second paper on Bioclipse, and have worked hard in the past two weeks on addressing the reviewers' questions (and we love these feature requests! See also these two blogs). One reviewer seemed very interested in seeing docking available in Bioclipse. While we do not have a full docking feature set up for Bioclipse, we do have functionality to deal with 3D structures, though our researched urged us to focus on the 2D side of cheminformatics so far.

To strengthen our intentions towards the 3D cheminformatics world, we have implemented a few new features, using CDK functionality. For example, we added Kabsch aligment and the related RMSD between molecular structures implemented as both popup menus as well as manager methods. The manager method you can see in action in MyExperiment workflow 937, which you can download directly into Bioclipse with one simple command (see Bioclipse Manager for MyExperiment.org):
var smileses = new Array("CC(C)C", "CCCN", "CCC=O");

var unaligned = cdk.createMoleculeList();
for (i=0; i<smileses.length; i++) {
  mol = cdk.fromSMILES(smileses[i]);
  mol = cdk.generate3dCoordinates(mol)
  unaligned.add(mol);
}

var aligned = cdk.kabsch(unaligned)

jmol.load(aligned.get(0));
for (i=1; i<aligned.size(); i++) {
  jmol.append(aligned.get(i));
}
Now, we do have to update the use of Jmol in Bioclipse, and a big overhaul is scheduled for the 2.4 released in February next year. But you get the idea.

As said, there are two stories to adding this new functionality. Because we want all GUI interaction the user performs to be recordable (Scientist 1: What did you do to get those nice results? Scientist 2: I pushed that button in the that long menu. Scientist 1: What button is that? Scientist 2: Wait, I send you the BSL script with a Google Wave.)

The managers that allow this recording is Bioclipse specific, and also the reason why it would not be trivial to make a general Bioclipse plugin for Eclipse... some Spring magic is used to inject the managers into the JavaScript language. Anyway, the second thing is to add a GUI element, like popup menus. Now, this is a particular area where Eclipse excels. Now, I did have to ask for the details, as I am not using this daily (I'm doing science, not IT), but Ola was kind enough to give me the pointers for it.

The below configuration snippet links the pop up action to Bioclipse Navigator content (you know, where your MDL SD, CML, script and other files show up in Bioclipse). But only if I have selected 3 or more files! And, only if those files are actually some molecular content with 3D coordinates! And Bioclipse inherits this functionality by using the Eclipse platform.
<menuContribution
  locationURI="popup:org.eclipse.ui.popup.any?after=additions">
  <command
    commandId="net.bioclipse.cdk.ui.handlers.kabschAlignment"
    label="Perform Kabsch Alignment"
    icon="icons/molecule2D.png">
    <visibleWhen>
      <with variable="selection">
        <count value="(2-"/>
        <iterate operator="and" ifEmpty="false">
          <adapt type="org.eclipse.core.resources.IResource">
            <or>
              <test property="org.eclipse.core.resources.contentTypeId"
                       value="net.bioclipse.contenttypes.cml.singleMolecule3d"/>
              <test property="org.eclipse.core.resources.contentTypeId"
                       value="net.bioclipse.contenttypes.cml.singleMolecule5d"/>
              <test property="org.eclipse.core.resources.contentTypeId"
                       value="net.bioclipse.contenttypes.mdlMolFile3D"/>
            </or>
          </adapt>
        </iterate>
      </with>
    </visibleWhen>
  </command>
</menuContribution>
When Bioclipse is run, this looks like:



And the alignment results will nicely show up in a Jmol viewer (while it is implemented as an Eclipse editor, it is not yet):


The first screenshot also shows the new pop-up menus for calculating two matrices for 3 or more molecules. One is based on the RMSD of the 3D atomic coordinats of the atoms in the MCSS (BTW, Asad's SMSD work is making its way into the CDK library, and will be available in a later Bioclipse version too.) and will create a distance matrix. The second new pop-up menu used the Tanimoto similarity measure based on CDK fingerprints on the selected chemical graphs. If the Bioclipse Statistics feature is installed, the created CSV files will open up in a matrix editor:


Wednesday, September 23, 2009

Extension point for running JUnit tests in a RCP Application instance?

One thing that has been on my wishlist is to be able to run the unit tests we have for Bioclipse from inside a running Bioclipse instance. That is, we have a Bioclipse Test Suite features on the update site, matching the functional features we have there. Each such test suite would run all JUnit tests we have for that feature.

The good thing about this is twofold:

  1. users can verify that their installation is working as intended
  2. the development team can easily run the test suite on foreign systems, without the need to install a fully operational Eclipse with Bioclipse development workspace
Now, the tricky thing is likely the following. How do we get to run all test suites? That is, I don't want to need to have to run the suites for each feature separately. Of course, this is exactly what extension points are for.

So, my question is, did anyone set up an system like this? And, is there an extension point that allows features to plugin additional JUnit test suites into a larger test suite dynamically?

Monday, September 7, 2009

New Plug-in Wizard template: Can I add Import-Package programmatically?

Dear Planet Eclipse readers, please take notice of my problem with adding an Import-Package to the MANIFEST.MF using the Plug-in Wizard templating mechanism. Any suggestions and pointers very much appreciated! I'd really like to remove step 7 from the following tutorial:

Last Friday, the Bioclipse 2.1 development series moved to Eclipse 3.5, so I had to update the Bioclipse SDK too, which we developed earlier.

With a new Eclipse version also comes new screenshots to talk you through the process of setting up a new Bioclipse manager plugin.

Step 1
Right click in your workspace navigator, and choose New -> Project:



Step 2
And select to create a new Plug-in Project:



Step 3
Give a project name, such as net.bioclipse.xml:



Step 4
Tune the ID, Version, Name, and Provider to your liking:



Step 5
Then select Bioclipse Manager:



Step 6
The next wizard page is specific the the Bioclipse manager, and asks a manager namespace, which will be used as prefix in the JavaScript Console. For example, if I make the namespace xml, then I will type xml.someMethod() inside the JavaScript. The default manager name is typically OK by default:

Then click Finish and let Eclipse set up the new project.

Step 7
Because I have not figured out yet how to add Import-Package to the MANIFEST.MF programmatically, you will have to do this manually. Add the last line of the next screenshot to the MANIFEST.MF of your new plugin:



Update: I found a hack to add the Import-Package programmatically, by overwriting the execute(IProject project, IPluginModelBase model, IProgressMonitor monitor) in the Template class.

Saturday, August 29, 2009

Reminder: my talk in Frankfurt on Monday; Want to meet up?

Quick and short reminder about my Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source and Open Standards talk on Monday. The session is great anyway, with other talks from Cameron, John and someone from Berlin on a Open Access HTS system (which reminds me to talk about the Open Access and that the term is tainted).

I still have a free program, other than I want to see Google Wave in action (and while I have receive my invitation, I have not received a login account yet). There is a potentially interesting talk about Second Generation Small Molecule Therapeutics by 15:00. But no plans otherwise for the afternoon and/or evening.

If you like to talk about CDK, Bioclipse and/or the Blue Obelisk movement. Or about my talk on Open Data, Open Standards and Open Source (ODOSOS) in chemoinformatics.

If you happen to be around the Frankfurt Westend campus. In building 4, I think, the Hörsaalzentrum, where the conference is. Please let me know if you like to meet up. I hope to be online :), but no promise on that... should work at a Uni location, not? Let's see... This is how to ping me, and don't worry about redundancy.

Email: egon.willighagen at gmail dot com
IRC: #cdk at irc.freenode.net
Twitter: egonwillighagen
Identica: chemblaics
Blog: just leave a reply to this message

Monday, August 17, 2009

Bioclipse and SPARQL end points

Last week, there was a very interesting thread on the DBPedia mailing list, on using Java for doing remote SPARQL queries. This was one of the features still missing in bioclipse.rdf. Richard Cyganiak replied pointing the code in Jena which conveniently does this and which bioclipse.rdf is already using anyway. Next, Fred Durao even gave a full code example relieving me from any further research, resulting in sparqlRemote() now implemented in the rdf manager:
> rdf.sparqlRemote(
"http://dbpedia.org/sparql",
"select distinct ?Concept where{[] a ?Concept } LIMIT 10"
);
[[http://dbpedia.org/ontology/Place], [http://dbpedia.org/ontology/Area],
[http://dbpedia.org/ontology/City], [http://dbpedia.org/ontology/River],
[http://dbpedia.org/ontology/Road], [http://dbpedia.org/ontology/Lake],
[http://dbpedia.org/ontology/LunarCrater],
[http://dbpedia.org/ontology/ShoppingMall], [http://dbpedia.org/ontology/Park],
[http://dbpedia.org/ontology/SiteOfSpecialScientificInterest]]
I reported earlier two example SPARQL queries for chemistry, which can now be rewritten as Bioclipse scripts:

and

Thursday, August 13, 2009

Making Bioclipse Development easier: the New Manager Wizard

Today, Jonathan, Carl, Arvid and I made writing managers for Bioclipse a bit easier. Plug-in development Eclipse in itself is already tricky to learn, and the use of Spring by the Bioclipse managers is not helping. And because very soon two new people will be starting with writing a new manager rather soon, we thought it was time to lower the activation barrier a bit.

The basic file structure of an Bioclipse manager looks like:
net.bioclipse.foo/
|--META-INF
| |--MANIFEST.MF
| `-- spring
| `-- context.xml
|-- plugin.xml
|-- .classpath
|-- .project
|-- build.properties
`-- src
`-- net
`-- bioclipse
`-- foo
|-- Activator.java
`-- business
|-- FooManager.java
|-- FooManagerFactory.java
|-- IFooManager.java
|-- IJavaFooManager.java
`-- IJavaScriptFooManager.java
That is twelve files which need to be just right. I used to copy/paste from an earlier (simple) manager.

But we know and understand that setting up this framework is even more challenging if you have not done this at least 10 times before. So, today we implemented a New Wizard (source available from this Git repository: bioclipse.sdk).

It just asks you a project name:

and a few other settings:



Installing the Bioclipse SDK
Installing this new plugin is fairly easy, and we have set up an Update Site at http://pele.farmbio.uu.se/sdk/. Just add this as Update site in Eclipse 3.4.x (which is still required for Bioclipse2). It depends on the JDT and PDE, which you will likely already have installed being part of the default Eclipse RCP release.

Go to the Software Updates in the Help menu:

and pick Add Site.... Enter the aforementioned update site as shown here:

Then, select the Bioclipse plugin:

After you hit Install and Eclipse install the fews tens of kBs of the plugin, the plugin should show up in your installation, like it did in mine:



Implementation Details

Writing the plugin was a challenge to me, and I am happy we were doing this in a hackaton. The Bioclipse-QSAR project already had a New Project wizard, but not for a new Plug-in Project. Some things are just slightly different then. For example, it turned out that creating a .classpath cannot be done in the regular way (it never showed up), and I had to dig up some internal code of the PDE. Actually, our current implementation is still using a few internal classes because of this:
IClasspathEntry[] entries = new IClasspathEntry[3];
String executionEnvironment = null;
ClasspathComputer.setComplianceOptions(
project,
ExecutionEnvironmentAnalyzer.getCompliance(executionEnvironment)
);
entries[0] = ClasspathComputer.createJREEntry(executionEnvironment);
entries[1] = ClasspathComputer.createContainerEntry();
IPath path = project.getProject().getFullPath().append("src/");
entries[2] = JavaCore.newSourceEntry(path);
Ideas are most welcome on how to clean up this code, and not make it use internal, non-exported classes. For the Java source files and even the MANIFEST.MF we are using templates, though I have seen this file being created programmatically too.

I'm sure we'll run in some needed plumbing here and there, but that's what update sites are for, not? Release soon, release often is an Open Source concept that works well in the Eclipse world.