Saturday, November 7, 2009

Call for Collaboration: JavaDoc validation with OpenJavaDocCheck

Dear all,

while it is still very much in progress, I have made good progress with writing a new BSD-licensed replacement for Sun's DocCheck utility for testing the library's JavaDoc quality. The DocCheck's never really satisfied me, and the most recent version is ancient. Because it is closed source, no one can continue on those efforts. DocCheck is MIA.

PMD is given nice, simple overviews instead. It provides me with a quick overview of what is wrong with the CDK JavaDoc, and also provides a decent XML format which allows extraction of information, which is used by, for example, SuperNightly as showed yesterday in PMD 2.4.5 installed in the CDK 1.2.x branch.

I have been pondering about it for a long time now, but writing a JavaDoc checking library is hardly core cheminformatics research; at least, you would not get funding for it, despite everyone always complaining about good documentation. Alas.

A few weeks ago, I was reviewing some more code, and again saw the very common error of the missing period at the end of the first sentence in JavaDoc. This one is sort of important for proper JavaDoc documentation generation, but the complexity of the current DocCheck reporting, people are not familiar enough with it. Being tired of having to repeat myself, I decided to address the problen, but creating better Nightly error reporting for the CDK JavaDoc.

So, I started OpenJavaDocCheck, or ojdcheck. As mentioned, I have made quite promising progress, and the current version provides the ability to write custom tests (which I plan to use for validating content of CDK taglet content), and create XML as well as XHTML which can be saved to any file. To give you a glimps of where things are going, here's a screenshot of the current XHTML output:

This list shows a mix of tests that are now implemented in OpenJavaDocCheck itself, but the third line is actually a test that is plugged in and specific for the CDK. This is an important feature, I think, and allows users of OpenJavaDocCheck to add functionality is that is not interesting to the general public, but very interesting for the JavaDoc being analyzed. Well, at least, it is to the CDK project :)

The current list of tests is still quite small, and consists of these tests:
  • test if each class and method has JavaDoc
  • test for missing @return tags
  • test for missing @param tags
  • test for @returns instead of @return
  • test @param template code, such as added by IDEs like Eclipse
  • test @exception template code, such as added by IDEs like Eclipse
  • test for redundant @version tags
I am now seeking feedback on the current code base, and potentially collaboration with writing more JavaDoc validation tests. There is enough to do, and I have been thinking on tests for:
  • spell checking JavaDoc
  • checking for 404s of web pages linked with <a href> in the JavaDoc
  • well-formedness of the HTML in the webpages
And about:
  • a PMD-like system to allow people to choose which testing they want or not
  • an Eclipse plugin

Wednesday, November 4, 2009

New Bioclipse Features: Kabsch Alignment, RMSD Distance and Tanimoto Simarlity Matrices

We recently submitted a second paper on Bioclipse, and have worked hard in the past two weeks on addressing the reviewers' questions (and we love these feature requests! See also these two blogs). One reviewer seemed very interested in seeing docking available in Bioclipse. While we do not have a full docking feature set up for Bioclipse, we do have functionality to deal with 3D structures, though our researched urged us to focus on the 2D side of cheminformatics so far.

To strengthen our intentions towards the 3D cheminformatics world, we have implemented a few new features, using CDK functionality. For example, we added Kabsch aligment and the related RMSD between molecular structures implemented as both popup menus as well as manager methods. The manager method you can see in action in MyExperiment workflow 937, which you can download directly into Bioclipse with one simple command (see Bioclipse Manager for
var smileses = new Array("CC(C)C", "CCCN", "CCC=O");

var unaligned = cdk.createMoleculeList();
for (i=0; i<smileses.length; i++) {
  mol = cdk.fromSMILES(smileses[i]);
  mol = cdk.generate3dCoordinates(mol)

var aligned = cdk.kabsch(unaligned)

for (i=1; i<aligned.size(); i++) {
Now, we do have to update the use of Jmol in Bioclipse, and a big overhaul is scheduled for the 2.4 released in February next year. But you get the idea.

As said, there are two stories to adding this new functionality. Because we want all GUI interaction the user performs to be recordable (Scientist 1: What did you do to get those nice results? Scientist 2: I pushed that button in the that long menu. Scientist 1: What button is that? Scientist 2: Wait, I send you the BSL script with a Google Wave.)

The managers that allow this recording is Bioclipse specific, and also the reason why it would not be trivial to make a general Bioclipse plugin for Eclipse... some Spring magic is used to inject the managers into the JavaScript language. Anyway, the second thing is to add a GUI element, like popup menus. Now, this is a particular area where Eclipse excels. Now, I did have to ask for the details, as I am not using this daily (I'm doing science, not IT), but Ola was kind enough to give me the pointers for it.

The below configuration snippet links the pop up action to Bioclipse Navigator content (you know, where your MDL SD, CML, script and other files show up in Bioclipse). But only if I have selected 3 or more files! And, only if those files are actually some molecular content with 3D coordinates! And Bioclipse inherits this functionality by using the Eclipse platform.
    label="Perform Kabsch Alignment"
      <with variable="selection">
        <count value="(2-"/>
        <iterate operator="and" ifEmpty="false">
          <adapt type="org.eclipse.core.resources.IResource">
              <test property="org.eclipse.core.resources.contentTypeId"
              <test property="org.eclipse.core.resources.contentTypeId"
              <test property="org.eclipse.core.resources.contentTypeId"
When Bioclipse is run, this looks like:

And the alignment results will nicely show up in a Jmol viewer (while it is implemented as an Eclipse editor, it is not yet):

The first screenshot also shows the new pop-up menus for calculating two matrices for 3 or more molecules. One is based on the RMSD of the 3D atomic coordinats of the atoms in the MCSS (BTW, Asad's SMSD work is making its way into the CDK library, and will be available in a later Bioclipse version too.) and will create a distance matrix. The second new pop-up menu used the Tanimoto similarity measure based on CDK fingerprints on the selected chemical graphs. If the Bioclipse Statistics feature is installed, the created CSV files will open up in a matrix editor: