Friday, January 16, 2009

Bioclipse and Gist integration

As you might have read, Bioclipse has scripting support (see for example, Scripting JChemPaint), and that we have been collection them on Gist and indexing them on Delicious with the tags bioclipse and gist. This provides a nice overview of what you can do with the current SVN version of Bioclipse2. And, hopefully, when released, allow users to quickly learn about Bioclipse features, allow people to share scripts etc. Think of it as for Bioclipse.

Now, what was missing until today, was easy access to gists in Bioclipse itself. No gist.load(33421) yet. There still is not, but I uploaded earlier today a Wizard for it. (The manager will follow later). Right click on an open Project, select New -> Other, and pick Download Gist:

and click Next:

Then, just type the number of the Gist you want to open in Bioclipse, for example 18315 (see Bioclipse2 Scripting #1: from SMILES to a UFF optimized structure in Jmol), and click another Next to select a file name and location:

The current code does require you to know the Gist number, so you'll need a web browser to look it up, but we do have search facilities in mind. Also, while the code attempts so, the resulting Gist is not automatically openend in an editor (a bug). Another idea is to just install the egit plugin in Bioclipse :)

Saturday, January 3, 2009

Editing and Validation of CML documents in Bioclipse

One advantage of using XML is that one can rely on good support in libraries for functionality. When parsing XML, one does not have to take care of the syntax, and focus on the data and its semantics. This comes at the expense of verbosity, though, but having the ability to express semantics explicitly is a huge benefit for flexibility.

So, when Peter and Henry put their first documents online about the Chemical Markup Language (CML), I was thrilled, even though is actually was still SGML when I encountered it. The work predates the XML recommendation. As I recently blogged, in '99 I wrote patches for Jmol and JChemPaint to support CML, which were published as preprint in the Chemical Preprint Server in a paper in 2000 in the Internet Journal of Chemistry. Neither of the two has survived.

Anyway, the Chemistry Development Kit makes heavy use of CML, and Bioclipse supports it too. Now, Bioclipse is based on the Eclipse Rich Client Platform architecture, for which there exist quite a few XML tools in the Web Tools Platform (WTP). Among these, a validation, content assisting XML editor. This means, I get red markings when I make my XML document not-well-formed or invalid. Just a quick recap: well-formedness means that the XML document has a proper syntax: one root node, properly closed tags, quotes around attribute values, etc. Validness, however, means that the document is well-formed, but also hierarchically organized according to some specification.

Enter CML. CML is such a specification, first with DTDs, but after the introduction of XML Namespaces with XML Schema (see There can be only one (namespace)). The WTP can use this XML Schema for validation, and this is of great help learning the CML language. Pressing Ctrl-space in Bioclipse will now show you what allowed content can be added at the current character position.

Yes, Bioclipse can do this now (in SVN, at least). This has been on my wishlist for at least two years now, but never really found the right information. Now, three days ago David wrote about End of Year Cramps in which he describes some of his work on the WTP for autocomplete for XPath queries. He see[s] a brighter future for XML at eclipse over the next year. I hope that those in the eclipse and XML community will help to continue to improve the basic support, so that first class commercial quality applications that leverage this support can continue to be built.

That was enough statement for me to ask in the comments on how to make the WTP XML editor aware of the CML XML Schema. It already picked up XML Schema's with xsi:schemaLocation, but I needed something to worked without such statements in the XML document itself. David explained that me that I could use the org.eclipse.wst.xml.catalog extension. This was really easy, and commited to Bioclipse SVN as:
<uri name=""
However, that does not make the WTP XML editor available in the Bioclipse application yet. Not ever in the "Open With"... So, I set up a CML Feature. After a follow up question, it turned out that the CML content type of Bioclipse was already a sub type of the XML type (see ):
name="Chemical Markup Language (CML)"
So, the only remaining problem was to actually get the WTP XML editor as part of the Bioclipse application. The new CML Feature takes care of that (I hope the export and building the update site work too, but that's yet untested), by important the relevant plugins and features. Last night, however, I ended up with one stacktrace which gave me little clue on which plugin I was still missing.

Therefore, I headed to #eclipse and actually met David of the blog that started this again. He asked nitind to think about it too, and they helped me pin down the issue. This relevant bit of the stacktrace turned out to be:
Caused by: java.lang.IllegalStateException
at org.eclipse.core.runtime.Platform.getPluginRegistry(
at org.eclipse.wst.common.componentcore.internal.impl.WTPResourceFactoryRegistry$ResourceFactoryRegistryReader.(
at org.eclipse.wst.common.componentcore.internal.impl.WTPResourceFactoryRegistry.(
at org.eclipse.wst.common.componentcore.internal.impl.WTPResourceFactoryRegistry.(
... 37 more
This refered to this bit of code of Eclipse'
Bundle compatibility = InternalPlatform.getDefault()
if (compatibility == null)
throw new IllegalStateException();

So, the plugin I turned to to have missing was org.eclipse.core.runtime.compatibility. Apparently, some parts of the WTP that the XMLEditor is using, still uses Eclipse2.x technology.

This screenshot shows the WTP XMLEditor in action in Bioclipse on a CML file. It shows the document contents with the 'Design' tab, which also shows allowed content, as derived from the XML Schema for CML. Also, note that the Outline and Properties view automatically come for free, which allows more detail and navigation of the content.

This screenshot shows the 'Source' tab for the same file, where I deliberately changed the value of the @id attribute of the first atom. The value does not validate against the regular expression defined in the CML schema for @id attribute values. It also shows the content assisting in action. At any location in the CML file, I can hit Ctrl-Space, and the editor will show me which content I can add at that location.

This makes Bioclipse a perfect tool to craft CML documents and learn the language.

Scripting JChemPaint

Stefan, Gilleain, Arvid and I had a JChemPaint Developers Workshop in Uppsala, to sprint the development of JChemPaint3, for which Niels layed out the foundation already a long time ago.

Gilleain and Arvid merged their branches into a single code base, while Stefan worked on the Swing application and applet. The Bioclipse SWT-based widget is being developed for Bioclipse2.

The new design separates widget/graphics toolkit specifics from the chemical drawing and editing logic. Regarding the editing functionality, this basically comes down to have a semantically meaningful edit API. This allows us to convert both Swing and SWT mouse events into things like addAtom("C", atom), which would add a carbon to an already existing atom. However, without too much phantasy, it allows adding a scripting language. This is what I have been working on. Right now, the following API is available from the Bioclipse2 JavaScript console (via the jcp namespace, in random order):
  • ICDKMolecule jcp.getModel()
  • IAtom getClosestAtom(Point2d)
  • setModel(ICDKMolecule) (for really fancy things)
  • removeAtom(IAtom)
  • IBond getClosestBond(Point2d)
  • updateView() (all edit command issue this automatically)
  • addAtom(String,Point2d)
  • addAtom(String,IAtom) (which works out coordinates automatically)
  • Point2d newPoint2d(double,double)
  • updateImplicitHydrogenCounts()
  • moveTo(IAtom, Point2d)
  • setSymbol(IAtom,String)
  • setCharge(IAtom,int)
  • setMassNumber(IAtom,int)
  • addBond(IAtom,IAtom)
  • moveTo(IBond,Point2d)
  • setOrder(IBond,IBond.Order)
  • setWedgeType(IBond,int)
  • IBond.Order getOrder(int)
  • zap() (sort of sudo rm -Rf /*)
  • cleanup() (calculate 2D coordinates from scratch)
  • addRing(IAtom,int)
  • addPhenyl(IAtom)
This API (many more method will follow) is not really aimed at the end user, who will simply point and click. The goal of this scripting language is, at least at this moment, to test the underlying implementation using Bioclipse. Future applications, however, may include simple scripts which use some logic to convert the editor content. For example, replacing a t-butyl fragment into a pseudo atom "t-Bu". The key thing to remember, is that this will allow Bioclipse to have non-CDK-based programs act on the JChemPaint editor content (e.g. using getModel() and setModel(ICDKMolecule)). More on that later.

A simple script could look like: Or, as screenshot:

A RCP dedicated blog

Yes, yet another blog by me. This one is special, being the first where I just copy/paste blog items from other blog I do. Just, so that people only interested in RCP related stuff, can tune in here. If only I knew how to do categories on