GeoNetwork, Me, and a Rubber Mallet (Pt. 5)
Eesh. So many, many months ago we began chopping at the guts of the open source GeoNetwork project (GN), with the intent of building a much cleaner, more fluid, more approachable front end to the GN platform. Why? Well, because the back end handles a lot of stuff well already, including all of the forms for metadata input; the external submit function (most importantly, probably, from ArcCatalog); and of course the lucene indexing of that content. More than that, probably, is the built-in harvestability and harvestable…ness (the ability to both harvest via CSW or directly from other GN nodes and be harvested by same). In other words, GN does a lot of the stuff already that any metadata catalog would need to do. Its problem, in my estimation, was a rather ugly and cumbersome GUI, something we set out to fix with a combination of more minimalistic styles, OpenLayers, TileCache, and a lot of brush clearing in the tangle of .xsl files that actually comprise the default GN GUI.
I wrote about this a little (not as much as I intended, of course), then had to set it aside as various other projects repeatedly pulled rank (that’s the sad truth of being a librarian, by the way: your own work is usually subsumed by funded projects on which you’re just a cog or supporting player). It had a lot of potential to not only solve a campus need for better spatial data distribution, but also become a node in The Libraries’ burgeoning master plan for data curation (not to mention further that all of this is supposed to blossom into a campus resource that various proposals to various funding agencies may cite as at least partial fulfilment of data curation requirements). Armed with just an undergraduate student worker (~10 hours/wk) and whatever time I could scrape together at night, I was able to get a working prototype that looked like this:
And it worked! It’s a largely untouched back end (except for some customization of the *index-fields.xsl files) plus a heavily customized front end that swapped out InterMap (wtf is InterMap?) with OpenLayers and ran all renderable layers through TileCache for better repeat performance. We also had a script that ran as a cron job that would pre-configure TileCache each night so all layers were ready for the day to come. (We didn’t quite get around to seeding the TileCache, however).
But the writing was on the wall almost from the start. The native GN GUI is a pretty rigid mess of Jeeves-run java services that output xml that filter through xsl for styling and presentation. Guess what? All of that garbage doesn’t exactly lend itself to swift and efficient web dev. Meaning it took a lot of effort just to get where we got and any future functionalities and fanciness was going to come at a price — more and more of our blood and time.
So late last semester I finally pulled the plug on the customized GUI version (yes, before we ever even released an alpha of what we had worked so hard on) and split for more manageable waters. And I’m here now to announce the new name in this series, draw out the new stack of technologies, and renew a promise to document the process better than before.
New name: GeoNetwork, Solr, OpenLayers, Me, and Rubber Mallet.
New stack: GN -> Lucene -> Solr -> OpenLayers/jQuery/PHP.
The new plan — just about caught up to our previous work already — is to simply use GN as-is for admin only. It will harvest, be harvested, edit and index metadata as it was meant to do. In front of that we have Apache’s Solr running, feeding on GN’s native Lucene index. This allows us to be much more flexible at every stage down the line from there, and the first place this has paid off is what I’ll write about next time — pulling records out of GN’s database (mysql), using Solr, as JSON responses in a homemade OpenLayers/jQuery/PHP web app (here’s a preview: it’s about 10 times easier than fucking with all of those xsl stylesheets).

May 28th, 2009 at 9:28 am
Thanks for posting. I believe I have heard of this project through the GN mailing list quite a while ago. Actually, I believe there were several projects focused on the GN front end.
I totally agree with what you’re saying. That’s why for the repositories I’m working on, I use Drupal CMS as my front end. It’s a PHP/MySQL app with integrated jQuery, and does a great job handling user management/permissions. I use Apache Solr behind the scenes, which Drupal supports as well (better support coming very soon.) With Drupal modules such as CCK, Views, linkages with OpenLayers and Google Maps API for WMS services, and others, it gives me a lot of control. Of course, it already does a great job separating style from content and easily allows me to apply theme/style from my main websites to my catalogs.
However, I am not integrating the back end Drupal database with a harvesting engine. I believe Drupal does support OAI-PMH but replicating data into a database that supports direct harvesting and/or CSW would be nice. At least CSS. Of course, if your data is found through search engines like Google, you have to ask yourself how important is harvesting to your mission? I’m going through those questions right now.
Just curious but do you run any internal map servers, like MapServer or GeoServer? Also, have you looked at deegree as a back end? I like GN as well, esp 2.4 with supposedly improved CSW support.
Good luck!
February 3rd, 2010 at 7:05 pm
Really admire the work your are putting in. It must have been hard to dump the previous plan, but for my two cents, it was the right decision. Hope it all goes to plan. I’ll be cheering from the sides.
Good Luck!
February 7th, 2010 at 8:24 pm
Dumb question time. Why use Solr and the lucene index? why not use CSW to query the geonetwork repository?
Using CSW would seem to be more portable and uncoupled form the lucene index. I understand though that there are some deficiencies in the current implementation of CSW in geonetwork.
February 7th, 2010 at 9:42 pm
Not dumb at all (I hope), since we actually started out with CSW. But Solr is attractive because it has a lot of extra power behind it in its faceted results, multiple index support, etc. We’re ultimately planning to do some interesting pre-query parsing to situate the query string in a given domain (and thus a pre-filtered resultset), since this will support many different disciplines who may be looking for quite different things. That’s the idea, anyway. CSW was already pretty finicky when we were testing it and Solr came out of the box with JSON — it just looked so much easier to work with.