GeoNetwork, Me, and a Rubber Mallet (Pt. 5)
Eesh. So many, many months ago we began chopping at the guts of the open source GeoNetwork project (GN), with the intent of building a much cleaner, more fluid, more approachable front end to the GN platform. Why? Well, because the back end handles a lot of stuff well already, including all of the forms for metadata input; the external submit function (most importantly, probably, from ArcCatalog); and of course the lucene indexing of that content. More than that, probably, is the built-in harvestability and harvestable…ness (the ability to both harvest via CSW or directly from other GN nodes and be harvested by same). In other words, GN does a lot of the stuff already that any metadata catalog would need to do. Its problem, in my estimation, was a rather ugly and cumbersome GUI, something we set out to fix with a combination of more minimalistic styles, OpenLayers, TileCache, and a lot of brush clearing in the tangle of .xsl files that actually comprise the default GN GUI.
I wrote about this a little (not as much as I intended, of course), then had to set it aside as various other projects repeatedly pulled rank (that’s the sad truth of being a librarian, by the way: your own work is usually subsumed by funded projects on which you’re just a cog or supporting player). It had a lot of potential to not only solve a campus need for better spatial data distribution, but also become a node in The Libraries’ burgeoning master plan for data curation (not to mention further that all of this is supposed to blossom into a campus resource that various proposals to various funding agencies may cite as at least partial fulfilment of data curation requirements). Armed with just an undergraduate student worker (~10 hours/wk) and whatever time I could scrape together at night, I was able to get a working prototype that looked like this:
And it worked! It’s a largely untouched back end (except for some customization of the *index-fields.xsl files) plus a heavily customized front end that swapped out InterMap (wtf is InterMap?) with OpenLayers and ran all renderable layers through TileCache for better repeat performance. We also had a script that ran as a cron job that would pre-configure TileCache each night so all layers were ready for the day to come. (We didn’t quite get around to seeding the TileCache, however).
But the writing was on the wall almost from the start. The native GN GUI is a pretty rigid mess of Jeeves-run java services that output xml that filter through xsl for styling and presentation. Guess what? All of that garbage doesn’t exactly lend itself to swift and efficient web dev. Meaning it took a lot of effort just to get where we got and any future functionalities and fanciness was going to come at a price — more and more of our blood and time.
So late last semester I finally pulled the plug on the customized GUI version (yes, before we ever even released an alpha of what we had worked so hard on) and split for more manageable waters. And I’m here now to announce the new name in this series, draw out the new stack of technologies, and renew a promise to document the process better than before.
New name: GeoNetwork, Solr, OpenLayers, Me, and Rubber Mallet.
New stack: GN -> Lucene -> Solr -> OpenLayers/jQuery/PHP.
The new plan — just about caught up to our previous work already — is to simply use GN as-is for admin only. It will harvest, be harvested, edit and index metadata as it was meant to do. In front of that we have Apache’s Solr running, feeding on GN’s native Lucene index. This allows us to be much more flexible at every stage down the line from there, and the first place this has paid off is what I’ll write about next time — pulling records out of GN’s database (mysql), using Solr, as JSON responses in a homemade OpenLayers/jQuery/PHP web app (here’s a preview: it’s about 10 times easier than fucking with all of those xsl stylesheets).

This works well, but you have to be sure that all files that point to your new file know that its name has been changed (and this is upwardly-cascading, of course), so that if you edit a file to point to your new “_pugo” version of something, you then have to edit that file by adding “_pugo,” which means you must make sure all files that point to or include that file are altered and so on and so on. It makes much more sense when you have files in front of you and I don’t want to beat this horse any more than I have already. Do it differently or better or not at all if you want. We don’t have any real genius to be telling you how to keep your shit together.







