Archive for the 'GeoNetwork' Category

GeoNetwork, Me, and a Rubber Mallet (Pt. 5)

Eesh. So many, many months ago we began chopping at the guts of the open source GeoNetwork project (GN), with the intent of building a much cleaner, more fluid, more approachable front end to the GN platform. Why? Well, because the back end handles a lot of stuff well already, including all of the forms for metadata input; the external submit function (most importantly, probably, from ArcCatalog); and of course the lucene indexing of that content. More than that, probably, is the built-in harvestability and harvestable…ness (the ability to both harvest via CSW or directly from other GN nodes and be harvested by same). In other words, GN does a lot of the stuff already that any metadata catalog would need to do. Its problem, in my estimation, was a rather ugly and cumbersome GUI, something we set out to fix with a combination of more minimalistic styles, OpenLayers, TileCache, and a lot of brush clearing in the tangle of .xsl files that actually comprise the default GN GUI.

I wrote about this a little (not as much as I intended, of course), then had to set it aside as various other projects repeatedly pulled rank (that’s the sad truth of being a librarian, by the way: your own work is usually subsumed by funded projects on which you’re just a cog or supporting player). It had a lot of potential to not only solve a campus need for better spatial data distribution, but also become a node in The Libraries’ burgeoning master plan for data curation (not to mention further that all of this is supposed to blossom into a campus resource that various proposals to various funding agencies may cite as at least partial fulfilment of data curation requirements). Armed with just an undergraduate student worker (~10 hours/wk) and whatever time I could scrape together at night, I was able to get a working prototype that looked like this:

geonetwork customized

And it worked! It’s a largely untouched back end (except for some customization of the *index-fields.xsl files) plus a heavily customized front end that swapped out InterMap (wtf is InterMap?) with OpenLayers and ran all renderable layers through TileCache for better repeat performance. We also had a script that ran as a cron job that would pre-configure TileCache each night so all layers were ready for the day to come. (We didn’t quite get around to seeding the TileCache, however).

But the writing was on the wall almost from the start. The native GN GUI is a pretty rigid mess of Jeeves-run java services that output xml that filter through xsl for styling and presentation. Guess what? All of that garbage doesn’t exactly lend itself to swift and efficient web dev. Meaning it took a lot of effort just to get where we got and any future functionalities and fanciness was going to come at a price — more and more of our blood and time.

So late last semester I finally pulled the plug on the customized GUI version (yes, before we ever even released an alpha of what we had worked so hard on) and split for more manageable waters. And I’m here now to announce the new name in this series, draw out the new stack of technologies, and renew a promise to document the process better than before.

New name: GeoNetwork, Solr, OpenLayers, Me, and Rubber Mallet.

New stack: GN -> Lucene -> Solr -> OpenLayers/jQuery/PHP.

The new plan — just about caught up to our previous work already — is to simply use GN as-is for admin only. It will harvest, be harvested, edit and index metadata as it was meant to do. In front of that we have Apache’s Solr running, feeding on GN’s native Lucene index. This allows us to be much more flexible at every stage down the line from there, and the first place this has paid off is what I’ll write about next time — pulling records out of GN’s database (mysql), using Solr, as JSON responses in a homemade OpenLayers/jQuery/PHP web app (here’s a preview: it’s about 10 times easier than fucking with all of those xsl stylesheets).

GeoNetwork, Me, and a Rubber Mallet (Pt. 4)

So last time it was on and on about how we’re trying to live-edit a GeoNetwork checkout by essentially creating a parallel copy of it right next to the original code. I used the word “alternatize,” which isn’t a word really, to describe how, for any file we add to or edit in GeoNetwork, we add a pseudo extension like “_pugo” to mark that it’s been touched by grubby hands. file naming patternThis works well, but you have to be sure that all files that point to your new file know that its name has been changed (and this is upwardly-cascading, of course), so that if you edit a file to point to your new “_pugo” version of something, you then have to edit that file by adding “_pugo,” which means you must make sure all files that point to or include that file are altered and so on and so on. It makes much more sense when you have files in front of you and I don’t want to beat this horse any more than I have already. Do it differently or better or not at all if you want. We don’t have any real genius to be telling you how to keep your shit together.

Anyway, it was all a big bore and there are only a few people reading this series anyway. But those that do AND are interested in GeoNetwork might like this one better — it’s brush-clearing day in the GN user interface. The first problem I ever had with GeoNetwork was its cluttered, complicated, 1.0-esque UI. It’s surprising, too, because it has some cool shit going on (geoRSS, ajax pulls of metadata results, dynamic mapping of data attached to found records). You can see slightly customized versions at fao.org or unocha.org. Both are good examples of how GeoNetwork is quite powerful out of the box, really, but also how it appears to not be in tune with how usability and web design has gone in recent years (no offense, I hope). And if you’re thinking “aw, don’t be a prick — it’s FOSS4G,” I dare you to click the “Advanced Search” form option. In fact I’ll give you 20m to try and fill out that form.

Lest you still think I’m being hard on this project, remember what my own profession thought was acceptable for the last fifteen years (and in many places still to this day). Exhibits A and B. And for obvious reasons that constitute a pretty good excuse — when those 20 minutes of form-filling are up you’ve constructed a very powerful and accurate search. But there are better ways to do it and that’s one of the primary objectives of this project — to present a sleek, minimalist interface to all these datasets without sacrificing the real power of the system (just hiding it).

So the first thing on my list was to severely cut back the intitial impression of the site, and the first thing to go was the table-based front page layout. It wasn’t that hard, really, I just went in and cut out almost all inputs, form widgets, and intermap stuff. Then in what was left all table elements were replaced with divs or spans and styled with a custom css. This mostly just took trial and error, and it’s still not done. Why? Well it needs to be cleaned up for good, for one thing (some stuff didn’t get axed because I couldn’t tell what it was doing). But also because that fucking Internet Explorer is doing funky things with div clears and widths. I hate it so much.

Anyway, I know this is sort of a step-through series of how we’re actually doing this stuff, but this main page customization takes virtually no skill — I just kept chopping and styling, chopping and styling until I was left with this (draft):

current draft of geonetwork UI design

…which is better than the way it was, in my opinion:


original geonetwork UI design

GeoNetwork, Me, and a Rubber Mallet (Pt. 3)

Getting back to this GeoNetwork business. We’re moving along at a decent clip given who we are and the fact that my GA is .5, student worker is .25 time, and I’m 1.89 time with 1.8 going to other projects. So this entry starts with work we did near the end of the summer and will be limited to the method we’re using to inject pretty severe edits into a checked-out GeoNetwork source in a way that will make at as easy as possible to maintain our live edits along with the checked-out GeoNetwork. The next entry will get into the edit of the default home page itself — later will come the search results, the addition of elements like OpenLayers or the styling of full metadata pulled in by ajax, etc. — but this one will explain away some strange things you would see if you checked out our version today.

Starting out I knew that if we made any progress at all there would come a time when the real GeoNetwork is updated and we would want to fold our changes right into the improved version. In an attempt to ease this process, one decision made right away was to alternatize the stuff we changed. And since that really isn’t a word, I’ll explain:

Most of the changes to the home page happen in file “main-page.xsl,” which gets called by the main.home service (defined in config.xml). I’ll get into the changes to this file below, but let’s jump ahead to us having a (severely) altered “main-page.xsl.” In order to not have to keep original copies around and start changing filenames in order to safely roll back to native GeoNetwork, I saved a copy of main-page.xsl with a pseudo extension that could be used to mark all of the files we change (or add). So “main-page.xsl” becomes “main-page_pugo.xsl,” and in fact this pseudoextension will be the norm for changes the rest of the way out. So we do two things; add “pugo” pseudoextension to the file to make an obvious copy and simultaneously mark edits inside of the files with “pugo” comments. For those files in GN whose file name could not be changed, it was obviously especially important for us to mark our edits with “pugo” comments. Worse comes to worse we can do searches for “pugo” to locate all of our edits, elisions, and additions.

So to begin live-editing and testing main-page.xsl we make a copy and name it “main-page_pugo.xsl.” Then to make GN actually use this file we need to configure the main.home service to load ours instead of the native. In config.xml, then, we change that service’s block from

to

This results in two distinct benefits. 1) we avoid having to diff twenty files (or more) every time the main GeoNetwork is updated, and 2) we can easily swap out our altered GeoNetwork with the native, allowing us to test how things are supposed to work and how they are (or more likely are not) working in ours. What it also means, however, is that any time we change a file to add a “_pugo” pseudoextension to a file, we need to be damned sure that all of the files that call the original now call our new copy. This is tedious one time for each file, but will save us a lot of heartache in the future. Get a text editor with file search or use Spotlight or something similar if you want it to be easy to find all files that call a given file. If you always make copies of files you edit (when possible) and point all files that use that file to your new copy, you end up with a cascading effect which is perfect — to turn on “our” GeoNetwork with all of the _pugo versions of files working together, we use config.xml as described above. To remove practically every change we make and run GeoNetwork natively, we remove the pseudoextension “_pugo” from the config.xml block quoted above, and everything is instantly back to normal.

…Not a thrilling entry, but so far this method is serving us very well. In the next entry we start really slicing up that home page, so don’t give up.

WorldWind Search Results Renderer

As we pound on GeoNetwork in order to make it a little more usable and presentable, we’ve had a couple of opportunities to see how well data viewers can inegrate with metadata search results. Primarily this means we’re including an OpenLayers instance that will automatically render either the data or the spatial footprint of a given search hit. Not a huge deal, this. A bigger deal is that we started toying with the idea of including an alternative globe render of search results. Google Earth was automatically disqualified because the embedded version is Windows-only (who does that in 2008?). So then we look at WorldWind Java and think…”why not?” Well, one reason “why not” is that I don’t know doink about java, so it became a special project for a grad asst.

And as we wait for a 0.6.0 release of WWj that purportedly has native CSW support (so it can be a stand-alone client for the catalog), my GA has gotten pretty far so far, able to get external page controls to act on an embedded WWj globe. I’ll post about this again when we’re further along (it will be part of my GeoNetwork series), but for now here are two rendered wms layers in an embedded WWj:


GeoNetwork, Me, and a Rubber Mallet (Pt. 2)

So GeoNetwork (GN here) is an open source java project. The open source part is great, of course, but the java? Well that’s…a problem. For me, anyway, given that java sits atop a huge, rich list of programming languages I know nothing about. This is nothing to be ashamed of if you’re a librarian, but it can be a problem if you’re the kind of librarian that complains, complains, complains about how most GIS web apps are practically cro-magnon (that’s right — they’re cro-magnon web apps). Why? Because ArcIMS made it easy to get something up. And once something was up, everybody quit trying, having fulfilled their county assessor’s office’s commitment to information transparency or whatever. The point is that these apps were clearly built by GIS people and whoever wanted them to be easy to use or (gasp!) attractive could go ahead and screw.

Then Google Maps proved that map interfaces didn’t have to be slow, cumbersome, lumbering gear-turners. Asynchronous calls to server side mechanisms meant you didn’t have to wait ten seconds every time you moved around, support for more common formats meant it was easier to get stuff on the map. Etcetera, right? You don’t need me to document the goodness of Google Maps.

Tragically, GeoNetwork’s great back end is almost entirely obscured by the 1.0-looking front end. Which is suprising, because the front end is built with some modular, ajaxy pieces that are usually praised when put into practice. To wit, here’s a simplified outline of a pretty standard request-response procedure using out-of-the-box GeoNetwork:

  • User loads default interface, probably at http://host:port/geonetwork/srv/main.home.
  • Shit starts to go down fast. The main.home service calls a bunch of xml and xsl files that together constitute the default page. Each element on this front page is a contribution from a different template out of xsl (the banner is created by banner.xsl, the header by header.xsl, etc.), so already it’s sort of easy to start busting up or removing pieces that offend your delicate sensibilities. You write a little or kill a little xsl and suddenly you’re “developing” GeoNetwork (of course not really, but a little gui tweaking goes a long way for some of us)
  • User draws a bounding box on the map for their area, fires off a search
  • Well there’s trouble already. The input map is a comically small Intermap embed, meaning it’s a little old school. But beware, if you choose “Advanced Search” as a means to present yourself with a little bit more space to make your choices, you’re going to be flooded with options and widgets and buttons and other garbage (flooded with a nifty ajax page element replacement, but flooded nonetheless). It’s almost as if it was built by the same genius that designed your basic Voyager-powered library opac (here’s an example of a classic look, here’s another — quite an orgy if you’re into filling out forms and reading about query syntax for ten minutes before you actually do any searching)
  • Anyway, the bbox and other parameters are sent to GN’s search service
  • GN uses Lucene to index its contents and in fact throws bbox and keyword queries right up against its Lucene index. I don’t know enough about Lucene to understand how it’s doing this minor spatial query, but have charged one of my student workers to find out for me (I’m a mean boss).
  • GN supports several metadata schema
  • …Including fgdc, iso 19115, 19139, and dublin core. Search results from a query into metadata in these different schema are then normalized with schema-specific stylesheets so it can all be returned to a search-results_xhtml.xsl in a denominated format for rendering to the browser.

    It’s in this results file where we have spent most of our time chopping here, trimming there, wildly flailing sharp things everywhere else. And it’s in there that the next post (first of several that will document the changes we’re playing with) will start.

    GeoNetwork, Me, and a Rubber Mallet (Pt. 1)

    So there is a remarkable dearth of documentation for geospatial datasets on my campus. A number of different researchers, labs, and other agencies are consuming and producing geospatial data from within their own insulated pockets of campus. Not very many are then doing anything clever or coordinated that would make those data discoverable and therefore usable by other groups on campus or, better still, the world.

    geonetwork logo

    The reason, naturally, is that it’s hard. PIs of these projects build machinery to help themselves first and foremost, secondarily fit the parameters of the project, provide ample fodder for publication and further research, and then somewhere down the line might think about how their stuff can be made more easily available to the world. Typically they mount some web-facing app and that’s that. Find it if you can.

    So we started a project over the summer to build a utility that can be one portal to geodata for and from campus labs and researchers. It will be a catalog, of sorts, but has potential built into it to be a much more agile and robust machine than a lot of libraries put up. It’s built on the open source GeoNetwork platform, but almost all of our work so far has been devoted to making GN more usable, more accessible in the ways web apps need to be these days.

    Let this post stand as an intro to a series of posts that will describe and document what we’ve done and what we’re doing with GN. None of us (it’s me, on of my GAs, and one student worker) are java developers. None of us are developers at all, in fact, but the students are game and we’ve been able to wrestle GN into submission on a number of aspects that might be of interest to others. Minimally, this series can be rolled into the articles we write following a hopefully-successful deployment.

    So the next post will discuss some basics about GN and its architecture (not all of which I understand). There are some aspects of how it’s built that are especially attractive to a dumb, clumsy librarian and I intend to start with those. The posts that follow will be devoted to the incisions and clubbings we dealt that platform in an effort to get it to where we wanted it.