Running XQuery from the Command Line with JRuby

2008.01.28 06:38

During daylight hours, I work on a CMS.

We use XQuery to transform a lot of our display-agnostic XML content to be more UI-friendly. We use XOM as our XML parser and Nux as a bridge between XOM and Saxon’s XQuery support.

I love XQuery, but I have a really hard time with the syntax. I can’t always remember if I need to use curly braces or not, what context I’m in, which variables I can use and which I can’t, etc. Sitting and waiting for things to publish and watching them blow up was losing its charm, so I wrote a small script to run an XQuery file against an XML file and print the results.

Here’s my xquery.rb script:

#!/usr/bin/env jruby

require 'java'
include_class 'nux.xom.xquery.XQuery'
include_class 'nux.xom.xquery.ResultSequence'
include_class 'nu.xom.Builder'
include_class 'nu.xom.Document'
include_class 'nu.xom.Serializer'
include_class 'java.io.ByteArrayOutputStream'

# need to distinguish Java's File from Ruby's File
module Java
  include_class 'java.io.File'
end

def fail(msg)
  $stderr.puts "ERROR -- #{msg}"
  exit 1
end

def assert_file_exists(*paths)
  paths.each do |path|
    fail("Could not find file '#{path}'")\
      unless File.exist?(path)
  end
end

fail('Usage: <xquery> <xml>') unless ARGV.size == 2

xquery_path, xml_path = ARGV

assert_file_exists(xquery_path, xml_path)

xquery = nil
open(xquery_path) do |stream|
  xquery = XQuery.new(stream.read, nil)
end

doc = Builder.new.build(Java::File.new(xml_path))

results = xquery.execute(doc)

nodes =  results.toNodes()
fail("No nodes returned") if nodes.size == 0

(0 .. nodes.size-1).each do |i|
  outstream = ByteArrayOutputStream.new
  serializer = Serializer.new(outstream)
  serializer.setIndent(2)
  serializer.write(Document.new(nodes.get(i)))
  puts outstream.toString()
end

Below is my xquery.sh script. I hate that I have to write this, but I don’t see how to avoid setting up the classpath.

The jars are loaded from my local Eclipse/Equinox/OSGi target platform, which contains all of the bundles and jars we use in our application.

#!/bin/sh -e

tp_root=~/local/eclipse/sf-trunk-tplatform/

bundles="cim.sf.core.text.xml\
 cim.sf.core.library.xom\
 cim.sf.core.library.saxon"

for bundle in $bundles; do
    for jar in `find $tp_root/$bundle* -name '*.jar'`; do
	export CLASSPATH="$CLASSPATH:$jar"
    done
done

xquery.rb $@

1 comment

Why don’t u just ENV['CLASSPATH']=”..” at the top of the rb script?

Comments? (moderated as hell)

allowed HTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>