Cromulent Postings: 2008

Sunday, December 07, 2008

The 'Branches/Tags/Trunk' Convention

Q. I have several projects under Subversion which do not adhere to the branches/tags/trunk convention. How do I move to this convention?

A. Check out each project and 'cd' to the local copy. Then:


mkdir branches tags trunk; svn add branches tags trunk
ls -A | egrep -v '\.svn|branches|tags|trunk' | xargs -I X svn move X trunk
svn commit -m "Moved to the 'branches/tags/trunk' convention."

Wednesday, September 03, 2008

Europa to Ganymede

I recently upgraded my version of Eclipse from Europa (v3.3) to Ganymede (v3.4) on Mac OS X. I wanted to install the following plug-ins: Subclipse (for Subversion), M2Eclipse (for Maven), WTP (for various editors, e.g. XML, XSD, etc.), and PDT (for PHP).

First up, Subclipse. I added the update site and installed it without a hitch. Eclipse required a restart; an annoyance I was hoping Ganymede would have fixed.

On to M2Eclipse. I added the update site, tried an install and got the following cryptic error messages:


Unsatisfied dependency: [org.maven.ide.eclipse.ajdt.feature.feature.group
0.9.5.20080717-1821] requiredCapability:
org.eclipse.equinox.p2.iu/org.eclipse.ajdt.core/0.0.0
Unsatisfied dependency: [org.eclipse.jst.common.frameworks 1.1.102.v200709122200]
requiredCapability: osgi.bundle/org.eclipse.emf.ecore.xmi/[2.2.0,2.4.0)
... snip (many more lines of similar error messages) ...

After some Googling, I found out that M2Eclipse requires that the AspectJ Development Tools (AJDT) and the Web Tools Platform (WTP) are installed first. I added the AJDT update site and installed it with one peculiarity. Prior to the install, Eclipse complained that the http://md.pp.ru/~eu/12 website was unavailable. This website belongs to a Eugene Kuleshov, a developer who is interested in, amongst other things, AspectJ. Perhaps he is an AJDT developer. Anyhow, the warning did not prevent the install from completing successfully. I added the WTP update site and it (eventually) installed. I returned to M2Eclipse and it installed correctly.

Finally, the PDT. I added the update site, tried an install, and got another error message:


Cannot find a solution satisfying the following requirements Match[requiredCapability:
org.eclipse.equinox.p2.iu/org.eclipse.wst.web_ui.feature.feature.group
/[3.0.1.v200807220139-7R0ELZE8Ks-y8HYiQrw5ftEC3UBF,3.0.1.v200807220139-7R0ELZE8Ks-
y8HYiQrw5ftEC3UBF]].

I have had problems with this plug-in before when using Europa. I followed the instructions on the PDT website, but to no avail. Three out of four plug-ins will have to suffice for now :-( Maybe Eclipse v3.5 will have an smoother plug-in system.

Wednesday, August 13, 2008

MIT's Simile Timeline

MIT's Simile Timeline is a DHTML-based AJAX widget for visualizing temporal information. Here it is trying to visualize my travel.xml data (where I've been):

Unfortunately, this won't display correctly in an RSS reader. The file timeline-api.js must (annoyingly) be included within the 'head' tags of a HTML page. So, I modified Blogger's template and my own mirror. However, you will also notice that Blogger fails to load travel.xml (the timeline is empty), whereas my mirror, which is on the same server as travel.xml and simply gets its feed from Blogger, loads it correctly. Ugh!

Thursday, July 31, 2008

CiteSeer's Dataset

I am exploring the citation and co-authorship graphs of the documents (and contexts) indexed by CiteSeer. However, parsing their index has proved tricky. The good news is that CiteSeer provides an OAI-PMH compliant dump of their index. I downloaded and unzipped the index as follows:


$ wget http://cs1.ist.psu.edu/public/oai/oai_citeseer.tar.gz
$ tar -zxf oai_citeseer.tar.gz

The file is based on the Dublin Core standard with additional metadata fields, including citation relationships (References and IsReferencedBy), author affiliations, and author addresses. The index is split into many 'dump' files with no root XML tag. So:


$ echo "<records>" `cat oai_citeseer/*` "</records>" > cs.xml

The file is quite big: approximately 1.9GB with over 36 million lines. The bad news is that:


$ xmllint --stream cs.xml

cs.xml:92025: parser error : attributes construct error
 <oai_citeseer:author name="L. "j. Svensson">

The XML is not well-formed. I tried some quick repairs with sed:


$ sed -e 's/L\.\ \"j\.\ Svensson/L\.\ J\.\ Svensson/g' cs.xml > csX.xml; mv csX.xml cs.xml
$ xmllint --stream cs.xml

cs.xml:168403: parser error : internal error
 <dc:title>Imagining CLP(^,= alpheta )</dc:title>

There also appears to be unprintable characters in the file. A post from the Xalan mailing list provides a solution:


$ java XMLFix cs.xml > csX.xml; mv csX.xml cs.xml
$ xmllint --stream cs.xml

cs.xml:418791: parser error : attributes construct error
 <oai_citeseer:author name="Nitin "nick Sawhney">

A recurring problem concerns people who parenthesize some part of their name, e.g. Nitin "Nick" Sawhney's. To fix these errors in the name attribute of the oai_citeseer:author tag:


$ sed -e 's/\(name\=\"[^\"]*\)\"\([^\"]*\"\>\)/\1\2/g' cs.xml > csX.xml; mv csX.xml cs.xml
$ sed -e 's/\(name\=\"[^\"]*\)\"\([^\"]*\)\"\([^\"]*\"\>\)/\1\2\3/g' cs.xml > csX.xml; mv csX.xml cs.xml
$ xmllint --stream cs.xml

cs.xml:25857443: parser error : attributes construct error
 <oai_citeseer:author name="Kai Voy"""zy Massachusettsiassachu">

I manually edit this line with vim, and I'm done! I have a well-formed XML file.

Saturday, May 17, 2008

JavaScript: 1997 vs 2008

After buying a very early issue of .net Magazine (1997?) that included the latest versions of both Internet Explorer (4.0?) and Netscape Navigator (4.0?) on a bonus CD, I wrote my very first lines of JavaScript. As far as I can remember, my script allowed a hyperlink to show/hide a block of text within the same page. The CD included many other DHTML examples, some of which needed separate code for the two browsers.

Recently, I've been using JavaScript to write something a little more complicated and I've been watching Douglas Crockford et al.'s JavaScript videos (JavaScript, The Theory of the DOM, Advanced JavaScript, Browser Wars, Quality, The Good Parts, The State of Ajax) to bring me up-to-date. JavaScript has changed alot: first-class functions, prototype-based inheritance, closures, variadic functions, etc. I have some catching up to do :-(

Friday, March 21, 2008

More hyperref woes

Following on from my last post on hyperref and algorithm2e, I have ran into difficulties with hyperref links and page breaks. Suppose I process the following with pdflatex:


\documentclass[11pt,english]{article}
\usepackage{hyperref}

\begin{document}
This is the top of the page!

\vspace{18cm}

This is the bottom of the page! Blah blah blah blah \url{http://www.martinharrigan.ie}?
\end{document}

Then I get:

It looks like the header has become a link as well! With natbib links, the page number is also active. My quick fix is to avoid page breaks in the middle of links altogether by using \clubpenalty:


\documentclass[11pt,english]{article}
\usepackage{hyperref}

\begin{document}
This is the top of the page!!

\vspace{17cm}

This is the bottom of the page! Blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah blah blah \clubpenalty10000\url{http://www.martinharrigan.ie}?
\end{document}

And then all is well in the world again:

By the way, the hopeful-sounding 'breaklinks' option of the hyperref package doesn't fix the problem.