Well, I figured out that you can edit the HTML in the files where the signatures are stored. Just take out the paragraph marks and everything works much better!
I found the files in
C:\Documents and Settings\
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="ascii" method="text">
<xsl:template match="/">
<xsl:for-each select="//pre">
<xsl:value-of select=".">
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
3) I then had a bunch of bibtex files. I needed to extra the titles and authors. So I decided to the bibtex to XML. After a few false tries, I found bib2XML and converted my files to XML. However, I later found that bib2XML does not properly translate special characters (accented characters, superscirpts, trademark symbols). Rather than fix the code, I fixed the files by hand using a few regular expression substitutions.
4) Then I wrote a .NET application to drive the Google web services API. This was fairly straightforward, except that the Google server for registering to use the API was down for a few days. Also, when I passed null for a default parameter, the call failed without a useful explanation. I tried using empty strings instead and it works.
5) My program created a tab-delimited file (TSV) with a line for each author of a paper, which I then loaded into Excel for analysis. I used the data anlysis wizard to count and sum the number of hits for publications. But now I want to do some more sophisticated analysis.
You might say this is overkill for the purpose I had in mind (or even that it is irrelevant). But I sometimes try to automate processes like this just to find out how hard it is. This one was pretty difficult.