Use Google to estimate how many pages refer to each paper published at the conference, then analyze this data to find the most cited authors.
Here's what I did:
1) First I needed a list of papers. First I thought of scraping HTML from DBLP. Then I noticed that DBLP contains bibtex items, so I used WinHTTrack Website Copier to download bibtex files from DBLP for that conference. This was not too hard, but I did have to play with the configuration a fair bit to get it to work. This is because the bibtex files are stored on a different server.
2) I had to get the bibtex out of the HTML pages. I decided to write an XSLT script to do this, but had a problem because HTML is not valid XML. After a few false starts, I was able to download, compile and run the .NET Html Agility Pack to clean up the HTML. Then I concatenated all the files and ran this XSLT script over them:
<xsl:output encoding="ascii" method="text">
3) I then had a bunch of bibtex files. I needed to extra the titles and authors. So I decided to the bibtex to XML. After a few false tries, I found bib2XML and converted my files to XML. However, I later found that bib2XML does not properly translate special characters (accented characters, superscirpts, trademark symbols). Rather than fix the code, I fixed the files by hand using a few regular expression substitutions.
4) Then I wrote a .NET application to drive the Google web services API. This was fairly straightforward, except that the Google server for registering to use the API was down for a few days. Also, when I passed null for a default parameter, the call failed without a useful explanation. I tried using empty strings instead and it works.
5) My program created a tab-delimited file (TSV) with a line for each author of a paper, which I then loaded into Excel for analysis. I used the data anlysis wizard to count and sum the number of hits for publications. But now I want to do some more sophisticated analysis.
You might say this is overkill for the purpose I had in mind (or even that it is irrelevant). But I sometimes try to automate processes like this just to find out how hard it is. This one was pretty difficult.