Stephen's BibTeX tools

I've written a script bibtex-from-paper which attempts the impossible feat of getting good-quality BibTeX from a simple PDF or Postscript file. It does so by a combination of text-scraping, Google Scholar and publisher web sites. I want to avoid scraped BibTeX, such as that found on Google Scholar, Citeseer and other sites, because it's often poor-quality. So my script uses Google Scholar simply for full-text search, and then downloads BibTeX from publisher sites such as ACM Portal and IEEExplore. This BibTeX is usually generated manually and is therefore of much higher quality.

Please note that these scripts are not intended to aid or enable any sort of automated crawling of the various web sites (ACM Portal, IEEE Explore, springerlink.com, CiteseerX and others) which they use. They are simply a tool for making it more convenient to locate and download the best available BibTeX for papers that you have legally and manually downloaded separately. I am not responsible for their misuse.

I'm making the script available here, but note that it only works for papers that are full-text indexed on Google Scholar, and even then it can be a bit flaky. It has several prerequisites:

The main script is here: bibtex-from-paper. It works by extracting a plausible English sentence from the paper, feeding that as a query into Google Scholar, and looking for results at any site for which a getbibtex script is available. The latter scripts simply load the page at the target site, follow the BibTeX link therein, and do any conversion necessary to yield plain text BibTeX as output.

If you have any success with these scripts, or make any improvements to them, I'd be interested to find out, so please do contact me.

Content updated at Tue 9 Dec 17:07:00 GMT 2008.
validate this page