Getting Started

From Paper Machines Wiki
Jump to: navigation, search

In order to run Paper Machines, you will need:

  • Zotero with PDF indexing tools installed (see the Search pane of Zotero's Preferences)
  • a corpus of documents with full text PDF/HTML and metadata
  • Java 6 or higher (download page)

The latest version of the software is available at


  • Paper Machines is not a standalone application. It runs in Zotero -- the database of citations and archival documents written by historians for historians to better collaborate -- so you must install Zotero first. Go to and install the standalone version of Zotero (i.e. not the plugin for Firefox or Chrome, although you can install those as well if you like)
  • You will now download Paper Machines and ask Zotero to accept it as an add-on.
    • First, download the Papermachines file.
    • Boot up Zotero Standalone, and choose "tools" at the top menu. Choose "add-ons". Now drag-and-drop the Paper Machines file (*.xpi) into the add-ons box.
    • Zotero should ask you if you want to install it. You will click "yes."

You need data

  • Paper Machines runs on full-text, ocr'd pdfs that are attached to full-annotated citations in Zotoro with good metadata.
    • full-text ocr'd pdf's are a kind of document where the computer can recognize the text. if you cannot highlight words in a pdf, it has not been ocr'd. you can find ocr software online. ocr'd pdf's are frequently available for download on services like JSTOR.
    • good metadata means that your citations in Zotero have author, title, date of publication, and place of publication, and are coded as books, articles, or chapters.
    • Paper Machines can run on HTML "snapshot" attachments (saved web pages with the icon of a camera), but make sure that these contain the full text you expect -- they may not download properly with some databases. It also reads text notes and plain text documents, configurable through the context menu -> "Paper Machines Preferences..."
    • It is a good idea to familiarize yourself with Zotero before attempting to use Paper Machines. Many librarians have written helpful guides about getting started with Zotero.
      • You need to know how to add files, use good metadata, attach pdf's to files, and organize files into folders to use Paper Machines. Play around a little with Zotero to learn how it works -- try adding some citations, creating subfolders, and attaching pdf's to some of them. If you need help, try to learn these techniques at another, Zotero-related site before attempting to use Paper Machines, so that you have data to get started with when Paper Machines is installed!
  • Paper Machines will not work unless you already have some data in Zotero.
    • Learning how to subscribe to other users' libraries is also handy because it is a good way of getting data.
    • You can also scrape data off of sources already out there, for instance JSTOR or the World Bank.
      • Zotero makes scraping these files super-easy where websites have already learned how to talk to Zotero. It creates a wee icon in the corner of your URL bar, which allows you to automatically scrape whatever you are looking at
      • For that to work, you first need to install the Zotero "connector" to work with your Safari, Chrome, or Firefox Browser.
    • Some users have found that Paper Machines produces empty results for smaller datasets. We suggest beginning with at least 20 files before you attempt a wordcloud or relational diagram, and more like 50 to 100 before you attempt Topic Modeling.

Run Paper Machines for the First Time

  • Once Paper Machines is installed, right-click on any folder in your Zotero library and notice that there is an expanded menu
    • several of the new menu items will be grayed out, but the item at the top -- "extract text for paper machines" -- will be dark. choose it.
    • "Extracting text for Paper Machines" will run for a minute while the computer reads all of your texts. You have a bunch of texts in that folder already, don't you?
    • After "extracting texts" the first time, you must click on another folder to allow Paper Machines to refresh. Then return to the folder that you were working on.