Getting Started

From Paper Machines Wiki
Jump to: navigation, search

In order to run Paper Machines, you will need:

  • Zotero with PDF indexing tools installed (see the Search pane of Zotero's Preferences)
  • a corpus of documents with full text PDF/HTML and metadata
  • Java 6 or higher (download page)

The latest version of the software is available at http://papermachines.org/install.

Installation

  • Paper Machines is not a standalone application. It runs in Zotero -- the database of citations and archival documents written by historians for historians to better collaborate -- so you must install Zotero first. Go to http://zotero.org and install the standalone version of Zotero (i.e. not the plugin for Firefox or Chrome, although you can install those as well if you like)
  • You will now download Paper Machines and ask Zotero to accept it as an add-on.
    • First, download the Papermachines file.
    • Boot up Zotero Standalone, and choose "tools" at the top menu. Choose "add-ons". Now drag-and-drop the Paper Machines file (*.xpi) into the add-ons box.
    • Zotero should ask you if you want to install it. You will click "yes."

You need data

  • Paper Machines runs on full-text, ocr'd pdfs that are attached to full-annotated citations in Zotoro with good metadata.
    • full-text ocr'd pdf's are a kind of document where the computer can recognize the text. if you cannot highlight words in a pdf, it has not been ocr'd. you can find ocr software online. ocr'd pdf's are frequently available for download on services like JSTOR.
    • good metadata means that your citations in Zotero have author, title, date of publication, and place of publication, and are coded as books, articles, or chapters.
    • Paper Machines can run on HTML "snapshot" attachments (saved web pages with the icon of a camera), but make sure that these contain the full text you expect -- they may not download properly with some databases. It also reads text notes and plain text documents, configurable through the context menu -> "Paper Machines Preferences..."
    • It is a good idea to familiarize yourself with Zotero before attempting to use Paper Machines. Many librarians have written helpful guides about getting started with Zotero.
      • You need to know how to add files, use good metadata, attach pdf's to files, and organize files into folders to use Paper Machines. Play around a little with Zotero to learn how it works -- try adding some citations, creating subfolders, and attaching pdf's to some of them. If you need help, try to learn these techniques at another, Zotero-related site before attempting to use Paper Machines, so that you have data to get started with when Paper Machines is installed!
  • Paper Machines will not work unless you already have some data in Zotero.
    • Learning how to subscribe to other users' libraries is also handy because it is a good way of getting data.
    • You can also scrape data off of sources already out there, for instance JSTOR or the World Bank.
      • Zotero makes scraping these files super-easy where websites have already learned how to talk to Zotero. It creates a wee icon in the corner of your URL bar, which allows you to automatically scrape whatever you are looking at
      • For that to work, you first need to install the Zotero "connector" to work with your Safari, Chrome, or Firefox Browser.
    • Some users have found that Paper Machines produces empty results for smaller datasets. We suggest beginning with at least 20 files before you attempt a wordcloud or relational diagram, and more like 50 to 100 before you attempt Topic Modeling.

Run Paper Machines for the First Time

  • Once Paper Machines is installed, right-click on any folder in your Zotero library and notice that there is an expanded menu
    • several of the new menu items will be grayed out, but the item at the top -- "extract text for paper machines" -- will be dark. choose it.
    • "Extracting text for Paper Machines" will run for a minute while the computer reads all of your texts. You have a bunch of texts in that folder already, don't you?
    • After "extracting texts" the first time, you must click on another folder to allow Paper Machines to refresh. Then return to the folder that you were working on.