From Paper Machines Wiki
In order to run Paper Machines, you will need:
- Zotero with PDF indexing tools installed (see the Search pane of Zotero's Preferences)
- a corpus of documents with full text PDF/HTML and metadata
- Java 6 or higher (download page)
The latest version of the software is available at http://papermachines.org/install.
- Paper Machines is not a standalone application. It runs in Zotero -- the database of citations and archival documents written by historians for historians to better collaborate -- so you must install Zotero first. Go to http://zotero.org and install the standalone version of Zotero (i.e. not the plugin for Firefox or Chrome, although you can install those as well if you like)
- You will now download Paper Machines and ask Zotero to accept it as an add-on.
- First, download the Papermachines file.
- Boot up Zotero Standalone, and choose "tools" at the top menu. Choose "add-ons". Now drag-and-drop the Paper Machines file (*.xpi) into the add-ons box.
- Zotero should ask you if you want to install it. You will click "yes."
You need data
- Paper Machines runs on full-text, ocr'd pdfs that are attached to full-annotated citations in Zotoro with good metadata.
- full-text ocr'd pdf's are a kind of document where the computer can recognize the text. if you cannot highlight words in a pdf, it has not been ocr'd. you can find ocr software online. ocr'd pdf's are frequently available for download on services like JSTOR.
- good metadata means that your citations in Zotero have author, title, date of publication, and place of publication, and are coded as books, articles, or chapters.
- Paper Machines can run on HTML "snapshot" attachments (saved web pages with the icon of a camera), but make sure that these contain the full text you expect -- they may not download properly with some databases. It also reads text notes and plain text documents, configurable through the context menu -> "Paper Machines Preferences..."
- It is a good idea to familiarize yourself with Zotero before attempting to use Paper Machines. Many librarians have written helpful guides about getting started with Zotero.
- You need to know how to add files, use good metadata, attach pdf's to files, and organize files into folders to use Paper Machines. Play around a little with Zotero to learn how it works -- try adding some citations, creating subfolders, and attaching pdf's to some of them. If you need help, try to learn these techniques at another, Zotero-related site before attempting to use Paper Machines, so that you have data to get started with when Paper Machines is installed!
- Paper Machines will not work unless you already have some data in Zotero.
- Learning how to subscribe to other users' libraries is also handy because it is a good way of getting data.
- You can also scrape data off of sources already out there, for instance JSTOR or the World Bank.
- Zotero makes scraping these files super-easy where websites have already learned how to talk to Zotero. It creates a wee icon in the corner of your URL bar, which allows you to automatically scrape whatever you are looking at
- For that to work, you first need to install the Zotero "connector" to work with your Safari, Chrome, or Firefox Browser.
- Some users have found that Paper Machines produces empty results for smaller datasets. We suggest beginning with at least 20 files before you attempt a wordcloud or relational diagram, and more like 50 to 100 before you attempt Topic Modeling.
Run Paper Machines for the First Time
- Once Paper Machines is installed, right-click on any folder in your Zotero library and notice that there is an expanded menu
- several of the new menu items will be grayed out, but the item at the top -- "extract text for paper machines" -- will be dark. choose it.
- "Extracting text for Paper Machines" will run for a minute while the computer reads all of your texts. You have a bunch of texts in that folder already, don't you?
- After "extracting texts" the first time, you must click on another folder to allow Paper Machines to refresh. Then return to the folder that you were working on.