Getting Lots of Data

From Paper Machines Wiki
Revision as of 09:35, 12 February 2015 by Joguldi (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Getting lots of data

To ask larger questions about historiography in the humanities and social sciences, it is important to run Paper Machines on many articles at once. In order to get great results, one typically needs a very large data set. There are several ways of getting data in bulk.

1) get a lot of data from another place. for example, by joining a large collaborative Zotero group library with a private cache of pdf's.

2) manually downloading lots of articles from a site like JSTOR or using some handy WGET code to export large batches of pdf's from a site like

3) JSTOR Data for Research. JSTOR dfr is a back-end to JSTOR that allows you to download word counts of hundreds or thousands of articles at a time. Paper Machines is built to work with JSTOR, the nonprofit article repository that covers much of the publication in the humanities and social sciences over the twentieth century.