Getting Lots of Data
Getting lots of data
To ask larger questions about historiography in the humanities and social sciences, it is important to run Paper Machines on many articles at once. In order to get great results, one typically needs a very large data set. There are several ways of getting data in bulk.
1) get a lot of data from another place. for example, by joining a large collaborative Zotero group library with a private cache of pdf's.
2) manually downloading lots of articles from a site like JSTOR or using some handy WGET code to export large batches of pdf's from a site like archive.org.
3) JSTOR Data for Research. JSTOR dfr is a back-end to JSTOR that allows you to download word counts of hundreds or thousands of articles at a time. Paper Machines is built to work with JSTOR, the nonprofit article repository that covers much of the publication in the humanities and social sciences over the twentieth century.
- for more detailed instructions, see our page on Working with JSTOR Data for Research