Welcome to the Paper Machines wiki.
This is a space for user-contributed content related to Paper Machines.
- Getting Started -- Installing PaperMachines, Getting Data, Running Paper Machines for the First Time
- Basic Troubleshooting -- Error messages and common problems
Joining In, Helping Out
Paper Machines is open-source software cobbled together by minimal research funds. Both the software and its documentation depend on People Like You!
- Edit these pages! please consider contributing your best practices, successes, and error messages to this page. If content is hard to understand, please help us clarify. Editing the wiki requires a login.
- Share your code! If you are a computer scientist or a programmer who has developed tools like MALLET for analyzing large pieces of text, please consider making a connector between Zotero and your tool, posting it on Github, and letting us know so that it can be added to the Paper Machines suite of tools. Go ahead and fork this wiki page and tell us about it.
- Are you doing a visualization/textmining project for CS, which you wish other people would use, sharing their own data? This is a great way of plugging in your code to a community of users.
- We also have a wish list of things we wish someone would build for us, and tools currently in process. These include:
- using the tf-idf engine to automatically generate (separate, dynamic) tags for Zotero library documents
- a "chronoparser" that finds and graphs mentions of eras (The Great Depression), date ranges (1846-1848), and events over a range of writing -- what is the memoryshed of a collection of scholarly documents? (in process, Zach Davis)
- tools to extract the most commonly occurring sets of adjective and noun pairs or noun and verb pairs
- tools to graph the prominence and change of particular key terms in economics, using Linked Open Data to identify keywords from this discipline
- sentiment analysis (what are the most positive and negative discussions in a corpus?), including a sentiment analysis overlay onto a timeline and a map
- tools to automatically cluster topics into master topics
- tools to correct the OCR of documents, including finding words that have been hammered together without spaces, word particles, and obvious (f/s) misreadings, and to provide an intelligent auto-correct suggestion or set of recommended corrections