Home > A Crowdsourced Corpus of AAC-like Communications
We used Amazon Mechanical Turk to create a large set of fictional AAC-like communications.
Workers were asked to invent communications as if they were using a scanning-style AAC interface for communication.
Our AAC corpus contains approximately six thousand communications.
We found our crowdsourced collection modeled conversational AAC better than datasets based on telephone conversations or newswire text.
We leveraged our crowdsourced messages to intelligently select sentences from much larger sets of Twitter, blog and Usenet data.
For details, see our paper.
Below you can download our corpus of communications, some of the test sets we used, and some of our trained language models.
Language models are in ARPA text format.
If you use this resource in your research, please reference:
Keith Vertanen and Per Ola Kristensson.
The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
ACL: 700-711, 2011.
PDFBibTeX
We thank Keith Trnka for allowing us to provide the Switchboard test set.
We thank Horabail Venkatagiri for allowing us to provide the communication test set.
Our specialists test set was created from the phrases suggested by AAC professionals on these pages at the University of Nebraska-Lincoln:
page1page2page3page4.
The training, development and test sets of the communication corpus. Contains the word lists we used to build our models. It also contains several of the test sets we used for evaluation.