Sunday, June 5, 2016

Voyant Tools Ratios and Language Density

By Kenton Rambsy

I use Voyant’s “ratio” feature to gauge the linguistic variety between various writers. Ratio is a word density measurement. The ratio is derived from the amount of content words in relation to the total number of words in a document (content words divided by total number of words).

Below is a table that contains the total number of words, word types, and ratios for 10 short stories by Zora Neale Hurston and Richard Wright.





Content (unique) words are parts of speech such as nouns, verbs, adjectives, and adverbs that shape a text by describing specific objects and actions as well as details of objects and characteristics of actions. These words, when used once, are categorized as different types. Hence, the “words types” comes from the total number of different words used a single time in a given document or corpus.

Voyant gives users the option to apply stop word filters. This helps to distinguish common words from unique words. “Stop words” refers to common words such as “to,” “that,” “this,” “I,” “you,” and “get” “and “is” that may be filtered out of searches for key words in a text.




When using Voyant Tools to mine African American short fiction, however, function words/stop words are of great importance. Function words are grammatical in nature and serve as auxiliary words that do not necessarily communicate specific characteristics of a given text.

Related
Notebook on Voyant Tools
"Seshat: A Digital Humanities Initiative" at Howard University 

No comments: