Skip to content

BDP: Big Data Poetry

About
FQA: Frequently Questioned Answers

Author: jhave

B/IOs

Biographies of poets. Generated with code. 2,513 bios of poets scraped from PoetryFoundation.org were batch-analyzed by Alchemy API (an online text-mining engine) for entities (employment roles, organizations, people, locations, etc…), concepts, keywords, and relations (subject,action,object). This analysis then guided word replacement and generation of new bios using NLTK (Natural Language Toolkit) part-of-speech tagging Approx. 2000+ BIOs generated…

July 20, 2014
Prosody: using the CMUdict in NLTK

OK. Parsing. Prosody. Metre. Rhythm. It seems prehistoric in the age of free-verse. But if poems are rhythm with/or/without rhyme then parsing into metrical feet seems one precondition on the path of accurately generating poems. Unfortunately, as far as I could tell, few folks have done it. A google search returned a few academic papers…

July 13, 2014
Synset to Lyrics ( how-to write over 5000 poems in an hour or so)

Download a website or torrent of song lyrics Parse the result (with BeautifulSoup) Use NLTK (Natural Language Toolkit) to lookup Wordnet synsets (homonyms, holonyms, hypernyms, antonyms, meronyms, synonyms and lemmas) as replacement candidates. Replace all the words in the poem (retaining the words-per-line and verse and grammar of prepositions and pronouns) Sift through ‘the reservoir‘…

July 13, 2014
Even the OpenLibrary is Locked

In my amateur-quest, to retrieve some archive that might have a semblance of approaching moderately large data for a poetry analysis project, I imagined OpenLibrary.org might offer an opportunity to download some poetry that is in the open domain. My first encounter was not encouraging. Thousands of the books listed there under poetry are distributed…

July 3, 2014
A few rudimentary visuals of Poetry Foundation corpus (preliminary buggy results)

Word counting is the ‘Hello World’ of big data. And my data is relatively tiny. Below are 25 images in 5 increasing small time scales for 5 different variables (line length, verse length, avg word length, # of words per poem, # of verses per poem) derived from an analysis of a corpus of 10.5k poems scraped from poetryfoundation.org.

July 2, 2014
Review: Socher et al. Recursive Deep Models …

Review Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng and Chris Potts. Conference on Empirical Methods in Natural Language Processing (EMNLP 2013, Oral). pdf;Website with Live Demo and Downloads Objective/Abstract Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank by Socher et…

May 28, 2014
On Numeration (Khan meet Steigler)

I’ve been spending some hours this weekend reviewing math at the amazing Khan Academy. The following reflection is meant as a contemplation of a trend and not in any way a critique of their valuable work. Consider the screenshot below: where the value assigned to IAK of 66º does not accurately reflect it’s value. Both…

April 7, 2014

←Previous Page

1 … 5 6 7

BDP: Big Data Poetry

Proudly powered by WordPress