BDP – Page 7 – Big-Data Poetry

July 20, 2014January 9, 2015

B/IOs

Biographies of poets. Generated with code.

2,513 bios of poets scraped from PoetryFoundation.org were batch-analyzed by Alchemy API (an online text-mining engine) for entities (employment roles, organizations, people, locations, etc…), concepts, keywords, and relations (subject,action,object).

This analysis then guided word replacement and generation of new bios using NLTK (Natural Language Toolkit) part-of-speech tagging

Approx. 2000+ BIOs generated in each run:

Code on Github
Made by Glia.ca

Text Analysis by

July 13, 2014January 9, 2015

Prosody: using the CMUdict in NLTK

OK. Parsing. Prosody. Metre. Rhythm. It seems prehistoric in the age of free-verse. But if poems are rhythm with/or/without rhyme then parsing into metrical feet seems one precondition on the path of accurately generating poems. Unfortunately, as far as I could tell, few folks have done it. A google search returned a few academic papers and no code. There was one stackoverflow question. So I wrote an email to Charles Hartman who had written Virtual Muse, who kindly replied : “I’ve been away from programming for quite a while. But by the end of this year Wiley-Blackwell will be publishing my textbook Verse: An Introduction to Prosody…” So I did it myself.

INPUT WORDS >>> OUTPUT NUMBERS: An Example

If by real you mean as real as a shark tooth stuck

‘1 1 1 1 1 1 1 1 0 1 1 1’

in your heel, the wetness of a finished lollipop stick,

’0 1 1 *,* 0 1 0 1 0 1 0 1 0 2 1 *,*’

Aimee Nezhukumatathil, Are All the Break-Ups in Your Poems Real?http://www.poetryfoundation.org/poem/245516

## parseStressOfLine(line) 
# function that takes a line
# parses it for stress
# corrects the cmudict bias toward 1
# and returns two strings 
#
# 'stress' in form '0101*,*110110'
#   --Note: 'stress' also returns words not in cmudict '0101*,*1*zeon*10110'
# 'stress_no_punct' in form '0101110110'

Continue reading “Prosody: using the CMUdict in NLTK”

July 13, 2014January 9, 2015

Synset to Lyrics ( how-to write over 5000 poems in an hour or so)

Download a website or torrent of song lyrics
Parse the result (with BeautifulSoup)
Use NLTK (Natural Language Toolkit) to lookup Wordnet synsets (homonyms, holonyms, hypernyms, antonyms, meronyms, synonyms and lemmas) as replacement candidates.
Replace all the words in the poem (retaining the words-per-line and verse and grammar of prepositions and pronouns)
Sift through ‘the reservoir‘ of words that returned no results from synset search to find dense tweets worthy of a bot.
Or check out the [ for now, terrible] 5,067 poems created in 8898 secs

reservoir of unique words — click on image to see code on github

Have a peek at a few results posted on my twitter feed:

The poetic problem is that these poems lack insight. Blastocytes have orders of magnitude more experiential insight than these poems.

Continue reading “Synset to Lyrics ( how-to write over 5000 poems in an hour or so)”

July 3, 2014January 9, 2015

Even the OpenLibrary is Locked

In my amateur-quest, to retrieve some archive that might have a semblance of approaching moderately large data for a poetry analysis project, I imagined OpenLibrary.org might offer an opportunity to download some poetry that is in the open domain. My first encounter was not encouraging. Thousands of the books listed there under poetry are distributed by the library of congress under what is know as a DAISY lock which requires a key to open and is only accessible to the blind. Imagine a library where a significant portion of the books are locked shut. Aaron Schwartz would not be amused.

Continue reading “Even the OpenLibrary is Locked”

July 2, 2014January 9, 2015

A few rudimentary visuals of Poetry Foundation corpus (preliminary buggy results)

Word counting is the ‘Hello World’ of big data. And my data is relatively tiny.

Below are 25 images in 5 increasing small time scales for 5 different variables (line length, verse length, avg word length, # of words per poem, # of verses per poem) derived from an analysis of a corpus of 10.5k poems scraped from poetryfoundation.org.

Continue reading “A few rudimentary visuals of Poetry Foundation corpus (preliminary buggy results)”

May 28, 2014

Review: Socher et al. Recursive Deep Models …

Review

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng and Chris Potts. Conference on Empirical Methods in Natural Language Processing (EMNLP 2013, Oral). pdf;Website with Live Demo and Downloads

Objective/Abstract

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank by Socher et al. introduces Recursive Neural Tensor Network (RNTN), a model for extracting sentiment from longer phrases that outperforms previous models. The model is trained on a corpus of 215,154 phrases that were labelled using Amazon Turk. Using RNTN, single sentence accuracy increases from 80% up to 85.4%; and negation is accurately captured (a task no previous model succeeded at).

Continue reading “Review: Socher et al. Recursive Deep Models …”

April 7, 2014

On Numeration (Khan meet Steigler)

I’ve been spending some hours this weekend reviewing math at the amazing Khan Academy. The following reflection is meant as a contemplation of a trend and not in any way a critique of their valuable work.

Consider the screenshot below: where the value assigned to IAK of 66º does not accurately reflect it’s value. Both angles IAK and GCJ, if measured with a tool like a compass, are 45º angles. Yet IAK is labelled 66º and the correct answer to the angle GCJ is 24º. It does not seem like an important mislabelling, yet there is a fundamental conceptual issue at stake here. And it has a relation to poetry (perhaps tenuous) but it’s an issue of trust. As Tom Waits said: “The large print giveth, the small print taketh away”. Exactly as here: where the faint inscription at the bottom states: “Note: Angles not necessarily drawn to scale.”

Continue reading “On Numeration (Khan meet Steigler)”