Prosody: using the CMUdict in NLTK

OK. Parsing. Prosody. Metre. Rhythm. It seems prehistoric in the age of free-verse. But if poems are rhythm with/or/without rhyme then parsing into metrical feet seems one precondition on the path of accurately generating poems. Unfortunately, as far as I could tell, few folks have done it. A google search returned a few academic papers and no code. There was one stackoverflow question. So I wrote an email to Charles Hartman who had written Virtual Muse, who kindly replied : I’ve been away from programming for quite a while. But by the end of this year Wiley-Blackwell will be publishing my textbook Verse: An Introduction to Prosody…” So I did it myself.


If by real you mean as real as a shark tooth stuck

‘1  1  1  1  1  1  1  1  0  1  1  1’

in your heel, the wetness of a finished lollipop stick,

’0  1  1 *,* 0  1  0  1  0  1  0  1  0  2 1 *,*’

Aimee  Nezhukumatathil, Are All the Break-Ups in Your Poems Real? 

## parseStressOfLine(line) 
# function that takes a line
# parses it for stress
# corrects the cmudict bias toward 1
# and returns two strings 
# 'stress' in form '0101*,*110110'
#   --Note: 'stress' also returns words not in cmudict '0101*,*1*zeon*10110'
# 'stress_no_punct' in form '0101110110'

“The Carnegie Mellon University Pronouncing Dictionary is a machine-readable pronunciation dictionary for North American English that contains over 125,000 words and their transcriptions. This format is particularly useful for speech recognition and synthesis, as it has mappings from words to their pronunciations in the given phoneme set. The current phoneme set contains 39 phonemes, for which the vowels may carry lexical stress.

0 No stress

1 Primary stress

2 Secondary stress”

I posted my parsing code on stackoverflow (my first post there!) as a response to a question that had received no really relevant answers, please give it an upvote!