BDP documents the process of exploring poetry generation using deep learning and feature extraction from large bodies of poetry.
LANGUAGE IS A PATTERN
POETRY IS ENCRYPTION
639,813 lines of primarily contemporary poetry
10,573 poems from PoetryFoundation.org
57,434 rap songs from Ohhla.com
4,702 pop lyrics
Machine learning (neural nets, auto-encoders, unsupervised learning) poetry generation.
Libraries: Tensorflow, Keras, Pytorch
Code on Github
How does the poetry generation work (in simple terms)?
All words in all poems are analyzed using NLTK (Natural Language Toolkit) for POS (part-of-speech).
All poems are sent to an online deep-learning natural language processing API called Alchemy which identifies entities. “Named entities specify things such as persons, places and organizations. AlchemyAPI’s named entity extraction is capable of identifying people, companies, organizations, cities, geographic features and other typed entities”. These entities then form an archive.
All words that are not matched to a synonym in WordNet are put into a ‘reservoir’
Every entity is replaced with an entity from another poem.
Words that are not entities and not prepositions are replaced with a synset (synonym,homonym,meronym…) using WordNet. If no replacements exist in synset, these words are replaced with a random word from the ‘reservoir’.
Rudimentary correction of verb tenses is done using pattern.en.
How about the graphs?
About the only true data-science in the project is the T-SNE analysis.
Where is the code?
Big-data is big meme. Fluffy flarfy retroactive regurgitation of probabilistic entrails: sophic oracles updated with statistical analysis. Extroverted introversion. All answers concerning its validity for qualitative practice are questionable.
It might also seem that numbers obliterate the ambiguous juicy core of poetry. Yet, the techniques of big-data offer a chance to generate poetry from models of language that emerge at a scale previously unimaginable.
In the eras of the printing press, poets (writers and intellectuals) aspired to be well-read. Some aspired to breadth, others to depth, yet all recognized the cognitive benefits of reading: through the osmosis of many words, patterns and process and modes of communication became clear. Now the amount written exceeds the capacity to read, a brain beyond the brain is needed to analyze and interpret the results. Big data can digest the literary torrent.