Source text on which the neural net is trained: MITPressDRAFT
After cleanup total chars: 61
Corpus length: 421183
Read a screen where code is rapidly producing poems.
Find a path through the words: construct a poem from machinic intuition.
The following recreates a performance made at
the Electronic Literature Organization conference
in Bergen, Norway on Aug. 4th 2015.
Technical process: the following poems were produced using a 10,000+ corpus of poems used as templates. Each poem has been sent to Alchemy API to produce entity-recognition, POS, and sentiment reports. That analysis influences replacement algorithms. Replacement uses NLTK synsets and Pattern.en and a reservoir of words found in the corpus that do not have synonyms.
In short, 10000 poems are
transformed by algorithms
into millions of poems
at an extremely rapid rate.
The reader must then
find a way to convert this
spew into spoken word.
One of the ends
of digital literature
is an external intuition.
is an engineering problem.
in this case
Skidding thru the
as they augment
I call this act of
cyborg/ skid/ spreedr poetry.
For the performance,
at ELO conference performance in Bergen, Norway on Aug. 5th 2015
I generated 100 poems, pausing ever 5 poems for 20 seconds,
and tried to weave spoken poetry from the output.
Unfortunately I forgot to record the audio/screenvideo
so my actual spoken-poem is gone!
If you want to try it yourself:
Download attached code and data corpus
— this includes the poems generated during the performance.
run in command line
This code will run with Python package Anaconda 2.7 installed.
Code will continue to be updated on
on github: https://github.com/jhave/Big-Data-Poetry.
Loss Pequeño Glazier is celebrating the 20th anniversary of the Electronic Poetry Centre along with Charles Bernstein, cris cheek, Tony Conrad, Steve McCaffery, Myung Mi Kim, Tammy McGovern, Joan Retallack, Laura Shackelford, Danny Snelson, Dennis Tedlock, Cecilia Vicuña, Elizabeth Willis, & Wooden Cities with Ethan Hayden. Along with exhibitions by: “Abra” (Amaranth Borsuk, Kate Durbin & Ian Hatcher); Pry: iPad-based Novella (Samantha Gorman & Danny Cannizzaro); “Enter:in’ Wodies” (Zuzana Husárová & Lubomír Panák) & myself. http://epc.buffalo.edu/e-poetry/2014/EPC-at-20/
So I made a future-feature film of a computer writing in real time
Spreeder (approx. output 8,100 poems; 2 hour-long real-time episodes).
Implemented in Python with Alchemy API, NLTK and pattern.en
Once again: there is much magic in the math. The era of numeration discloses a field of stippled language. Songlines, meridians, tectonics, the soft shelled crab, a manta ray, a flock of starlings.
In the image below, each dot is a poem. It’s position is calculated based on an algorithm called t-SNE (Distributed Stochastic Neighbour Embedding)
The image above is beautiful, but it’s impossible to know what is actually going on. So i built a interactive version (it’s a bit slow, but, functions…) where rollover of a dot reveal all the poems by that author.
Screengrabs (below) of the patterns suggest that poets do have characteristic forms discernible by algorithms. Position is far from random; note, the algorithm did not know the author of any of the poems; the algorithm was fed the poems; this is the equivalent of blind-taste-testing.
Still these images don’t tell us much about the poems themselves, except that they exist in communities. That the core of poetry is a spine. That some poets migrate across styles, while others define themselve by a style. The real insights will emerge as algorithms like t-SNE are applied to larger corpus, and allow nuanced investigation of the features extracted: on what criteria exactly did the probabilities grow? What are the 2 or 3 core dimensions?
My very basic non-math-poet comprehension of how it works: t-SNE performs dimensionality reduction: it reduces the numbers of parameters considered. Dimensionality reduction is useful when visualizing data; think about graphing 20 different parameters (dimensions). Another technique that does this is PCA: principal component analysis. Dimensionality reduction is in a sense a distillation process, it simplifies. In this case, it converts ‘pairwise similarities’ between poems into probability distributions. Then it decreases the ‘entropy’ using a process of gradient descent to minimize the (mysterious) Kullback-Leibler divergence.
To know more about the Python version of t-SNE bundled into sklearn, read Alexander Fabisch
One of the few parameters I bothered tweaking over numerous runs is appropriately named) perplexity. In the FAQ, LJP van der Maaten (who created t-SNE) wrote:
What is perplexity anyway?
Perplexity is a measure for information that is defined as 2 to the power of the Shannon entropy. The perplexity of a fair die with k sides is equal to k. In t-SNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. It is comparable with the number of nearest neighbors-k that is employed in many manifold learners.
Using Python (Anaconda), NLTK, WordNet, Alchemy, pattern.en, and pyenchant
to analyze and perform word replacement
on a corpus of 10,119 poems scraped from the PoetryFoundation
and generate 7,769 poems in approx. 2 hours and 30 minutes.
This is a real-time hour-long screen-grab output
of the trace window in SublimeText
as the poetry-gen program runs.
And here is another episode of “Crawling Toward Creeley” (in this episode: a variation on the ‘Gnomic Verses’ generated then modified)
Markov chains are one of the traditional tricks in the NLP playbook. They are the apple pie-chart of text-generation.
Basic process: given a source text, find words that are neighbours, if you know the neighbours of a word, you can form a chain if you wish. [(“you”),(“know”,”can”,”wish”)] and reconstruct a text which contains pairs (bigrams) from the source.
The result is an example of what Charles Hartman might refer to as newbie-augmented-cyborg-poet (dead simple technically, but satisfying artistically since the volume of generated texts from which new verses can be hand-crafted is massive). This sort of auto-suggest based-upon-a-corpus technique radically shifts the dimensions of creativity: in the first ‘modified’ example I edited the output, adding words, disguising some obvious quotations from Bernstein, truncating verses, changing lines, modulating rhythms. In the raw output below, it’s just the computer (fueled by Bernstein’s berning phrases), it could go on infinitely given a large enough corpus.
Poetry is both the easiest and the hardest to generate. Since non-linear deflections and word-riffs are an aspect of contemporary poetry, slamming together ripe fertile conjunctions is easy. Migrating toward a sensitive, complex, experiential and contextual lived poetry is the real challenge (I didn’t even begin to touch it here).
what is there if
An exit is as
clear as dead.
Billboards poster our losses.
Better a barber
than a thousand one-line
I think I have begun to develop a mild form of insanity that often strikes those who fiddle around with computationally-generated text. After reading thousands of lines of dense incomprehensible gibberish it clarifies and makes sense, often more sense than any mere linear thought. The brain acclimatises to syntactic pressure.
Recipe for mildly insane word-salad:
Mix all ingredients together using replacement algorithms.
To read 10116 poems (simple style) (in a single 24-mb html page) generated in 10356.4216051 seconds (2.87 hours, 3612pph [poems per hour], 60 ppm [poems per minute] ) on 2014-08-14 at 02:54 click here
Read the RAP Reservoir: 33,150 words extracted from 56k user-input rap songs that did not return any usable results from a WordNet synset search. If you are looking for the evolution of language that occurs through mutation (typo, mispells, pop-cruft) this is it.
Words disconnected from their primary communicative intent operate as lesions/lessons within the psyche.
Today, I generated another 10120 poems using a very mild modification of the alchemy-synset algorithm with the average word-length constrained even shorter. Speed decreased to 64 ppm poems-per-minute. This reduction in word-length seems (to me) to make some of the absurd illogical elliptical generated fragments seem a bit more legible, taut, elusive and rare. It comes at a cost of coherence. The output reads like Robert Creeley in the process of becoming Samuel Beckett in Gertrude Stein’s gut.
To read 10120 poems (simple shrink-gapped style) (in a single 20-mb html page) generated in 9500.10482717 seconds (2.63 hours total, 3847 poems per hour, 64 ppm, poems-per-minute) on 2014-08-04 at 12:02, click here
Let me give a robot agingAnd as it rains tag the sun with ‘almost’while within the green foga tree that is a dotSoftly opens after the burn.………Gaza maskedas me maskedeach heavy’s heart outcrying at halo’s burialmaking a meal of soilthe city a scar…..enthusiasm’s ice. We haveWalked on the bore all nite.Now in the lightWe exchange smells and snot.By dawn we will have buried our lackAnd glued wet to the army of being.…………