Wavenet for Poem Generation: preliminary results

For the past week, I’ve been running a port of the Wavenet algorithm to generate poems. A reasonable training result emerges in about 24 hours, — a trained model that can generate immense amounts of text relatively quickly. On a laptop. (Code: github). By reasonable I mean the poems do not have any real sense, no sentient self, no coherent narrative, nor epic structure. But they do have cadence, they do not repeat, new words are plausible, and they have adopted a scattered open line style characteristic of the late twentieth century corpus on which they were trained. Much more lucid than Schwitters’ Ursonate, output is reminiscent of Beckett’s Not I : ranting incandescent perpetual voice.


Remember, these are evolutionary amoebas, toddlers just learning to babble. The amazing thing is that without being given any syntax rules, they are speaking, generating a kind of prototypical glossolalia poem, character by character. Note: models are like wines, idiosyncratic reservoirs, the output of each has a distinct taste, — some have mastered open lines, others mutter densely, many mangle words to make neologisms  — each has obsessions. The Wavenet algorithm is analogous to a winery: its processes ensure that all of the models are similar. Tensorflow is the local region; recursive neural nets form the ecosystem. The corpus is the grapes.

Intriguing vintages-models :

Dense intricate Model 33380 — trained with 1024 skip channels and dilation to 1024 (read a txt sample)

the mouth’s fruit
tiny from carrying
a generative cup

Loose uncalibrated Model 13483 with loss = 0.456, (1.436 sec/step) trained on 2016-10-15T20-46-39 with 2048 skip channels and dilation to 256 (read a txt sample)

 at night, say, that direction.

      now. so you hear we are shaking
          from the woods

Full results (raw output, unedited txt files from the week of Oct 10-16th 2016) here.

it’s there we brail,
  beautiful full
left to wish our autumn was floor

Edited micro poems

…extracted from the debris are here.

through lust,
and uptight winking cold
blood tree hairs
 in loss


Python source code + a few trained models, corpus and some sample txt: on github which will be updated with new samples and code as it emerges.


The Model number refers to how many steps it trained for. Skip channels weave material from different contexts. On this corpus, larger skip channels produce more coherent output. Dilations refer to the size of the tensors of the encoder-decoder: eg. [ 1, 2, 4, 8, 16, 32, 64, 128, 256, etc… ] Higher values up to 1024 seem to be of benefit, but take longer to train. Loss is the mathematical calculation of the distance between the goal and the model; it is a measure of how tightly the model fits the topological shape of the corpus; as models are trained, they are supposed to learn to minimize loss; low loss is supposed to be good. For artistic purposes this is questionable (I describe why in see next section). For best results, in general, on this corpus: 10k to 50k steps, 1024 dilations, a skip channel of 512 or more, and (most crucial) loss less than 0.6.


Loss is not everything. An early iteration model with low loss will generate cruft with immense spelling errors. Thousands of runs later, a model with the same loss value will usually produce more sophisticated variations, less errors. So there is more going on inside the system than is captured by the simple metric of loss optimization. Moreover if the system is about to  undergo a catastrophic blowout of loss values, during which the loss ceases to descend toward the gradient and exponentially oscillates (this occasionally occurs after approx 60k steps). Generated text from poems just before that (with good loss values below 1.0 or even excellent loss values below 0.6) will produce some ok stuff interspersed with long periods of nonsense or —— repeated **** symbols. These repetitive stretches are symptoms of the imminent collapse. So loss is not everything. Nonsense can be a muse. Mutating small elements, editing, flowing, falling across the suggestive force of words in raw tumult provides a viable medium for finding voice. Continue reading

2-layer 256-cell LSTM neural net trained on a source text of moderate size

Yesterday, sitting in Hong Kong, I launched a cluster of GPUs in North Carolina and ran a neural net for 9 hours to generate proto-language.

Using modified code from the Keras example on LSTM text generation (and aided by a tutorial on aws-keras-theano-tensorflow integration), the cluster produced 76,976 words.

Many of these words are new and never-seen-before. It’s like having Kurt Schwitters on tap.

mengaporal concents
typhinal voivat

the dusial prespirals
of Inimated dootion

Here is my bill for over 11 hours of GPU cluster time:


Neural nets learn through exposure to words just like babies.  The more they hear, the better they get. For preliminary testing of the code, a 2-layer 256-cell LSTM neural net was trained on a source text of moderate size: a draft of my book, in a 430kb text file format. So it’s as if a baby was exposed to only critical literary theory, no kids books, no conversation, just theory.

The results can be very very strange:

in a way of stumalized scabes occurs forms. and then paradigm for some control implicit and surjace of need)

And they evolve over time from simplicity:

and conceptual poetry and conceptual poetry in the same that the same that the same that the same that has been are conceptual poetry and conceptual process of the static entity and conceptual sourd subjects that in the same of the same that digital poetry in the static and conceptual sourd source of the static entity and conceptual sourd source of the static entity and conceptual poetry in the station of the static environment for the station of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a

… to complexity

u archetypht]mopapoph wrud mouses occurbess, the disavil centory that inturainment ephios into the reputiting the sinctions or this suinncage encour. That language, Y. untiletterforms, bear. and matter a nalasist relues words. this remagming in nearogra`mer struce. (, things that digital entibles offorms to converaction difficued harknors complex the sprict but the use of procomemically mediate the cunture from succohs at eyerned that is cason, other continuity. As a discubating elanted intulication action, these tisting as sourdage. Fore?is pobegria, prighuint, take sculptural digital cogial into computers to Audiomraphic ergeption in the hybping the language. /Ay it bodies to between as if your this may evorv: and or all was be as unityle/disity 0poeliar stance that shy. in from this ke

It is important to recognize, the machine is generating words character by character, — it has never seen a dictionary; it is not given any knowledge of grammar rules; it is not preloaded with words or spelling. This is not random replace or shuffling, it is a recursive groping toward sense, articulate math, cyber-toddler lit. And it is based on a 9 hour exposure to a single 200 page book with an intellectual idiom. More complex nets based on vast corpuses trained over weeks will surely produce results that are astonishing.

Yet I somehow feel that there is a limit to this (LSTM) architecture’s capacity to replicate thought. Accuracy is measured through a loss function that began at 2.87 and over 46 runs (known as epochs) loss descended to 0.2541. A kind of sinuous sense emerges from the babble but it is like listening to an impersonation of thought done in an imaginary language by an echolaliac. Coherency is scarce.

Code & Complete Output

Code is on github.

Complete 80k word output saved directly from the Terminal: here.


Do you like nonsense? Nostalgic for Lewis Carrol?

Edited excerpts (I filtered by seeking neologisms then hand-tuned for rhythm and cadence) output here. Which includes stuff like:

In the is networks, the reads on Gexture
orthorate moth process
and sprict in the contrate

in the tith reader, oncologies
appoth on the entered sure

in ongar interpars the cractive sompates
betuental epresed programmeds

in the contiele ore presessores
and practions spotute pootry

in grath porming
phosss somnos prosent

E-lit Celebrity Shout-Out

Of course as the machine begins to learn words  (character by character assembling sense from matrices of data), familiar names (of theorists or poets cited in my book) crop up among the incoherence.

in Iteration 15 at diversity level 0.2:

of the material and and the strickless that in the specificient of the proposed in a strickles and subjective to the Strickland of the text to the proposing the proposed that seem of the stricklingers and seence and the poetry is the strickland to a text and the term the consider to the stricklinger

in Iteration 26 at diversity level 1.0:

 sound divancted image of Funkhouser surfaleders to dexhmeating antestomical prosting

in Iteration 34 at diversity level: 0.6

Charles Bernstein
Fen and Crotic

unternative segmentI
spate, papsict feent

& in Iteration 29 at diversity level 1.1:

Extepter Flores of the Amorphonate

evocative of a Genet-made literary saint among the rubble of some machinic merz, Leo has adopted temporarily his cybernetic moniker:




LSTM CHARRNN: blossoming acronyms

Machine learning hacks.
Building poetic nonsense with neural nets.


mules and the technology, the created and the tractions and the tractional artically of the traction of the tractical processe of the prectional and and and structured the entional the eractions of the the tractions and the tractions of the termore the the creative of the ention of the metach of the interallention of the termore and the entions of the created the tractions and structured the media and contempler the tractically, and in the eranted the tractions of the metach of the contempler in


The network knows nothing of language.
Sense emerges from feedback.


Using Theano backend.
Source text on which the neural net is trained:  MITPressDRAFT


! ” # % & ‘ ( ) + , – . / 0 1 2 3 4 5 6 7 8 9 : ; = ? @ [ ] ` a b c d e f g h i j k l m n o p q r s t u v w x y z


After cleanup total chars: 61
Corpus length: 421183


My days evolve:
Making micro mods to Keras code.
Reading up on LSTM.
Watching Sirajology.


As usual, there is tons of text to read…

Continue reading

ELO Performance (Brief Reproduction)

Read a screen where code is rapidly producing poems.
Find a path through the words: construct a poem from machinic intuition.

The following recreates a performance made at
the Electronic Literature Organization conference
in Bergen, Norway on Aug. 4th 2015.

Details: http://bdp.glia.ca/smaller-words-shrink-gapped
Code: https://github.com/jhave/Big-Data-Poetry

Technical process: the following poems were produced using a 10,000+ corpus of poems used as templates. Each poem has been sent to Alchemy API to produce entity-recognition, POS, and sentiment reports. That analysis influences replacement algorithms. Replacement uses NLTK synsets and Pattern.en and a reservoir of words found in the corpus that do not have synonyms.

In short, 10000 poems are
transformed by algorithms
into millions of poems
at an extremely rapid rate.

The reader must then
find a way to convert this
spew into spoken word.

ELO 2015 — Bergen – Performance

One of the ends
of digital literature
is an external intuition.

External intuition
is an engineering problem.

in this case
is me.

Skidding thru the
generated poems
as they augment
my imagination.

I call this act of
augmented imagination:
cyborg/ skid/ spreedr poetry.


For the performance,
at ELO conference performance in Bergen, Norway on Aug. 5th 2015
I generated 100 poems, pausing ever 5 poems for 20 seconds,
and tried to weave spoken poetry from the output.

Unfortunately I forgot to record the audio/screenvideo
so my actual spoken-poem is gone!


If you want to try it yourself:
Download attached code and data corpus
— this includes the poems generated during the performance.

run in command line
>>cd code/poetryFoundation/ELO_July2015/
>>python ELO2015_PERF_Creeley-Aug4th.py

This code will run with Python package Anaconda 2.7 installed.


Code will continue to be updated on
on github: https://github.com/jhave/Big-Data-Poetry.

Spreeder: the feature film (EPC 20th Anniversary Celebration)

Loss Pequeño Glazier is celebrating the 20th anniversary of the Electronic Poetry Centre along with Charles Bernstein, cris cheek, Tony Conrad, Steve McCaffery, Myung Mi Kim, Tammy McGovern, Joan Retallack, Laura Shackelford, Danny Snelson, Dennis Tedlock, Cecilia Vicuña, Elizabeth Willis, & Wooden Cities with Ethan Hayden. Along with exhibitions by: “Abra” (Amaranth Borsuk, Kate Durbin & Ian Hatcher); Pry: iPad-based Novella (Samantha Gorman & Danny Cannizzaro); “Enter:in’ Wodies” (Zuzana Husárová & Lubomír Panák) & myself.  http://epc.buffalo.edu/e-poetry/2014/EPC-at-20/

So I made a future-feature film of a computer writing in real time
Spreeder (approx. output 8,100 poems; 2 hour-long real-time episodes).
Implemented in Python with Alchemy API, NLTK and pattern.en



t-SNE: Classification of 10,557 poems

Once again: there is much magic in the math. The era of numeration discloses a field of stippled language. Songlines, meridians, tectonics, the soft shelled crab, a manta ray, a flock of starlings.

In the image below, each dot is a poem. It’s position is calculated based on an algorithm called t-SNE (Distributed Stochastic Neighbour Embedding)

Screen Shot 2014-08-23 at 9.16.28 pm

The image above is beautiful, but it’s impossible to know what is actually going on. So i built a interactive version (it’s a bit slow, but, functions…) where rollover of a dot reveal all the poems by that author.

Screengrabs (below) of the patterns suggest that poets do have characteristic forms discernible by algorithms. Position is far from random; note, the algorithm did not know the author of any of the poems; the algorithm was fed the poems; this is the equivalent of blind-taste-testing.

Still these images don’t tell us much about the poems themselves, except that they exist in communities. That the core of poetry is a spine. That some poets migrate across styles, while others define themselve by a style. The real insights will emerge as algorithms like t-SNE are applied to larger corpus, and allow nuanced investigation of the features extracted: on what criteria exactly did the probabilities grow? What are the 2 or 3 core dimensions?

What is t-SNE

My very basic non-math-poet comprehension of how it works: t-SNE performs dimensionality reduction: it reduces the numbers of parameters considered. Dimensionality reduction is useful when visualizing data; think about graphing 20 different parameters (dimensions). Another technique that does this is PCA: principal component analysis. Dimensionality reduction is in a sense a distillation process, it simplifies. In this case, it converts ‘pairwise similarities’ between poems into probability distributions. Then it decreases the ‘entropy’ using a process of gradient descent to minimize the (mysterious) Kullback-Leibler divergence.

To know more about the Python version of t-SNE bundled into sklearn, read Alexander Fabisch

One of the few parameters I bothered tweaking over numerous runs is appropriately named) perplexity. In the FAQ, LJP van der Maaten (who created t-SNE) wrote:

 What is perplexity anyway?

Perplexity is a measure for information that is defined as 2 to the power of the Shannon entropy. The perplexity of a fair die with k sides is equal to k. In t-SNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. It is comparable with the number of nearest neighbors-k that is employed in many manifold learners.

SPREED : Speed Screen Reading : One Hour Real-Time Poetry Generation ScreenGrab

Using Python (Anaconda), NLTK, WordNet, Alchemy, pattern.en, and pyenchant
to analyze and perform word replacement
on a corpus of 10,119 poems scraped from the PoetryFoundation
and generate 7,769 poems in approx. 2 hours and 30 minutes.

This is a real-time hour-long screen-grab output
of the trace window in SublimeText
as the poetry-gen program runs.

Code on Github
Made by Glia.ca  

And here is another episode of  “Crawling Toward Creeley” (in this episode: a variation on the ‘Gnomic Verses’ generated then modified)

Markov Bern

Markov chains are one of the traditional tricks in the NLP playbook. They are the apple pie-chart of text-generation.

Basic process: given a source text, find words that are neighbours, if you know the neighbours of a word, you can form a chain if you wish. [(“you”),(“know”,”can”,”wish”)] and reconstruct a text which contains pairs (bigrams) from the source.

So I did that using as source texts: Charles Bernstein Dark City and Rough Trades. (Found on Bernstein’s EPC author page).

The result is an example of what Charles Hartman might refer to as newbie-augmented-cyborg-poet (dead simple technically, but satisfying artistically since the volume of generated texts from which new verses can be hand-crafted is massive). This sort of auto-suggest based-upon-a-corpus technique radically shifts the dimensions of creativity: in the first ‘modified’ example I edited the output, adding words, disguising some obvious quotations from Bernstein, truncating verses, changing lines, modulating rhythms. In the raw output below, it’s just the computer (fueled by Bernstein’s berning phrases), it could go on infinitely given a large enough corpus.

Poetry is both the easiest and the hardest to generate. Since non-linear deflections and word-riffs are an aspect of contemporary poetry, slamming together ripe fertile conjunctions is easy. Migrating toward a sensitive, complex, experiential and contextual lived poetry is the real challenge (I didn’t even begin to touch it here).

Code on Github
Made by Glia.ca  

what is there if
An exit is as
clear as dead.

Billboards poster our losses.
Better a barber
than a thousand one-line
sweat glands.


I think I have begun to develop a mild form of insanity that often strikes those who fiddle around with computationally-generated text. After reading thousands of lines of dense incomprehensible gibberish it clarifies and makes sense, often more sense than any mere linear thought. The brain acclimatises to syntactic pressure.

Recipe for mildly insane word-salad:

  • take 57,000 rap songs input by fans,
    • extract all words that do not return results from WordNet synset search and put into Reservoir
  • one list of scientific terminology (for sombre intellectual tone)
    • chop off “-ology” wherever it occurs
  • one list of swear words (for spice)
  • call to WordNet synset algorithm (for fibre and continuity)
  • use pattern.en to do conjugation (for a tiny bit of coherence)
  • use NLTK part-of-speech tagging
  • Alchemy for entity (people, places, etc…) replacement
  • 10,000 or more poems

Mix all ingredients together using replacement algorithms.

To read 10116 poems (simple style) (in a single 24-mb html page) generated in 10356.4216051 seconds (2.87 hours, 3612pph [poems per hour], 60 ppm [poems per minute] ) on 2014-08-14 at 02:54 click here

Read a selection of just a few poems 

Read the RAP Reservoir: 33,150 words extracted from 56k user-input rap songs that did not return any usable results from a WordNet synset search. If you are looking for the evolution of language that occurs through mutation (typo, mispells, pop-cruft) this is it.

Code on Github
Made by Glia.ca