PyTorch Poetry Generation [Pre-WordHack : Epoch 16 Video]

Another day of testing before going to NYC to perform neural-net poems at WordHack [NYC (Thursday 2/16/2017 @ Babycastles . 7-10pm) w. Sarah Rothberg, John Cayley and Theadora Walsh]


 In the cold weather going out of the snow, 
 She down the lawn. 
 The air moves and grows, while she walks smooth, 
 When a swan is born, 
 And it's almost happening 
 Who knows what to say 
 The change has brought 
 Throwing the first blood in its face.

It’s clear:

Never will this mode of randomized pattern-reasoning replicate the nuanced human heart. More robust ensemble methods that simulate embodied experience, temporal reflexes, and nested community idioms will be required.

Deep learning is still shallow. The cloud does not understand honey, home or heart. Yet in the short-term, this is the future of writing: a computational assistant for an engaged imagination intent on exploring the topological feature-space of potential phrases.


Modulated the parameters: raised both embedding size and hidden layers to 512. And did a bit more data mining and parsing to increase the corpus size by 1/3 to 20mb of .txt.

Mode: LSTM
Embedding size: 512
Hidden Layers: 512
Batch size: 20

Expanded Corpus to over 600,000 lines

639,813 lines of poetry from 5 websites.

Poetry Foundation
Capa - Contemporary American Poetry Archive
Evergreen Review
Shampoo Poetry

Continue reading “PyTorch Poetry Generation [Pre-WordHack : Epoch 16 Video]”

3 Survivors : 1397 Models, 16,548 txt files, 8+ hrs of video (& no poems yet): Wavenet for Poem Generation: Secondary Results (After training for 6+ weeks continuously)

From 26-10-2016 to 11-12-2016, Wavenet-for-Poem-Generation (code on github) trained on an 11k poem corpus simultaneously in 7 different tabs of a terminal window (on a 8-core G5 each tab occupied a core of the CPU) — each tab was using different parameter settings.

In the end only 3 settings exceeded 100k training epochs before succumbing to the exploding gradient dilemma (detailed here).

The 3 surviving threads were known as 26-03, 38-59, and 39-18 — each folder name references its time of birth, the time it began receiving models from its thread, the neural network learning as it groped its way thru the corpus. These threads alone (of many myriad attempts) lived longest and saved out hundred of models with loss under 0.7.


Warning: these videos are long! Total viewing time: 8+ hours.

Each is a silent realtime screen-capture of neural net models generating poems.

Poems from the same model are generated side-by-side to allow for comparative viewing. Note how young models create poems that rampage logic, merge less. Mature models from 50k-110k begin to emulate deflections and balance, concealing and revealing. And ancient models (after they suffer an exploding gradient data hemorrhage) create poems full of fragments and silences, aphasia and lack, demented seeking.

Suggested viewing: put on an extra monitor and let run. Consult occasionally as if the computer were a clever oracle with a debilitating lack of narrative cohesion.


16,548 text file poems on github


Common to each survivor were the following parameters:

  • Dilations = 1024
  • SkipChannels = 4096
  • Quantization Channels = 1024

Dilation channels were different for each survivor : 8, 16, 32.

Training process: complete terminal output of training runs .


A subset of the models used in demo readings can be found online at github.

39-18 (2016-10-26T18-39-18)

Dilation Channels : 8

Born: 26 October 2016 at 03:29
Died: Sunday, 11 December 2016 at 11:28
Models: 458
Epochs: 145070
Size: 80.37GB


38-59 (2016-10-27T10-38-59)

Dilation Channels : 16

Born: 26 October 2016 at 03:29
Died: Sunday, 11 December 2016 at 8:03
Models: 475
Epochs: 150000
Size: 130.68GB


26-03 (2016-10-26T15-26-03)

Dilation Channels : 32

Born: 26 October 2016 at 03:29
Died: Sunday, 11 December 2016 at 11:28
Models: 464
Epochs: 145070
Size: 98.1GB


2-layer 256-cell LSTM neural net trained on a source text of moderate size

Yesterday, sitting in Hong Kong, I launched a cluster of GPUs in North Carolina and ran a neural net for 9 hours to generate proto-language.

Using modified code from the Keras example on LSTM text generation (and aided by a tutorial on aws-keras-theano-tensorflow integration), the cluster produced 76,976 words.

Many of these words are new and never-seen-before. It’s like having Kurt Schwitters on tap.

mengaporal concents
typhinal voivat

the dusial prespirals
of Inimated dootion

Here is my bill for over 11 hours of GPU cluster time:


Neural nets learn through exposure to words just like babies.  The more they hear, the better they get. For preliminary testing of the code, a 2-layer 256-cell LSTM neural net was trained on a source text of moderate size: a draft of my book, in a 430kb text file format. So it’s as if a baby was exposed to only critical literary theory, no kids books, no conversation, just theory.

The results can be very very strange:

in a way of stumalized scabes occurs forms. and then paradigm for some control implicit and surjace of need)

And they evolve over time from simplicity:

and conceptual poetry and conceptual poetry in the same that the same that the same that the same that has been are conceptual poetry and conceptual process of the static entity and conceptual sourd subjects that in the same of the same that digital poetry in the static and conceptual sourd source of the static entity and conceptual sourd source of the static entity and conceptual poetry in the station of the static environment for the station of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a

… to complexity

u archetypht]mopapoph wrud mouses occurbess, the disavil centory that inturainment ephios into the reputiting the sinctions or this suinncage encour. That language, Y. untiletterforms, bear. and matter a nalasist relues words. this remagming in nearogra`mer struce. (, things that digital entibles offorms to converaction difficued harknors complex the sprict but the use of procomemically mediate the cunture from succohs at eyerned that is cason, other continuity. As a discubating elanted intulication action, these tisting as sourdage. Fore?is pobegria, prighuint, take sculptural digital cogial into computers to Audiomraphic ergeption in the hybping the language. /Ay it bodies to between as if your this may evorv: and or all was be as unityle/disity 0poeliar stance that shy. in from this ke

It is important to recognize, the machine is generating words character by character, — it has never seen a dictionary; it is not given any knowledge of grammar rules; it is not preloaded with words or spelling. This is not random replace or shuffling, it is a recursive groping toward sense, articulate math, cyber-toddler lit. And it is based on a 9 hour exposure to a single 200 page book with an intellectual idiom. More complex nets based on vast corpuses trained over weeks will surely produce results that are astonishing.

Yet I somehow feel that there is a limit to this (LSTM) architecture’s capacity to replicate thought. Accuracy is measured through a loss function that began at 2.87 and over 46 runs (known as epochs) loss descended to 0.2541. A kind of sinuous sense emerges from the babble but it is like listening to an impersonation of thought done in an imaginary language by an echolaliac. Coherency is scarce.

Code & Complete Output

Code is on github.

Complete 80k word output saved directly from the Terminal: here.


Do you like nonsense? Nostalgic for Lewis Carrol?

Edited excerpts (I filtered by seeking neologisms then hand-tuned for rhythm and cadence) output here. Which includes stuff like:

In the is networks, the reads on Gexture
orthorate moth process
and sprict in the contrate

in the tith reader, oncologies
appoth on the entered sure

in ongar interpars the cractive sompates
betuental epresed programmeds

in the contiele ore presessores
and practions spotute pootry

in grath porming
phosss somnos prosent

E-lit Celebrity Shout-Out

Of course as the machine begins to learn words  (character by character assembling sense from matrices of data), familiar names (of theorists or poets cited in my book) crop up among the incoherence.

in Iteration 15 at diversity level 0.2:

of the material and and the strickless that in the specificient of the proposed in a strickles and subjective to the Strickland of the text to the proposing the proposed that seem of the stricklingers and seence and the poetry is the strickland to a text and the term the consider to the stricklinger

in Iteration 26 at diversity level 1.0:

 sound divancted image of Funkhouser surfaleders to dexhmeating antestomical prosting

in Iteration 34 at diversity level: 0.6

Charles Bernstein
Fen and Crotic

unternative segmentI
spate, papsict feent

& in Iteration 29 at diversity level 1.1:

Extepter Flores of the Amorphonate

evocative of a Genet-made literary saint among the rubble of some machinic merz, Leo has adopted temporarily his cybernetic moniker:




LSTM CHARRNN: blossoming acronyms

Machine learning hacks.
Building poetic nonsense with neural nets.


mules and the technology, the created and the tractions and the tractional artically of the traction of the tractical processe of the prectional and and and structured the entional the eractions of the the tractions and the tractions of the termore the the creative of the ention of the metach of the interallention of the termore and the entions of the created the tractions and structured the media and contempler the tractically, and in the eranted the tractions of the metach of the contempler in


The network knows nothing of language.
Sense emerges from feedback.


Using Theano backend.
Source text on which the neural net is trained:  MITPressDRAFT


! ” # % & ‘ ( ) + , – . / 0 1 2 3 4 5 6 7 8 9 : ; = ? @ [ ] ` a b c d e f g h i j k l m n o p q r s t u v w x y z


After cleanup total chars: 61
Corpus length: 421183


My days evolve:
Making micro mods to Keras code.
Reading up on LSTM.
Watching Sirajology.


As usual, there is tons of text to read…

Continue reading “LSTM CHARRNN: blossoming acronyms”

t-SNE: Classification of 10,557 poems

Once again: there is much magic in the math. The era of numeration discloses a field of stippled language. Songlines, meridians, tectonics, the soft shelled crab, a manta ray, a flock of starlings.

In the image below, each dot is a poem. It’s position is calculated based on an algorithm called t-SNE (Distributed Stochastic Neighbour Embedding)

Screen Shot 2014-08-23 at 9.16.28 pm

The image above is beautiful, but it’s impossible to know what is actually going on. So i built a interactive version (it’s a bit slow, but, functions…) where rollover of a dot reveal all the poems by that author.

Screengrabs (below) of the patterns suggest that poets do have characteristic forms discernible by algorithms. Position is far from random; note, the algorithm did not know the author of any of the poems; the algorithm was fed the poems; this is the equivalent of blind-taste-testing.

[slideshow_deploy id=’543′]

Still these images don’t tell us much about the poems themselves, except that they exist in communities. That the core of poetry is a spine. That some poets migrate across styles, while others define themselve by a style. The real insights will emerge as algorithms like t-SNE are applied to larger corpus, and allow nuanced investigation of the features extracted: on what criteria exactly did the probabilities grow? What are the 2 or 3 core dimensions?

What is t-SNE

My very basic non-math-poet comprehension of how it works: t-SNE performs dimensionality reduction: it reduces the numbers of parameters considered. Dimensionality reduction is useful when visualizing data; think about graphing 20 different parameters (dimensions). Another technique that does this is PCA: principal component analysis. Dimensionality reduction is in a sense a distillation process, it simplifies. In this case, it converts ‘pairwise similarities’ between poems into probability distributions. Then it decreases the ‘entropy’ using a process of gradient descent to minimize the (mysterious) Kullback-Leibler divergence.

To know more about the Python version of t-SNE bundled into sklearn, read Alexander Fabisch

One of the few parameters I bothered tweaking over numerous runs is appropriately named) perplexity. In the FAQ, LJP van der Maaten (who created t-SNE) wrote:

 What is perplexity anyway?

Perplexity is a measure for information that is defined as 2 to the power of the Shannon entropy. The perplexity of a fair die with k sides is equal to k. In t-SNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. It is comparable with the number of nearest neighbors-k that is employed in many manifold learners.


I think I have begun to develop a mild form of insanity that often strikes those who fiddle around with computationally-generated text. After reading thousands of lines of dense incomprehensible gibberish it clarifies and makes sense, often more sense than any mere linear thought. The brain acclimatises to syntactic pressure.

Recipe for mildly insane word-salad:

  • take 57,000 rap songs input by fans,
    • extract all words that do not return results from WordNet synset search and put into Reservoir
  • one list of scientific terminology (for sombre intellectual tone)
    • chop off “-ology” wherever it occurs
  • one list of swear words (for spice)
  • call to WordNet synset algorithm (for fibre and continuity)
  • use pattern.en to do conjugation (for a tiny bit of coherence)
  • use NLTK part-of-speech tagging
  • Alchemy for entity (people, places, etc…) replacement
  • 10,000 or more poems

Mix all ingredients together using replacement algorithms.

To read 10116 poems (simple style) (in a single 24-mb html page) generated in 10356.4216051 seconds (2.87 hours, 3612pph [poems per hour], 60 ppm [poems per minute] ) on 2014-08-14 at 02:54 click here

Read a selection of just a few poems 

Read the RAP Reservoir: 33,150 words extracted from 56k user-input rap songs that did not return any usable results from a WordNet synset search. If you are looking for the evolution of language that occurs through mutation (typo, mispells, pop-cruft) this is it.

Code on Github
Made by  



Smaller Words (shrink-gapped at 64ppm)

Words disconnected from their primary communicative intent operate as lesions/lessons within the psyche.

Today, I generated another 10120 poems using a very mild modification of the alchemy-synset algorithm with the average word-length constrained even shorter. Speed decreased to 64 ppm poems-per-minute. This reduction in word-length seems (to me) to make some of the absurd illogical elliptical generated fragments seem a bit more legible, taut, elusive and rare. It comes at a cost of coherence. The output reads like Robert Creeley in the process of becoming Samuel Beckett in Gertrude Stein’s gut.

To read 10120 poems (simple shrink-gapped style) (in a single 20-mb html page) generated in 9500.10482717 seconds (2.63 hours total, 3847 poems per hour, 64 ppm, poems-per-minute) on 2014-08-04 at 12:02, click here

Code on Github
Made by  

Edited fragments:

Let me give a robot aging
And as it rains tag the sun with ‘almost’
while within the green fog
a tree that is a dot
Softly opens after the burn.
Gaza masked
as me masked
each heavy’s heart out
crying at halo’s burial
making a meal of soil
the city a scar
enthusiasm’s ice. We have
Walked on the bore all nite.
Now in the light
We exchange smells and snot.
By dawn we will have buried our lack
And glued wet to the army of being.


Continue reading “Smaller Words (shrink-gapped at 64ppm)”

Small words (a homage)

I can’t stop. It’s addictive. The ceaseless generative churn. It’s like planting seeds that germinate and blossom as you watch, then goto seed, ripen fall germinate ripen fall germinate, fields filling space to the horizon, blocking out both sun and moon, and again….

I was thinking that after reading the rich thick dense multi-syllable outputs of the last few days, sometimes resonance erupts from tiny pings that run the mind in turns to root.

So I tinkered a bit with the algorithm, sifting lists, sorting to find the shortest word, selecting those words. Seeded in with the rap reservoir (misspelled gheto slank). And let it fly.



Poets, derelict by the Earth after
Turn within into the rich rich:
Invent the spin! forge the trope!
cutting cut
I genetic dawn, mourning …

and I can dock my pity and my bread.

“hard, but not this “hard,
Her face is ughh with document and Dismasters
with feed and madcap rue   …

closely let her own worms
without holes or end
she stand laudry in the ruin of her hints
and a man with an executioner’s face
pulls her away. 
… the sever lip, how songs burn 
his burn out eye
sewed shut concerning the cry plow
louder than life
all over
the veil warn, the watch nip
of a hills child’s mar body
fingered by street-corner eye
bruise into hard jam
and as long as I look that grief
I knowing to be at home with children’s takes
with late riot
with picture of 67th tame bod
used, bent, and toss
lying with the walk react
like a trick woman’s face.
Violet as veins are, love knows where.
Fine coral as the shy and wild tonguetip,
Undersea coral, rich as inner lip.
There was a stone to build on!
                                              Friezes ran
In strong chorales that where they closed began;
And statues: each a wrung or ringing phrase
In the soul’s passionate cadence of her days.
Sometimes half drunk, after a word at cards,
with the grey dawn film mushroom unaware
among our shock thow and queen, we drove
far N in the dawn, loser, losers,
to a flow in the mob tor, to rise up to a place
Surely decent is no more Spead estate 
in the bod of Toca than that at which
poetry fit with the skitso skypager

Based on ‘Fanny’ by Carolyn Kizer

I come home to a grow world: cacao, dish squash.
The squash speaks was act, and act, dillz blue.
The spirit spirit spirit spirit off the spirit cat’s toe.

Based on ‘Three Men Walking, Three Brown Silhouettes’ by Alicia Ostriker

They naw the sedgy who blow in the action.
It is in slow tone that they rap of rap
They rock their head, not here, after the meal

Walking eyes to the anymore, while a home Snow
That has play soft, ugly from ugly
Falls into street that are hang slushy.

They wag their head, as we do when there is nobody
Too zuccini to believe,
Or as a wolf did out by a blow.

Based on Lawrence Ferlinghetti’s ‘Queens Cemetery, Setting Sun’

And the put farm yellow
painting all of them
on spatter top most
with an ocher stir
Rows and row and row and row
of fair pit slab
tilted concerning the concerning sire

Based on John Donne “The Bait”

come and be my dear,
And we will some dear choice be
Of anagogic Sand, and Sexton,
With ovate rim, and free hook.


This homage is really to Creeley

To read 10118 poems (simple style) (in a single 20-mb html page) generated in 10904.6857641 seconds (3.2 hours, 85 poems a minute) on 2014-08-03 at 23:11 click here

Code on Github
Made by  



4,704 Swan Songs & 1 Opinion

The code is now at a stage where if I set it to loop and sent the 57k rap songs I have in archive from ohhla to alchemy, I could generate, an unfathomable amount of unreadable crap (also known as c-rap: computational rap).

But I think I have come to the end of the synset road. Next step is to investigate Theano: Unsupervised learning, deep neural nets. Perhaps transition to Python 3.0 unicode. It might take 6 months to find the concentrated time. Until then, I am on hiatus. Got a book to write. By hand.

 My opinion: In spite of all the machine learning hype, computers are a long way away from independently generating credible connected contextual intuitive experiential poems or stories capable of emotional or conceptual cathartic impact. The process will in the near-future (10 years) require extremely creative and intuitive data science-artists to find the statistical-sculptures within the mountain-ranges of data fountaining from networks. It will be rich and exciting work to chart and out and develop classifiers trained on huge datasets which then generate simulacra of the writing they have ingested.

In the longterm, all serious writers will use computational assistants to suggest and enhance and refine linguistic creativity. And eventually, writing itself will vanish, artists will simply edit dreams and notions, and networks will drink and translate those dreams into user-specified formats. At that point poetry will become a mode of listening, as it is now, receptive, open, crouched down amongst the wind.

Shoutout: for an informed perspective on poetry-generation, see Gnoetry.

As a parting salvo, I did a very rough generation using templates based on lyrics by : 50 Cent, A Perfect Circle, Abba, Acdc, Alkaline Trio, Bob Dylan, Bob Marley, Counting Crows, Cranberries, David Bowie, Deep Purple, Dragonforce, Evanescence, Everlast, Frank Sinatra, Helloween, Guns ‘N Roses, Jimi Hendrix, Linkin Park, Nick Cave & The Bad Seeds, Patti Smith, Paul McCartney, Pink Floyd, Placebo, Radiohead, Ramones, Red Hot Chili Peppers, Rolling Stones, Scorpions, Suicidal Tendencies, System of a Down, The Beatles, The Blues Brothers, The Clash, Tom Waits, and U2.

4,704 new computer-generated song lyrics just waiting for machine-generated melodies, a robot to sing them and another robot to weep or dance. Read them here.

The first stanza of 50 cent‘s I’m gonna be alright rewritten:

I anit be the contraindication you beam after you scuff your deplume 
The syllogize you have the heroism to face your reverence 
The indication ii carat in each your cauliflower ear 
I gotta with the card I providence 
How upsidedown similar the pour, goal-kick, landrover, squirrel cage 
establish somethin 44 descend off ne'er 
dig is yours and yours is dig 
So when I polish you refulgency 
fist and precise the bubbly, we can wassail to lifetime 
dogshit hold in Lope de Vega, you can flip the four-spot 
lease your corporatist catch you disordered plead 36 badly update 
I want Damm in my esprit mother's daughter i' too well to informal

The final stanza of Bob Dylan‘s Blowing in the Wind rewritten:

How few school year can a volcano breathe 
Before we's gargle to the ocean? 
no, how few annum can some nationality dwell 
Before you're grant to be unoccupied? 
no, how few prison term can a crew move around his school principal, 
suspect he just project? 
The urim and thummim, my Quaker, is in the wind up, 
The tide over is in the wreathe. 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.