Screencast 2017-12-15 14:21:42 [ Averaged Stochastic Gradient Descent with Weight Dropped QRNN Poetry Generation with Random Seed ]

Got to messing around a bit more with awd-lstm-lm … and by modifying the poetry generation loop, — so that it reloads the model with a new seed each iteration, — used the time it reloads as an organic tiny delay. The result is like a poem flick : every 300 ms (on average, on this machine) the system generates 222 words.

Is it presumptuous to call these 222 word bursts, poetry?

A talented intellectual friend, refers to them as nonsense. It’s an assessment I can accept. But within these nonsensical rantings are such lucid hallucinatory fragments that the act of writing poetry under such circumstances (rather than waiting for wings of inspiration, or the tickle of some conceptual tongue) becomes more geological, an act of patient sifting, weaving dexterity applied to the excess, editing/panning for nuggets among an avalanche of disconnected debris.

If some nuance of intimacy is buried in the process so be it; the muses are (often) indifferent to the sufferings of those who sing the songs; these epic sessions in all their incoherence signal an emergent rupture in the continuum of creativity.

Yet the lack of coherence does also signal limit-case challenges to so-called deep learning: context and embodiment. Poems are creatures that encompass extreme defiant agility in terms of symbolic affinities, yet they also demonstrate embodied coherence, swirl into finales, comprehend the reader. Without the construction of a functional digital emulation of emotional reasoning (as posited by Rosalind Picard and Aaron Sloman among others) that is trained on a corpus derived from embodied experience, such poetry will remain gibberish, inert until massaged by the human heart. So it is.

These poems will be used as the source text for January’s RERITES.



generated-2017-12-15T12-29-56_Screencast 2017-12-15 14:21:42_CL


Continue reading “Screencast 2017-12-15 14:21:42 [ Averaged Stochastic Gradient Descent with Weight Dropped QRNN Poetry Generation with Random Seed ]”

Averaged Stochastic Gradient Descent with Weight Dropped QRNN Poetry Generation [Screencast 2017-12-14 15:23:19]

Trying out a new algorithm: using another mildly revised training corpus

Sources: a subset of Poetry Magazine, Jacket2, Bob Marley, Bob Dylan, David Bowie, Tom Waits, Patti Smith, Radiohead, 2 River, Capa, Evergreen Review,, Cathay by Li Bai, Kenneth Patchen, Maurice Blanchot, and previous Rerites.

Same crappy results. But boy does it churn out verses quickly: 2651 poems in 16m09s (approx 2.75 poems per second, each poem is 88 words long).

Pause video below to read.

TEXT file: generated-2017-12-14T15-07-07_Screencast 2017-12-14 15:23:19


2017-12-11 11:52:45 [SUBEST4+JACKET2+LYRICS]

Using a mildly revised (cleaner leaner) corpus

AND … dropout=0.65 (to prevent overfitting)

jhave@jhave-Ubuntu:~/Documents/Github/pytorch-poetry-generation/word_language_model$ python --checkpoint=models/2017-12-11T06-42-23_dec_rerites_SUBEST4+JACKET2+LYRICS/ --cuda --words=600 --data=data/dec_rerites_SUBEST4+JACKET2+LYRICS

Generated: Screencast 2017-12-11 11:52:45_SUBEST4+JACKET2+LYRICS

SUBSETS: Small Data (Repeat Mimemesis provoked by lack of Dropout)

So after last week’s disastreous expansion, i decided to get austere. implement strategic pruning, and reduce the corpus.

Selections were integrated from Bob Dylan, Tom Waits, Patti Smith, 2 River, Capa, Evergreen Review, Tumblr, Cathay by Li Bai, Kenneth Patchen, Maurice Blanchot, and previous Rerites.

Because the training occured without dropout, the model is overfit and a lot of direct quotations seep into the output.

Text: Screencast 2017-12-10 14:22:56_SUBSET2

Text: Screencast 2017-12-09 19:31:37

Errors on the Path

What follows documents futility. Effort expended for nothing.

It began when I (greedy for a more diverse vocabulary) expanded the corpus somewhat erratically.

Added my tumblr blog (for contemporary tech words), then Kraznahorkai’s War And War, Maurice Blanchot’s The Space of Literature, some Kenneth Patchen, and a set of contemporary poetry syllabus packages for added breadth.

Corpus swelled to 65.3mb

Tried it on Wavenet. After 24 hours got gibberish, no idea why, convergence not occuring, spastic spikes of loss disrupting system.

So shifted back to Pytorch.
And using 1500 embedded layers began crashing :

RuntimeError: cuda runtime error (2) : out of memory

Eventually reduced layers to 500. It runs.

jhave@jhave-Ubuntu:~/Documents/Github/pytorch-poetry-generation/word_language_model$ python --cuda --data=data/dec_rerites --emsize=500 --nhid=500 --dropout=0.65 --epochs=80 --tied
INITIALIZING Directory: models/2017-12-03T18-34-4

Even with cuda enabled on a TitanX GPU, training is achingly slow: 17,811 batches at about 500ms per batch + validation time means a single epoch takes more than 3 hours to complete. It needs maybe 40? 60? 100? epochs to arrive anywhere interesting?

Wait a couple days. Sigh. Still not finished.

Wait another 2 days. Sigh. Still not finished.

Stop it anyway.

Run it.

First thing I notice is, it’s very slow. Big corpus slows down generation times as well as training time. Second thing: it’s not appreciably better. Third, the contemporary lexicon (scientific, net-based) that I had hoped to induce into the poetry, massaging it forward from the 15th century toward the 21st, is imperceptible.

Result: throw it all away. Start again. Reduce corpus size to 2.8mb and run with 2000 hidden layers. Wait some more… Am waiting now ….

Rerites (6 months! 6 books!)

Poetry books.  One a month.

Generated by a computer. Edited by a human.

Project duration: 05.2017 – 05.2018

6 books completed in 6 months.

Rerites October 2017 just published today: preview online (entire text) or order copy available on Blurb.

162 pages, 8000 words.


All That I Know For All That Is Left

I know your blue throat, 
 writing as you sleep. 
Smoke-wreaths howling 
 over the hot sea 
 to see my voice 
 fall so near your feet.

To see the wild light
 as rain breaks
 into thin snow 
 as i pause
 on the stairs 
 above the kitchen.

These are minutes to worry 
 about. The rain caught 
 rushing at windows, begging 
 to hear the garden in the sun.


Added 5 months of
into the training corpus.

Will use as source text for November Rerites.

pytorch-poet_Screencast 2017-10-30 23:38:22_OCTOBER-Rerites_source

To read other Rerite months, visit

BRERIN: A PhilosoBot (at random temperatures for 2 hours)

BRERIN : A Philosobot: Trained on the collected book-length works of Erin Manning and Brian Massumi: Producing single sentences for 2 hours and 2 minutes at random temperatures: Temperature is a hyperparameter of neural nets that influences randomness: Think of it as complexity fluctuation.

~ + ~

BRERIN is a homage to a sustained diligent fertile intellectual oeuvre.

Erin Manning and Brian Massumi are thinkers; they operate within diverse terrains (radical empiricists, speculative pragmatists, process philosophers); they utilize language to explore cultural thought as a process; they co-direct the SenseLab. I am grateful to them for the generosity of inviting me to explore their work with machine learning and donating their writings to this process. As they write:

“The SenseLab does not exist as such. It is not an organization. It is not an institution. It is not a collective identity. It is an event-generating machine, a processual field of research-creation whose mission is to inside itself out. Its job is to generate outside prolongations of its activity that rip.ple into distant pools of potential.” — Thought in the Act

~ + ~

BRERIN generates text which reflects the vocabulary and cadence of its origin. It operates as a container for modes of idiomatic discourse. Yet it is also an an artefact of contemporary deep learning, utterly lacking in subtle contextuality or genuine cognition.



Library: PyTorch

Mode: GRU
Embedding size: 2500
Hidden Layers: 2500
Batch size: 20

Epoch: 69
Loss: 0.71

Temperature range: 0.25 to 1.25

~ + ~

TEXT: BRERIN_2h02m_07092017_longSentence

~ + ~

~ + ~


Incremental TEMPERATURE Increases (Seed-text for Sept RERITES)

In the egg, the child is simple. A wet light. Lurching.

Body is wind. The sun in the sea.

Then as if more, motions, the shadows of trees.

The ineluctable diffusion of randomized complexity.

Drainfgdl gsod. Trainins spekcled!


Poetry evolves as language in organisms, from simple to complex, from simile and homile to histrionics. Increments in the temperature of a neural net model simulate time.

For high temperatures ( τ → ∞ {\displaystyle \tau \to \infty } \tau \to \infty ), all actions have nearly the same probability and the lower the temperature, the more expected rewards affect the probability. For a low temperature ( τ → 0 + {\displaystyle \tau \to 0^{+}} \tau \to 0^{+}), the probability of the action with the highest expected reward tends to 1.

~ + ~

~ + ~

Text: Screencast 2017-08-30 00:00:40_incrementalTEMPERATURE_PoetryPytorch

~ + ~

This source will become the first seed-text for September’s RERITES.

BRERIN (Sense Lab Philosobot – Ep69 )


A Philosobot:
Trained on the collected book-length works
of Erin Manning and Brian Massumi

Neural nets learn how-to write by reading.
Each reading of the corpus is called an epoch.

This neural net read all  the collected book-length works
of Erin Manning and Brian Massumi
69 times (in approx 8 hours
using a TitanX GPU).

+ ~ +

Now it writes 70 word segments (that end in a sentence).
Matching as best it can vocabulary and cadence of the corpus.

It cannot match the thought, but reflects a simulacrum of thought:
the thought inherent within language, within reading, within writing.


Library: PyTorch


Mode: GRU
Embedding size: 2500
Hidden Layers: 2500

Batch size: 20
Epoch: 69

Loss: 0.71

~ + ~

Text: 25-08-2017_Epoch69_Temperature0p95_1h39m

~ + ~

BRERIN (Epoch 39)

Epoch 39 is a roughly fermented gated recurrent network (GRU) that exemplifies the rough parabolic deflection contours of Sense Lab discourse.

jhav:~ jhave$ cd /Users/jhave/Desktop/github/pytorch-poetry-generation/word_language_model

jhav:word_language_model jhave$ source activate ~/py36 
(/Users/jhave/py36) jhav:word_language_model jhave$ python --checkpoint=/Users/jhave/Desktop/github/pytorch-poetry-generation/word_language_model/models/2017-08-22T12-35-49/

System will generate 88 word bursts, perpetually, until stopped.


A Philosobot:
Trained on the collected book-length works of Erin Manning and Brian Massumi

+~+Library: PyTorch+~+

Mode: GRU
Embedding size: 2500
Hidden Layers: 2500
Batch size: 20
Epoch: 39
Loss: 1.59

Please be patient.



Text : Screencast_SL_BE_Epoch39_24-08-2017_16h12_1h04m_model-GRU-emsize-2500-nhid_2500-nlayers_2-batch_size_20-epoch_39-loss_1.59-ppl_4.90

For the tech-minded, let it be noted: this is an overfit model. While overfitting is taboo in science, it is a creator of blossoms in natural language generation. The texture of actual units of source text sutured into a collagen of authenticity.

Specifically: I used all the text sources in the training data. And basically did not care about the relevance or size of test or validation data. And the embedding size is made as large as the gpu will tolerate. Dropout is high so it gets confused.

Basically, for a deep learning expert, the loss and perplexity values are invalid, to put it crudely: bullshit. Yet the texture of the language generated is superior.

Consider the analogy of training a child to read and write: does the wise teacher keep back part of the corpus of knowledge, or does the teacher give all to the student?

Brerin may have many moments of spasmodic incoherence, yet at an idiomatic cadence and vocabulary level the texts recreate the dexterity and delirium intensities of the source fields. In essence, reflecting the vast variational presence of both Erin and Brian. This bot is a homage to their massive resilient oeuvre.