Averaged Stochastic Gradient Descent with Weight Dropped QRNN Poetry Generation [Screencast 2017-12-14 15:23:19]

Trying out a new algorithm: https://github.com/salesforce/awd-lstm-lm using another mildly revised training corpus

Sources: a subset of Poetry Magazine, Jacket2, Bob Marley, Bob Dylan, David Bowie, Tom Waits, Patti Smith, Radiohead, 2 River, Capa, Evergreen Review, jhavelikes.tumblr.com, Cathay by Li Bai, Kenneth Patchen, Maurice Blanchot, and previous Rerites.

Same crappy results. But boy does it churn out verses quickly: 2651 poems in 16m09s (approx 2.75 poems per second, each poem is 88 words long).

Pause video below to read.

TEXT file: generated-2017-12-14T15-07-07_Screencast 2017-12-14 15:23:19


2017-12-11 11:52:45 [SUBEST4+JACKET2+LYRICS]

Using a mildly revised (cleaner leaner) corpus

AND … dropout=0.65 (to prevent overfitting)

jhave@jhave-Ubuntu:~/Documents/Github/pytorch-poetry-generation/word_language_model$ python generate_2017-INFINITE-1M_October.py --checkpoint=models/2017-12-11T06-42-23_dec_rerites_SUBEST4+JACKET2+LYRICS/model-LSTM-emsize-2400-nhid_2400-nlayers_2-batch_size_20-epoch_21-loss_3.58-ppl_36.04.pt --cuda --words=600 --data=data/dec_rerites_SUBEST4+JACKET2+LYRICS

Generated: Screencast 2017-12-11 11:52:45_SUBEST4+JACKET2+LYRICS

SUBSETS: Small Data (Repeat Mimemesis provoked by lack of Dropout)

So after last week’s disastreous expansion, i decided to get austere. implement strategic pruning, and reduce the corpus.

Selections were integrated from Bob Dylan, Tom Waits, Patti Smith, 2 River, Capa, Evergreen Review, Tumblr, Cathay by Li Bai, Kenneth Patchen, Maurice Blanchot, and previous Rerites.

Because the training occured without dropout, the model is overfit and a lot of direct quotations seep into the output.

Text: Screencast 2017-12-10 14:22:56_SUBSET2

Text: Screencast 2017-12-09 19:31:37

Errors on the Path

What follows documents futility. Effort expended for nothing.

It began when I (greedy for a more diverse vocabulary) expanded the corpus somewhat erratically.

Added my tumblr blog jhavelikes.tumblr.com (for contemporary tech words), then Kraznahorkai’s War And War, Maurice Blanchot’s The Space of Literature, some Kenneth Patchen, and a set of contemporary poetry syllabus packages for added breadth.

Corpus swelled to 65.3mb

Tried it on Wavenet. After 24 hours got gibberish, no idea why, convergence not occuring, spastic spikes of loss disrupting system.

So shifted back to Pytorch.
And using 1500 embedded layers began crashing :

RuntimeError: cuda runtime error (2) : out of memory

Eventually reduced layers to 500. It runs.

jhave@jhave-Ubuntu:~/Documents/Github/pytorch-poetry-generation/word_language_model$ python main_June2017.py --cuda --data=data/dec_rerites --emsize=500 --nhid=500 --dropout=0.65 --epochs=80 --tied
INITIALIZING Directory: models/2017-12-03T18-34-4

Even with cuda enabled on a TitanX GPU, training is achingly slow: 17,811 batches at about 500ms per batch + validation time means a single epoch takes more than 3 hours to complete. It needs maybe 40? 60? 100? epochs to arrive anywhere interesting?

Wait a couple days. Sigh. Still not finished.

Wait another 2 days. Sigh. Still not finished.

Stop it anyway.

Run it.

First thing I notice is, it’s very slow. Big corpus slows down generation times as well as training time. Second thing: it’s not appreciably better. Third, the contemporary lexicon (scientific, net-based) that I had hoped to induce into the poetry, massaging it forward from the 15th century toward the 21st, is imperceptible.

Result: throw it all away. Start again. Reduce corpus size to 2.8mb and run with 2000 hidden layers. Wait some more… Am waiting now ….

Rerites (6 months! 6 books!)

Poetry books.  One a month.

Generated by a computer. Edited by a human.

Project duration: 05.2017 – 05.2018

6 books completed in 6 months.

Rerites October 2017 just published today: preview online (entire text) or order copy available on Blurb.

162 pages, 8000 words.


All That I Know For All That Is Left

I know your blue throat, 
 writing as you sleep. 
Smoke-wreaths howling 
 over the hot sea 
 to see my voice 
 fall so near your feet.

To see the wild light
 as rain breaks
 into thin snow 
 as i pause
 on the stairs 
 above the kitchen.

These are minutes to worry 
 about. The rain caught 
 rushing at windows, begging 
 to hear the garden in the sun.


Added 5 months of
Rerites http://glia.ca/2017/rerites/
into the training corpus.

Will use as source text for November Rerites.

pytorch-poet_Screencast 2017-10-30 23:38:22_OCTOBER-Rerites_source

To read other Rerite months, visit http://glia.ca/2017/rerites/

BRERIN: A PhilosoBot (at random temperatures for 2 hours)

BRERIN : A Philosobot: Trained on the collected book-length works of Erin Manning and Brian Massumi: Producing single sentences for 2 hours and 2 minutes at random temperatures: Temperature is a hyperparameter of neural nets that influences randomness: Think of it as complexity fluctuation.

~ + ~

BRERIN is a homage to a sustained diligent fertile intellectual oeuvre.

Erin Manning and Brian Massumi are thinkers; they operate within diverse terrains (radical empiricists, speculative pragmatists, process philosophers); they utilize language to explore cultural thought as a process; they co-direct the SenseLab. I am grateful to them for the generosity of inviting me to explore their work with machine learning and donating their writings to this process. As they write:

“The SenseLab does not exist as such. It is not an organization. It is not an institution. It is not a collective identity. It is an event-generating machine, a processual field of research-creation whose mission is to inside itself out. Its job is to generate outside prolongations of its activity that rip.ple into distant pools of potential.” — Thought in the Act

~ + ~

BRERIN generates text which reflects the vocabulary and cadence of its origin. It operates as a container for modes of idiomatic discourse. Yet it is also an an artefact of contemporary deep learning, utterly lacking in subtle contextuality or genuine cognition.



Library: PyTorch

Mode: GRU
Embedding size: 2500
Hidden Layers: 2500
Batch size: 20

Epoch: 69
Loss: 0.71
Perplexity: 2.03.pt

Temperature range: 0.25 to 1.25

~ + ~

TEXT: BRERIN_2h02m_07092017_longSentence

~ + ~

~ + ~

CODE: https://github.com/jhave/pytorch-poetry-generation/tree/master/word_language_model

Incremental TEMPERATURE Increases (Seed-text for Sept RERITES)

In the egg, the child is simple. A wet light. Lurching.

Body is wind. The sun in the sea.

Then as if more, motions, the shadows of trees.

The ineluctable diffusion of randomized complexity.

Drainfgdl gsod. Trainins spekcled!


Poetry evolves as language in organisms, from simple to complex, from simile and homile to histrionics. Increments in the temperature of a neural net model simulate time.

For high temperatures ( τ → ∞ {\displaystyle \tau \to \infty } \tau \to \infty ), all actions have nearly the same probability and the lower the temperature, the more expected rewards affect the probability. For a low temperature ( τ → 0 + {\displaystyle \tau \to 0^{+}} \tau \to 0^{+}), the probability of the action with the highest expected reward tends to 1.


~ + ~

~ + ~

Text: Screencast 2017-08-30 00:00:40_incrementalTEMPERATURE_PoetryPytorch

~ + ~

This source will become the first seed-text for September’s RERITES.

BRERIN (Sense Lab Philosobot – Ep69 )


A Philosobot:
Trained on the collected book-length works
of Erin Manning and Brian Massumi

Neural nets learn how-to write by reading.
Each reading of the corpus is called an epoch.

This neural net read all  the collected book-length works
of Erin Manning and Brian Massumi
69 times (in approx 8 hours
using a TitanX GPU).

+ ~ +

Now it writes 70 word segments (that end in a sentence).
Matching as best it can vocabulary and cadence of the corpus.

It cannot match the thought, but reflects a simulacrum of thought:
the thought inherent within language, within reading, within writing.


Library: PyTorch


Mode: GRU
Embedding size: 2500
Hidden Layers: 2500

Batch size: 20
Epoch: 69

Loss: 0.71
Perplexity: 2.03.pt

~ + ~

Text: 25-08-2017_Epoch69_Temperature0p95_1h39m

~ + ~

BRERIN (Epoch 39)

Epoch 39 is a roughly fermented gated recurrent network (GRU) that exemplifies the rough parabolic deflection contours of Sense Lab discourse.

jhav:~ jhave$ cd /Users/jhave/Desktop/github/pytorch-poetry-generation/word_language_model

jhav:word_language_model jhave$ source activate ~/py36 
(/Users/jhave/py36) jhav:word_language_model jhave$ python generate_2017-SL-BE_LaptopOPTIMIZED.py --checkpoint=/Users/jhave/Desktop/github/pytorch-poetry-generation/word_language_model/models/2017-08-22T12-35-49/model-GRU-emsize-2500-nhid_2500-nlayers_2-batch_size_20-epoch_39-loss_1.59-ppl_4.90.pt

System will generate 88 word bursts, perpetually, until stopped.


A Philosobot:
Trained on the collected book-length works of Erin Manning and Brian Massumi

+~+Library: PyTorch+~+

Mode: GRU
Embedding size: 2500
Hidden Layers: 2500
Batch size: 20
Epoch: 39
Loss: 1.59
Perplexity: 4.90.pt

Please be patient.



Text : Screencast_SL_BE_Epoch39_24-08-2017_16h12_1h04m_model-GRU-emsize-2500-nhid_2500-nlayers_2-batch_size_20-epoch_39-loss_1.59-ppl_4.90

For the tech-minded, let it be noted: this is an overfit model. While overfitting is taboo in science, it is a creator of blossoms in natural language generation. The texture of actual units of source text sutured into a collagen of authenticity.

Specifically: I used all the text sources in the training data. And basically did not care about the relevance or size of test or validation data. And the embedding size is made as large as the gpu will tolerate. Dropout is high so it gets confused.

Basically, for a deep learning expert, the loss and perplexity values are invalid, to put it crudely: bullshit. Yet the texture of the language generated is superior.

Consider the analogy of training a child to read and write: does the wise teacher keep back part of the corpus of knowledge, or does the teacher give all to the student?

Brerin may have many moments of spasmodic incoherence, yet at an idiomatic cadence and vocabulary level the texts recreate the dexterity and delirium intensities of the source fields. In essence, reflecting the vast variational presence of both Erin and Brian. This bot is a homage to their massive resilient oeuvre.