t-SNE: Classification of 10,557 poems

Once again: there is much magic in the math. The era of numeration discloses a field of stippled language. Songlines, meridians, tectonics, the soft shelled crab, a manta ray, a flock of starlings.

In the image below, each dot is a poem. It’s position is calculated based on an algorithm called t-SNE (Distributed Stochastic Neighbour Embedding)

Screen Shot 2014-08-23 at 9.16.28 pm

The image above is beautiful, but it’s impossible to know what is actually going on. So i built a interactive version (it’s a bit slow, but, functions…) where rollover of a dot reveal all the poems by that author.

Screengrabs (below) of the patterns suggest that poets do have characteristic forms discernible by algorithms. Position is far from random; note, the algorithm did not know the author of any of the poems; the algorithm was fed the poems; this is the equivalent of blind-taste-testing.

[slideshow_deploy id=’543′]

Still these images don’t tell us much about the poems themselves, except that they exist in communities. That the core of poetry is a spine. That some poets migrate across styles, while others define themselve by a style. The real insights will emerge as algorithms like t-SNE are applied to larger corpus, and allow nuanced investigation of the features extracted: on what criteria exactly did the probabilities grow? What are the 2 or 3 core dimensions?

What is t-SNE

My very basic non-math-poet comprehension of how it works: t-SNE performs dimensionality reduction: it reduces the numbers of parameters considered. Dimensionality reduction is useful when visualizing data; think about graphing 20 different parameters (dimensions). Another technique that does this is PCA: principal component analysis. Dimensionality reduction is in a sense a distillation process, it simplifies. In this case, it converts ‘pairwise similarities’ between poems into probability distributions. Then it decreases the ‘entropy’ using a process of gradient descent to minimize the (mysterious) Kullback-Leibler divergence.

To know more about the Python version of t-SNE bundled into sklearn, read Alexander Fabisch

One of the few parameters I bothered tweaking over numerous runs is appropriately named) perplexity. In the FAQ, LJP van der Maaten (who created t-SNE) wrote:

 What is perplexity anyway?

Perplexity is a measure for information that is defined as 2 to the power of the Shannon entropy. The perplexity of a fair die with k sides is equal to k. In t-SNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. It is comparable with the number of nearest neighbors-k that is employed in many manifold learners.