This visualisation scrapes data from theguardian.com website, and displays the articles in a grid format, where related subject matters should appear near to one another across the whole grid.

Firstly, the articles are analysed for content and flattened to two dimensions using principal components analysis (using all words from the headlines, standfirsts and tags as features), and are then skewed further into a regular 2D grid. Secondly (and separately from the principal components anlaysis), K-means clustering is applied on the articles, assigning some 'topic' to each article: to see these topic assignments, select 'Topic' as the colour source (this is the default setting). Different topics (as determined by K-means) are represented by different colours. These approximately align with the PCA-driven layout. Other ways to colour the articles are by popularity (this data is extracted from Facebook share data) and article recency.

All analysis is performed in Python, and visualisation using the D3.js library.

Hover over the visualisation

Click on a block to open the article in a new window. All stories and data from theguardian.com.