I am a Swedish computer science & media technology student who loves to work with exciting projects that involve Visualization, Software Development, Human Computer Interaction and Education. View full presentation

This blog contains random thoughs and some notes from the development of Continuous — a modern WebGL based math visualization tool to help students learn algebra and calculus. Show archive

emil axelsson

January 8 2014

Castafiore - an audio visualizer for Spotify

Fall 2013 Spotify released a new Audio API which lets you create apps within Spotify that make use of the waveform and the spectrum of the currently playing song. I wanted to try it out and see if I could create something similar to the old visualizations from Winamp and Windows Media Player back in the early 2000's (while also trying to create something new and interesting).

Luckily Spotify has support for WebGL content inside third party apps, which makes it possible to take advantage of the computation power of modern graphics hardware.

Meet Madame Castafiore! Lately I have been naming my hobby projects after Tintin characters, and since this one is very much about music I had no other option than borrowing the name from The Milanese Nightingale.

The idea

A good friend of mine recently made a painting by letting paint drops run down on a canvas (not a HTML5 canvas, I am talking about the real deal now). I thought it would be cool to create a Spotify app that automatically splatters paint on a surface depending on the dynamics of the playing song. The paint should then blend with other colors while flowing down the surface.

How to do it?

I found this paper on realtime simulation of water drops on the GPU. To simulate water drops, the authors suggest using a texture where each pixel intensity represents the amount of water in the corresponding area. Once a pixel intensity exceeds a certain limit, the gravitational force on the water will start overcoming the frictional force and it will start flowing down the surface, leaving some traces behind it. The more water, the more speed.

The paper also presents a way to render the water by adding another render pass, calculating its surface normals and deriving the directions of reflection and refraction. I stole many of these ideas right away and implemented them in JavaScript and GLSL.

What about color?

I wanted my drops to have colors, so I needed to come up with a way for the simulation to take this information into account as well. My first approach was to store the amount of cyan, magenta and yellow pigments in the texture's r, g, and b channels, and the amount of water in the alpha-channel. When blending colors I would add up the pigments, and before rendering I would convert from CMY color to RGB. However, this makes it impossible to represent white paint. Using (0, 0, 0, 1) as CMYA components would naturally mean no pigments and a lot of water.

I would need to store both absorption and reflectance for the three color channels in order to allow for both black, white and transparent drops. But since a texture in WebGL can only store four values per pixel, and multiple render targets is not yet supported in the WebGL standard, I decided to use a simpler model where I assume the pigment concentration to be constant. This means that transparent water cannot longer be represented. I consider it a valid tradeoff, since white color probably will look more interesting than transparent water in a audio visualization. When blending colors, I simply calculate an average of the colors being blended, weighted on the contribution of each color.

This is what I had come up with after two days of frenetic coding. The colors were here selected from The Scream by Edvard Munch and blended using the first approach discussed above.

How to drop the drops when the bass drops?

The whole point with an audio visualization is its way to react to the dynamics of the music. I wanted color splats to pop up when there are large changes in the volume of the song. Through Spotify's Audio API, the application is fed with short-time Fourier transforms about ten times per second. By calculating the spectral flux between the frames, I could detect distinct volume changes in the audio track.

Some songs are much more dynamic in their volume than others, and it is not uncommon that one song has parts with much more dynamics than other pieces of it. I found that emitting splats when the flux exceeded a fixed threshold would cause the visualizer to go completely bananas during some songs and become totally inactive during others. I solved this with a buffer keeping track of the latest calculations of the flux, and normalizing the current value with respect to the average flux in the buffer. The size of each splat depends on how much the flux exceeds the threshold.

Colors from the album art

Thanks to Spotify's API, it was easy to fetch the album art of the currently playing song. To make the visualizations more different from song to song, the colors are chosen from the cover. At some volume peaks I also make a slight random color adjustment to the dark background color, giving the whole thing a little more disco feeling.

Listen to Håkan Hellström, or try it yourself!

I made a video of what Castafiore painted while listening to Håkan Hellström's Du är snart där. You can watch the video here or download the source code from my Github repo.