Video: Yasuhiro Tani, Courtesy of Yamaguchi Center for Arts and Media [YCAM] / Edit: Qosmo, inc.

“AI DJ Project” is a live performance featuring an Artificial Intelligence (AI) DJ playing alongside a human DJ. Utilizing various deep neural networks, the software(AI DJ) selects vinyl records and mixes songs. Playing alternately, each DJ selects one song at a time, embodying a dialogue between the human and AI through music. DJ-ing “Back to Back” serves as a critical investigation into the unique relationship between humans and machines. The system of AI DJ consists of the following three features:

_
1. Music Selection
We trained three different neural networks for inferring genres, musical instruments and drum machines used in the track from spectrogram images. AI DJ "listens" to what human DJ plays and extracts auditory features using those networks. The extracted features are compared with those of all tracks in our pre-selected record box, so that the system can select the closest one, which presumably has similar musical tone/mood.

_
2. Beatmatching
It is also a task for AI DJ to control the pitch(speed) of the turntable to match the beat. We used “reinforcement learning”(RL) to teach the model how to speed up/down, nudge/pull the turntable to align downbeats through trials and errors. For this purpose, we built an OSC-compatible custom turntable and robot fingers to manipulate.

_
3. Crowd-Reading
A good DJ should pay attention to the energy of the audience. We utilize a deep leaning based motion tracking technique to quantify how much people in the audience dance to the music AI plays for future music selection.

We have performed several times in different locations in Japan and Europe. AI's slight unpredictability always brought amusing tension into the performance and gave new ideas to human DJs on what/how to play music as a DJ. AI is not a replacement for the human DJ. Instead, it is a partner that can think and play alongside its human counterpart, bringing forth a wider perspective of our relationship to contemporary technologies.

BACKGROUND

A DJ (or disc jockey) is a person who mixes different sources of pre-existing recorded music, usually for a live audience in a nightclub. It is regarded as a highly creative process to select appropriate music and mix them in smooth and pleasant ways.

The art of DJ has been one of many testbeds of computational creativity. ‘AlgoRhythms’ is a Turing test competition, where DJ software mix given music automatically and try to convince human evaluators that human DJs did the mixes. ‘2045’ is an AI-themed DJ party, where each DJ brings his/her custom DJ algorithm and let it play in lieu.

Unlike these previous attempts, our AI DJ project doesn’t aim to automate the whole DJ process, but rather tries to accomplish a successful collaboration between AI and human DJ. Hence in our DJ session, software and human DJ plays alternately one track at a time(usually referred as Back to Back or B2B).

FEATURES

In B2B the AI system and human DJ perform under similar conditions as much as possible. For example, the AI uses the same physical vinyl records and turntables as human DJ. The system listens to tracks played by the human DJ and chooses the next record to be played. (It is a task for human assistants to look for the selected record and set it to the turntable.) After a record is set the AI begins the process again, adjusting the tempo of the next track to the tempo of the track played by its human counterpart. The beats of both tracks are matched by controlling the pitch(rotation speed) of the turntable. For this purpose, we built a custom DJ turntable and a robot finger, which can be plugged into a computer and be manipulated via OSC protocol.

1. MUSIC SELECTION
2. BEATMATCHING
3. CROWD-READING


1. MUSIC SELECTION

The minimum requirement for a DJ is to maintain the “flow” of music, so it is a common practice to select a next track, which sounds somewhat similar to what is being played, but has something new in its rhythm structure/sound texture. . . etc at the same time. Also, DJs usually use instruments or sometimes prominent drum-machine sounds used in tracks as clues for music selection (i.e., a track with piano solo to a track with organ riff, Two tracks both with Roland TR-808 snare)

Based on these observations, we trained three different neural networks. Our models and datasets used for each model are the following:

  • Genre Inference (wasabeat dance music dataset)
  • Instrument Inference (IRMAS dataset)
  • Drum Machine Inference (200.Drum.Machines dataset)

Recommending music on Spotify with deep learning http://benanne.github.io/2014/08/05/spotify-cnns.html

t-SNE representation of the extracted auditory features

Each model is a convolutional neural network similar to [2], which takes spectrogram images of sounds and infers genres(minimal techno/tech house/hip-hop. . . ), instruments(piano/trumpet. . . ) and drum machines (TR-808/TR-909. . . ).

Once we got the network trained, we can use the same model to extract auditory features in a high dimensional vector. When human DJ is playing, the system feeds the incoming audio into the model and generate a feature vector. The vector will be compared with those of all tracks in our pre-selected record box (with over 350 tracks for the present), so that the system can select the closest track, which presumably has similar musical tone/mood/texture, as the next track to play.

It’s worth noting that we initially collected and analyzed DJ playlist dataset (visualized in the image) and used it to select the most likely candidate according to the data as in the collaborative filter. We soon realized, however, that it ended up banal music selections, then decided to ignore all metadata associated with the music (genre, artist name, label, etc.) and focus only on the audio data.


2. BEATMATCHING

The second task for AI DJ is to control the pitch(=speed) of turntable to match the beat with music human DJ plays. We used “reinforcement learning”(RL) to teach the model how to speed up/down, nudge/pull the turntable to align downbeats through trials and errors. We use various metrics in [3] to compute rewards for the model.

We have found that it is relatively easy to match tempo of two tracks, but very difficult to align the “phase” of beats at the same time due to its longterm dependency: the result of any manipulation can be observed as changes in tempo only after several bars. Hence, the beat matching through RL is still an open challenge.

Reinforcement Learning of DJ Beat Matching (2016)

Custom Technics SL-1200 and Robot finger for beatmatching (designed and assembled by YCAM InterLab)


3. CROWD-READING

“A good DJ is always looking at the crowd, seeing what they like, seeing whether it’s working; communicating with them, smiling at them. And a bad DJ is always looking down at the decks and just doing whatever they practiced in their bedroom, regardless of whether the crowd are enjoying it or not.”

Norman Cook, aka Fatboy Slim


During music selection process, the system tries to select tracks with similar mood as mentioned above, as long as the amount of the body movement is more significant than a given threshold. Once the index gets less than the threshold, random noise, inverse proportional to the amount of the body movement, was added to the feature vectors of incoming music, so that the system might be able to explore new musical realm and (hopefully) stimulate the seemingly bored audience.

Unsurprisingly, this randomness apparently worked as a feedback loop in the performance: the randomness brought more confusion to the audience, and it led to more randomness. It ended up proving the difficulty to maintain a subtle balance between regularity and unexpectedness in DJ’s music selection process.

At the latest AI DJ performance in Dec 2017, we introduced a new feature: “reading” a crowd. It is an essential role of DJ to read the audience and play music suitable to the atmosphere. In the performance, we deployed a camera system to track the movement of the bodies in the crowd using OpenPose library. The system quantifies how much the audience appreciates (i.e., dance) the music being played and use the information in the process of music selection.

Camera system for quantifying how much the audience dances to the music

PAST PERFORMANCES

2017

Sound Tectonics #20

2017/12/15 @ Yamaguchi Center for Arts and Media (YCAM), Yamaguchi, Japan
Guest DJ : tofubeats, Licaxxx

MUTEK.JP

2017/11/3 @National Museum of Emerging Science and Innovation (Miraikan), Tokyo, Japan

SCOPITONE FESTIVAL 2017

2017/9/21 @STEREOLUX, Nantes, France

Speculum Artium 2017

2017/9/14 @Zavod za kulturo Delavski dom Trbovlje, Slovenia

YCAM presents AI DJ / WILD BUNCH FEST. 2017

2017/8/19 @ Kirara Memorial Park, Yamaguchi, Japan

DIGITAL CHOC  #6— Machines désirantes

2017/2/17 @WWW, Shibuya, Tokyo, Japan

2016

2045 × LIFE PAINT Supported by VOLVO CAR JAPAN

2016/10/27 @Daikanyama UNIT, Tokyo

2045 Generation #4

2016/9/4 “OKAZAKI LOOPS @The National Museum of Modern Art, Kyoto, Japan

CREDIT

Concept/Programming

Nao Tokui

Visualization

Shoya Dozono

Project Management

Miyu Hosoi

Assistance

Robin Jungers, Yuma Kajihara

Robot

TASKO, inc.

Customized turntable for AI

Mitsuhito Ando (YCAM)

Project Support

YCAM InterLab

CONTACT

contact@qosmo.jp

REFERENCE

[1] AlgoRhythm — Neukom Institute Turing Tests in the Creative Arts. | http://bregman.dartmouth.edu/turingtests/

[2] Keunwoo Choi, Gyrgy Fazekas and Mark Sandler. Transfer learning for music classification and regression tasks. Mar 2017.

[3] Matthew E. P. Davies and Sebastian Böck. Evaluating the Evaluation Measures for Beat Tracking. ISMIR, 2014.

[4] Daito Manabe and Nao Tokui. 2045 AI DJ Party. | http://2045.rhizomatiks.com/, 2014.

SPECIAL THANKS

wasabeat, Chris Romero, Yansu Kim, Rakutaro Ogiwara, Ametsub

PRESS

^