General Issues
Genre Classification
Artist Identification
Artist Similarity

Genre Classification
Artist Classification


ISMIR2004 Audio Description Contest - Genre/ Artist ID Classification and Artist Similarity

Compare algorithms for music classification and similarity. The contest is divided into three subcontests. Participants may choose to participate in any or all of them.

- Genre classification

- Artist identification

- Artist similarity

General Issues

- Participants should send us their postal adress so that we can distribute the data.
- A framework is provided to allow the participants to simulate an evaluation environment like the one that will be used during the final evaluation.
- We accept algorithms and systems that make use of external frameworks such as Weka, HTK, Torch and so on as long as they run on GNU/Linux.
- We have proposed, based on literature and feedback from researchers, a set of evaluation metrics that determine the winner. Should something inexpected occur, we leave the door open to redefine the winner after a vote by the community.
- Contestants can cancel their participation.
- Contestants may decide not to make their results public. Anonymity will be rigourously preserved.
- The audio content cannot always be distributed due to copyright licenses. We will compute and deliver low level descriptors of the music tracks as proposed in [1]. In order to allow participants to start preparing and testing, MFCC are already included in the distribution DVD.
- Low level descriptors of the training/evaluation data can be computed at our lab and delivered to the participants up to 3 times.


Important dates

June 28

Final definition of the contest rules

July 15

Tuning data made available to participants

September 7

Deadline for participant submission of the algorithms (participation can be cancelled)

October 10-14


Publication of the results of the tests. Prizes (to be defined) will be delivered during the ISMIR 2004 conference in Barcelona

>> back to top

Genre classification
The task is to classify songs into genres like Magnatune has organized its site. Magnatune is an example of a customer that would benefit from MIR tools.
Moreover, its licensing scheme allow the use of the audio content for research.

Training and development set
A training and a development set will be mailed to participants. The training and development set consist each of:

- classical: 320 samples.
- electronic: 115 samples
- jazz_blues: 26 samples
- metal_punk: 45 samples
- rock_pop: 101 samples
- world: 122 samples

Data for participants

- training/genres.txt: list of all genres (classes)
- training/tracklist.csv: list of all tracks with metadata (genre, artist, album,
track title, tracknumber, filename) in CSV format
- development/tracklist.csv: anonymous list for development as used later for
- development/ground_truth.cvs: mapping of tracks to genre (eg.:
- development/tracks/*.mp3: audio files in mp3 format
- training set: part1 part2 (LARGE FILES!)
- development set: part1 part2 (LARGE FILES!)

Test set
The evaluation set will not be distributed and consists of:
~ 700 tracks in the genres (classes) mentioned above with a similar distribution.

Evaluation metric
The evaluation corresponds to the accuracy of correctly guessed genres. overlap

The samples per class are not equidistributed. The evaluation metric normalizes the number of correctly predicted genres by the probability of appearance of each genre (P_c).


Artist Identification
The task is artist identification given three songs after training the system on 7 songs per artist.

Training and Development set
Low-level features corresponding to songs of 105 artists from uspop2002 [1-3].

The training set includes 7 songs from each artist and the development 3 songs.

Data for participants

- training/tracklist.csv: list of all tracks with metadata (artist, album, track
title, tracknumber, filename) in CSV format
- training/artists.txt: list of all artists
- training/tracks/<artist>/<album>/<num>-<track title>: mfc values for each track
- development/tracklist.csv: anonymous list for development as used later for
- development/tracklist_realnames.csv: all metadata for the development tracks
- development/ground_truth.csv: mapping of anonymous name to real artist names (eg."artist_1",,"aretha_franklin")
- development/tracks/*.mfc: mfc values for each track
- artist ID/similarity set: part1 part2 (LARGE FILES!)

Test set

~ 200 artists not present in uspop2002 (7 songs for training, 3 songs for evaluation).

Evaluation metric
Accuracy = number Of Correctly Identified Artists/Total


>> back to top

Artist Similarity

The task is building a system that proposes similar artists like the experts of All Music Guide do.

Training and Development set
105 artist from uspop2002 [1-3], each represented by 10 songs, divided in a training set of 53 artist and a development set of 52 artists.

Data for participants

- training/artists.txt: list of artist names - training/tracklist.csv: list of all tracks with metadata (artist, album, track title, tracknumber, filename) in CSV format.

- training/tracklist.csv: list of all tracks with metadata (artist, album, track title, tracknumber, filename) in CSV format
- training/tracks/artist/album/num-track title="".mfc: mfc coefficients for each track.
- development/artists.txt: list of artist names
- development/tracklist.csv: list of all tracks with metadata (artist, album, track title, tracknumber, filename) in CSV format
- development/tracks/artist/album/num-track title="".mfc: mfc coefficients for each track
- artist similarity data set (same as in artist ID)

Test set

~ 200 artists not included in uspop2002 (100 for training and a 100 for evaluation).

Evaluation metric

The goodness of the agreement between the automatic recomendations and the expert opinions will be assessed by different metrics against a similarity matrix derived as described in [1-3].

The expert matrix of this contest differs slightly to the one of [1-3] because it was derived using a different set of artists. There are 105 artist that appear in both sets. We have compared the agreement on the overlapping submatrices and we found a top-10 rank agreement of 0.345 (averaged on 10 runs) using topNrankagree.m (see [1] and [3]) between our AMG (expert allmusicguide derived data) and the one of [1-3]. Using the matrix of [1-3] as ground truth the agreement is 0.255.

The following formulae will be used for the evaluation of the recommendations. The exact implementation of the evaluations can be found in the framework.

- TopNrankagree: see [1-3]


where si is the score for artist i, N is how many similar artists are considered in each case (10), r is the `decay constant' for the reference ranking (0.50^.33), c is the decay constant for the candidate ranking (0.50^.67), and k_r is the rank under the candidate measure of the artist ranked r under the reference measure.

- Overlap: Measures the number of common artist among the 10 best recomendations of the automatic system and the expert's opinion.

- MTG distance

where d(i) represents the distance from the original artist to the result ranked i by the system, and c(r) represents the cardinal of all artists at distance r from the original one.

>> back to top

The testing framework is composed of python scripts that will take care of the administrative tasks inherent to running batches of tests. The user just has to plug-in his algorithms in a simple interface to be able to run tests and evaluate a system.

The framework can be downloaded from here

It is still in development so any suggestions or contributions are welcome.

>> back to top

For the development and test there are features corresponding to Mel Frequency Cepstrum Coefficients extracted using the HTK Hidden Markov Model Toolkit available from We used the following settings for the MFCC extraction.

HCopy 3.1 CUED 16/01/02 : $Id: HCopy.c,v 1.6 2002/01/16 8:11:29 ge204 Exp $

The files can be read with the HList utility from the HTK toolkit.

SOURCERATE     = 454		#22050 Hz sampling rate

TARGETKIND     = MFCC_E_D       #MFCC + Energy + Delta
TARGETRATE     = 150000		#15ms frame rate



WINDOWSIZE     = 300000.0	# 30ms window size
USEHAMMING     = TRUE		# Hamming window
PREEMCOEF      = 0.95		# Pre-emphasis

NUMCHANS       = 40		# Number of filter banks

NUMCEPS        = 20		# Number of MFCC
CEPLIFTER      = 20		# Use of Cepstral Filter

LOFREQ         = -1		
HIFREQ         = -1

ESCALE         = 1

>> back to top

[1] Berenzweig, Adam, Daniel Ellis, Beth Logan, Brian Whitman. "A Large
Scale Evaluation of Acoustic and Subjective Music Similarity Measures."
In Proceedings of the 2003 International Symposium on Music Information
Retrieval. 26-30 October 2003, Baltimore, MD.[ paper PDF ]

[2] Ellis, Daniel, Brian Whitman, Adam Berenzweig and Steve Lawrence.
"The Quest For Ground Truth in Musical Artist Similarity." In
Proceedings of the 3rd International Conference on Music Information
Retrieval. 13-17 October 2002, Paris, France. [ paper PDF | talk PDF ]


>> back to top

Universitat Pompeu Fabra

Institut Universitari de l'Audiovisual