Corpora

Audio Content Analysis Datasets
Companion datasets to the book Audio Content Analysis, by Alexander Lerch.
GiantSteps Datasets
Datasets for automatic evaluation of tempo estimation and key detection algorithms.
GTZAN dataset
Manually annotated metrical structure for 1000 audio tracks
iKala Dataset
Comprises 252 30-second excerpts sampled from 206 iKala songs.
J-DISC
J-DISC is a resource for searching and exploring jazz recordings created by the Center for Jazz Studies at Columbia University.
MAPS Database
A piano database for multipitch estimation and automatic transcription of music.
MARG Note-level Singing Dataset
Dataset produced by the Music & Audio Research Group for work in automatic music transcription.
McGill Billboard Annotations
Annotations and audio features for the first 1000 randomly selected entries from Billboard chart slots presented at ISMIR 2011, and the additional 300 entries used to evaluate audio chord estimation for MIREX 2012.
Meertens Tunes Collections
The MTC consist of a number of melodic data sets (Dutch Songs), both vocal and instrumental. MTC is open access available for research purposes and is especially valuable for MIR research.
mididb.com
MIDI transcriptions of many popular songs, including EDM.
Million Song Dataset
A collection of audio features and metadata for a million contemporary popular music tracks.
MIR Datasets
A list of datasets maintained at the Music Inforation Retrieval Wiki.
MuseData
An electronic library of Classical Music scores encoded to MuseData, Humdrum, MIDI, and MIDI+ formats.
Music Technology Group Datasets
Various datasets compiled as part of research projects carried out at the MTG.
musiXmatch Database
Official lyrics collection of the Million Song Dataset.
Petrucci Music Library
The datasets backing the Music Ngram Viewer.
RECOLA Database
Multimodal recordings of spontaneous collaborative and affective interactions in French.
repovizz
A framework for remote storage, visual browsing, annotation, and exchange of multi-modal data.
RWC Music Database
The RWC (Real World Computing) Music Database is a copyright-cleared music database (DB) available to researchers as a common foundation for research.
Suomen Kansan eSävelmät
Digital Archive of Finnish Folk Tunes.
SymbTr
A Turkish Makam Music Symbolic Data Collection.
The Bellmann Corpus
The Bellmann Corpus, released in 2013, consists of musical scores for over 650 pieces (or complete sections of multi-movement works) for piano or harpsichord.
Tonal Harmony Excerpts
MIDI files for 46 excerpts from the workbook and instructor’s manual for Tonal Harmony by Stefan Kostka and Dorothy Payne.
Weimar Jazz Database (WJAZZD)
A component of the Jazzomat project, WJAZZD is a database of jazz solo transcriptions available to the public to further enhance and improve jazz and MIR research.
Skip to toolbar