Acappella Dataset The dataset contains ~ 46 hours of a cappella solo singing videos sourced from YouTube. The dataset covers singing content in 4 language categories: English, Spanish, Hindi and others. Besides, the singers also span a wide range of different ethnicities, accents, professions and ages. The dataset comes with pre-defined splits: ~ 80 % Training, ~ 7% Validation, ~ 13% Test (seen-heard + unseen-unheard) The samples in our dataset are defined based on the timestamps corresponding to the segments of interest in each of the videos. We provide these timestamps as a part of the dataset. They have been manually selected to exclude parts of the videos that do not satisfy any of the following characteristics: single frontal face view without occlusions, minimal background noise, no beatboxing, no snapping fingers, songs with lyrics (e.g. we avoid humming and yodelling). We provide Youtube URLs, associated face detections, and timestamps, as well as cropped audio segments and cropped face videos from the dataset. The copyright the of original videos remains with the original owners of those videos. The data is covered under a Creative Commons Attribution 4.0 International license (Please read the license terms here. https://creativecommons.org/licenses/by/4.0/). Downloading this dataset implies agreement to follow the same conditions for any modification and/or re-distribution of the dataset in any form. Additionally any entity using this dataset agrees to the following conditions: THIS DATASET IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. These license terms have been inspired from that of the VoxCeleb dataset (https://www.robots.ox.ac.uk/~vgg/data/voxceleb/).