Open Voice Project

We're building an open dataset of tabletop RPG session recordings from real gaming groups on Discord. Every participant consents before recording. Each speaker is a separate track. Participants choose their own license restrictions. Released for speech research, transcription, and AI. See how it works.

• • •
The gap

No open, consent-based TTRPG audio dataset exists

Tabletop RPG sessions produce some of the richest multi-speaker conversation in existence — character voices, narration, cross-talk, improvisation. Anyone building speech recognition, transcription tools, or conversational AI has no good source for this kind of audio.

DatasetAudioPer-speakerConsentOpen
CRD3 (Critical Role)Text onlyScrapedYes
Playing with VoicesYouTube linksMixedNoNo
FIREBALLText onlyYesCC BY 4.0
Open Voice ProjectRaw PCMPer-speakerExplicitCC BY-SA 4.0 +
Dataset goals

What makes this different

Per-speaker tracks
Record each participant as a separate audio file. No diarization needed.
Consent-first collection
Every participant explicitly agrees before any audio is captured.
Multi-system coverage
D&D, Pathfinder, Call of Cthulhu, and more. Not limited to one show or system.
Character voice acting
Players alter their voices for characters — a uniquely challenging ASR benchmark.

Record, consent, contribute

A Discord bot joins your voice channel, collects consent from every participant, and records each speaker as a separate lossless audio track.

01
GM runs /record in a text channel. Bot detects voice participants and posts a consent prompt.
02
Each player clicks Accept or Decline. Recording starts once everyone responds.
03
After accepting, players can toggle "No LLM Training" or "No Public Release." Defaults to fully open.
04
Bot joins voice and announces "Recording has begun." Each speaker is captured as a separate stream.
05
GM runs /stop. Audio is pseudonymized and uploaded. Bot announces "Recording complete."
06
Participants can review transcripts, flag private info, correct lines, and manage consent anytime on the web portal.
Bot slash commands Consent prompt Data restriction toggles Session complete
Add to your Discord server
Your data, your rules

Participant portal

Every participant can sign in with Discord and manage their recordings. You stay in control of your voice data — before, during, and after publication.

Under construction — coming soon.

Review transcripts
Read through your session transcripts with per-line audio playback. See exactly what was captured.
Flag private info
Spot a phone number or address in pre-game chatter? One click to flag it. Flagged lines are excluded from the published dataset.
Correct transcriptions
Fix what the speech recognition got wrong. Every correction improves the dataset — and produces free ASR training data.
Change permissions
Toggle "No LLM Training" or "No Public Release" per session, anytime. Upgrade or restrict — your choice.
Download your data
Export all your sessions, transcripts, and audio files. Full GDPR data portability.
Withdraw consent
Changed your mind? Withdraw consent for any session. If it hasn't been published yet, your audio is permanently deleted from storage.

Session bundles

Each recording session produces a structured bundle uploaded to secure storage. Discord identities are pseudonymized — the public dataset contains no usernames or IDs.

Audio
Per-speaker raw PCM, 48kHz 16-bit mono
Metadata
Game system, duration, participant count
Consent
Pseudonymized records with two-flag license restrictions
Status

Early collection

We're recording sessions and validating the pipeline. The dataset will be published on HuggingFace once we have enough validated contributions. Participants choose their own license restrictions per session.

I need this dataset too

I'm building Session Helper, an open source, self-hostable GM assistant that transcribes sessions, generates notes, and maintains a living campaign wiki. This dataset is how I'm training and evaluating the transcription pipeline — and I want it to be useful to everyone else working on speech and language tools too.