/record in a text channel. Bot detects voice participants and posts a consent prompt.
We're building an open dataset of tabletop RPG session recordings from real gaming groups on Discord. Every participant consents before recording. Each speaker is a separate track. Participants choose their own license restrictions. Released for speech research, transcription, and AI. See how it works.
Tabletop RPG sessions produce some of the richest multi-speaker conversation in existence — character voices, narration, cross-talk, improvisation. Anyone building speech recognition, transcription tools, or conversational AI has no good source for this kind of audio.
| Dataset | Audio | Per-speaker | Consent | Open |
|---|---|---|---|---|
| CRD3 (Critical Role) | Text only | — | Scraped | Yes |
| Playing with Voices | YouTube links | Mixed | No | No |
| FIREBALL | Text only | — | Yes | CC BY 4.0 |
| Open Voice Project | Raw PCM | Per-speaker | Explicit | CC BY-SA 4.0 + |
A Discord bot joins your voice channel, collects consent from every participant, and records each speaker as a separate lossless audio track.
/record in a text channel. Bot detects voice participants and posts a consent prompt./stop. Audio is pseudonymized and uploaded. Bot announces "Recording complete."
Every participant can sign in with Discord and manage their recordings. You stay in control of your voice data — before, during, and after publication.
Under construction — coming soon.
Each recording session produces a structured bundle uploaded to secure storage. Discord identities are pseudonymized — the public dataset contains no usernames or IDs.
We're recording sessions and validating the pipeline. The dataset will be published on HuggingFace once we have enough validated contributions. Participants choose their own license restrictions per session.
I'm building Session Helper, an open source, self-hostable GM assistant that transcribes sessions, generates notes, and maintains a living campaign wiki. This dataset is how I'm training and evaluating the transcription pipeline — and I want it to be useful to everyone else working on speech and language tools too.