Podcast my Paper
We provide a module to perform document processing and text-to-speech to enable
listening to any document in podcast style. The functionality can be accessed
through the podcast API or by running the script
scripts/podcast_single_document.py
.
API access
The podcast API is available through the podcast
module.
An end-to-end workflow looks like this (modified from the test module):
from biochatter.podcast import Podcaster
from biochatter.vectorstore import DocumentReader
# Load document
reader = DocumentReader()
document = reader.load_document("test/dcn.pdf")
# Initialise podcaster
podcaster = Podcaster(document)
# Generate podcast (LLM task)
podcaster.generate_podcast(characters_per_paragraph=5000)
# Employ text-to-speech to generate audio file (optional)
podcaster.podcast_to_file("test/test.mp3", model="tts-1-hd", voice="alloy")
If you do not want audio output, you can simply access the generated text though
the function podcaster.podcast_to_text()
.
This example uses the paid OpenAI text-to-speech API to generate the audio file.
The default of the podcast_to_file
function is to use the free Google
text-to-speech API. When using OpenAI, due to the input length limit of 4096
characters, the podcast is split into multiple parts indicated by integer
suffixes.
Command line access
To generate a podcast from a single document more quickly, you can use the
scripts/podcast_single_document.py
script. It accepts two arguments, the path
to the document and the path to the desired output file. If the output file
ends in .mp3
, the OpenAI text-to-speech API will be used to generate an audio
file. Otherwise, the script will generate a text file and skip the
text-to-speech step. If using the OpenAI text-to-speech API, multiple files
will be generated with integer suffixes. If you installed BioChatter with
poetry, you can run the script like this (from the root directory of the
repository):