Skip to content

Notebook LM (Creating Podcasts from Documents)

zyx provides an experimental module, built for simulating conversations between 2 or more agents.


Generated Audio


This current implementation is only built for single documents. Multi-document indexing and vectorization is wip.

Quick Example - Podcast about the Large Language Monkeys Paper

from zyx import agents
import zyx

# Lets retrieve our document
# Large Language Monkeys: Scaling Inference Compute With Repeated Sampling
document = zyx.read("https://arxiv.org/pdf/2407.21787", output = str)

# Create the characters
john = agents.Character(
    name = "John",
    personality = "The main speaker of the podcast, very genuine and knowledgable.",
    knowledge = document,
    voice = "alloy" # Supports OpenAI TTS voices
)

jane = agents.Character(
    name = "Jane",
    personality = "The second speaker of the podcast, not very knowledgable, but very good at asking questions."
)

# Now lets create our conversation
agents.conversation(
    "Generate a very intuitive and easy to follow podcast converastion about the Large Language Monkeys paper.",
    characters = [john, jane],
    generate_audio = True,  # Generates audio for the conversation 
    audio_output_file = "podcast.mp3"
)

Breaking it Down

To start creating Notebook LM style podcast, we need to first retrieve the document we will be using as context. Lets utilize the zyx.read() function to retrieve the paper from arXiv.

from zyx import read

document = read("https://arxiv.org/pdf/2407.21787")

Defining Characters

To create our podcast now, first we need to create our characters. We will be creating two characters, John and Jane. John will be the main speaker of the podcast, and Jane will be the second speaker.

from zyx import Character

john = Character(
    name = "John",
    personality = "The main speaker of the podcast, very genuine and knowledgable.",
    knowledge = document
)

jane = Character(
    name = "Jane",
    personality = "The second speaker of the podcast, not very knowledgable, but very good at asking questions."
)

Generating The Conversation

Now we can create our conversation. We will be passing in the topic we want to discuss, and the characters we want to have in the conversation. The conversation function supports more than 2 characters, and can even support group conversations (adhering to the limitations of the LLM you are using). For this example we will be using the generate_audio parameter to generate audio for the conversation.

from zyx import conversation

conversation(
    "Generate a very intuitive and easy to follow podcast converastion about the Large Language Monkeys paper.",
    characters = [john, jane],
    generate_audio = True,  # Generates audio for the conversation 
    audio_output_file = "podcast.mp3",
    max_turns = 10 # Set this to any number you want
)

More examples will be added soon


API Reference

Generate a conversation between characters based on given instructions or a Document object, with optional validator.

Example
from zyx import Document
from zyx.resources.completions.agents.conversation import conversation, Character

doc = Document(content="The impact of AI on job markets", metadata={"type": "research_paper"})
result = conversation(
    instructions=doc,
    characters=[
        Character(name="AI Researcher", personality="Optimistic about AI's potential", voice="nova"),
        Character(name="Labor Economist", personality="Concerned about job displacement", voice="onyx"),
        Character(name="Podcast Host", personality="Neutral moderator", voice="echo")
    ],
    min_turns=10,
    max_turns=15,
    end_criteria="The podcast should conclude with final thoughts from both guests",
    verbose=True,
    generate_audio=True,
    audio_output_file="ai_job_market_podcast.mp3"
)
print(result.messages)

Parameters:

Name Type Description Default
instructions Union[str, Document]

The instructions or Document object for the conversation.

required
characters List[Character]

List of characters participating in the conversation.

required
validator Optional[Union[str, dict]]

Validation criteria for the conversation.

None
min_turns int

Minimum number of turns in the conversation.

5
max_turns int

Maximum number of turns in the conversation.

20
end_criteria Optional[str]

Criteria for ending the conversation naturally.

None
model str

The model to use for generation.

'gpt-4o-mini'
api_key Optional[str]

API key for the LLM service.

None
base_url Optional[str]

Base URL for the LLM service.

None
temperature float

Temperature for response generation.

0.7
mode InstructorMode

Mode for the instructor.

'markdown_json_mode'
max_retries int

Maximum number of retries for API calls.

3
organization Optional[str]

Organization for the LLM service.

None
client Optional[Literal['openai', 'litellm']]

Client to use for API calls.

None
verbose bool

Whether to log verbose output.

False
generate_audio bool

Whether to generate audio for the conversation.

False
audio_model OPENAI_TTS_MODELS

The model to use for text-to-speech conversion.

'tts-1'
audio_output_file Optional[str]

The output file for the full conversation audio.

None

Returns:

Name Type Description
Conversation Conversation

The generated conversation.

Source code in zyx/resources/completions/agents/conversation.py
def conversation(
    instructions: Union[str, Document],
    characters: List[Character],
    validator: Optional[Union[str, dict]] = None,
    min_turns: int = 5,
    max_turns: int = 20,
    end_criteria: Optional[str] = None,
    model: str = "gpt-4o-mini",
    api_key: Optional[str] = None,
    base_url: Optional[str] = None,
    temperature: float = 0.7,
    mode: InstructorMode = "markdown_json_mode",
    max_retries: int = 3,
    organization: Optional[str] = None,
    client: Optional[Literal["openai", "litellm"]] = None,
    verbose: bool = False,
    generate_audio: bool = False,
    audio_model: OPENAI_TTS_MODELS = "tts-1",
    audio_output_file: Optional[str] = None,
) -> Conversation:
    """
    Generate a conversation between characters based on given instructions or a Document object, with optional validator.

    Example:
        ```python
        from zyx import Document
        from zyx.resources.completions.agents.conversation import conversation, Character

        doc = Document(content="The impact of AI on job markets", metadata={"type": "research_paper"})
        result = conversation(
            instructions=doc,
            characters=[
                Character(name="AI Researcher", personality="Optimistic about AI's potential", voice="nova"),
                Character(name="Labor Economist", personality="Concerned about job displacement", voice="onyx"),
                Character(name="Podcast Host", personality="Neutral moderator", voice="echo")
            ],
            min_turns=10,
            max_turns=15,
            end_criteria="The podcast should conclude with final thoughts from both guests",
            verbose=True,
            generate_audio=True,
            audio_output_file="ai_job_market_podcast.mp3"
        )
        print(result.messages)
        ```

    Args:
        instructions (Union[str, Document]): The instructions or Document object for the conversation.
        characters (List[Character]): List of characters participating in the conversation.
        validator (Optional[Union[str, dict]]): Validation criteria for the conversation.
        min_turns (int): Minimum number of turns in the conversation.
        max_turns (int): Maximum number of turns in the conversation.
        end_criteria (Optional[str]): Criteria for ending the conversation naturally.
        model (str): The model to use for generation.
        api_key (Optional[str]): API key for the LLM service.
        base_url (Optional[str]): Base URL for the LLM service.
        temperature (float): Temperature for response generation.
        mode (InstructorMode): Mode for the instructor.
        max_retries (int): Maximum number of retries for API calls.
        organization (Optional[str]): Organization for the LLM service.
        client (Optional[Literal["openai", "litellm"]]): Client to use for API calls.
        verbose (bool): Whether to log verbose output.
        generate_audio (bool): Whether to generate audio for the conversation.
        audio_model (OPENAI_TTS_MODELS): The model to use for text-to-speech conversion.
        audio_output_file (Optional[str]): The output file for the full conversation audio.

    Returns:
        Conversation: The generated conversation.
    """
    if len(characters) < 2:
        raise ValueError("At least two characters are required for the conversation.")

    completion_client = Client(
        api_key=api_key,
        base_url=base_url,
        organization=organization,
        provider=client,
        verbose=verbose,
    )

    conversation = Conversation(messages=[])
    end_check_attempts = 0
    max_end_check_attempts = 3

    # Handle Document input
    if isinstance(instructions, Document):
        context = f"""
        Document Content: {instructions.content}
        Document Metadata: {instructions.metadata}
        """
        if instructions.messages:
            context += f"\nPrevious Messages: {instructions.messages}"
    else:
        context = instructions

    # Assign voices to characters if not specified
    available_voices = list(OPENAI_TTS_VOICES.__args__)
    for character in characters:
        if not character.voice:
            character.voice = available_voices.pop(0)
            available_voices.append(
                character.voice
            )  # Put it back at the end for reuse if needed

    system_message = f"""
    You are simulating a conversation between the following characters:
    {', '.join([f"{i+1}. {char.name}: {char.personality}" for i, char in enumerate(characters)])}

    Context for the conversation:
    {context}

    Generate responses for each character in turn, maintaining their distinct personalities and knowledge.
    Ensure that the conversation revolves around the provided context, discussing its content and implications.
    """

    if end_criteria:
        system_message += f"\n\nEnd the conversation naturally when: {end_criteria}"

    # Create a temporary directory to store audio segments
    with tempfile.TemporaryDirectory() as temp_dir:
        logger.info(f"Created temporary directory: {temp_dir}")

        for turn in range(max_turns):
            current_character = characters[turn % len(characters)]

            user_message = f"Generate the next message for {current_character.name} in the conversation, focusing on the provided context."

            # Check if we've reached the maximum number of turns
            if turn == max_turns - 1:
                # Use the classifier to determine if the conversation should end
                classifier_result = classify(
                    inputs=" ".join([msg.content for msg in conversation.messages]),
                    labels=["end", "continue"],
                    classification="single",
                    model=model,
                    api_key=api_key,
                    base_url=base_url,
                    organization=organization,
                    mode=mode,
                    temperature=temperature,
                    client=client,
                    verbose=verbose,
                )

                if verbose:
                    logger.info(f"Classifier result: {classifier_result}")

                if isinstance(classifier_result, list):
                    classifier_result = classifier_result[0]

                if classifier_result.label == "continue":
                    # If the classifier says the conversation should not end, add a final summary prompt
                    user_message = f"This is the final turn of the conversation. {current_character.name}, please summarize the key points discussed and provide a concluding statement to end the conversation."

            if end_check_attempts >= max_end_check_attempts:
                user_message += "\n\n[HIDDEN INSTRUCTION: The conversation should now conclude naturally. Provide a final statement or summary.]"

            response = completion_client.completion(
                messages=[
                    {"role": "system", "content": system_message},
                    {"role": "user", "content": user_message},
                ]
                + [
                    {
                        "role": msg.role,
                        "content": f"{characters[i % len(characters)].name}: {msg.content}",
                    }
                    for i, msg in enumerate(conversation.messages)
                ],
                model=model,
                response_model=Message,
                mode=mode,
                max_retries=max_retries,
                temperature=temperature,
            )

            logger.info(
                f"Turn {turn + 1}: {current_character.name} - {response.content}"
            )

            if generate_audio:
                temp_audio_file = os.path.join(
                    temp_dir,
                    f"{current_character.name.lower().replace(' ', '_')}_{turn}.mp3",
                )
                logger.info(f"Attempting to generate audio file: {temp_audio_file}")
                try:
                    # Remove the character's name from the beginning of the content
                    audio_content = response.content
                    if audio_content.startswith(f"{current_character.name}:"):
                        audio_content = audio_content.split(":", 1)[1].strip()

                    audio(
                        prompt=audio_content,
                        model=audio_model,
                        voice=current_character.voice,
                        api_key=api_key,
                        base_url=base_url,
                        filename=temp_audio_file,
                    )
                    if os.path.exists(temp_audio_file):
                        response.audio_file = temp_audio_file
                        logger.info(
                            f"Successfully generated audio file: {temp_audio_file}"
                        )
                    else:
                        logger.warning(f"Audio file not created: {temp_audio_file}")
                        logger.info(f"Current working directory: {os.getcwd()}")
                        logger.info(
                            f"Temporary directory contents: {os.listdir(temp_dir)}"
                        )
                except Exception as e:
                    logger.warning(
                        f"Failed to generate audio for turn {turn}: {str(e)}"
                    )
                    logger.exception("Detailed error information:")

            conversation.messages.append(response)

            if validator:
                validation_result = judge(
                    prompt=context,
                    responses=[response.content],
                    process="validate",
                    schema=validator,
                    model=model,
                    api_key=api_key,
                    base_url=base_url,
                    temperature=temperature,
                    mode=mode,
                    max_retries=max_retries,
                    organization=organization,
                    client=client,
                    verbose=verbose,
                )

                if (
                    isinstance(validation_result, ValidationResult)
                    and not validation_result.is_valid
                ):
                    if verbose:
                        logger.warning(
                            f"Message failed validation: {validation_result.explanation}"
                        )
                    continue

            # Check if we've reached the minimum number of turns
            if turn >= min_turns - 1:
                # Use the boolean BaseModel for end-of-conversation detection
                end_check = completion_client.completion(
                    messages=[
                        {
                            "role": "system",
                            "content": f"You are evaluating if a conversation should end based on the following criteria: {end_criteria}",
                        },
                        {
                            "role": "user",
                            "content": f"Analyze the following conversation and determine if it should end:\n\n{' '.join([msg.content for msg in conversation.messages])}",
                        },
                    ],
                    model=model,
                    response_model=ConversationEndCheck,
                    mode=mode,
                    max_retries=max_retries,
                    temperature=0.2,
                )

                if verbose:
                    logger.info(f"End check: {end_check}")
                    logger.info(f"End check explanation: {end_check.explanation}")

                if end_check.should_end:
                    if verbose:
                        logger.info("Conversation ended based on end criteria.")
                    break

                # Use the classify function to determine if the conversation should end
                classifier_result = classify(
                    inputs=" ".join([msg.content for msg in conversation.messages]),
                    labels=["end", "continue"],
                    classification="single",
                    model=model,
                    api_key=api_key,
                    base_url=base_url,
                    organization=organization,
                    mode=mode,
                    temperature=temperature,
                    client=client,
                    verbose=verbose,
                )

                if verbose:
                    logger.info(f"Classifier result: {classifier_result}")

                if isinstance(classifier_result, list):
                    classifier_result = classifier_result[0]

                if classifier_result.label == "end":
                    if verbose:
                        logger.info("Conversation ended based on classifier decision.")
                    break

        if generate_audio and audio_output_file:
            combined = AudioSegment.empty()
            for msg in conversation.messages:
                if msg.audio_file:
                    try:
                        if os.path.exists(msg.audio_file):
                            audio_segment = AudioSegment.from_mp3(msg.audio_file)
                            combined += audio_segment
                            logger.info(f"Added audio segment: {msg.audio_file}")
                        else:
                            logger.warning(f"Audio file not found: {msg.audio_file}")
                    except Exception as e:
                        logger.warning(
                            f"Error processing audio file {msg.audio_file}: {str(e)}"
                        )

            if not combined.empty():
                combined.export(audio_output_file, format="mp3")
                conversation.audio_file = audio_output_file
                logger.info(f"Exported combined audio to: {audio_output_file}")
            else:
                logger.warning("No valid audio segments found to combine.")

    return conversation