Simulating Dynamic AI Conversations with Python, Jinja2, and MongoDB

Building Realistic Multi-Persona Dialogues Using LLMs and Template-Driven Prompts

In the rapidly evolving world of AI and conversational agents, there’s increasing demand for tools that go beyond simple single-turn Q\&A. What if you could simulate multi-turn conversations between distinct personas—not just for demos, but for training, testing, or even research into agent psychology?

In this post, we’ll explore how to orchestrate a sophisticated, dynamic conversation between two AI personas using Python, Jinja2, Ollama (for LLM inference), and MongoDB for persistent history. The system supports configurable “memory windowing,” fully template-driven prompts, and seamless streaming from your chosen LLM.

Why Simulate AI-to-AI Conversation?

Persona development: Train, tune, or evaluate LLM-based personas by watching how they interact, reason, or evolve over a dialogue.
Prompt engineering: Experiment with context window sizes, system prompts, and response patterns to optimize results for production use.
Research: Explore emergent behavior, bias, or self-reflection capabilities in foundation models.

Core Architecture

The basic system has three main components:

Python Orchestrator Script: Runs the conversation loop, manages prompt/response flow, and tracks conversation history.
Jinja2 Templating: Dynamically generates persona prompts, system messages, and context from templates, supporting highly customizable roles and instructions.
MongoDB Persistence: Stores each turn (including conversation history and metadata) for later analysis or rehydration.

Here’s a high-level breakdown:

1. Persona-Driven Turn Loop

At its heart, the orchestrator alternates turns between two personas (e.g., “psychologist” and “AI subject”). Each turn:

Loads the most recent N conversation turns (memory window)
Renders a Jinja2 template to generate a context-aware prompt for the current persona
Streams a response from the LLM (using Ollama or your API of choice)
Appends the new message to the conversation history
Optionally persists the state to MongoDB

This creates a feedback loop where each agent can “see” only the windowed context you allow, leading to much more natural, coherent multi-turn dialogue.

2. Jinja2 Templates for Flexible Prompts

Instead of hard-coding prompts, both personas use Jinja2 templates. These templates can:

Summarize recent dialogue history
Ask for more depth, reflection, or insight based on prior answers
Allow quick swapping or A/B testing of prompt strategies

Here’s an (abridged) example:

{% if history %}
You are a compassionate expert engaged in a conversation with an AI. Here’s the latest exchange:
{% for entry in history %}
- {{ entry.role.capitalize() }}: {{ entry.content | replace('\n', ' ') }}
{% endfor %}

Reflect and ask a thoughtful, open-ended question that deepens the discussion.
{% else %}
Introduce the conversation with a gentle, reflective opener.
{% endif %}
Output ONLY your next question.

You can store these templates by persona or conversation type, enabling modularity and reuse.

3. Streaming LLM Responses

The script uses a streaming API for LLM inference, so responses can be processed in real time, displayed to the user as they’re generated, or even interrupted if needed.

def ollama_streaming_prompt(prompt, system_prompt=None, model="mistral"):
    payload = { "model": model, "prompt": prompt, "stream": True }
    if system_prompt: payload["system"] = system_prompt
    response_text = ""
    with requests.post(OLLAMA_GENERATE_URL, json=payload, stream=True) as r:
        for line in r.iter_lines():
            if line:
                chunk = json.loads(line.decode("utf-8"))
                response_text += chunk.get("response", "")
    return response_text.strip()

Memory Windowing: Context Control for Realism

A standout feature is memory windowing: instead of feeding the entire history each turn (which gets unwieldy fast), the orchestrator trims context to the last N messages.

This not only keeps prompts concise (and token-efficient), but mimics human working memory, enabling fascinating emergent behavior—forgetfulness, misunderstandings, or context shifts.

def get_memory_window(history, window_size):
    if window_size is None or window_size <= 0:
        return history
    return history[-window_size:]

Conversation Persistence with MongoDB

Every turn, the system writes out the latest history (and optional metadata) to MongoDB. This provides:

Replayability (reload and rehydrate past conversations)
Training data for fine-tuning or RLHF
Auditability for analysis or debugging

Full System Flow

Startup: Ensure the requested LLM model is available; pull if missing.
Conversation Loop: a. Persona 1 generates a prompt from a template, using the current context window b. Send prompt to LLM and collect response c. Append to history d. Persona 2 repeats (with its own template/system prompt/context)
Persistence: After each turn, save the updated history to MongoDB
Repeat for the configured number of turns

How You Can Extend This Pattern

Swap out personas: Just drop in new templates for different conversation types (therapist-client, interviewer-interviewee, teacher-student, etc.)
Tune memory window: Experiment with short vs. long context for different effects
Integrate other LLM APIs: This approach is API-agnostic—swap in OpenAI, Anthropic, local models, etc.
Add analysis: Automatically tag, score, or reflect on conversation content as it evolves

Takeaways

By combining Python orchestration, Jinja2 template-driven prompt engineering, and MongoDB persistence, you can build highly customizable, multi-turn conversational simulations that are more than just toys—they’re powerful tools for agent design, training, and research.

Curious to try it out? All you need is a local or hosted LLM, MongoDB, and your favorite prompt templates. Happy experimenting—and let those conversations flow!