Simulating Dynamic AI Conversations with Python, Jinja2, and MongoDB
Building Realistic Multi-Persona Dialogues Using LLMs and Template-Driven Prompts
In the rapidly evolving world of AI and conversational agents, there’s increasing demand for tools that go beyond simple single-turn Q\&A. What if you could simulate multi-turn conversations between distinct personas—not just for demos, but for training, testing, or even research into agent psychology?
In this post, we’ll explore how to orchestrate a sophisticated, dynamic conversation between two AI personas using Python, Jinja2, Ollama (for LLM inference), and MongoDB for persistent history. The system supports configurable “memory windowing,” fully template-driven prompts, and seamless streaming from your chosen LLM.
Why Simulate AI-to-AI Conversation?
- Persona development: Train, tune, or evaluate LLM-based personas by watching how they interact, reason, or evolve over a dialogue.
- Prompt engineering: Experiment with context window sizes, system prompts, and response patterns to optimize results for production use.
- Research: Explore emergent behavior, bias, or self-reflection capabilities in foundation models.
Core Architecture
The basic system has three main components:
- Python Orchestrator Script: Runs the conversation loop, manages prompt/response flow, and tracks conversation history.
- Jinja2 Templating: Dynamically generates persona prompts, system messages, and context from templates, supporting highly customizable roles and instructions.
- MongoDB Persistence: Stores each turn (including conversation history and metadata) for later analysis or rehydration.
Here’s a high-level breakdown:
1. Persona-Driven Turn Loop
At its heart, the orchestrator alternates turns between two personas (e.g., “psychologist” and “AI subject”). Each turn:
- Loads the most recent N conversation turns (memory window)
- Renders a Jinja2 template to generate a context-aware prompt for the current persona
- Streams a response from the LLM (using Ollama or your API of choice)
- Appends the new message to the conversation history
- Optionally persists the state to MongoDB
This creates a feedback loop where each agent can “see” only the windowed context you allow, leading to much more natural, coherent multi-turn dialogue.
2. Jinja2 Templates for Flexible Prompts
Instead of hard-coding prompts, both personas use Jinja2 templates. These templates can:
- Summarize recent dialogue history
- Ask for more depth, reflection, or insight based on prior answers
- Allow quick swapping or A/B testing of prompt strategies
Here’s an (abridged) example:
{% if history %}
You are a compassionate expert engaged in a conversation with an AI. Here’s the latest exchange:
{% for entry in history %}
- {{ entry.role.capitalize() }}: {{ entry.content | replace('\n', ' ') }}
{% endfor %}
Reflect and ask a thoughtful, open-ended question that deepens the discussion.
{% else %}
Introduce the conversation with a gentle, reflective opener.
{% endif %}
Output ONLY your next question.
You can store these templates by persona or conversation type, enabling modularity and reuse.
3. Streaming LLM Responses
The script uses a streaming API for LLM inference, so responses can be processed in real time, displayed to the user as they’re generated, or even interrupted if needed.
def ollama_streaming_prompt(prompt, system_prompt=None, model="mistral"):
payload = { "model": model, "prompt": prompt, "stream": True }
if system_prompt: payload["system"] = system_prompt
response_text = ""
with requests.post(OLLAMA_GENERATE_URL, json=payload, stream=True) as r:
for line in r.iter_lines():
if line:
chunk = json.loads(line.decode("utf-8"))
response_text += chunk.get("response", "")
return response_text.strip()
Memory Windowing: Context Control for Realism
A standout feature is memory windowing: instead of feeding the entire history each turn (which gets unwieldy fast), the orchestrator trims context to the last N messages.
This not only keeps prompts concise (and token-efficient), but mimics human working memory, enabling fascinating emergent behavior—forgetfulness, misunderstandings, or context shifts.
def get_memory_window(history, window_size):
if window_size is None or window_size <= 0:
return history
return history[-window_size:]
Conversation Persistence with MongoDB
Every turn, the system writes out the latest history (and optional metadata) to MongoDB. This provides:
- Replayability (reload and rehydrate past conversations)
- Training data for fine-tuning or RLHF
- Auditability for analysis or debugging
Full System Flow
- Startup: Ensure the requested LLM model is available; pull if missing.
- Conversation Loop: a. Persona 1 generates a prompt from a template, using the current context window b. Send prompt to LLM and collect response c. Append to history d. Persona 2 repeats (with its own template/system prompt/context)
- Persistence: After each turn, save the updated history to MongoDB
- Repeat for the configured number of turns
How You Can Extend This Pattern
- Swap out personas: Just drop in new templates for different conversation types (therapist-client, interviewer-interviewee, teacher-student, etc.)
- Tune memory window: Experiment with short vs. long context for different effects
- Integrate other LLM APIs: This approach is API-agnostic—swap in OpenAI, Anthropic, local models, etc.
- Add analysis: Automatically tag, score, or reflect on conversation content as it evolves
Takeaways
By combining Python orchestration, Jinja2 template-driven prompt engineering, and MongoDB persistence, you can build highly customizable, multi-turn conversational simulations that are more than just toys—they’re powerful tools for agent design, training, and research.
Curious to try it out? All you need is a local or hosted LLM, MongoDB, and your favorite prompt templates. Happy experimenting—and let those conversations flow!