Reliably Testing Voice Agents: Validating and Optimizing Famulor Assistants with Cledon

The productive use of voice agents places high demands on quality, reliability, and scalability. Even small changes to prompts, tool calls, or configurations can have unexpected effects on real conversations. This is precisely where the integration of Famulor and Cledon comes in: it allows for systematic, reproducible, and realistic testing of voice agents—from pure text logic to complete voice-to-voice telephony.

In this post, we'll explain in detail how the entire process works, which testing strategies are effective, and what value you can gain from it for the development and operation of your assistants.

Why Automated Testing is Crucial for Voice Agents

Voice agents are complex systems. They consist not just of an LLM prompt, but of several interconnected components that must work together perfectly to enable a natural and effective conversation:

Speech-to-Text (STT): The precise recognition of the caller's spoken language, including various accents and potential background noise.
LLM Logic: The heart of the assistant, responsible for decision-making, dialogue management, and recognizing when an external tool is needed.
Business Logic & APIs: The connection to third-party systems via tool calls, for example, to book an appointment in a calendar, retrieve customer data from a CRM, or check the status of an order.
Text-to-Speech (TTS): The conversion of the generated response into a natural-sounding voice, including correct timing, emphasis, and pauses.

Without a structured and automated testing process, significant risks arise that can jeopardize the success of your project:

Regressions: Changes in one area (e.g., a prompt adjustment) can unknowingly have a negative impact on other, already functioning use cases.
Errors in Live Operation: Issues like incorrect API responses or logical loops in the dialogue only appear during interactions with real customers, leading to frustration and, in the worst case, the loss of leads or customers.
Lack of Objectivity: Without measurable criteria, the quality of a voice agent cannot be objectively assessed or compared. Was the change truly an improvement? Manual testing by a few people only provides a subjective gut feeling.
Scalability Issues: Manual tests are time-consuming and not scalable. It is impossible to manually check hundreds of conversation variants before a new version goes live.

The combination of the flexible voice AI platform Famulor with the specialized testing framework Cledon provides a professional solution and establishes a robust quality assurance process.

The Anatomy of a Famulor Voice Agent: What is Actually Being Tested?

To understand the necessity of testing, one must look at the individual layers of a Famulor assistant. On the Famulor platform, you can fine-tune every aspect of your agent, which also means that each of these components must be validated.

1. The Conversational Logic (Prompt & Flow)

The core is often the system prompt or a visual flow in the Famulor Flow Builder. This is where the agent's behavior is defined. What needs to be tested:

Goal Achievement: Does the agent reliably guide the caller to their goal (e.g., booking an appointment, lead qualification)?
Dialogue Management: Does the agent ask the right questions? Can it handle unexpected answers or interruptions?
Robustness: What happens if the caller digresses or provides irrelevant information? Does the agent get back on track?

2. Integrations and Tool Calls

A voice agent that cannot perform actions is just a conversational partner. The real value comes from deep integrations into your business processes. Tests must ensure:

Correct Data Transfer: Is the information extracted by the agent (name, date, concern) correctly passed to the API?
Error Handling: How does the agent react if a connected API (e.g., your CRM) is unavailable or returns an error? Does it inform the caller clearly?
Data Processing: Does the agent understand the API's response and can it correctly integrate it into its spoken reply (e.g., "The next available appointment is on...")?

3. The Auditory Layer (STT & TTS)

This is about the listening experience and recognition accuracy in a real phone call.

Latency: How quickly does the agent respond? Pauses that are too long lead to unnatural conversations.
Intelligibility: Is the TTS voice clear and distinct? Are technical terms or names pronounced correctly?
Recognition Accuracy: How well does the STT model work with different dialects, noisy environments, or poor connection quality?

Step-by-Step: Testing a Famulor Voice Agent with Cledon

The combination of Famulor and Cledon enables a continuous testing process that covers all the aspects mentioned above. Cledon acts as an automated caller that runs predefined conversation scenarios and logs the Famulor agent's responses.

Step 1: Define Test Strategy and Test Cases

Before the first test call is initiated, you need a clear strategy. Define the most important use cases (user stories) and derive concrete test cases from them. A good test case always describes the initial state, the actions to be performed, and the expected outcome.

Examples of Test Cases:

Happy Path: The caller wants to book an appointment for next week, provides all data correctly, and the appointment is successfully entered into the calendar.
Edge Case (Correction): The caller states a date but then corrects themselves. Does the agent recognize the correction and use the right date?
Negative Test (Invalid Data): The caller tries to book an appointment on a Sunday, although business hours do not permit this. Does the agent politely decline and suggest an alternative?
Integration Test (API Error): The calendar service is unavailable. Does the agent inform the caller that booking is currently not possible and offer to forward the request manually?

Step 2: Configuration in Famulor and Cledon

The technical setup is straightforward. In Famulor, you create your voice agent as usual and assign it a phone number. In Cledon, you create a new test project and set the Famulor agent's phone number as the target.

Next, you create your test cases in Cledon. This can be done in two ways:

Text-Based Tests: You write the dialogue from the caller's perspective as text. Cledon then uses its own TTS voice to simulate the call. This is excellent for testing the LLM logic and tool calls.
Audio-Based Tests: You upload predefined audio files. This is ideal for testing the STT component under real-world conditions, e.g., with recordings that include background noise or different accents.

For each test step, you define in Cledon what response you expect from the Famulor agent. This can be a specific sentence or the execution of a tool call, the success of which you verify.

Step 3: Execution – From Smoke Tests to Regression Tests

With the prepared test suite, you can now perform various types of tests:

Smoke Test: After every small change to the prompt or configuration, run a small selection of the most important test cases to ensure that basic functionality is still intact.
Functional Tests: You test a specific functional area (e.g., everything related to appointment booking) with all associated test cases.
Regression Test: Before a new version of the agent goes live, you run the entire test suite. Cledon can conduct hundreds of calls in parallel and provides you with a complete picture of your agent's status within minutes.

Step 4: Analysis, Optimization, and the Feedback Loop

After each test run, Cledon provides a detailed report. You can see exactly which test cases passed and which failed. For each failed test, you get the full transcript and can identify the exact point where the dialogue deviated from the expected path.

This data-driven approach enables a rapid optimization cycle:

Identify Error: The test report shows that the agent often misunderstands the request for an email address.
Formulate Hypothesis: "The phrasing of the question might be unclear."
Make Change in Famulor Assistant: You adjust the prompt to phrase the question more clearly. For example: "Could you please spell out your email address for me?"
Start a New Test Run: You run the same test case again in Cledon.
Validate Result: The test is now successful. The change has fixed the problem without causing new errors.

This process transforms optimization from guesswork into a scientific, measurable procedure and ensures that the quality of your voice agent continuously improves.

Best Practices for Testing Voice Agents

To get the most out of the combination of Famulor and Cledon, you should follow these best practices:

Test more than just the "Happy Path": Most errors lurk in the unexpected branches of a conversation.
Automate your regression tests: Automatically run your entire test suite before every release. This is your most important insurance against unnoticed bugs.
Version your prompts and flows: Treat your configuration like code. If a test fails, you can easily revert to a previous, working version.
Measure latency: An agent that takes too long to respond will not be accepted. Define clear thresholds for response times.
Start testing early: Integrate testing into your development process from the very beginning, not just before going live.

Conclusion: Quality as a Competitive Advantage Through Professional Testing

Building an impressive voice agent is one thing. Ensuring that it functions reliably, robustly, and flawlessly under real-world conditions is another—and often the more critical part. Manual testing quickly reaches its limits and can no longer cover the complexity of modern AI systems.

The integration of Famulor's flexible AI platform with a professional testing framework like Cledon offers the solution. It enables developers and companies to establish a systematic, automated, and measurable quality assurance process. Instead of hoping for the best in live operation, you can validate changes, objectively measure performance, and guarantee a consistently high quality of service.

Ultimately, investing in automated testing is an investment in customer satisfaction and the success of your business. A voice agent that your customers trust because it simply works is not a cost factor, but an invaluable competitive advantage.

Are you ready to take the quality of your voice agents to the next level? Discover the possibilities of the Famulor platform and learn how to develop robust, reliable, and scalable AI assistants. Contact us for a demo!

Frequently Asked Questions (FAQ)

What is the main advantage of combining Famulor and Cledon?

The main advantage lies in end-to-end quality assurance. While Famulor enables the creation of highly flexible and powerful voice agents, Cledon provides the tools to validate their behavior automatically, reproducibly, and under realistic conditions before they interact with customers.

Can I test my Famulor assistants without Cledon?

Yes, you can perform tests manually via calls or using the testing tools integrated into Famulor. This is ideal for quick, individual checks. However, Cledon offers automation, scalability, and systematic regression testing for hundreds of scenarios, which is essential for professional and critical applications.

What types of errors are typically found when testing voice agents?

Typical errors include logical fallacies in the dialogue flow, failed or misinterpreted API calls (tool calls), inaccurate speech recognition (STT) for specific terms or accents, unnatural pauses due to high latency, and inconsistent handling of unexpected user inputs.

How much effort is required to set up an automated testing process?

The initial setup requires defining and creating the test cases, which involves some effort. However, once this test suite exists, running the tests is as simple as clicking a button. The long-term benefits in time savings and avoided errors in live operation far outweigh the initial effort.