AI Agents Challenge - B2B Sales Challenge

About the Benchmark

What is the CRM AI Agent Benchmark?

The CRM AI Agent Benchmark is a standardized evaluation framework designed to test AI assistants on their ability to analyze CRM data, understand sales patterns, and provide actionable insights to sales professionals. It helps compare different AI approaches across standardized datasets and questions.

What types of datasets are used in the benchmark?

The benchmark uses five distinct datasets, each focusing on different aspects of CRM analysis:

Pipeline Insights: Analysis of sales pipeline metrics
Email Analysis: Understanding customer communications
Sales Records: Historical sales data analysis
Rule Compliance: Identifying adherence to sales policies
Performance Trends: Evaluating sales rep performance

How are agents scored in the benchmark?

Agents receive scores based on the accuracy and relevance of their responses across all five datasets. Each dataset contributes to the overall score, and individual dataset scores are also recorded to identify specific strengths and weaknesses.

Participation & Submission

How do I register to participate?

To participate, create an account on this website by clicking the "Register" button in the navigation menu. After registration, you'll receive an API key that you can use to submit your agent's scores.

How do I submit my agent's scores?

After registering, you can use the API key displayed on your profile page to submit your agent's scores via the API endpoint. Detailed submission instructions are available on your profile page after logging in.

Can I submit multiple agents?

Yes, you can submit scores for multiple different agents using the same API key. Each agent will be displayed separately on the leaderboard with its own performance metrics.

Can I update my agent's scores?

Yes, you can submit updated scores for your agents as you improve them. The leaderboard will display the most recent submission for each agent.

Leaderboard & Results

How is the leaderboard calculated?

The leaderboard ranks agents based on their overall score across all five datasets. The score is calculated as a weighted average, with each dataset contributing equally to the final score.

How often is the leaderboard updated?

The leaderboard is updated in real-time as new submissions are received through the API.

What information is shown for each agent?

Each agent entry on the leaderboard displays the agent name, overall score, and individual scores for each of the five datasets. You can click on an agent's name to view more detailed performance information.

Technical Questions

What API format is used for submissions?

Submissions are made via a POST request to the API endpoint with your API key for authentication. The submission should include your agent's name and scores for each dataset in JSON format.

What technologies does this benchmark site use?

This site is built with Flask (Python web framework), SQLite for the database, and uses Tailwind CSS for styling. The entire platform is designed to be lightweight and easy to use.

Is there a GitHub repository for this benchmark?

Yes, you can find the benchmark code, datasets, and evaluation scripts on GitHub. The link is available in the footer of this website.

Additional Help

I'm having trouble with my submission, who can I contact?

If you're experiencing technical issues or have questions about the benchmark, please visit the About page's contact section or reach out via GitHub.

How can I contribute to improving the benchmark?

We welcome contributions and feedback to improve the benchmark. You can suggest improvements, report issues, or contribute code through the GitHub repository.

Will there be future versions of this benchmark?

Yes, this is "Chapter 1: Sales Challenge" of the AI Agents Challenge. Future chapters will focus on different aspects of AI agent capabilities in various business contexts.

Frequently Asked Questions