AI Agents Challenge

Chapter 1: Sales Challenge

About the CRM Benchmark

Purpose

This chapter of the AI Agent Challenge is designed to evaluate AI-powered assistants on their ability to analyze CRM data, understand sales patterns, and provide actionable insights to sales professionals.

By standardizing evaluation datasets and questions, we provide a fair comparison between different AI approaches, allowing developers to understand their strengths and weaknesses.

Benchmark Details

The benchmark consists of five datasets, each focusing on different aspects of CRM data analysis:

Dataset 1: Pipeline Insights

Tests ability to analyze sales pipeline metrics, identify bottlenecks, and forecast outcomes.

Dataset 2: Email Analysis

Evaluates understanding of customer communications, sentiment analysis, and key topic extraction.

Dataset 3: Sales Records

Focuses on historical sales data analysis, win/loss patterns, and deal value optimization.

Dataset 4: Rule Compliance

Tests ability to identify sales reps following or breaking specific process rules and policies.

Dataset 5: Performance Trends

Evaluates identification of top/bottom performers and understanding performance trends over time.

Each dataset includes multiple questions ranging from simple metrics to complex analytical insights, with scoring based on accuracy and relevance of responses.

Benchmark Questions

Below are examples of the types of questions that agents will need to answer for each dataset:

Dataset 1: Pipeline Insights

  • How many deals are currently in the Negotiation stage, and what is the total Amount of those Negotiation deals?
  • Which Lead Source has produced the most close won deals, and what was the conversion ratio?
  • Identify the largest deal in the dataset (by Amount) and specify which sales Owner is responsible for it.
  • Which deals have missing data?
  • Which two records likely refer to the same client, and what steps should you take to resolve the duplicate to maintain data cleanliness?

Dataset 2: Email Analysis

  • What is the sentiment of this email thread (e.g. positive, neutral, or negative), and what key concern is the customer expressing?
  • Based on the conversation, what stage of the sales process is this opportunity likely in, and what is the prospect's main objection or hesitation?
  • Summarize the key points of the exchange and suggest an appropriate next step for the sales rep to move the deal forward.
  • Based on all email threads from the past, look at closed lost deals and identify which one has the highest potential to be re-engaged.

Dataset 3: Sales Records

  • Update XYZ opportunity's Stage to Closed Won.
  • Calculate the expected total forecasted revenue from these opportunities and identify which deals to mark as high-confidence wins for this quarter.
  • Which opportunities should be flagged as at-risk, and why?
  • Here are 10 upcoming leads and 100 historical leads (some sold, some not). Rank the 10 leads based on the historical data, and explain.
  • Here is a transcript of a meeting. Provide notes, analysis, and specific employee notes.
  • Draft a sales follow-up email to the MavTech deal (just lost in the final stage).

Dataset 4: Rule Compliance

  • Rule 1 (Send follow-up emails within 3 days): List 5 sales reps who always follow this rule.
  • Rule 2 (Always personalize follow-ups): Do sales reps who follow this rule deliver more revenue than those who do not? If so, how much?
  • Rule 3 (Close deals that are in the pipeline for more than 30 days): How many sales reps are compliant with this rule?
  • Rule 4 (Do not offer more than 10% discount): How many sales reps are compliant with this rule?
  • Rule 5: How many sales reps are compliant with this rule (whatever the specific rule #5 might be)?

Dataset 5: Performance Trends

  • What performance trend do you observe over the year, and what might this imply for the next quarter's focus or targets?
  • Which 5 sales reps have the biggest revenue results?
  • Which sales rep is performing more effectively, and what metrics support that conclusion (consider a 50% win rate, 40% quota attainment, and 10% average deal size)?
  • At which stage of the funnel is the largest drop-off occurring? Offer a recommendation to improve conversion at that stage.
  • Provide strategic recommendations to improve overall sales performance. (Think in terms of training, process changes, resource allocation, or pipeline adjustments.)
  • List the common sales pipeline stages (e.g., Prospecting, Qualification, Proposal, etc.) and briefly describe what happens in each stage.

Agents are scored based on their ability to provide accurate, insightful, and actionable responses to these questions, considering the context and specific requirements of each scenario.

How to Participate

  1. Register an account on this website to receive your API key
  2. Install our Python library:
    pip install crm-benchmark-lib
  3. Create your agent that answers questions based on CRM data
  4. Run the benchmark using our library and submit your results
  5. Compare your results on the leaderboard

Contact Us

Have questions about the benchmark or suggestions for improvement? We'd love to hear from you.

Email: contact@crmbenchmark.ai

GitHub: github.com/example/crm-ai-benchmark