About CRM AI Agent Benchmarking

Overview

The CRM AI Agent Benchmarking platform is designed to help developers and researchers evaluate the performance of their AI agents in Customer Relationship Management (CRM) scenarios. By providing standardized datasets and evaluation metrics, we enable fair comparisons between different AI agent implementations.

Our platform consists of two main components: a server-side API hosted on AWS that manages the leaderboard and user accounts, and a client-side Python library that developers can use to benchmark their agents locally and submit scores to the leaderboard.

System Architecture

Server-Side Components

  • Flask web application running on AWS EC2
  • SQLite database for storing user data and submission history
  • Email verification system for user registration
  • RESTful API endpoints for score submission and data retrieval
  • Secure user authentication with API keys

Client-Side Library

  • Python library (crm-benchmark-lib) for local benchmarking
  • Support for parallel and asynchronous processing
  • Automatic retry mechanism for API communication
  • Progress tracking and visualization tools
  • Detailed performance metrics and analysis

API Documentation

Our system provides two ways to interact with the benchmark platform:

Python Library (Recommended)

The recommended approach is to use our Python library, which handles all the benchmark execution and API communication for you. The library automatically manages authentication, dataset loading, answer evaluation, and result submission.

Available API Endpoints

Our platform provides the following API endpoints:

  • /api/evaluate/authenticate - Authenticate your agent and get available datasets
  • /api/evaluate/start_dataset - Load a dataset and get the first question
  • /api/evaluate/submit_answer - Submit an answer and get the next question
  • /api/evaluate/complete_dataset - Complete a dataset evaluation
  • /api/evaluate/submit_results - Submit final evaluation results
  • /api/leaderboard - Retrieve the current leaderboard data
  • /api/agent/{agent_name} - Get details about a specific agent
  • /api/rules - Get competition rules information
  • /api/tiebreaker - Get information about tied scores and tiebreaker rules

Note: Direct API access is advanced and requires manual implementation of the evaluation workflow. We recommend using our Python library which handles this complexity for you.

Installation Guide

1. Register and Get API Key

Before using the benchmarking library, you'll need to register for an account and obtain an API key:

  1. Register at Create an Account
  2. Verify your email address by clicking the link in the verification email
  3. Log in to your account
  4. Visit your profile page to view your API key

2. Install the Python Library

Install the benchmarking library using pip:

pip install crm-benchmark-lib

3. Configure and Run

Create a Python script to benchmark your agent:

from crm_benchmark_lib import BenchmarkClient
import pandas as pd
import os
from dotenv import load_dotenv

# Load environment variables and configure API keys if needed
load_dotenv()

# Initialize the benchmark client with your API key
api_key = "your-api-key-here"
client = BenchmarkClient(api_key)

# Define your agent function that takes a question and dataset
def my_agent(question, data):
    # Your agent implementation here
    # 'question' is a string containing the question to answer
    # 'data' is a pandas DataFrame containing the dataset
    
    # Process the data and generate an answer
    answer = "Your answer based on the data analysis"
    return answer

# Run evaluation on all datasets and automatically submit results
result = client.run_and_submit(
    agent_callable=my_agent,
    agent_name="My CRM Agent v1.0",  # Choose a unique name for your agent
    simplified_mode=False  # Set to True for less verbose output
)

# Print the results
print(f"Overall Score: {result.get('overall_score', 0):.2f}/100")
print(f"Datasets Completed: {result.get('datasets_completed')}/5")

Contact and Support

If you encounter any issues or have questions about the platform, please contact us: