Rules

AIAgentChallenge: Official Competition Rules

Highest Average Score

The participant(s) whose AI Agent achieves the highest average score across all provided dataset questions will be deemed top performers (e.g., first place, top three, top five, as applicable).

Tiebreaker

In the event of a tie in average scores, the participant whose AI Agent completes its top-scoring run in the shortest overall time will receive precedence in the final standings.

Prohibition on Hardcoded Answers

The submitted AI Agent code or prompts must not contain pre-computed or manually embedded answers to any challenge questions. The AI Agent must derive all answers solely from the provided datasets.

Additional Eligibility Requirements

Verification Process

Top performers may be subject to a verification process to ensure their AI Agent complies with all competition rules. This may include a code review and demonstration of the AI Agent's functionality.

Submission Deadline

All submissions must be received by the specified deadline. Late submissions may be included on the leaderboard but will not be eligible.

Code Authenticity

Participants must be the original authors of their submitted code or properly attribute and license any third-party code used in their submission.

Rule Changes and Interpretations

Rule Updates

The competition organizers reserve the right to update or clarify these rules during the competition. Any changes will be announced on this page and communicated to all registered participants.

Final Interpretation

The competition organizers' interpretation of these rules is final in the event of any disputes or questions regarding eligibility, scoring, or prize distribution.

Enhanced Evaluation Method

To ensure more robust and fair evaluation, we've enhanced our testing methodology:

Multi-Instance Testing

For each of the 5 dataset categories, we maintain multiple dataset instances with the same patterns but different specific data.

  • During evaluation, the system randomly selects 3 dataset instances for each category
  • Your agent is evaluated on all 3 instances separately
  • Scores are averaged across the instances to determine the final score for each dataset

This multi-instance testing approach provides several benefits:

  • More robust evaluation that tests generalization ability
  • Prevents overfitting to specific dataset examples
  • Reduces impact of random variations in performance
  • Ensures more reliable and fair rankings