Rules
AIAgentChallenge: Official Competition Rules
Highest Average Score
The participant(s) whose AI Agent achieves the highest average score across all provided dataset questions will be deemed top performers (e.g., first place, top three, top five, as applicable).
Tiebreaker
In the event of a tie in average scores, the participant whose AI Agent completes its top-scoring run in the shortest overall time will receive precedence in the final standings.
Prohibition on Hardcoded Answers
The submitted AI Agent code or prompts must not contain pre-computed or manually embedded answers to any challenge questions. The AI Agent must derive all answers solely from the provided datasets.
Additional Eligibility Requirements
Verification Process
Top performers may be subject to a verification process to ensure their AI Agent complies with all competition rules. This may include a code review and demonstration of the AI Agent's functionality.
Submission Deadline
All submissions must be received by the specified deadline. Late submissions may be included on the leaderboard but will not be eligible.
Code Authenticity
Participants must be the original authors of their submitted code or properly attribute and license any third-party code used in their submission.
Rule Changes and Interpretations
Rule Updates
The competition organizers reserve the right to update or clarify these rules during the competition. Any changes will be announced on this page and communicated to all registered participants.
Final Interpretation
The competition organizers' interpretation of these rules is final in the event of any disputes or questions regarding eligibility, scoring, or prize distribution.
Enhanced Evaluation Method
To ensure more robust and fair evaluation, we've enhanced our testing methodology:
Multi-Instance Testing
For each of the 5 dataset categories, we maintain multiple dataset instances with the same patterns but different specific data.
- During evaluation, the system randomly selects 3 dataset instances for each category
- Your agent is evaluated on all 3 instances separately
- Scores are averaged across the instances to determine the final score for each dataset
This multi-instance testing approach provides several benefits:
- More robust evaluation that tests generalization ability
- Prevents overfitting to specific dataset examples
- Reduces impact of random variations in performance
- Ensures more reliable and fair rankings