EXACT 2026

The 2nd International XAI Challenge for Transparent Educational Question-Answering

IEEE IJCNN 2026 Competition

Present and publish your work at CSoNet 2026

Competition is active — 33 day(s) remaining

Live Participation

LIVE

Real-time snapshot of registered teams from around the world. Updated automatically as new registrations come in.

0
Teams
0
Participants
0
Affiliations
0
Countries

Top Affiliations

  • Loading…

By Country

  • Loading…

 

Introduction

Large Language Models (LLMs) have demonstrated impressive abilities in question-answering systems, including those applied in educational contexts. However, these models often generate only brief, single-answer responses without providing any reasoning or explanatory detail. In educational environments, this lack of transparency poses significant challenges:

Quality of Answers

When an answer is incorrect, we cannot trace or understand the source of the error. When it is correct, we have no visibility into how it was derived or whether it was properly verified.

Complex Reasoning

Educational domains often involve intricate rules, policies, and multi-step logical reasoning, which can be challenging for purely data-driven LLMs to handle reliably.

One promising direction within Explainable AI (XAI) is Symbolic Reasoning, where a Symbolic Engine, either standalone or integrated with an LLM, makes the reasoning process explicit. However, many other approaches can also achieve transparency. This challenge invites any method that enhances both the accuracy and interpretability of educational Question Answering (QA) systems, making them more suitable for verifiable use in learning environments.

Example Queries

The following examples illustrate the types of educational queries this challenge addresses, along with expected transparent responses:

Query Expected Response
This semester, I scored 8 points on the final exam for the DSA course. However, I was absent for the lab exam. Can I still get a B in this course? No. Because you missed the lab exam, you received a score of 0 for lab work. According to Regulation #13 of X University, a student with 0 lab points cannot pass the course.
Calculate the equivalent resistance of the following circuit, given that each resistor has a resistance of r. Since the resistors are connected together at both ends, the circuit can be redrawn to show that the three resistors are connected in parallel. Therefore, the equivalent resistance is: R = r / 3

Challenge Objectives

The primary goal of this challenge is to build educational QA systems that not only produce accurate answers but also provide clear, verifiable reasoning for how those answers were derived. Specifically, we seek to:

  • Encourage (but not require) the use of symbolic reasoning tools such as Z3, custom solvers, or other logic-based engines alongside LLMs
  • Extend XAI research into STEM domains such as physics (electric circuits)
  • Provide benchmark datasets and evaluation frameworks to support future developments in explainable AI for education

What You'll Build

Participating teams will develop systems that:

  • Provide correct final answers to educational queries
  • Generate natural language explanations that justify each answer
  • Optionally provide additional supporting evidence, such as First-Order Logic (FOL) derivations, Chain-of-Thought (CoT) reasoning, premise lists, or other structured proofs, to strengthen the system's reasoning depth
  • Use any approach, including symbolic reasoning, neurosymbolic methods, fine-tuned LLMs, or any combination, as long as the system can explain how it arrived at each answer

Challenge Chairs

The quality and fairness of this challenge are ensured by an international committee of professors and experts:

Competition Rules

Who Can Participate

This competition is open to everyone: high school students, university students, working professionals, and researchers worldwide. There is no restriction on age, nationality, or affiliation, except that members of the URA Research Group (the organizing team) are not eligible to participate.

Rules

All participating teams must adhere to the following rules throughout the competition:

DO

Provide Explainable Answers

Every generated answer must be accompanied by a natural language explanation that justifies how the answer was derived. The explanation should be concise, interpretable, and verifiable.

Encouraged: Use a Symbolic Engine

Teams are encouraged to incorporate symbolic reasoning (e.g., Z3 Solver or a custom-built engine) to verify and explain answers. However, this is not mandatory. Any approach that produces explainable results is accepted.

Use Open-Source LLMs

All LLMs used in the system must be open-source and have 8 billion parameters or fewer. This applies to any LLM component, whether used for answer generation, reasoning, or Natural Language to Logic conversion.

DO NOT

Use Closed-Source Models

The use of commercial or closed-source LLMs (e.g., GPT, Claude, Gemini) is strictly prohibited. Submissions that rely on closed-source models will be disqualified.

Hide External Data Sources

All external datasets used for fine-tuning LLMs or Symbolic Engines must be fully disclosed. Failure to disclose external data usage will result in disqualification.

Datasets

The official datasets will be released at the kick-off workshop. Two dataset types will be provided, covering logical reasoning in educational regulations and physics problem-solving. The input provided to each team's system depends on the dataset type (see details below). All other fields (FOL, CoT, explanations, etc.) shown in the samples are reference annotations provided in the training data only, which teams can use as templates for building their own reasoning pipelines.

Dataset Type 1: Logic-Based Educational Queries

This dataset contains 464 records with a total of 913 questions designed to evaluate logical reasoning in educational contexts. Topics cover university regulations such as grading policies, course enrollment rules, scholarship criteria, and academic requirements. Question types include Multiple Choice, Yes/No/Uncertain, and open-ended queries. Each record includes a set of premises in both natural language and FOL, along with derived questions, ground-truth answers, and human-written explanations. During evaluation, the system receives the question together with the natural language premises (premises-NL) as input. Teams are free to use the premises in any way (e.g., as prompt context, for FOL conversion, etc.).

{
  "premises-NL": [
    "If a curriculum is well-structured and has exercises, it enhances student engagement.",
    "If a curriculum enhances student engagement and provides access to advanced resources, it enhances critical thinking.",
    "If a faculty prioritizes pedagogical training and curriculum development, the curriculum is well-structured.",
    "The faculty prioritizes pedagogical training and curriculum development.",
    "The curriculum has practical exercises.",
    "The curriculum provides access to advanced resources."
  ],
  "premises-FOL": [
    "ForAll(c, (well_structured(c) ∧ has_exercises(c)) → enhances_engagement(c))",
    "ForAll(c, (enhances_engagement(c) ∧ advanced_resources(c)) → enhances_critical_thinking(c))",
    "..."
  ],
  "questions": [
    "Based on the premises, what can we conclude about the curriculum?\nA. It enhances student engagement but not critical thinking\nB. It enhances critical thinking\nC. It needs more resources to enhance critical thinking\nD. It is well-structured but lacks exercises",
    "Does the combination of faculty priorities and curriculum features lead to enhanced critical thinking?"
  ],
  "answers": ["B", "Yes"],
  "explanation": [
    "Premise 4 and premise 3 confirm the curriculum is well-structured. Premise 5 provides exercises, so premise 1 implies enhanced engagement. Premise 6 adds advanced resources, and premise 2 confirms enhanced critical thinking, supporting option B.",
    "Faculty priorities satisfy premise 3, making the curriculum well-structured. Exercises (premise 5) and premise 1 lead to enhanced engagement, and with advanced resources (premise 6), premise 2 confirms enhanced critical thinking."
  ]
}

Dataset Type 2: Physics Problems

This dataset contains 5,520 text-based physics problems focusing on electric circuits and electrostatics. Topics include resistance, voltage, current, power, capacitance, electric fields, and energy calculations. Questions are numerical, requiring multi-step computation. Each problem comes with step-by-step CoT reasoning and a final numerical answer with its unit. During evaluation, the system receives only the question as input. The source materials (textbooks, knowledge references) used to construct this dataset will be announced at the kick-off workshop.

{
  "id": "TD401",
  "question": "Calculate the energy stored in capacitor C when C = 100 μF and U = 30 V.",
  "cot": "Step 1: Identify the given values for capacitance (C) and voltage (U).\nStep 2: Recall the formula for energy: E = 0.5 * C * U^2.\nStep 3: Convert capacitance to Farads: C = 100 μF = 1 × 10^-4 F.\nStep 4: Substitute: E = 0.5 × (1 × 10^-4) × (30)^2.",
  "answer": "45",
  "unit": "J"
}

Evaluation Criteria

Submissions are assessed across three dimensions: correctness, explanation quality, and reasoning depth:

Criterion Description
P1: Correctness of Answers Generating accurate and precise answers for the given queries
P2: Quality of Explanation Providing a clear, coherent natural language explanation that justifies the answer
P3: Depth of Reasoning Demonstrating strong reasoning capabilities through additional supporting evidence, such as FOL derivations, CoT steps, premise identification, or other structured proofs

Test Format

The official test set will combine both dataset types into a single unified set. For Type 1 queries, the system receives the question along with natural language premises; for Type 2 queries, the system receives the question only. Questions will include multiple-choice, Yes/No/Uncertain, open-ended reasoning, and numerical computation problems. The topic distribution (percentage of each dataset type) will be announced at the kick-off workshop.

Evaluation Process

  • Phase 1 & 2 (Selection): Submissions are first scored automatically against ground-truth answers, then reviewed by the organizing committee for explanation quality.
  • Final Round: Top teams run their systems live on unseen queries. The Challenge Chairs evaluate each team's answers, explanations, and reasoning depth in real time. Teams demonstrating stronger reasoning capabilities will be ranked higher.
  • Final Score: The final score will be computed as a weighted combination of P1, P2, and P3. Specific weights will be published with the official dataset release.

Submission Requirements

Each team must submit an API endpoint along with a brief solution description (1 page) detailing their approach, models used, and the dataset used for training. For each query, the API must return the required fields below. Teams are encouraged to include optional fields that demonstrate the depth of their system's reasoning. Richer evidence will have an advantage in the evaluation, particularly in the final round where the Challenge Chairs assess reasoning depth live.

{
  // Required (Mandatory)
  "answer": "B",
  "explanation": "The voltage across R2 is calculated using ...",

  // Optional (Encouraged)
  "fol": "∀x (Resistor(x) → HasVoltage(x, V))",
  "cot": [
    "Step 1: Identify the circuit topology ...",
    "Step 2: Apply Kirchhoff's voltage law ...",
    "Step 3: Solve for the unknown voltage ..."
  ],
  "premises": [
    "Ohm's law: V = IR",
    "KVL: sum of voltages in a loop = 0"
  ],
  "confidence": 0.92
}

Note: answer and explanation are mandatory for every submission. All other fields (fol, cot, premises, confidence) are optional but encouraged, as they contribute to higher scores in the reasoning depth evaluation. The final submission format will be finalized at the kick-off workshop.

Timeline

The competition is structured into the following phases. Click on each phase to see more details.

# Phase Date
1
Team Registration Period

Teams of up to 5 members register through the official Google Form. Individual participants may request to be matched with a team by the organizers.

Apr 10 – May 10, 2026
2
Kick-off & Training Dataset Release

The competition is officially launched with an online workshop. Both training datasets are released alongside submission guidelines, evaluation criteria, and the required API format.

May 4 May 9, 2026
3
Main Competition Phase

Teams develop their explainable QA systems using the released training data. Each submission consists of an API endpoint and a one-page solution description. Multiple submissions are allowed; only the latest is evaluated.

May 5 – May 30, 2026
4
Phase 1 Evaluation Results

Submissions are automatically scored against ground-truth answers for correctness and explanation quality. Each team receives detailed scores and feedback from the organizing committee.

Jun 1 – Jun 2, 2026
5
Model Refinement Period

Based on Phase 1 feedback, teams may refine their systems and re-submit updated API endpoints. This is the final opportunity to improve models before the second evaluation round.

Jun 3 – Jun 4, 2026
6
Phase 2 Evaluation Results

Refined submissions undergo a second evaluation covering correctness, explanation quality, and reasoning depth. Results from both rounds are combined to produce the preliminary rankings.

Jun 5 – Jun 7, 2026
7
Final Ranking Announcement

Preliminary rankings from Phase 1 and Phase 2 are published. The top 10 teams are announced and invited to participate in the Public Test Day.

Jun 10, 2026
8
Public Test Day, Solution Presentations & Final Result Release

The top 10 teams run their systems live on unseen queries before the Challenge Chairs. Each team presents their approach and reasoning pipeline while the Chairs assess answer quality, explanations, and reasoning depth in real time. Final official results are released at the end of the session.

Jun 15, 2026
9
Paper Submission (Top 10 Teams)

The top 10 teams are invited to submit a paper describing their methodology, system design, and experimental results to our Special Session at The 15th International Conference on Computational Science and Network Intelligence (CSoNet 2026).

Jun 30 – Jul 15, 2026
10
On-site Presentation at CSoNet

Selected teams present their work at CSoNet 2026 in Vietnam. The top 5 teams receive cash prizes at this event.

Nov 16–18, 2026
Register Now

Prizes and Awards

Outstanding teams will be recognized through a combination of cash prizes, publication opportunities, and certificates:

  • Top 5 teams: Cash prize and invitation to present at CSoNet 2026 in Vietnam.
  • Top 10 teams: Invited to submit a paper to our Special Session "Explainable AI for Educational Question-Answering" at The 15th International Conference on Computational Science and Network Intelligence (CSoNet 2026).
  • All teams with valid submissions through Phase 1 and Phase 2: Official certificate of participation issued by the Challenge Chairs and the conference organizers.

Organizers

EXACT 2026 is organized by the URA Research Group at Ho Chi Minh City University of Technology (HCMUT), Vietnam, in collaboration with the University of Naples Parthenope, Italy.

Senior Organizers

  • Prof. Angelo Ciaramella — University of Naples Parthenope, Italy
  • Mr. Nguyen Song Thien Long — Ho Chi Minh City University of Technology (HCMUT), Vietnam

Organizers

  • Ms. Le Thi Xinh — Ho Chi Minh City University of Technology (HCMUT), Vietnam
  • Ms. Vo Thi Nhu Quynh — Ho Chi Minh City University of Technology (HCMUT), Vietnam
  • Mr. Nguyen Huu Nam Phong — Ho Chi Minh City University of Technology (HCMUT), Vietnam
  • Ms. Tran Huynh Mai Thao — Ho Chi Minh City University of Technology (HCMUT), Vietnam

Previous Edition

The first edition of this challenge was held at the International Workshop on Trustworthiness and Reliability in Neuro-Symbolic AI, co-located with IEEE IJCNN 2025. The event attracted 107 participants across 30 teams from multiple countries and produced the first public benchmark for explainable academic-regulation QA.

  • Website: https://sites.google.com/view/trns-ai/challenge
  • Our Findings: Long S. T. Nguyen*, Khang H. N. Vo*, Thu H. A. Nguyen*, Tuan C. Bui, Duc Q. Nguyen, Thanh-Tung Tran, Anh D. Nguyen, Minh L. Nguyen, Fabien Baldacci, Thang H. Bui, Emanuel Di Nardo, Angelo Ciaramella, Son H. Le, Ihsan Ullah, Lorenzo Di Rocco, and Tho T. Quan. Bridging LLMs and Symbolic Reasoning in Educational QA Systems: Insights from the XAI Challenge at IJCNN 2025. Italian Conference on Big Data and Data Science (ITADATA), 2025. [PDF]

Building on these results, EXACT 2026 broadens the scope from educational regulations into STEM reasoning and introduces structured evaluation criteria for explanations.

Contact

For any inquiries about the competition, team registration, or dataset access, please reach out to us at ura.hcmut@gmail.com.

Once registration is finalized, the organizers will set up a communication platform (e.g., Discord) to facilitate discussions with the teams.