Public Falsification Record

Every claim killed by the three-gate verification pipeline: sealed-sandbox reproduction failure (Gate 2), adversarial red-team attack (Gate 3), or mathematical counterexample (Gate 1). Total: 399. · JSON

294 Gate 2/3 pipeline kills · 105 math counterexample kills

Gate 2 / Gate 3 Pipeline Falsifications (294)

Claims VERIFIED at Gate 2 (sealed-sandbox repro) and subsequently falsified by the Gate 3 adversarial red-team (three independent LLM attackers, inverted scoring). A claim SURVIVES only if all three attackers fail to find a fatal flaw (avg attack score < 3.5; no individual score ≥ 5.0).

Task ID	Gate	Claim type	Goal / Claim	Avg attack	Killed (UTC)
760fbc69-65b1-4f…	Gate 3	formula_repro	How does the scaling of model size affect the performance gain from English intermediate-task training in zero-shot cross-lingual transfer, … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.2/10	8.6/10	2026-06-21 01:19
9a6f19a5-f2e6-4c…	Gate 3	formula_repro	To what extent does English intermediate-task training improve cross-lingual reasoning capabilities on multilingual benchmarks compared to d… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.2/10	6.9/10	2026-06-21 01:19
5571973f-50bb-48…	Gate 3	formula_repro	Does intermediate-task training on domain-specific multilingual datasets improve robustness to domain shift in zero-shot cross-lingual trans… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 9.2/10	8.2/10	2026-06-21 01:19
a80c506e-d2f8-4a…	Gate 3	formula_repro	What is the impact of scaling the number of intermediate language-understanding tasks on zero-shot cross-lingual transfer performance for lo… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-21 01:18
2290ba9d-c1c0-46…	Gate 3	formula_repro	What is the impact of intermediate-task training on low-resource languages in the XTREME benchmark when using models pretrained on both Engl… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10	6.2/10	2026-06-21 01:18
2af78e0e-7fba-4d…	Gate 3	formula_repro	How does the effectiveness of English intermediate-task training for zero-shot cross-lingual transfer compare to multilingual intermediate-t… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-21 01:18
8544463f-7e19-46…	Gate 3	formula_repro	How does the performance of multilingual intermediate-task training on low-resource languages compare to English intermediate tasks when eva… COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	4.9/10	2026-06-21 01:18
bd43d6f2-a6c8-44…	Gate 3	formula_repro	How does the performance of intermediate-task training sequences compare to continuous pretraining on a multilingual corpus in zero-shot cro… COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 2.5/10	6.1/10	2026-06-21 01:18
345ee2d0-d119-47…	Gate 3	formula_repro	Does multi-task intermediate training on diverse English NLU tasks improve robustness against typological divergence more effectively than s… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10	8.0/10	2026-06-20 19:17
13542541-aac3-44…	Gate 3	formula_repro	What is the impact of English intermediate-task training on the alignment stability of multilingual encoders when evaluated on adversarial p… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-20 19:17
d7a818b5-ede1-48…	Gate 3	formula_repro	Does the performance gain from English intermediate-task training on XTREME scale with increasing pretraining model size across diverse low-… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10	8.9/10	2026-06-20 19:17
7f4a6491-9864-42…	Gate 3	formula_repro	How does English intermediate-task training affect zero-shot cross-lingual robustness on XTREME tasks with synthetic code-switching noise co… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-20 19:17
aa533e6e-e763-4c…	Gate 3	formula_repro	How does intermediate-task training on English reasoning datasets affect zero-shot cross-lingual performance on the XCOPA and XNLI subsets o… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.2/10	6.1/10	2026-06-20 19:17
48a8684b-3452-40…	Gate 3	formula_repro	Does the order of intermediate-task fine-tuning (sequential vs. concurrent) influence the robustness of multilingual alignment in zero-shot … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.1/10	2026-06-20 19:16
a76c6aed-2013-42…	Gate 3	formula_repro	How does the choice of English intermediate-task difficulty (e.g., low vs. high complexity) affect zero-shot cross-lingual transfer performa… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 9.2/10	6.7/10	2026-06-20 19:16
62dad032-b551-46…	Gate 3	formula_repro	How does the choice of intermediate task complexity (e.g., easy vs. hard language understanding tasks) affect zero-shot cross-lingual transf… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 3.2/10 · REPLICATION ATTACKER: 9.0/10	7.1/10	2026-06-20 19:15
4ffd234e-3144-46…	Gate 3	formula_repro	Does multilingual intermediate-task training on XTREME-R outperform monolingual English training in few-shot cross-lingual transfer across l… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	6.1/10	2026-06-20 19:15
5f20fd9d-3000-49…	Gate 2	unknown	Does the integration of synthetic code-switched data improve the robustness of zero-shot cross-lingual retrieval models against adversarial …	-	2026-06-20 16:59
c29d0b1d-1c55-45…	Gate 2	unknown	How does hybrid batch training for monolingual and cross-lingual objectives impact zero-shot retrieval accuracy on the BEIR benchmark compar…	-	2026-06-20 16:57
a10e30bc-cc3b-42…	Gate 3	formula_repro	How does training on artificially code-switched data affect the robustness of zero-shot cross-lingual retrieval models across low-resource l… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10	6.8/10	2026-06-20 13:15
88b977da-978a-4b…	Gate 3	formula_repro	How does the granularity of bilingual lexicons (e.g., word-level vs. phrase-level) impact the effectiveness of artificially code-switched tr… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10	5.8/10	2026-06-20 13:15
9293eb3f-c631-43…	Gate 3	formula_repro	What is the impact of artificially code-switched training data on the robustness of cross-lingual retrieval models evaluated on the PAWS-X d… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10	6.7/10	2026-06-20 13:15
108e3723-cd80-4b…	Gate 3	formula_repro	How does the quality of bilingual lexicons impact the performance of zero-shot cross-lingual retrieval models on the BEIR benchmark when eva… COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10	7.4/10	2026-06-20 13:14
4cb997d4-121c-4d…	Gate 3	formula_repro	Does training on artificially code-switched data improve zero-shot cross-lingual retrieval recall on the MIRACL benchmark compared to monoli… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10	7.2/10	2026-06-20 13:14
420aa4ee-b924-40…	Gate 3	formula_repro	What is the effect of increasing the amount of artificially code-switched training data on the robustness of zero-shot cross-lingual retriev… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10	6.5/10	2026-06-20 13:14
bb51b78d-f1e5-48…	Gate 3	formula_repro	How does the transfer learning performance of self-supervised speech models pre-trained on Flemish Dutch compare to other low-resource langu… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10	5.3/10	2026-06-20 07:14
1249d225-3960-40…	Gate 3	formula_repro	How does the noise level in automatically induced bilingual lexicons affect the nDCG@10 and MAP scores of zero-shot cross-lingual retrievers… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10	6.2/10	2026-06-19 19:12
2e7a84f4-8bf1-4d…	Gate 3	formula_repro	To what extent does English intermediate-task training enhance zero-shot reasoning capabilities on multilingual benchmarks like XTREME-R for… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10	7.8/10	2026-06-19 19:12
d41c927e-b869-40…	Gate 3	formula_repro	How does intermediate-task training on non-English source languages compare to English-only intermediate training for zero-shot cross-lingua… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-19 19:12
44a960a9-7f95-41…	Gate 3	formula_repro	How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.7/10	2026-06-19 13:09
05adb989-b8ea-44…	Gate 3	formula_repro	Does training on artificially code-switched datasets improve the robustness of zero-shot cross-lingual retrievers against query-document lan… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	5.7/10	2026-06-19 13:06
da82e7da-f16a-41…	Gate 3	formula_repro	Does the hybrid batch strategy improve zero-shot cross-lingual retrieval robustness on the XTD benchmark compared to standard multilingual c… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-19 13:06
2bb9a5bd-57ad-44…	Gate 3	formula_repro	How does training on artificially code-switched data affect zero-shot cross-lingual performance on the XNLI benchmark compared to standard m… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10	7.8/10	2026-06-19 13:06
3b872189-de14-45…	Gate 3	formula_repro	How does training on artificially code-switched data affect the zero-shot retrieval accuracy of multilingual dense retrievers on the Lasers … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10	7.5/10	2026-06-19 13:06
5ff9bb46-9d08-44…	Gate 3	formula_repro	To what extent does training on artificially code-switched data improve zero-shot cross-lingual retrieval robustness on XTREME-R when querie… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.7/10	2026-06-19 13:05
6ef56955-1505-47…	Gate 3	formula_repro	How does the proportion of code-switched tokens in synthetic training data correlate with the accuracy drop of zero-shot cross-lingual ranke… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10	8.8/10	2026-06-19 13:05
73b8a1d9-aee2-48…	Gate 3	formula_repro	How does training on artificially code-switched data affect the robustness of zero-shot cross-lingual rankers against adversarial noise comp… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-19 13:03
07f2d90e-a6ad-40…	Gate 2	unknown	Does training on artificially code-switched data improve zero-shot cross-lingual retrieval performance for low-resource languages not includ…	-	2026-06-19 12:46
7507c92e-3093-47…	Gate 3	formula_repro	Does integrating CausalMixFT during fine-tuning improve the robustness of tabular foundation models against adversarial perturbations in low… COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10	5.0/10	2026-06-19 07:00
da92f3e0-4d5b-44…	Gate 3	formula_repro	How do dense RGB-D SLAM systems utilizing 3D Gaussian representations compare to neural implicit methods in terms of memory consumption and … COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10	7.5/10	2026-06-19 07:00
3d9090b2-81c7-4d…	Gate 3	formula_repro	How do vision-language models perform in cross-domain robustness evaluations when tested on perturbed multimodal benchmarks from domains lik… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10	6.5/10	2026-06-19 07:00
0ff89691-6471-49…	Gate 3	formula_repro	How does the trade-off between model size and latency compare between OpenPangu-7B-MLA and smaller prosody-exclusive models when deployed on… COUNTEREXAMPLE HUNTER: 8.0/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 8.5/10	7.0/10	2026-06-19 07:00
86059cb7-4f35-4c…	Gate 3	formula_repro	What is the impact of cross-lingual transfer from English pre-trained speech models versus monolingual Flemish pre-training on phoneme recog… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.3/10	2026-06-19 07:00
0cb45f8d-b6ae-45…	Gate 3	formula_repro	How does the addition of self-supervised pre-training objectives in zero-shot cross-lingual SLU models affect slot-filling accuracy on the M… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 2.5/10	6.5/10	2026-06-19 07:00
36c4669f-6437-43…	Gate 3	formula_repro	What is the effect of varying the size of the monolingual training set on the intent detection performance of zero-shot cross-lingual SLU mo… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 3.0/10	6.3/10	2026-06-19 07:00
d245e830-fc51-49…	Gate 3	formula_repro	What is the impact of varying the code-switching ratio in training data on the retrieval performance degradation when query and document lan… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	9.0/10	2026-06-19 01:00
a5ab3b78-a7cb-48…	Gate 3	formula_repro	How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned… COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 6.5/10	6.7/10	2026-06-19 01:00
5e61f043-f854-46…	Gate 3	formula_repro	What is the impact of varying the ratio of code-switched tokens in artificially generated training data on the robustness (measured by accur… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10	7.8/10	2026-06-19 00:59
a984d373-612a-42…	Gate 3	formula_repro	How does the performance of zero-shot cross-lingual rankers trained on artificially code-switched data compare to models fine-tuned on multi… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10	7.5/10	2026-06-19 00:59
963c284c-e2a8-44…	Gate 3	formula_repro	How does increasing the proportion of code-switched tokens in the training data affect the robustness of zero-shot cross-lingual retrieval m… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10	5.0/10	2026-06-19 00:59
643ef90b-e1b6-4f…	Gate 3	formula_repro	How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 7.5/10	6.0/10	2026-06-19 00:59
05f4c33e-153e-4c…	Gate 3	formula_repro	Does scaling the multilingual pre-trained model size improve precision@k in zero-shot cross-lingual retrieval when using the proposed hybrid… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10	9.2/10	2026-06-18 18:59
f988e6e1-5aab-4f…	Gate 3	formula_repro	Can scaling the hybrid batch training method to larger multilingual models (e.g., XLM-R or mT5) further enhance zero-shot cross-lingual retr… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-18 18:59
6a1d78e6-6b7d-44…	Gate 3	formula_repro	How does domain-adaptive fine-tuning of Flemish Dutch self-supervised speech models impact word error rate on CommonVoice compared to cross-… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-18 18:56
52c35878-037b-40…	Gate 3	formula_repro	What is the comparative effect of multi-task intermediate training versus single large-task training on reasoning capabilities within zero-s… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-18 18:54
9b3a269a-5094-47…	Gate 3	formula_repro	Does combining diverse intermediate tasks improve robustness in zero-shot cross-lingual transfer on XTREME-R more effectively than training … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.0/10	2026-06-18 18:53
880c758a-54df-4a…	Gate 3	formula_repro	How does hybrid batch training impact zero-shot cross-lingual retrieval accuracy on XNLI compared to monolingual fine-tuning across varying … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10	7.0/10	2026-06-18 18:53
faa8fed9-29ac-4e…	Gate 3	formula_repro	Does the synergistic hybrid batch training approach improve cross-lingual retrieval robustness on the MIRACL benchmark under domain shift co… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-18 18:51
fd7ab5fe-aab9-48…	Gate 3	formula_repro	What is the impact of hybrid batch training on the scaling behavior of zero-shot retrieval performance across varying model sizes within the… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 2.0/10	6.5/10	2026-06-18 18:51
2e00ae69-fcc4-4e…	Gate 3	formula_repro	How does hybrid batch training for simultaneous monolingual and cross-lingual retrieval impact zero-shot performance on low-resource languag… COUNTEREXAMPLE HUNTER: 8.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 4.5/10	7.3/10	2026-06-18 18:50
25c348c6-4569-45…	Gate 3	formula_repro	Does the synergistic optimization of monolingual and cross-lingual objectives in hybrid batch training improve retrieval performance on long… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-18 18:48
a9afc122-af0a-4c…	Gate 3	formula_repro	What is the impact of varying the proportion of code-switched tokens in artificially generated training data on the robustness of zero-shot … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.7/10	2026-06-18 12:48
96a5be5a-20a9-4d…	Gate 3	formula_repro	Does training on artificially code-switched data improve the robustness of retrieval models against language mismatch errors in queries and … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	9.0/10	2026-06-18 12:48
d4730df9-5abd-4e…	Gate 3	formula_repro	What is the impact of varying bilingual lexicon coverage on the zero-shot cross-lingual retrieval performance of code-switched trained model… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-18 12:48
82718235-0464-46…	Gate 3	formula_repro	How does the cross-lingual retrieval accuracy of models trained on artificially code-switched data compare to full multilingual pretraining … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.2/10	9.2/10	2026-06-18 12:48
304f71e8-cfc0-47…	Gate 3	formula_repro	How does the performance of zero-shot cross-lingual retrieval models improve when trained on artificially code-switched data generated from … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10	8.9/10	2026-06-18 12:47
5b801b53-702c-47…	Gate 3	formula_repro	How does the hybrid batch training strategy impact the zero-shot cross-lingual retrieval accuracy of larger multimodal models (e.g., PaLI, B… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-18 12:47
681c606e-0739-4d…	Gate 3	formula_repro	How does the scaling of model size (e.g., small, base, large) interact with the hybrid batch training strategy in terms of zero-shot cross-l… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-18 12:47
fed390d7-0f30-4f…	Gate 3	formula_repro	Can the hybrid batch training strategy be adapted to improve zero-shot cross-lingual retrieval performance in low-resource language settings… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-18 12:46
69ee4e5c-067c-4b…	Gate 3	formula_repro	How does the scaling of intermediate-task dataset size affect the degradation of zero-shot cross-lingual transfer performance on the XTREME … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-18 06:45
a9ba0423-52f5-40…	Gate 3	formula_repro	What is the impact of intermediate-task training on the robustness of zero-shot cross-lingual transfer to low-resource languages within the … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-18 06:45
dc84b96b-753f-46…	Gate 3	formula_repro	Does the choice of multilingual intermediate tasks (e.g., language-agnostic vs. language-specific) impact the robustness of zero-shot cross-… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-18 06:45
bff5aa00-46f0-42…	Gate 3	formula_repro	How does the performance of multilingual intermediate-task training compare to English intermediate-task training on the XTREME-R benchmark,… COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 6.5/10	5.2/10	2026-06-18 06:44
b6aec771-9353-47…	Gate 3	formula_repro	How does the hybrid batch training strategy impact zero-shot retrieval accuracy on low-resource MIRACL language pairs compared to dedicated … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 6.5/10	5.7/10	2026-06-18 06:44
82a8b49a-ca92-48…	Gate 3	formula_repro	How does fine-tuning Flemish Dutch self-supervised speech models with domain adaptation techniques affect word error rate on the CommonVoice… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10	7.5/10	2026-06-18 06:44
16fd9c48-0866-43…	Gate 3	formula_repro	What is the impact of structural causal model fidelity on the downstream classification accuracy of fine-tuned tabular foundation models in … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-18 06:43
27b34f6d-ce74-49…	Gate 3	formula_repro	What is the effect of domain-specific vs. general-domain code-switched data on zero-shot cross-lingual retrieval performance in multilingual… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.0/10	2026-06-18 00:41
706377b3-65ea-44…	Gate 3	formula_repro	How does the robustness of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare across different lang… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.0/10	2026-06-18 00:41
6724432f-b734-46…	Gate 3	formula_repro	How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to those trained on … COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10	7.2/10	2026-06-18 00:41
5eeb7591-2ce2-45…	Gate 3	formula_repro	How does the retrieval accuracy per training token of models trained on artificially code-switched data compare to full multilingual pretrai… COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10	5.5/10	2026-06-18 00:41
572586b4-d231-44…	Gate 3	formula_repro	How does the lexical coverage ratio of bilingual dictionaries used for artificial code-switching correlate with zero-shot cross-lingual retr… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 6.5/10	7.1/10	2026-06-18 00:41
b539628a-b9a7-4a…	Gate 3	formula_repro	How does hybrid batch training impact zero-shot retrieval recall@10 on the MIRACL benchmark for low-resource languages compared to monolingu… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-17 18:40
8d5a627d-354d-4d…	Gate 3	formula_repro	How does the hybrid batch training strategy impact zero-shot retrieval accuracy on unseen low-resource language pairs when evaluated on the … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-17 18:39
16872a52-c7b2-4c…	Gate 3	formula_repro	How does fine-tuning on naturally occurring code-switched corpora (e.g., LINCS or NLPCC) compare to fine-tuning on artificially code-switche… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	9.0/10	2026-06-17 18:39
a068b0e0-7ac3-4d…	Gate 3	formula_repro	Does the synergistic hybrid batch training strategy improve zero-shot cross-lingual retrieval accuracy for languages with varying typologica… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10	8.8/10	2026-06-17 18:38
e864d5a7-01cf-48…	Gate 3	formula_repro	To what extent do self-supervised speech models pre-trained on Flemish Dutch generalize to low-resource dialects compared to English pre-tra… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10	8.9/10	2026-06-17 18:38
4d287f11-5746-4c…	Gate 3	formula_repro	Does fine-tuning English pre-trained speech models on limited Flemish data yield comparable robustness to noise as models pre-trained exclus… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-17 18:36
52cce802-988b-44…	Gate 3	formula_repro	How does scaling the model size of TSDiff impact its performance on cross-domain time series forecasting benchmarks (e.g., UCR archive) comp… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-17 18:36
69e07f68-b58d-4d…	Gate 3	formula_repro	What is the impact of simultaneous monolingual and cross-lingual objective optimization on the generalization capability of multilingual enc… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-17 12:34
8aad928b-e796-47…	Gate 3	formula_repro	How does hybrid batch training affect zero-shot cross-lingual retrieval accuracy on low-resource language pairs in the MIRACL benchmark comp… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-17 12:34
96ee3e4c-90e2-47…	Gate 3	formula_repro	How does varying the ratio of monolingual to cross-lingual training examples in hybrid batches affect the performance trade-off between NQ a… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10	8.8/10	2026-06-17 12:34
c1c990b6-70f0-44…	Gate 3	formula_repro	What is the impact of simultaneous monolingual, cross-lingual, and multilingual optimization on the retrieval performance of transformer mod… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	6.3/10	2026-06-17 12:31
efef605f-428e-4d…	Gate 3	formula_repro	How does the synergistic hybrid batch training strategy compare to standard multilingual fine-tuning in terms of zero-shot cross-lingual ret… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-17 12:31
5e54cc94-1461-48…	Gate 3	formula_repro	Can integrating domain-specific monolingual data (e.g., legal, medical) into hybrid batch training improve zero-shot retrieval accuracy on X… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	9.0/10	2026-06-17 12:28
45ea17e5-1565-49…	Gate 3	formula_repro	Can the model-agnostic nature of SafeCoDe be validated across different multimodal architectures (e.g., LLaVA, Qwen-VL) by comparing their s… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10	8.7/10	2026-06-17 12:28
d4118170-4257-4f…	Gate 3	formula_repro	How does the hybrid batch training strategy compare to language-specific adapter modules in improving zero-shot cross-lingual retrieval accu… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10	9.1/10	2026-06-17 06:27
f36a2df0-3772-49…	Gate 3	formula_repro	What is the impact of varying the degree of artificial code-switching in training data on the robustness of zero-shot cross-lingual retrieva… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-17 06:27
8acc7c8a-b815-43…	Gate 3	formula_repro	Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval performance on non-English language pairs in XM3600 compar… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-17 06:26
479ce4ad-ca7a-40…	Gate 3	formula_repro	Can intermediate-task training on English reasoning datasets mitigate cross-lingual performance degradation in low-resource languages in the… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-17 06:26
cf208549-5033-41…	Gate 3	formula_repro	Does multilingual intermediate-task training improve zero-shot transfer accuracy on XTREME-R domain-specific subsets compared to English-onl… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 6.5/10	8.2/10	2026-06-17 06:26
5ad5a3a9-7cbb-4f…	Gate 3	formula_repro	What is the impact of varying the size and linguistic diversity of the English intermediate-task corpus on the degradation of zero-shot tran… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.0/10	8.4/10	2026-06-17 06:26
c64ed49c-d8b1-42…	Gate 3	formula_repro	How does cross-lingual query generation augmentation impact the adversarial robustness of dense retrieval models against paraphrase attacks … COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.5/10	8.1/10	2026-06-17 06:26
9cbec7df-52ae-46…	Gate 3	formula_repro	Does pretraining zero-shot cross-lingual retrieval models on artificially code-switched data improve robustness to language divergence in qu… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-17 00:25
791b2ca6-fdad-44…	Gate 3	formula_repro	Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval robustness in low-resource language settings for multimoda… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-17 00:25
c3b1fbef-bb8a-4c…	Gate 3	formula_repro	Does the hybrid batch training strategy improve retrieval performance on the XOR benchmark compared to models optimized solely for cross-lin… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-17 00:25
1003c363-131a-43…	Gate 3	formula_repro	Does training on artificially code-switched data improve zero-shot cross-lingual retrieval performance on the MLQA benchmark compared to sta… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.2/10	6.4/10	2026-06-17 00:25
0195f05e-2ecb-43…	Gate 3	formula_repro	Does the hybrid batch training strategy proposed for information retrieval improve multimodal alignment accuracy on zero-shot cross-lingual … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10	5.7/10	2026-06-17 00:25
54b44e18-53d0-4a…	Gate 3	formula_repro	Does intermediate-task training on English reasoning datasets improve zero-shot cross-lingual performance on the XCOPA and XNLI subsets of X… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.0/10	2026-06-17 00:25
1fba2b0b-14c2-4f…	Gate 3	formula_repro	Can multilingual intermediate-task training outperform English-only intermediate training for zero-shot transfer on domain-specific subsets … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-17 00:24
f9e78b9c-2598-43…	Gate 3	formula_repro	How does the size of the English intermediate-task corpus affect the degradation of zero-shot transfer accuracy on low-resource languages wi… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-17 00:24
f9cfb858-1284-49…	Gate 3	formula_repro	How does the performance of cross-lingual dense retrieval systems using query-augmented passage representations compare to those using multi… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10	9.1/10	2026-06-17 00:24
8416a433-960a-43…	Gate 3	formula_repro	What is the impact of scaling the size of the synthetic dataset generated by CausalMixFT on the fine-tuning performance of tabular foundatio… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	8.8/10	2026-06-17 00:24
36871c14-b8df-4a…	Gate 3	formula_repro	How does cross-lingual query generation augmentation affect the adversarial robustness of dense retrieval models against paraphrase attacks … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10	7.5/10	2026-06-16 18:21
006e8e4a-a201-4b…	Gate 3	formula_repro	How does the performance gap between high-resource and low-resource languages in cross-lingual retrieval models change when using different … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10	8.2/10	2026-06-16 18:21
b6d14fcd-ba14-42…	Gate 3	formula_repro	How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.2/10	5.7/10	2026-06-16 18:21
ef1bc3a9-c7fa-44…	Gate 3	formula_repro	What is the impact of varying the proportion of code-switched terms in training data on the robustness of zero-shot cross-lingual retrieval … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10	6.2/10	2026-06-16 18:21
b39d0c42-f3a9-4c…	Gate 3	formula_repro	How does the performance of cross-lingual query generation compare to multilingual contrastive learning (e.g., XLM-R, LasER) on the BEIR ben… COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10	5.7/10	2026-06-16 18:21
33238242-5dc0-45…	Gate 2	unknown	How does the cross-lingual transfer performance of mE5 compare to other multilingual models like XLM-R or mBERT when pre-trained on monoling…	-	2026-06-16 17:19
2556595d-7efa-45…	Gate 3	formula_repro	How does the performance of multilingual dense retrieval models compare on WebFAQ when trained with synthetic data augmentation versus human… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10	8.5/10	2026-06-16 12:21
fafdde58-463a-48…	Gate 3	formula_repro	How does the performance of Targeted Lexical Injection (TLI) with early-layer LoRA fine-tuning compare to full-parameter fine-tuning on the … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10	8.2/10	2026-06-16 12:19
497fe53f-999a-48…	Gate 3	formula_repro	How does the pass@1 degradation of CodeT5 compare to JaCoText on MBPP Pro when subjected to semantic-preserving docstring perturbations vers… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10	7.8/10	2026-06-16 12:19
d3826925-d702-4e…	Gate 3	formula_repro	Can TLI early-layer LoRA fine-tuning improve cross-domain alignment in Lugha-Llama for low-resource Bantu languages, as evaluated by mAP sco… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10	6.7/10	2026-06-16 12:19
d09dcdad-46aa-44…	Gate 3	formula_repro	What is the effect of Targeted Lexical Injection on cross-lingual alignment quality for Lugha-Llama when evaluated on semantic textual simil… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10	7.5/10	2026-06-16 12:19
c90a313e-3bcf-45…	Gate 3	formula_repro	How does early-layer LoRA with Targeted Lexical Injection impact zero-shot cross-lingual transfer accuracy on the XNLI benchmark for low-res… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 4.5/10	7.2/10	2026-06-16 12:19
647a30e8-8ab0-47…	Gate 3	formula_repro	To what extent does the depth of early-layer LoRA fine-tuning in TLI affect cross-lingual lexical alignment, as measured by LAS scores acros… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10	7.8/10	2026-06-16 06:19
a8f56d30-b343-4a…	Gate 3	formula_repro	How do context-aware conversational models and sequence labeling approaches differ in zero-shot cross-lingual transfer accuracy for hate spe… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.0/10	2026-06-16 06:18
a63a1ae2-af59-40…	Gate 3	formula_repro	How does the scalability of CausalMixFT compare to other data augmentation methods (e.g., SMOTE, GAN-based augmentation) when fine-tuning ta… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-16 06:18
75b0a51f-2504-44…	Gate 3	formula_repro	Can SCM-based synthetic augmentation reduce the validation data requirements for early stopping in fine-tuning, as measured by the stability… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-16 06:18
3dd55559-98c7-47…	Gate 3	formula_repro	Do parameter-efficient fine-tuning methods like LoRA maintain instance segmentation performance on COCO when applied to other transformer ba… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 0.0/10	5.5/10	2026-06-16 06:18
b451d079-8c4b-4d…	Gate 3	formula_repro	How does the integration of CausalMixFT-generated synthetic data affect the fine-tuning convergence speed and validation accuracy of tabular… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10	6.2/10	2026-06-16 06:18
069ce81a-1531-45…	Gate 3	formula_repro	Does CausalMixFT outperform diffusion-based data augmentation (e.g., DiffAugment) in terms of robustness to covariate shift when fine-tuning… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10	9.1/10	2026-06-16 06:17
a97c638e-389e-47…	Gate 3	formula_repro	How does integrating causal structure into TabPFN's synthetic data generation affect its performance on downstream task accuracy across diff… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10	8.7/10	2026-06-16 06:17
d7b2e2bf-4375-41…	Gate 2	unknown	How do TimeGAN and VAE-generated synthetic financial time series compare in terms of robustness when used to evaluate the temporal reasoning…	-	2026-06-16 01:04
e0caf16c-fb4c-48…	Gate 3	formula_repro	How does varying the depth of LoRA adapter injection in Lugha-Llama affect cross-lingual alignment accuracy on low-resource Swahili-English … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.3/10	2026-06-16 00:15
bc889895-c22b-44…	Gate 3	formula_repro	To what extent does the combination of SFT and DPO degrade the zero-shot reasoning capabilities of OPT-350M on the Big-Bench Hard suite rela… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10	6.2/10	2026-06-16 00:14
fcc53574-9cfa-44…	Gate 3	formula_repro	How does the reasoning accuracy of multimodal large language models compare to diffusion-based trajectory policies in dynamic task planning … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	9.0/10	2026-06-16 00:14
852ac52e-42d2-4e…	Gate 3	formula_repro	How does the hybrid batch training strategy impact zero-shot cross-lingual retrieval accuracy on low-resource languages within the XQuAD ben… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-16 00:14
f8fc038d-df35-4b…	Gate 3	formula_repro	How does the scaling of synthetic data diversity in tabular foundation model pretraining affect accuracy degradation under distributional sh… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 8.5/10	7.9/10	2026-06-16 00:14
f1cb0512-e3b3-44…	Gate 3	formula_repro	To what extent does incorporating causal priors via CausalMixFT improve out-of-distribution (OOD) robustness in tabular foundation models, a… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.5/10	2026-06-16 00:13
4efb4ac7-2558-4e…	Gate 3	formula_repro	How does the cross-lingual query generation approach compare to cross-lingual passage generation in terms of enhancing the alignment capabil… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10	8.7/10	2026-06-15 18:13
e4b97bd1-f023-4b…	Gate 3	formula_repro	What is the correlation between training data volume in WebFAQ 2.0 and zero-shot cross-lingual retrieval performance gaps across the 75 supp… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10	6.8/10	2026-06-15 18:11
b68d34da-d3d1-43…	Gate 3	formula_repro	Can synergistic optimization of monolingual and cross-lingual objectives reduce performance degradation on the XTREME retrieval benchmark fo… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 1.5/10	6.3/10	2026-06-15 18:10
de04d7fd-23ac-49…	Gate 3	formula_repro	Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval performance on downstream datasets like MIRACL or XNLI whe… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10	8.5/10	2026-06-15 18:08
82ef2513-1ed2-43…	Gate 3	formula_repro	Does early-layer LoRA adaptation for lexical alignment in Lugha-Llama maintain zero-shot translation accuracy on morphologically rich Bantu … COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10	8.8/10	2026-06-15 18:08
79f75ace-7cdb-4f…	Gate 3	formula_repro	How does the alignment of synthetic financial data generated by GANs versus VAEs influence the downstream performance of multimodal models i… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-15 18:07
be269da2-4b74-48…	Gate 3	formula_repro	How does the noise level in automatically extracted bilingual lexicons impact the zero-shot cross-lingual retrieval accuracy of code-switche… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.5/10 · REPLICATION ATTACKER: 6.5/10	5.2/10	2026-06-15 18:07
1ce91234-09d8-44…	Gate 3	formula_repro	How does early-layer LoRA fine-tuning for lexical alignment in Lugha-Llama compare to full-parameter fine-tuning on zero-shot cross-lingual … COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-15 18:07
d78d2f8b-f3d8-45…	Gate 3	formula_repro	How does early-layer LoRA fine-tuning for lexical alignment in Lugha-Llama compare to full-parameter fine-tuning on cross-lingual retrieval … COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.3/10	2026-06-15 12:06
fb1c1363-5e72-48…	Gate 3	formula_repro	Does early-layer LoRA fine-tuning improve cross-lingual lexical alignment more effectively than full-model fine-tuning for low-resource Afri… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 7.5/10	7.4/10	2026-06-15 12:05
9f2d0d5b-7064-49…	Gate 3	formula_repro	How does fine-tuning dense retrieval models on WebFAQ's 47 million non-English pairs impact zero-shot cross-lingual transfer accuracy on the… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10	7.2/10	2026-06-15 12:05
c64e666b-968b-44…	Gate 3	formula_repro	How does training on artificially code-switched data compare to translate-train methods in improving zero-shot cross-lingual retrieval accur… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10	6.5/10	2026-06-15 12:05
befe8d82-5024-48…	Gate 3	formula_repro	How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to multilingual pret… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 6.5/10	6.2/10	2026-06-15 12:05
05a11663-8824-41…	Gate 3	formula_repro	How does hybrid batch training for simultaneous monolingual and cross-lingual optimization impact zero-shot retrieval accuracy on out-of-dom… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10	6.2/10	2026-06-15 12:05
e10a1785-aeeb-42…	Gate 3	formula_repro	Does training on artificially code-switched data improve cross-lingual retrieval precision compared to monolingual training when evaluated o… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.1/10	2026-06-15 12:05
28d7ecd5-529b-47…	Gate 3	formula_repro	Does training on artificially code-switched data improve cross-lingual robustness on the XQuAD benchmark when evaluated against standard mul… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10	8.3/10	2026-06-15 12:05
324f98c4-488c-47…	Gate 3	formula_repro	Does intermediate-task training on domain-specific English corpora improve zero-shot transfer performance on multilingual domain subsets of … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 6.5/10	8.2/10	2026-06-15 12:05
367b042b-53cd-40…	Gate 3	formula_repro	How does the alignment between MIDI symbolic input and audio output in Tacotron-based models compare to that of neural source-filter wavefor… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10	7.8/10	2026-06-15 06:04
853e72b4-662b-4e…	Gate 3	formula_repro	What is the impact of TLI early-layer LoRA fine-tuning on the robustness of Lugha-Llama against adversarial lexical perturbations in low-res… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10	5.7/10	2026-06-15 06:04
e655d427-02d9-48…	Gate 3	formula_repro	How does early-layer LoRA adaptation in Lugha-Llama impact zero-shot cross-lingual retrieval accuracy on noisy Swahili-English datasets comp… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10	8.8/10	2026-06-15 06:04
b435d9ed-a132-4b…	Gate 3	formula_repro	How does early-layer LoRA fine-tuning for lexical injection compare to middle-layer adaptation in improving cross-lingual alignment scores o… COUNTEREXAMPLE HUNTER: 3.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 9.5/10	6.7/10	2026-06-15 06:04
c5a6b419-3445-44…	Gate 3	formula_repro	How does the token prioritization strategy in Vcc affect perplexity scores on the PG-19 benchmark compared to sparse attention patterns like… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 8.5/10	8.2/10	2026-06-15 06:04
757b4933-05c3-4b…	Gate 3	formula_repro	How does cross-lingual query generation compare to direct cross-lingual data training in terms of improving passage representation alignment… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-15 06:03
7ee1158c-08c4-4d…	Gate 3	formula_repro	Does augmenting passage representations with generated queries reduce the latency-throughput trade-off in cross-lingual dense retrieval syst… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.7/10	2026-06-15 06:03
f149150d-181d-4e…	Gate 3	formula_repro	How does the robustness of zero-shot cross-lingual voice cloning in flow-matching TTS models vary when evaluated on noisy or adversarial inp… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-15 06:03
67bf0cbf-613c-4e…	Gate 3	formula_repro	How does the combined SFT+DPO alignment strategy impact the reasoning accuracy of OPT-350M on complex multilingual queries relative to stand… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	6.2/10	2026-06-15 06:02
780db65d-2d7d-42…	Gate 3	formula_repro	To what extent does increasing the scale of the base language model mitigate the degradation in helpfulness scores observed in OPT-350M afte… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10	4.7/10	2026-06-15 06:02
9013c53e-30cb-47…	Gate 3	formula_repro	How does early-layer LoRA adaptation for lexical alignment in Lugha-Llama compare to full fine-tuning in zero-shot cross-lingual transfer ac… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10	6.2/10	2026-06-15 00:02
ae131cd2-d64a-42…	Gate 3	formula_repro	Does the latent cross-lingual alignment achieved via Targeted Lexical Injection in Lugha-Llama generalize to zero-shot machine translation p… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.3/10	2026-06-15 00:01
8d6c501f-4675-4a…	Gate 3	formula_repro	Do auxiliary objectives with factorized latent dynamics improve sample efficiency in small-scale Video-JEPA training relative to standard jo… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 6.5/10	4.6/10	2026-06-15 00:01
9ade25b1-b2f8-40…	Gate 3	formula_repro	What is the effect of factorized latent dynamics auxiliary objectives on the transfer learning performance of Video-JEPA when evaluated on d… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 6.5/10	7.1/10	2026-06-15 00:01
afe7316d-46f8-46…	Gate 3	formula_repro	How does CLIP-TD's zero-shot transfer accuracy on domain-shifted vision-language tasks compare to standard CLIP fine-tuning methods? COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10	7.5/10	2026-06-14 18:01
db269ac8-afd2-4c…	Gate 3	formula_repro	How does the scaling of self-supervised pretraining data size affect the performance of few-shot meta-learners on language model benchmarks … COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10	5.0/10	2026-06-14 18:01
72958e36-47e4-4e…	Gate 3	formula_repro	What is the impact of integrating motion-image diffusion priors on the robustness of vision-language-action models against adversarial pertu… COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 3.2/10 · REPLICATION ATTACKER: 8.5/10	6.3/10	2026-06-14 18:01
96f30ae4-f497-45…	Gate 3	formula_repro	How does the cross-lingual voice cloning performance of flow-matching TTS models compare to diffusion-based TTS models when evaluated on uns… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10	6.2/10	2026-06-14 18:01
cad169a5-a675-41…	Gate 3	formula_repro	How does the performance of Targeted Lexical Injection (TLI) compare to full fine-tuning and adapter-based methods on the XTREME-R benchmark… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-14 18:01
cf8d5c87-fe0e-4d…	Gate 3	formula_repro	How does hybrid batch training affect the zero-shot cross-lingual retrieval accuracy of mBERT on low-resource language pairs compared to mon… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10	9.2/10	2026-06-14 17:59
463bd35f-feb7-45…	Gate 3	formula_repro	Does synergistic optimization of monolingual and cross-lingual objectives improve generalization to unseen language pairs in the BEIR zero-s… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 2.5/10	7.0/10	2026-06-14 17:58
964ed2b1-5e26-41…	Gate 3	formula_repro	How does hybrid batch training affect zero-shot retrieval accuracy on low-resource languages in the XTREME benchmark compared to dedicated m… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10	9.2/10	2026-06-14 17:58
a6b4186a-a702-48…	Gate 3	formula_repro	What is the comparative robustness of CausalMixFT-generated synthetic data against other data augmentation methods (e.g., GAN-based or diffu… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10	5.7/10	2026-06-14 11:57
42f5bcad-eb35-4b…	Gate 3	formula_repro	To what extent does CausalMixFT fine-tuning improve the generalization accuracy of tabular foundation models under data scarcity compared to… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10	7.8/10	2026-06-14 11:57
d9c74c0f-cb53-41…	Gate 3	formula_repro	How does the F1-score of multilingual transformer models compare to monolingual models when evaluated on code-mixed hate speech datasets wit… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-14 11:57
5ec1afab-3eea-4c…	Gate 3	formula_repro	How does early-layer LoRA lexical injection compare to middle-layer adaptation in improving zero-shot cross-lingual retrieval accuracy for S… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-14 11:56
4e42da41-1273-43…	Gate 3	formula_repro	To what extent does Targeted Lexical Injection improve cross-lingual alignment scores on the XCOPA dataset for underrepresented Bantu langua… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10	6.8/10	2026-06-14 11:56
5073185f-b8b7-47…	Gate 3	formula_repro	What is the impact of context window size on the retrieval-augmented generation performance of quantized LoRA-adapted models when evaluating… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-14 11:56
e486f159-c9b9-4f…	Gate 3	formula_repro	How does the alignment of multimodal embeddings (e.g., text and audio) in MUST-RAG affect the consistency and robustness of generated answer… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 9.5/10	6.8/10	2026-06-14 11:55
1fafcd2c-2be3-43…	Gate 3	formula_repro	How does the fidelity of structural causal models used for data augmentation impact the few-shot classification accuracy of fine-tuned tabul… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 3.0/10	6.5/10	2026-06-14 11:55
e404ac75-ed39-43…	Gate 3	formula_repro	Can targeted lexical injection in Lugha-Llama achieve comparable zero-shot cross-lingual performance to MMPLMs like WMT21fb on clinical doma… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10	6.0/10	2026-06-14 11:55
7b8d797a-9b95-44…	Gate 3	formula_repro	How does the use of causal data augmentation techniques like CausalMixFT compare to traditional data augmentation methods (e.g., SMOTE, GAN-… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 8.5/10	8.9/10	2026-06-14 05:55
cf1474bf-5527-47…	Gate 3	formula_repro	What is the accuracy degradation of generalized zero-shot learning models under norm-bounded perturbations across unseen classes? COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-14 05:54
a7211fa0-6c1a-43…	Gate 3	formula_repro	How does fine-tuning dense retrieval models on native multilingual WebFAQ data impact zero-shot cross-lingual retrieval accuracy on XQuAD co… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10	6.0/10	2026-06-14 05:54
99730361-463c-4d…	Gate 3	formula_repro	Does early-layer LoRA fine-tuning improve zero-shot cross-lingual natural language inference accuracy for low-resource African languages com… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	9.0/10	2026-06-14 05:54
fdb047e5-d021-45…	Gate 3	formula_repro	How does the performance of dense retrieval models trained on WebFAQ compare to those trained on Wikipedia-based datasets like Natural Quest… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-14 05:53
3be16ffd-6b7d-4f…	Gate 3	formula_repro	How does the ratio of synthetic to real pretraining data impact the few-shot classification accuracy of multimodal video-language models on … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-14 05:53
038c8225-74bd-49…	Gate 3	formula_repro	How does contrastive pretraining objective selection impact cross-lingual retrieval accuracy for low-resource language pairs in the XTREME b… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10	7.0/10	2026-06-14 05:53
d1a71185-38af-4c…	Gate 3	formula_repro	Does hybrid batch training improve cross-domain generalization for multilingual retrieval models on unseen topics in low-resource languages … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-14 05:53
70f05cea-fe5d-40…	Gate 3	formula_repro	What is the impact of hybrid batch training on the scaling behavior of zero-shot retrieval accuracy when extending from low-resource to high… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.0/10	2026-06-14 05:53
86a0376b-0fad-42…	Gate 3	formula_repro	What is the impact of mixed-precision inference (e.g., FP16 vs. BF16) on the efficiency-accuracy trade-off for long-context models like Long… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	9.0/10	2026-06-14 05:53
d7973fbb-05f8-47…	Gate 3	formula_repro	How does the zero-shot cross-lingual retrieval accuracy of a multilingual encoder pre-trained on WebFAQ's 47M non-English QA pairs compare t… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 9.5/10	6.8/10	2026-06-13 23:53
3378e994-9dc5-4c…	Gate 3	formula_repro	Does scaling the proportion of non-English WebFAQ fine-tuning data improve retrieval latency and accuracy trade-offs for cross-lingual tasks… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 23:52
36a277bf-5c26-40…	Gate 3	formula_repro	What is the impact of incorporating visual modality into self-supervised learning for speech representations on the robustness of neural sou… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 6.5/10	6.7/10	2026-06-13 23:52
809b945c-726a-44…	Gate 3	formula_repro	What is the impact of mixed-dataset pretraining versus single-dataset pretraining on the robustness of Video-JEPA representations to tempora… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	8.8/10	2026-06-13 23:52
da64baea-9f76-40…	Gate 3	formula_repro	What is the impact of varying the ratio of synthetic to real data in CausalMixFT on the fine-tuning performance of tabular foundation models… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-13 23:51
b83881e6-6035-48…	Gate 3	formula_repro	What is the impact of causal data augmentation proportions on the sample efficiency and convergence speed of fine-tuning tabular foundation … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 23:51
c90f70ac-13c1-4c…	Gate 3	formula_repro	What is the correlation between the fidelity of synthetic tabular samples generated via SCMs and the downstream fine-tuning performance of f… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.0/10	8.4/10	2026-06-13 23:51
0b01bfe2-6818-4c…	Gate 3	formula_repro	Does integrating causal structure into synthetic data generation improve the robustness of TabPFN against feature permutation compared to st… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10	8.5/10	2026-06-13 23:51
b744a045-e120-4e…	Gate 3	formula_repro	What is the impact of varying the proportion of causal synthetic data during fine-tuning on the robustness of tabular foundation models acro… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10	8.7/10	2026-06-13 23:51
1ab47328-816a-4c…	Gate 3	formula_repro	How does the robustness of dense retrievers pretrained on WebFAQ compare to those trained on monolingual datasets when evaluated on adversar… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	8.8/10	2026-06-13 23:50
e3b701fc-8dd1-45…	Gate 3	formula_repro	How does the performance of Video-JEPA models with factorized latent dynamics compare to non-factorized variants when evaluated on the Somet… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10	5.0/10	2026-06-13 17:50
1613d088-356e-46…	Gate 3	formula_repro	What is the impact of mixed-dataset pretraining (UCF-101 + Something-Something V2 + ImageNet-100) on the accuracy of Video-JEPA models with … COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 8.5/10	6.2/10	2026-06-13 17:50
51b48b75-c13e-4d…	Gate 3	formula_repro	Does the robustness gained from Targeted Lexical Injection in Lugha-Llama generalize to code-switched social media text as measured by F1 sc… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10	7.8/10	2026-06-13 17:50
fc099746-cbba-4e…	Gate 3	formula_repro	Does fine-tuning tabular foundation models with Structural Causal Model-based synthetic data improve generalization accuracy more than stand… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10	6.5/10	2026-06-13 17:50
2003206f-03e3-4e…	Gate 3	formula_repro	Does combining ImageNet-100 with video datasets improve the domain robustness of self-supervised Video-JEPA representations on heterogeneous… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10	5.3/10	2026-06-13 17:50
a836297c-8d52-40…	Gate 3	formula_repro	What is the impact of varying the rank of LoRA matrices on cross-lingual alignment for Turkic languages when fine-tuned on early layers, eva… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.3/10	2026-06-13 17:50
469e9bad-7f2e-41…	Gate 3	formula_repro	How does the generalization performance of CausalMixFT compare to other data augmentation methods (e.g., Mixup, SMOTE) when fine-tuning tabu… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10	8.9/10	2026-06-13 17:50
d74469d0-67aa-44…	Gate 3	formula_repro	How does fine-tuning dense retrieval models on WebFAQ's 47 million non-English pairs impact zero-shot cross-lingual transfer accuracy on the… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 17:50
a76c6c0e-24bb-4e…	Gate 3	formula_repro	What is the comparative robustness of early-layer LoRA versus full-parameter fine-tuning for Lugha-Llama on cross-lingual natural language i… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 17:50
8df4041f-79f6-45…	Gate 3	formula_repro	How does the incorporation of auxiliary objectives in Video-JEPA models impact the robustness of learned representations when evaluated on o… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10	6.7/10	2026-06-13 11:49
f33863b8-6dc3-46…	Gate 3	formula_repro	How does factorized latent dynamics in Video-JEPA compare to standard JEPA in cross-domain transfer accuracy from synthetic to real-world vi… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-13 11:49
063ca2b8-24d2-46…	Gate 3	formula_repro	What is the impact of varying the number of LoRA layers on cross-lingual lexical alignment in Lugha-Llama when benchmarked against the FLORE… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 11:49
9dadb434-2322-46…	Gate 3	formula_repro	How do bitwise neural networks with stochastic inference techniques perform in comparison to full-precision networks with Monte Carlo dropou… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-13 11:49
bebb97c4-3ac3-46…	Gate 3	formula_repro	What is the effect of the SFT+DPO alignment strategy on the helpfulness retention rate of OPT-350M when evaluated on the Anthropic Helpful-H… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 11:48
8a29c57f-aef4-47…	Gate 3	formula_repro	How does retrieval-augmented revision compare to adversarial training in improving Big-Vul detection accuracy for Llama-3.1-8B without requi… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10	7.7/10	2026-06-13 11:48
dbd73bd9-a579-40…	Gate 3	formula_repro	How does fine-tuning dense retrieval models on the non-English subset of WebFAQ impact cross-lingual zero-shot performance on TyDi QA compar… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10	5.0/10	2026-06-13 11:48
e890a0c6-75f2-41…	Gate 3	formula_repro	What is the impact of fine-tuning WebFAQ-pretrained dense retrieval models on downstream cross-lingual NLI tasks, as measured by XNLI accura… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 11:48
6910a60a-f0a2-40…	Gate 3	formula_repro	Do auxiliary factorized objectives in Video-JEPA improve few-shot learning performance on fine-grained video benchmarks relative to non-fact… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 9.5/10	8.1/10	2026-06-13 11:48
31ae88ad-ede0-43…	Gate 3	formula_repro	To what extent does Direct Preference Optimization enhance the robustness of counter-speech models against adversarial hate speech inputs co… COUNTEREXAMPLE HUNTER: 4.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	7.7/10	2026-06-13 05:48
d82913b1-e2c4-40…	Gate 3	formula_repro	How does retrieval diversity in music-specific RAG frameworks impact answer robustness against adversarial perturbations compared to general… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10	9.0/10	2026-06-13 05:48
3cb37eff-ef87-4e…	Gate 3	formula_repro	How does the multimodal capture component in Expert Mind affect VQA accuracy on domain-specific datasets compared to text-only RAG baselines… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 05:47
e447cada-12a9-4f…	Gate 3	formula_repro	What is the comparative effect of graph sparsity versus density on the F1-score performance of retrieval-augmented generation models in zero… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 5.5/10 · REPLICATION ATTACKER: 9.5/10	8.0/10	2026-06-13 05:47
fd93dc1c-d547-4d…	Gate 3	formula_repro	How does the MRR of cross-lingual dense retrieval models degrade on WebFAQ low-resource language families compared to high-resource ones whe… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10	7.8/10	2026-06-13 05:47
840b7dfc-7587-47…	Gate 3	formula_repro	What is the impact of scaling the multilingual dense retriever model size (e.g., small vs. large) on retrieval performance across low-resour… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10	6.8/10	2026-06-13 05:47
223a5aad-7c31-46…	Gate 3	formula_repro	To what extent does training on artificially code-switched data improve cross-lingual retrieval robustness for low-resource languages compar… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 05:47
cf7bfdf5-1e02-40…	Gate 3	formula_repro	Does training dense retrievers on WebFAQ 2.0's bilingual aligned pairs improve zero-shot question answering accuracy on multilingual benchma… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-13 05:46
6811def3-b807-4f…	Gate 3	formula_repro	How does fine-tuning dense retrieval models on WebFAQ's non-English subsets impact zero-shot cross-lingual retrieval accuracy on the XTREME … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10	8.7/10	2026-06-13 05:46
80d91e4a-f8da-4d…	Gate 3	formula_repro	What is the impact of injecting LoRA adapters exclusively into attention mechanisms versus feed-forward networks in Llama-3.2-3B on the late… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 9.5/10	7.5/10	2026-06-12 21:25
1b596d33-8278-4f…	Gate 3	formula_repro	What is the impact of fine-tuning CodeT5 with adversarial training on its semantic consistency and robustness accuracy in generalized zero-s… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-12 21:23
04a7cbf6-efb9-4b…	Gate 3	formula_repro	How does CausalMixFT compare to other data augmentation techniques (e.g., SMOTE, MixUp) in terms of fine-tuning robustness on tabular datase… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.5/10	2026-06-12 13:50
5e879d86-d825-41…	Gate 3	formula_repro	How does the ratio of synthetic-to-real data in CausalMixFT affect the F1 score variance of tabular foundation models on TabFact across mult… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10	7.8/10	2026-06-12 13:50
d96ded43-64c0-44…	Gate 3	formula_repro	How does evidential deep learning with non-negative evidence constraints affect cross-modal retrieval accuracy on CLIP and ALBEF compared to… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10	8.2/10	2026-06-12 13:50
6330a381-0e16-4e…	Gate 3	formula_repro	How does the data augmentation strategy used in scTab compare in effectiveness to other state-of-the-art data augmentation techniques when a… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10	7.2/10	2026-06-12 07:43
63170d1a-bec2-45…	Gate 3	formula_repro	To what extent does the causal structure complexity (e.g., number of confounders or mediators) in the SCM used for CausalMixFT affect the ge… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10	6.7/10	2026-06-12 07:43
475d3f67-fd79-47…	Gate 3	formula_repro	How does the generalization of scaled tabular models trained on Criteo data perform on unseen high-cardinality categorical features in other… COUNTEREXAMPLE HUNTER: 7.3/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 6.5/10	6.1/10	2026-06-12 07:43
790e88e1-86e1-4d…	Gate 3	formula_repro	How does the domain gap between synthetic and real-world video data affect the zero-shot accuracy of CLIP-based video encoders in gesture re… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 1.0/10	6.3/10	2026-06-12 07:43
b22d1b2d-fd2b-41…	Gate 2	unknown	How do TabPFN, CTGAN, and CausalMixFT perform in cross-domain tabular data generation tasks when evaluated on both synthetic and real-world …	-	2026-06-12 05:07
7e2cde64-adf0-4b…	Gate 3	formula_repro	Can causal synthetic data generation improve the robustness of tabular foundation models against distribution shifts in cross-domain evaluat… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10	8.7/10	2026-06-12 01:36
bc4f6f71-a74f-4b…	Gate 3	formula_repro	To what extent does the choice of Structural Causal Model (SCM) backbone (e.g., linear vs. nonlinear) in CausalMixFT affect few-shot accurac… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	5.7/10	2026-06-12 01:35
41ba449b-d600-44…	Gate 3	formula_repro	How does the CMAL framework's image-text alignment performance on COCO and Flickr30K compare to CLIP and ALBEF in terms of Recall@1 and NDCG… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-12 01:35
aa621143-3436-4a…	Gate 3	formula_repro	Does the scaling behavior of XSimGCL's contrastive loss formulation yield superior convergence rates compared to LightGCL when trained on de… COUNTEREXAMPLE HUNTER: 10.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 10.0/10	9.8/10	2026-06-12 01:34
7958ccbd-1a8f-47…	Gate 3	formula_repro	What is the impact of the novel web-crawled data collection strategy in WebFAQ 2.0 on the domain generalization capabilities of multilingual… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-11 19:28
9f94de5f-bbb3-45…	Gate 3	formula_repro	What is the impact of varying the ratio of synthetic-to-real samples in CausalMixFT on the calibration error and generalization performance … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10	9.1/10	2026-06-11 19:28
aa6a06f7-1784-40…	Gate 3	formula_repro	What is the effect of curriculum learning strategies on the accuracy of large multimodal models evaluated on the MedQA benchmark? COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-11 19:28
8018fed0-9d06-45…	Gate 3	formula_repro	How does curriculum-based multi-task learning impact the inference latency of large multimodal models on sparse medical image-text pairs? COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-11 19:25
3c14268f-b85e-4f…	Gate 3	formula_repro	What is the comparative memory footprint and inference latency of multi-task trained vision-language models versus single-task baselines on … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 0.0/10	6.2/10	2026-06-11 19:25
a324f8d3-15a0-49…	Gate 3	formula_repro	How does the stochastic inference technique in bitwise neural networks compare to other ensemble methods (e.g., snapshot ensembles, Monte Ca… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-11 19:25
3bb4313b-0a73-4d…	Gate 3	formula_repro	To what extent does training dense retrievers on the bilingual aligned QA pairs in WebFAQ 2.0 improve alignment metrics and retrieval robust… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10	8.7/10	2026-06-11 13:19
46e484e7-8529-4e…	Gate 3	formula_repro	To what extent does the inclusion of 47 million non-English WebFAQ pairs improve the robustness of multilingual encoders against domain shif… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10	6.2/10	2026-06-11 13:18
72716d26-4a67-4e…	Gate 3	formula_repro	How do multilingual dense retrievers trained on SWIM-IR perform on low-resource languages in BEIR compared to models trained on natural mult… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10	6.3/10	2026-06-11 13:18
a9865db0-e4a2-4f…	Gate 3	formula_repro	How do different alignment strategies in multimodal models impact inference throughput in low-resource settings when evaluated on BRATS with… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-11 13:17
e733595e-23e1-4e…	Gate 3	formula_repro	What is the comparative robustness of multimodal reasoning in language models with different alignment strategies when applied to cross-doma… COUNTEREXAMPLE HUNTER: 8.2/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	8.7/10	2026-06-11 13:17
6124f4e5-dc19-47…	Gate 3	formula_repro	To what extent does layer-wise KV cache reconstruction in methods like ReST-KV artificially inflate needle-in-a-haystack scores relative to … COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-11 07:12
30fba3f7-6edd-44…	Gate 3	formula_repro	Reproducibility meta-analysis: 3 independent publications report divergent Qwen2.5 performance on Docvqa with a 80.3 percentage-point spread… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-11 07:11
0595bd4f-0470-40…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10	7.5/10	2026-06-11 01:23
50a4f525-1f3f-43…	Gate 3	formula_repro	What is the performance degradation of Unified-IO 2 on the VQA-v2 dataset when audio modalities are introduced as distractors versus text-on… COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 5.0/10 · REPLICATION ATTACKER: 7.5/10	6.7/10	2026-06-11 01:22
f82ac2f4-1a92-4e…	Gate 3	formula_repro	How does GRACE's quantization-aware training scale with model size, and how does it affect performance on the MME and MM1K benchmarks when a… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 2.0/10	5.9/10	2026-06-11 01:22
cc2d0e37-a950-4a…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10	7.0/10	2026-06-10 19:35
673590a7-25e9-41…	Gate 3	formula_repro	How does Qwen3's performance on GPQA Diamond compare to other frontier models when evaluated under chain-of-thought prompting versus standar… COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10	6.2/10	2026-06-10 19:35
b472d355-87a8-45…	Gate 3	formula_repro	How do language models compare to human experts on professional knowledge and science benchmarks v19 COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10	8.7/10	2026-06-10 19:34
b5058ffc-3f4d-46…	Gate 3	formula_repro	What is the impact of million-token context windows on multimodal reasoning accuracy in Gemini 1.5 Pro versus prior versions? COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	8.8/10	2026-06-10 19:34
09acaf30-ab81-49…	Gate 3	formula_repro	To what extent does chain-of-thought prompting mitigate performance degradation in long-horizon reasoning tasks for LLMs evaluated on the Bi… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-10 19:34
031cd03f-2fbe-4d…	Gate 3	formula_repro	What are the benchmark performance scores of GLM-4.5-Air on reasoning mathematics coding and language understanding tasks COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10	5.8/10	2026-06-10 19:34
99e0cc2f-ae34-40…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 8.5/10	8.9/10	2026-06-10 16:52
73ec2b2b-e67b-47…	Gate 3	formula_repro	What is the cross-domain generalization capability of OpenPangu-7B-MLA on empathetic speech understanding tasks when evaluated on MMSU and o… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-10 16:51
dd13f070-1013-42…	Gate 3	formula_repro	How does the performance of self-supervised foundation models on tabular data classification compare to standard normalization techniques wh… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10	9.0/10	2026-06-10 16:51
2483aaac-7f84-4c…	Gate 3	formula_repro	To what extent does fine-tuning on adversarial multi-hop QA examples improve the robustness of RAG systems against distractor contexts compa… COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.5/10	2026-06-10 16:51
3cbe8120-1209-45…	Gate 3	formula_repro	How does fine-tuning on AdvRACE affect the cross-lingual robustness of MRC models when evaluated on adversarial perturbations in non-English… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-10 16:51
4a909146-446f-4d…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 2.5/10	6.3/10	2026-06-10 10:48
426ccfd7-06e6-40…	Gate 3	formula_repro	How does the integration of non-lexical vocal cues in multimodal language models like OpenPangu-7B-MLA affect downstream task performance on… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10	6.5/10	2026-06-10 10:47
294a5d5b-f300-40…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	5.7/10	2026-06-10 08:45
18e28019-37fb-4c…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.2/10	5.6/10	2026-06-10 08:45
a80b4a8e-8700-4c…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10	8.9/10	2026-06-10 08:44
e26d33b4-a5b3-48…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 3.0/10	6.2/10	2026-06-10 08:44
30bd9c9a-90c8-4e…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10	8.9/10	2026-06-10 08:43
3b783c5d-ec77-4e…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10	5.9/10	2026-06-10 08:42
388b9655-1a81-4e…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 0.0/10	5.8/10	2026-06-10 08:42
42863d1d-2f6a-41…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.7/10	2026-06-10 08:42
0e47786d-3f42-43…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.2/10	2026-06-10 08:41
3904006d-6cfc-42…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.7/10	2026-06-10 08:41
845a22c0-61ad-4e…	Gate 3	arithmetic_repro	- COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10	8.7/10	2026-06-10 08:41
42a5d013-2da3-4d…	Gate 3	unknown	- COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.2/10	2026-06-10 08:36
8520660f-c1c4-4c…	Gate 3	unknown	- COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	6.3/10	2026-06-10 08:36
fa1dffe8-f9a9-4f…	Gate 3	formula_repro	How does the F1-score of diffusion-based tabular generative models compare to CTGAN when augmenting data for training LLMs on imbalanced tex… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-10 08:35
ee851b65-000d-44…	Gate 3	formula_repro	What is the impact of varying the pretraining dataset size and diversity on the cross-domain generalization capabilities of tabular foundati… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-10 08:35
11c29061-cf3e-4b…	Gate 3	formula_repro	Does scaling the size of domain-specific training data for RAG models improve alignment with human evaluators when measured by RAGalyst's me… COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10	7.8/10	2026-06-10 08:35
9f6b0926-918c-40…	Gate 3	formula_repro	How does the scaling of unlabeled video-audio pretraining data affect the few-shot adaptation accuracy of latent action models on the RoboBe… COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10	9.3/10	2026-06-10 08:35

Math Counterexample Kills (105 total, showing 100)

Conjectures generated by the autonomous math research pipeline and killed at Gate 1 when a numerical counterexample was found. These never reach the Lean 4 proof stage.

Conjecture ID	Problem	Statement (falsified)	Killed (UTC)
843d975d77414a55…	Ramsey R(5,5) — upper bound improvement	In any 2-coloring of the edges of K_43 that contains no monochromatic K_5, there exists no vertex v such that the red degree of v is exactly 21 AND the red neighborhood of v induces a subgraph containing a red triangle. …	2026-06-21 01:49
e7d599128e7d45b9…	Twin prime density — Hardy-Littlewood conjecture v	For all integers x >= 100, the absolute difference between the actual count of twin prime pairs up to x and the Hardy-Littlewood prediction (2C2x/ln(x)^2) is strictly bounded by the square root of the prediction itself…	2026-06-20 21:43
63dbbefc6b334b33…	Twin prime conjecture — density analysis	For every integer N >= 10,000, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let S_odd(N) be the sum of the smaller primes p in these pairs where p ends in the digit 3 or 9, and S_even(N) be the sum whe…	2026-06-20 17:35
342d5e78b226469c…	Fibonacci primes — density conjecture	For every index n > 4 such that the Fibonacci number F_n is prime, the index n itself must be a prime number that can be expressed as the sum of two squares (i.e., n is a Pythagorean prime or n=2). Consequently, no Fibon…	2026-06-20 09:23
cd9bb9306d6a4edf…	Fibonacci primes — density conjecture	For all integers n > 4 such that the n-th Fibonacci number F_n is prime, the index n must satisfy n ≡ 1 or 2 (mod 5). Furthermore, if n ≡ 2 (mod 5), then n must be exactly 3. Consequently, for all Fibonacci primes with i…	2026-06-20 09:23
01cae6580a0347ec…	Geometric Sum Identity	The sum of the first n odd powers of 3 is given by the closed-form formula (3^(2n) - 1) / 8.	2026-06-20 03:58
75ca28388123479e…	Gauss Sum Identity	For any natural number n, the sum of integers from 0 to n multiplied by 2 equals n times (n+1). Specifically verified for n=100.	2026-06-19 18:21
afd87b814d86453c…	Square Minus Square Factoring	For any natural number n less than 100, the square of n is either even or odd.	2026-06-19 18:20
0b56f01b498740fa…	Square Minus Square Factoring	For every natural number n less than 100, the square of n is either even or odd.	2026-06-19 18:20
7a77efd70f1a4118…	Quadratic Residue mod 3	For every natural number n less than or equal to 100, the square of n modulo 3 is either 0 or 1.	2026-06-19 14:15
a7f5bd3a2fcc4f22…	Primes of form n^2+1 — density and distribution	For the sequence of primes of the form p = n^2 + 1, let S(x) be the set of such primes less than or equal to x. Define the 'quadratic gap ratio' for a prime p = n^2 + 1 (where n > 1) as R(p) = (p_next - p) / (2n), where …	2026-06-19 01:53
b94993ff3d514132…	Ramsey R(5,5) — upper bound improvement	In any 2-coloring of the edges of K_43 (the current lower bound for R(5,5)) that contains no monochromatic K_5, the maximum number of monochromatic K_4 subgraphs is exactly 204. Furthermore, any such extremal coloring mu…	2026-06-18 17:32
68b38389e7ec452c…	Catalan's conjecture (Mihailescu) — Lean4 formal p	For any integer n > 1, if n is a perfect power (n = x^a with x > 1, a > 1), then the distance to the nearest other perfect power m (m != n, m = y^b with y > 1, b > 1) satisfies \|n - m\| > sqrt(n) * (ln(n))^0.8, with the s…	2026-06-17 20:27
6516345988494423…	Catalan's conjecture (Mihailescu) — Lean4 formal p	For any integer n > 8 that is a perfect power (i.e., n = x^a with x, a > 1), the open interval (n, n + n^(5/6)) contains no other perfect powers. This conjecture asserts that for perfect powers greater than 8, the gap to…	2026-06-17 20:26
2849c8ac1ec74318…	Geometric Sum Identity	The sum of the first 101 powers of 2 (from 2^0 to 2^100) equals 2^101 - 1.	2026-06-17 20:26
afc717278ab246f9…	Sum of Odd Numbers Identity	The sum of the first 42 odd positive integers equals 42 squared.	2026-06-17 16:11
42ae3e2e624a4a21…	Square Minus Square Factoring	For any natural number n less than 100, the square of n is either even or odd.	2026-06-17 12:05
df9723f85369422d…	Square Minus Square Factoring	For every natural number n less than 100, the square of n is either even or odd (specifically, n squared modulo 2 is either 0 or 1).	2026-06-17 12:05
849d0bcdc5b04211…	Quadratic Residue mod 4	For every natural number n less than 100, the square of n modulo 4 is either 0 or 1.	2026-06-17 08:01
3ab4d13b11594410…	OEIS A001065 — perfect number conjecture	For any even perfect number n > 6, let p be the largest prime factor of n (which is also the Mersenne prime exponent's base, i.e., n = 2^(p-1)*(2^p - 1)). The sum of the proper divisors of the Mersenne prime component (2…	2026-06-16 23:52
39276ea98e4f49b0…	Primes of form n^2+1 — density conjecture	For every integer N >= 2, let P_N be the set of primes of the form k^2+1 less than or equal to N. Let M_N be the maximum gap between consecutive elements in the sorted sequence P_N (defining the first gap as p_1 - 0). Th…	2026-06-16 07:15
816e34ad26774d21…	Twin prime density — Hardy-Littlewood conjecture v	The ratio of the actual count of twin prime pairs up to x to the Hardy-Littlewood prediction (2C2x/ln(x)^2) exhibits a systematic negative bias that decays according to a specific logarithmic correction term. Specifica…	2026-06-16 03:05
1f35272d16e64f29…	Twin prime conjecture — density analysis	For any integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let S_3(N) be the count of such pairs where the smaller prime p satisfies p mod 3 = 1. The conjecture states that the deviation of…	2026-06-15 22:59
6625e04ce42645a3…	Fibonacci primes — density conjecture	For all integers n > 3, if the n-th Fibonacci number F_n is prime, then n must be a prime number p such that p ≡ 1 (mod 4) or p = 3. In other words, there are no Fibonacci primes with prime indices p where p ≡ 3 (mod 4) …	2026-06-15 14:49
6168a219a5854c3f…	Collatz conjecture — structural pattern search	For any integer n > 1, let S(n) be the set of odd integers encountered in the Collatz trajectory of n before reaching 1. Define the 'Odd-Step Parity Signature' P(n) as the sum of the indices (0-based) of all odd elements…	2026-06-15 06:32
d457ed3a611f478f…	Collatz conjecture — structural pattern search	For any integer n > 1, let S(n) be the set of odd integers encountered in the Collatz trajectory of n before reaching 1 (excluding the final 1). The conjecture states that the sum of the reciprocals of the elements in S(…	2026-06-15 06:31
b76082e436db434d…	Goldbach conjecture — computational extension	For every even integer n >= 10,000, there exists a Goldbach partition n = p + q (where p and q are prime) such that both p and q lie within the interval [n/2 - sqrt(n), n/2 + sqrt(n)] AND at least one of the primes p or …	2026-06-15 02:26
1ffdfdaa6b324dfe…	Primes of form n^2+1 — density and distribution	For the sequence of integers n where n^2+1 is prime, let the gaps be defined as g_k = n_{k+1} - n_k. The conjecture states that for all k >= 2, the gap g_k is strictly less than 2.5 * sqrt(n_k) * ln(ln(n_k)). This refine…	2026-06-15 02:25
bb34d8f54e2641fb…	Ramsey multiplicity K_4 — minimum number of monoch	In any 2-coloring of the edges of K_18 that achieves the global minimum number of monochromatic K_4 subgraphs, the resulting color classes (graphs) must be isomorphic to each other. Furthermore, each color class must hav…	2026-06-14 13:39
d1b012cc38cd4dfd…	Fibonacci primes — density conjecture	For every integer n >= 5, if the nth Fibonacci number F_n is prime, then n must be a prime number p such that either p = 5 or p is congruent to 1 or 9 modulo 20. In other words, no Fibonacci prime exists at a prime index…	2026-06-14 01:01
2516b894d67544f4…	Catalan's conjecture (Mihailescu) — Lean4 formal p	For any integer n > 1, if there exist two distinct perfect powers P1 = x^a and P2 = y^b (with x,y,a,b > 1) such that P1 < n < P2 and the gap G = P2 - P1 satisfies G < n^(1/3), then n must be equal to 26. Specifically, 26…	2026-06-13 20:43
e69a3d74fbc1457a…	Primes of form n^2+1 — density and distribution	For any integer N >= 100, let S_N be the set of primes of the form k^2+1 less than or equal to N. Let gaps_N be the sorted list of differences between consecutive elements in S_N. The conjecture states that the standard …	2026-06-13 12:28
fc8a413e3bb340b7…	Twin prime density — Hardy-Littlewood conjecture v	For all integers k >= 3, let T_k be the k-th twin prime pair (p_k, p_k+2). The fractional part of the square root of the smaller prime, {sqrt(p_k)}, is strictly less than 0.95, with the sole exception of the first twin p…	2026-06-12 23:58
ec12513113104894…	Twin prime conjecture — density analysis	For any integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let C2 be the twin prime constant (approx 0.66016). The conjecture states that the normalized residual R(N) = (T(N) - 2C2N/(ln N…	2026-06-12 19:52
70cdbcc71ee04a64…	Fibonacci primes — density conjecture	For all indices n > 4 such that the Fibonacci number F_n is prime, the index n must be a prime number p satisfying the condition that 5 is a quadratic non-residue modulo p (i.e., the Legendre symbol (5/p) = -1), with the…	2026-06-12 15:06
07ee02bc74414051…	Square Minus Square Factoring	For every natural number n less than 100, the square of n minus the square of (100 - n) equals 200 times n minus 10000.	2026-06-12 02:26
58fbbd5840c74b8d…	Square Minus Square Factoring	For any natural number n less than 100, the square of n modulo 2 is either 0 or 1.	2026-06-12 02:25
95f128c744214fc7…	OEIS A001065 — perfect number conjecture	For every even perfect number n > 6, the sum of the proper divisors of n that are congruent to 1 modulo 4 is strictly greater than the sum of the proper divisors congruent to 3 modulo 4. Specifically, if S_1(n) = sum{d \|…	2026-06-11 13:47
28a8b2a801ae431f…	Geometric Sum Identity	The sum of three consecutive geometric terms (base=2) equals 14.	2026-06-11 09:38
ee01e3b5cc8d48e1…	Geometric Sum Identity	The sum of powers of 2 from 2^0 to 2^15 equals 2^16 - 1.	2026-06-11 09:38
f6283ecc58b94fbe…	Square Minus Square Factoring	For every natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2.	2026-06-11 04:38
1ef4058fa9054186…	Square Minus Square Factoring	For every natural number n less than 100, the square of n is either even or odd.	2026-06-11 04:37
415722d528c54fb7…	Square Minus Square Factoring	For any natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2.	2026-06-11 04:37
3c69744486e44d43…	Quadratic Residue mod 3	For every natural number n less than 100, the square of n modulo 3 is either 0 or 1.	2026-06-11 04:36
126dd256f8354b53…	Sum of Odd Numbers Identity	The sum of the first 100 odd positive integers equals 10,000.	2026-06-10 22:08
3f7781c61e534635…	Square Minus Square Factoring	For every natural number n less than 100, the square of n modulo 2 is either 0 or 1.	2026-06-10 18:05
8f940771d9454664…	Square Minus Square Factoring	For every natural number n from 0 to 99, the square of n modulo 2 is either 0 or 1.	2026-06-10 18:05
b33154755b914337…	Square Minus Square Factoring	For every natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2.	2026-06-10 18:05
208383baf95c4350…	Sum of Odd Numbers Identity	The sum of the first 100 odd positive integers equals 100 squared.	2026-06-10 09:18
5e66db8a98eb4f41…	Sum of Odd Numbers Identity	The sum of the first 150 odd positive integers equals 150 squared.	2026-06-10 09:18
7f248d0d38c24d74…	Goldbach conjecture — computational extension	For every even integer n >= 100, there exists a Goldbach partition n = p + q (with p <= q) such that the prime p satisfies p > n/2 - sqrt(n) * (ln ln n)^2, AND p is a quadratic residue modulo the smallest prime factor of…	2026-06-10 07:28
c37b8d2825f34f46…	Primes of form n^2+1 — density and distribution	Let S(x) be the set of integers n in [1, x] such that n^2 + 1 is prime. For any two consecutive elements a, b in S(x) (with a < b), the gap g = b - a satisfies g < 2.5 * sqrt(a) * ln(a) for all x >= 1000. This conjecture…	2026-06-10 07:25
007bb20f4be4478b…	Ramsey multiplicity K_4 — minimum number of monoch	In any 2-coloring of the edges of K_18 that achieves the global minimum number of monochromatic K_4 subgraphs, the resulting color classes (graphs) must both be isomorphic to the Turán graph T(18, 3). Consequently, the m…	2026-06-10 01:47
f600ae4401434fed…	Fibonacci primes — density conjecture	For every Fibonacci prime F_p with prime index p > 3, the quantity (F_p - 1) / p is never an integer. In other words, no Fibonacci prime (beyond F_3=2 and F_4=3, though 4 is not prime index, specifically checking p=5, 7,…	2026-06-09 13:24
e0243ef726e64075…	Collatz conjecture — structural pattern search	For any integer n > 1, let S(n) be the set of distinct values visited in the Collatz trajectory of n before reaching 1. Let M(n) be the maximum element in S(n). The conjecture states that the ratio of the count of odd nu…	2026-06-09 00:53
45b6484654eb41d2…	Goldbach conjecture — computational extension	For every even integer n > 6, there exists a Goldbach partition n = p + q (with p <= q) such that the smaller prime p satisfies p > sqrt(n) and the product p*q is congruent to 1 modulo 24.	2026-06-09 00:52
b171ab227ec34a92…	Primes of form n^2+1 — density and distribution	Let P be the set of primes of the form n^2+1. For any x >= 10, let S(x) be the sum of the reciprocals of the square roots of the generators n for all such primes p = n^2+1 <= x. The conjecture states that S(x) is strictl…	2026-06-08 20:48
afae0570267f4f10…	Twin prime density — Hardy-Littlewood conjecture v	For all integers x >= 10,000, the relative error between the actual count of twin prime pairs up to x and the Hardy-Littlewood prediction (2C2x/ln(x)^2) is strictly bounded by the function 1.8 / ln(x). Specifically, \|p…	2026-06-08 07:37
b77b197f38ef42b4…	Ramsey R(4,6) — computational bounds	In any 2-coloring of the edges of K_35 that avoids a red K_4 and a blue K_6 (if such a coloring exists), the maximum degree of any vertex in the red subgraph must be strictly less than 12. That is, Δ(Red) ≤ 11.	2026-06-08 07:37
b9da4d4e215345c4…	Fibonacci primes — density conjecture	For all integers n > 4, if the nth Fibonacci number F_n is prime, then n is either prime itself or n=4. Furthermore, for every prime index p > 3 such that F_p is composite, F_p possesses at least one prime factor q such …	2026-06-07 23:17
24d0c43564104d67…	Goldbach conjecture — extend computational verific	For every even integer n > 10,000, there exists a Goldbach partition n = p + q (where p and q are primes) such that both p and q are 'isolated' within a window of size W(n) = floor(0.8 * ln(n) * ln(ln(n))). Specifically,…	2026-06-07 15:00
ab2f6f8062e94761…	OEIS A001065 — perfect number conjecture	For every even perfect number n > 6, the sum of the squares of its proper divisors is strictly congruent to 1 modulo the square of its associated Mersenne prime exponent. Specifically, if n = 2^(p-1)(2^p - 1) where p and…	2026-06-07 14:59
bc6085714d2046c4…	Primes of form n^2+1 — density and distribution	For the sequence of primes of the form p = n^2 + 1, let n_k be the k-th positive integer such that n_k^2 + 1 is prime. The conjecture states that for all k >= 2, the gap between consecutive bases n_k and n_{k-1} satisfie…	2026-06-07 06:31
9f7d6c51e37b4088…	Twin prime density — Hardy-Littlewood conjecture v	For all x >= 1000, the actual count of twin prime pairs up to x strictly exceeds the standard Hardy-Littlewood prediction (2C2x/ln(x)^2) but remains bounded above by the prediction augmented with a specific second-orde…	2026-06-06 18:04
845aaff7aef64a01…	Fibonacci primes — density conjecture	For every integer n >= 3, if the Fibonacci number F_n is prime, then n must be a prime number, AND the index n satisfies the property that 2n+1 is either a prime number or a semiprime (product of exactly two primes, not …	2026-06-06 09:39
1028c83002d64c6f…	Catalan's conjecture (Mihailescu) — Lean4 formal p	For any integer n > 1, if n is a perfect power (n = x^a with x, a > 1) and the next consecutive perfect power m (m = y^b with y, b > 1, m > n) satisfies m - n = 1, then n must be 8. Furthermore, for any perfect power n >…	2026-06-06 05:32
59707dec0c84466f…	OEIS A001065 — perfect number conjecture	For every even perfect number n > 6, the sum of the binary digits of (n/2) is strictly less than the number of distinct prime factors of (n-1).	2026-06-06 01:23
e1ea6af8e40f4d3a…	Collatz conjecture — structural pattern search	For any integer n > 1, let S(n) be the set of odd numbers encountered in the Collatz trajectory of n before reaching 1. Let m = min(S(n)). Then the total stopping time (number of steps to reach 1) is strictly less than m…	2026-06-05 21:15
52c749523bfe490f…	Primes of form n^2+1 — density conjecture	For any integer n >= 2, let S_n be the set of primes of the form k^2+1 where k <= n. Let M_n be the maximum gap between consecutive elements in the sorted sequence S_n (defining the first gap as p_1 - 2). Then, M_n is st…	2026-06-05 16:24
b01e6c25195044a2…	Primes of form n^2+1 — density conjecture	For every integer n >= 1, the count of primes of the form k^2 + 1 with k <= n (denoted P(n)) satisfies the inequality P(n) >= floor(1.2 * sqrt(n) / ln(n)). Furthermore, for any n >= 100 where P(n) > 0, the gap between co…	2026-06-05 16:22
20a77f3e6ff34241…	Fibonacci primes — density conjecture	For every Fibonacci prime F_p with index p > 5, the integer part of the square root of the index p, denoted as floor(sqrt(p)), is always a prime number.	2026-06-04 23:37
d177586b3b3d4762…	Primes of form n^2+1 — density and distribution	Let P_N be the set of primes of the form n^2+1 for 1 <= n <= N. Let A_N be the count of such primes where the generator n is itself a prime number. The conjecture states that for all N >= 1000, the ratio of the density o…	2026-06-04 10:33
498269cc76514396…	Twin prime density — Hardy-Littlewood conjecture v	The normalized error term of the twin prime count, defined as E(x) = (pi_2(x) * ln(x)^2) / (2 * C2 * x) - 1, exhibits a persistent negative bias for all x in the range [10^4, 10^8]. Specifically, the conjecture states th…	2026-06-03 22:07
b81ab2bf8a3742e6…	OEIS A001065 — perfect number conjecture	For any even perfect number n > 6, let p be the unique Mersenne prime such that n = 2^(p-1)*(2^p - 1). The sum of the divisors of the exponent (p-1), denoted sigma(p-1), is strictly less than the square root of the Merse…	2026-06-03 07:23
af2e36aa3e2c473a…	Primes of form n^2+1 — density and distribution	For the sequence of primes of the form n^2+1, let p_k be the k-th such prime. The conjecture states that for all k >= 2, the gap between consecutive primes p_k and p_{k-1} satisfies: p_k - p_{k-1} < 2 * sqrt(p_k) * (ln(p…	2026-06-03 02:25
3c752084d9a043ca…	Primes of form n^2+1 — density conjecture	For every integer n >= 2, let S_n be the set of primes of the form k^2+1 with k <= n. Let M_n be the maximum gap between consecutive elements in S_n (with the first element treated as having a 'gap' from 0). Then M_n < 4…	2026-06-02 22:15
7c85eafa9f3a4c9b…	Twin prime conjecture — density analysis	For every integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N, and let S(N) be the sum of the reciprocals of the smaller primes in these pairs (i.e., sum(1/p) for all such p). The conjecture …	2026-06-02 11:16
17b23802a7a14aa9…	Cap set problem — F_3^n maximum	Conjecture: For n=6, the maximum size of a cap set in F_3^6 is exactly 112, and this maximum is uniquely achieved (up to affine equivalence) by the set of vectors with weight congruent to 1 modulo 3 in the specific coord…	2026-06-02 04:47
1bc7acdee264452e…	Catalan's conjecture (Mihailescu) — Lean4 formal p	For any integer n > 1, if n is a perfect power (n = x^a with x, a > 1), then the interval (n, n + n^(2/3)] contains no other perfect powers, except for the specific case where n = 8 (2^3), in which case the interval (8, …	2026-06-02 04:44
3f02ace5c31a4891…	Goldbach conjecture — computational extension	For every even integer n > 100, there exists a Goldbach partition n = p + q (with p <= q) such that the prime p lies within the interval [n/2 - sqrt(n), n/2]. Furthermore, the smallest such prime p satisfies the stronger…	2026-06-01 21:26
2d0c51fde717499c…	Primes of form n^2+1 — density and distribution	For all integers n >= 2, the gap between consecutive primes of the form k^2+1 is strictly less than 4 * sqrt(p_m) * ln(p_m), where p_m is the smaller prime in the pair. Furthermore, the ratio of the actual gap to this bo…	2026-06-01 17:21
2229477e7b1a459e…	Primes of form n^2+1 — density and distribution	For the sequence of primes of the form n^2+1, let p_k = n_k^2+1 be the k-th such prime. The conjecture states that for all k >= 2, the gap between consecutive bases n_k and n_{k-1} satisfies: n_k - n_{k-1} < 2 * sqrt(n_{…	2026-06-01 17:18
3032040e036b4ec0…	Primes of form n^2+1 — density conjecture	For every integer n >= 100, the number of primes of the form k^2+1 with k <= n is strictly greater than the number of primes of the form k^2+1 with k <= n/2 multiplied by the factor (1.3 * sqrt(n) / ln(n)). This conjectu…	2026-06-01 13:41
e2f7b4d3db414cd8…	Twin prime conjecture — density analysis	For every integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. The ratio of the actual count T(N) to the Hardy-Littlewood estimate E(N) = 2 * C_2 * N / (ln N)^2 (where C_2 is the twin prime c…	2026-06-01 01:14
6d026f496bb045ec…	Fibonacci primes — density conjecture	For every integer n > 4, if the n-th Fibonacci number F_n is prime, then n must be a prime number p such that 5 is a quadratic non-residue modulo p (i.e., the Legendre symbol (5/p) = -1). This implies that all Fibonacci …	2026-05-31 21:04
1b3e58ad75384c33…	Fibonacci primes — density conjecture	For all integers n > 6, if the nth Fibonacci number F_n is prime, then n must be a prime number that can be expressed as the sum of two squares (i.e., n is 2, or n is a prime congruent to 1 modulo 4). This implies that n…	2026-05-31 21:03
a29125de7d584756…	Catalan's conjecture (Mihailescu) — Lean4 formal p	For any integer n > 1 that is not 8, if n is a perfect power (n = x^a with x>1, a>1), then the smallest perfect power m > n (where m = y^b with y>1, b>1) satisfies the gap inequality m - n > n^0.55. The only exception to…	2026-05-31 16:59
47c3e171d2ff4226…	OEIS A001065 — perfect number conjecture	For any even perfect number n > 6, let m = n/6. The sum of the proper divisors of m (denoted s(m)) is strictly greater than the square of the number of distinct prime factors of m (denoted omega(m)^2).	2026-05-31 12:56
9c24c2404c5b4ea2…	Goldbach conjecture — computational extension	The sum of two primes representing an even number n > 2 has its maximal prime difference bounded by n^(0.51), where the exponent 0.51 is strictly between 0.5 and 1. This refines the trivial bound of n-3 by showing the di…	2026-05-31 08:39
5a54ab3135de44a4…	Primes of form n^2+1 — density and distribution	For integers n >= 2, let P(n) be the set of primes of the form k^2+1 less than or equal to n. Let G(n) be the maximum gap between consecutive elements in P(n) (with the first gap defined as p_1 - 2). The conjecture state…	2026-05-31 05:55
815317887cf646b1…	Primes of form n^2+1 — density and distribution	For any integer N >= 100, let S_N be the set of primes p <= N such that p = k^2 + 1 for some integer k. Let M_N be the maximum gap between consecutive elements in the sorted sequence S_N (with the first gap defined as th…	2026-05-31 05:55
3e68141113f44ca9…	Primes of form n^2+1 — density conjecture	For every integer n >= 1, the number of primes of the form k^2 + 1 with k <= n is strictly less than 2 * sqrt(n). Furthermore, the ratio of this count to sqrt(n) never exceeds 1.8 for any n >= 100.	2026-05-31 01:48
21e6bfada240446d…	Primes of form n^2+1 — density conjecture	For every integer n >= 2, the number of primes of the form k^2 + 1 with k <= n is strictly greater than the number of integers k <= n such that k^2 + 1 is a product of exactly two distinct primes, both of which are congr…	2026-05-31 01:48
3f50b59f69c24ee3…	Twin prime density — Hardy-Littlewood conjecture v	For all integers x >= 10,000, the cumulative count of twin prime pairs pi_2(x) strictly exceeds the first-order Hardy-Littlewood approximation L_1(x) = 2C_2 x / (ln x)^2, but remains bounded above by a second-order co…	2026-05-30 17:37
84e7e3b311544ebb…	Cap set problem F_3^6 — verify maximum size = 112	The maximum cap set size in F_3^6 is exactly 112, and this bound is achieved only by the canonical construction S_3^6 ⊂ F_3^6	2026-05-30 04:42
809a9ab0175448e8…	Fibonacci primes — density conjecture	For all integers n >= 3, if the nth Fibonacci number F_n is prime, then the index n must be a prime number p such that p is not a Wieferich prime base 2 (i.e., 2^(p-1) is not congruent to 1 modulo p^2). Furthermore, for …	2026-05-30 04:30
fe5aa22c047044f1…	Cap set problem — F_3^n maximum	The maximum size of a cap set in F_3^n for n ≤ 8 is bounded above by ⌊2.2^n⌋, and for n = 6, 7, 8 the values are exactly 124, 353, and 994 respectively	2026-05-30 01:15
23f6590eb1bc458c…	Cap set problem — F_3^n maximum	The maximum size of a cap set in F_3^n for n=6 is exactly 112, and this value is achieved by a specific construction based on the Edel's bound.	2026-05-30 01:13
028910cf4158418c…	Primes of form n^2+1 — density conjecture	The count of primes of the form n^2+1 up to a given bound is asymptotically equal to 2CLi(x) where C is a constant approximately 0.685 and Li(x) is the logarithmic integral, with the constant C being related to the pro…	2026-05-29 19:18
aac0f88db762449e…	Ramsey multiplicity K_4 — minimum number of monoch	In any 2-coloring of K_18, the minimum number of monochromatic K_4 is exactly 18, and this minimum is achieved only by colorings where the graph of one color forms a specific structured graph related to the Turán graph T…	2026-05-29 16:36