Index |  Research ▾  |  Verification ▾  | About

Public Falsification Record

Every claim killed by the three-gate verification pipeline: sealed-sandbox reproduction failure (Gate 2), adversarial red-team attack (Gate 3), or mathematical counterexample (Gate 1). Total: 399.  · JSON

294 Gate 2/3 pipeline kills  ·  105 math counterexample kills

Gate 2 / Gate 3 Pipeline Falsifications (294)

Claims VERIFIED at Gate 2 (sealed-sandbox repro) and subsequently falsified by the Gate 3 adversarial red-team (three independent LLM attackers, inverted scoring). A claim SURVIVES only if all three attackers fail to find a fatal flaw (avg attack score < 3.5; no individual score ≥ 5.0).

Task ID Gate Claim type Goal / Claim Avg attack Killed (UTC)
760fbc69-65b1-4f… Gate 3 formula_repro How does the scaling of model size affect the performance gain from English intermediate-task training in zero-shot cross-lingual transfer, …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.2/10
8.6/10 2026-06-21 01:19
9a6f19a5-f2e6-4c… Gate 3 formula_repro To what extent does English intermediate-task training improve cross-lingual reasoning capabilities on multilingual benchmarks compared to d…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.2/10
6.9/10 2026-06-21 01:19
5571973f-50bb-48… Gate 3 formula_repro Does intermediate-task training on domain-specific multilingual datasets improve robustness to domain shift in zero-shot cross-lingual trans…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 9.2/10
8.2/10 2026-06-21 01:19
a80c506e-d2f8-4a… Gate 3 formula_repro What is the impact of scaling the number of intermediate language-understanding tasks on zero-shot cross-lingual transfer performance for lo…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-21 01:18
2290ba9d-c1c0-46… Gate 3 formula_repro What is the impact of intermediate-task training on low-resource languages in the XTREME benchmark when using models pretrained on both Engl…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10
6.2/10 2026-06-21 01:18
2af78e0e-7fba-4d… Gate 3 formula_repro How does the effectiveness of English intermediate-task training for zero-shot cross-lingual transfer compare to multilingual intermediate-t…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-21 01:18
8544463f-7e19-46… Gate 3 formula_repro How does the performance of multilingual intermediate-task training on low-resource languages compare to English intermediate tasks when eva…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
4.9/10 2026-06-21 01:18
bd43d6f2-a6c8-44… Gate 3 formula_repro How does the performance of intermediate-task training sequences compare to continuous pretraining on a multilingual corpus in zero-shot cro…
COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 2.5/10
6.1/10 2026-06-21 01:18
345ee2d0-d119-47… Gate 3 formula_repro Does multi-task intermediate training on diverse English NLU tasks improve robustness against typological divergence more effectively than s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10
8.0/10 2026-06-20 19:17
13542541-aac3-44… Gate 3 formula_repro What is the impact of English intermediate-task training on the alignment stability of multilingual encoders when evaluated on adversarial p…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-20 19:17
d7a818b5-ede1-48… Gate 3 formula_repro Does the performance gain from English intermediate-task training on XTREME scale with increasing pretraining model size across diverse low-…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
8.9/10 2026-06-20 19:17
7f4a6491-9864-42… Gate 3 formula_repro How does English intermediate-task training affect zero-shot cross-lingual robustness on XTREME tasks with synthetic code-switching noise co…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-20 19:17
aa533e6e-e763-4c… Gate 3 formula_repro How does intermediate-task training on English reasoning datasets affect zero-shot cross-lingual performance on the XCOPA and XNLI subsets o…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.2/10
6.1/10 2026-06-20 19:17
48a8684b-3452-40… Gate 3 formula_repro Does the order of intermediate-task fine-tuning (sequential vs. concurrent) influence the robustness of multilingual alignment in zero-shot …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.1/10 2026-06-20 19:16
a76c6aed-2013-42… Gate 3 formula_repro How does the choice of English intermediate-task difficulty (e.g., low vs. high complexity) affect zero-shot cross-lingual transfer performa…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 9.2/10
6.7/10 2026-06-20 19:16
62dad032-b551-46… Gate 3 formula_repro How does the choice of intermediate task complexity (e.g., easy vs. hard language understanding tasks) affect zero-shot cross-lingual transf…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 3.2/10 · REPLICATION ATTACKER: 9.0/10
7.1/10 2026-06-20 19:15
4ffd234e-3144-46… Gate 3 formula_repro Does multilingual intermediate-task training on XTREME-R outperform monolingual English training in few-shot cross-lingual transfer across l…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
6.1/10 2026-06-20 19:15
5f20fd9d-3000-49… Gate 2 unknown Does the integration of synthetic code-switched data improve the robustness of zero-shot cross-lingual retrieval models against adversarial … - 2026-06-20 16:59
c29d0b1d-1c55-45… Gate 2 unknown How does hybrid batch training for monolingual and cross-lingual objectives impact zero-shot retrieval accuracy on the BEIR benchmark compar… - 2026-06-20 16:57
a10e30bc-cc3b-42… Gate 3 formula_repro How does training on artificially code-switched data affect the robustness of zero-shot cross-lingual retrieval models across low-resource l…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10
6.8/10 2026-06-20 13:15
88b977da-978a-4b… Gate 3 formula_repro How does the granularity of bilingual lexicons (e.g., word-level vs. phrase-level) impact the effectiveness of artificially code-switched tr…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
5.8/10 2026-06-20 13:15
9293eb3f-c631-43… Gate 3 formula_repro What is the impact of artificially code-switched training data on the robustness of cross-lingual retrieval models evaluated on the PAWS-X d…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10
6.7/10 2026-06-20 13:15
108e3723-cd80-4b… Gate 3 formula_repro How does the quality of bilingual lexicons impact the performance of zero-shot cross-lingual retrieval models on the BEIR benchmark when eva…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10
7.4/10 2026-06-20 13:14
4cb997d4-121c-4d… Gate 3 formula_repro Does training on artificially code-switched data improve zero-shot cross-lingual retrieval recall on the MIRACL benchmark compared to monoli…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
7.2/10 2026-06-20 13:14
420aa4ee-b924-40… Gate 3 formula_repro What is the effect of increasing the amount of artificially code-switched training data on the robustness of zero-shot cross-lingual retriev…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
6.5/10 2026-06-20 13:14
bb51b78d-f1e5-48… Gate 3 formula_repro How does the transfer learning performance of self-supervised speech models pre-trained on Flemish Dutch compare to other low-resource langu…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
5.3/10 2026-06-20 07:14
1249d225-3960-40… Gate 3 formula_repro How does the noise level in automatically induced bilingual lexicons affect the nDCG@10 and MAP scores of zero-shot cross-lingual retrievers…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
6.2/10 2026-06-19 19:12
2e7a84f4-8bf1-4d… Gate 3 formula_repro To what extent does English intermediate-task training enhance zero-shot reasoning capabilities on multilingual benchmarks like XTREME-R for…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
7.8/10 2026-06-19 19:12
d41c927e-b869-40… Gate 3 formula_repro How does intermediate-task training on non-English source languages compare to English-only intermediate training for zero-shot cross-lingua…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-19 19:12
44a960a9-7f95-41… Gate 3 formula_repro How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.7/10 2026-06-19 13:09
05adb989-b8ea-44… Gate 3 formula_repro Does training on artificially code-switched datasets improve the robustness of zero-shot cross-lingual retrievers against query-document lan…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
5.7/10 2026-06-19 13:06
da82e7da-f16a-41… Gate 3 formula_repro Does the hybrid batch strategy improve zero-shot cross-lingual retrieval robustness on the XTD benchmark compared to standard multilingual c…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-19 13:06
2bb9a5bd-57ad-44… Gate 3 formula_repro How does training on artificially code-switched data affect zero-shot cross-lingual performance on the XNLI benchmark compared to standard m…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10
7.8/10 2026-06-19 13:06
3b872189-de14-45… Gate 3 formula_repro How does training on artificially code-switched data affect the zero-shot retrieval accuracy of multilingual dense retrievers on the Lasers …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10
7.5/10 2026-06-19 13:06
5ff9bb46-9d08-44… Gate 3 formula_repro To what extent does training on artificially code-switched data improve zero-shot cross-lingual retrieval robustness on XTREME-R when querie…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.7/10 2026-06-19 13:05
6ef56955-1505-47… Gate 3 formula_repro How does the proportion of code-switched tokens in synthetic training data correlate with the accuracy drop of zero-shot cross-lingual ranke…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10
8.8/10 2026-06-19 13:05
73b8a1d9-aee2-48… Gate 3 formula_repro How does training on artificially code-switched data affect the robustness of zero-shot cross-lingual rankers against adversarial noise comp…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-19 13:03
07f2d90e-a6ad-40… Gate 2 unknown Does training on artificially code-switched data improve zero-shot cross-lingual retrieval performance for low-resource languages not includ… - 2026-06-19 12:46
7507c92e-3093-47… Gate 3 formula_repro Does integrating CausalMixFT during fine-tuning improve the robustness of tabular foundation models against adversarial perturbations in low…
COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
5.0/10 2026-06-19 07:00
da92f3e0-4d5b-44… Gate 3 formula_repro How do dense RGB-D SLAM systems utilizing 3D Gaussian representations compare to neural implicit methods in terms of memory consumption and …
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
7.5/10 2026-06-19 07:00
3d9090b2-81c7-4d… Gate 3 formula_repro How do vision-language models perform in cross-domain robustness evaluations when tested on perturbed multimodal benchmarks from domains lik…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10
6.5/10 2026-06-19 07:00
0ff89691-6471-49… Gate 3 formula_repro How does the trade-off between model size and latency compare between OpenPangu-7B-MLA and smaller prosody-exclusive models when deployed on…
COUNTEREXAMPLE HUNTER: 8.0/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 8.5/10
7.0/10 2026-06-19 07:00
86059cb7-4f35-4c… Gate 3 formula_repro What is the impact of cross-lingual transfer from English pre-trained speech models versus monolingual Flemish pre-training on phoneme recog…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.3/10 2026-06-19 07:00
0cb45f8d-b6ae-45… Gate 3 formula_repro How does the addition of self-supervised pre-training objectives in zero-shot cross-lingual SLU models affect slot-filling accuracy on the M…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 2.5/10
6.5/10 2026-06-19 07:00
36c4669f-6437-43… Gate 3 formula_repro What is the effect of varying the size of the monolingual training set on the intent detection performance of zero-shot cross-lingual SLU mo…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 3.0/10
6.3/10 2026-06-19 07:00
d245e830-fc51-49… Gate 3 formula_repro What is the impact of varying the code-switching ratio in training data on the retrieval performance degradation when query and document lan…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
9.0/10 2026-06-19 01:00
a5ab3b78-a7cb-48… Gate 3 formula_repro How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 6.5/10
6.7/10 2026-06-19 01:00
5e61f043-f854-46… Gate 3 formula_repro What is the impact of varying the ratio of code-switched tokens in artificially generated training data on the robustness (measured by accur…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10
7.8/10 2026-06-19 00:59
a984d373-612a-42… Gate 3 formula_repro How does the performance of zero-shot cross-lingual rankers trained on artificially code-switched data compare to models fine-tuned on multi…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
7.5/10 2026-06-19 00:59
963c284c-e2a8-44… Gate 3 formula_repro How does increasing the proportion of code-switched tokens in the training data affect the robustness of zero-shot cross-lingual retrieval m…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10
5.0/10 2026-06-19 00:59
643ef90b-e1b6-4f… Gate 3 formula_repro How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 7.5/10
6.0/10 2026-06-19 00:59
05f4c33e-153e-4c… Gate 3 formula_repro Does scaling the multilingual pre-trained model size improve precision@k in zero-shot cross-lingual retrieval when using the proposed hybrid…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10
9.2/10 2026-06-18 18:59
f988e6e1-5aab-4f… Gate 3 formula_repro Can scaling the hybrid batch training method to larger multilingual models (e.g., XLM-R or mT5) further enhance zero-shot cross-lingual retr…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-18 18:59
6a1d78e6-6b7d-44… Gate 3 formula_repro How does domain-adaptive fine-tuning of Flemish Dutch self-supervised speech models impact word error rate on CommonVoice compared to cross-…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-18 18:56
52c35878-037b-40… Gate 3 formula_repro What is the comparative effect of multi-task intermediate training versus single large-task training on reasoning capabilities within zero-s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-18 18:54
9b3a269a-5094-47… Gate 3 formula_repro Does combining diverse intermediate tasks improve robustness in zero-shot cross-lingual transfer on XTREME-R more effectively than training …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.0/10 2026-06-18 18:53
880c758a-54df-4a… Gate 3 formula_repro How does hybrid batch training impact zero-shot cross-lingual retrieval accuracy on XNLI compared to monolingual fine-tuning across varying …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
7.0/10 2026-06-18 18:53
faa8fed9-29ac-4e… Gate 3 formula_repro Does the synergistic hybrid batch training approach improve cross-lingual retrieval robustness on the MIRACL benchmark under domain shift co…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-18 18:51
fd7ab5fe-aab9-48… Gate 3 formula_repro What is the impact of hybrid batch training on the scaling behavior of zero-shot retrieval performance across varying model sizes within the…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 2.0/10
6.5/10 2026-06-18 18:51
2e00ae69-fcc4-4e… Gate 3 formula_repro How does hybrid batch training for simultaneous monolingual and cross-lingual retrieval impact zero-shot performance on low-resource languag…
COUNTEREXAMPLE HUNTER: 8.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 4.5/10
7.3/10 2026-06-18 18:50
25c348c6-4569-45… Gate 3 formula_repro Does the synergistic optimization of monolingual and cross-lingual objectives in hybrid batch training improve retrieval performance on long…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-18 18:48
a9afc122-af0a-4c… Gate 3 formula_repro What is the impact of varying the proportion of code-switched tokens in artificially generated training data on the robustness of zero-shot …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.7/10 2026-06-18 12:48
96a5be5a-20a9-4d… Gate 3 formula_repro Does training on artificially code-switched data improve the robustness of retrieval models against language mismatch errors in queries and …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
9.0/10 2026-06-18 12:48
d4730df9-5abd-4e… Gate 3 formula_repro What is the impact of varying bilingual lexicon coverage on the zero-shot cross-lingual retrieval performance of code-switched trained model…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-18 12:48
82718235-0464-46… Gate 3 formula_repro How does the cross-lingual retrieval accuracy of models trained on artificially code-switched data compare to full multilingual pretraining …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.2/10
9.2/10 2026-06-18 12:48
304f71e8-cfc0-47… Gate 3 formula_repro How does the performance of zero-shot cross-lingual retrieval models improve when trained on artificially code-switched data generated from …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
8.9/10 2026-06-18 12:47
5b801b53-702c-47… Gate 3 formula_repro How does the hybrid batch training strategy impact the zero-shot cross-lingual retrieval accuracy of larger multimodal models (e.g., PaLI, B…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-18 12:47
681c606e-0739-4d… Gate 3 formula_repro How does the scaling of model size (e.g., small, base, large) interact with the hybrid batch training strategy in terms of zero-shot cross-l…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-18 12:47
fed390d7-0f30-4f… Gate 3 formula_repro Can the hybrid batch training strategy be adapted to improve zero-shot cross-lingual retrieval performance in low-resource language settings…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-18 12:46
69ee4e5c-067c-4b… Gate 3 formula_repro How does the scaling of intermediate-task dataset size affect the degradation of zero-shot cross-lingual transfer performance on the XTREME …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-18 06:45
a9ba0423-52f5-40… Gate 3 formula_repro What is the impact of intermediate-task training on the robustness of zero-shot cross-lingual transfer to low-resource languages within the …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-18 06:45
dc84b96b-753f-46… Gate 3 formula_repro Does the choice of multilingual intermediate tasks (e.g., language-agnostic vs. language-specific) impact the robustness of zero-shot cross-…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-18 06:45
bff5aa00-46f0-42… Gate 3 formula_repro How does the performance of multilingual intermediate-task training compare to English intermediate-task training on the XTREME-R benchmark,…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 6.5/10
5.2/10 2026-06-18 06:44
b6aec771-9353-47… Gate 3 formula_repro How does the hybrid batch training strategy impact zero-shot retrieval accuracy on low-resource MIRACL language pairs compared to dedicated …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 6.5/10
5.7/10 2026-06-18 06:44
82a8b49a-ca92-48… Gate 3 formula_repro How does fine-tuning Flemish Dutch self-supervised speech models with domain adaptation techniques affect word error rate on the CommonVoice…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10
7.5/10 2026-06-18 06:44
16fd9c48-0866-43… Gate 3 formula_repro What is the impact of structural causal model fidelity on the downstream classification accuracy of fine-tuned tabular foundation models in …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-18 06:43
27b34f6d-ce74-49… Gate 3 formula_repro What is the effect of domain-specific vs. general-domain code-switched data on zero-shot cross-lingual retrieval performance in multilingual…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.0/10 2026-06-18 00:41
706377b3-65ea-44… Gate 3 formula_repro How does the robustness of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare across different lang…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.0/10 2026-06-18 00:41
6724432f-b734-46… Gate 3 formula_repro How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to those trained on …
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
7.2/10 2026-06-18 00:41
5eeb7591-2ce2-45… Gate 3 formula_repro How does the retrieval accuracy per training token of models trained on artificially code-switched data compare to full multilingual pretrai…
COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
5.5/10 2026-06-18 00:41
572586b4-d231-44… Gate 3 formula_repro How does the lexical coverage ratio of bilingual dictionaries used for artificial code-switching correlate with zero-shot cross-lingual retr…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 6.5/10
7.1/10 2026-06-18 00:41
b539628a-b9a7-4a… Gate 3 formula_repro How does hybrid batch training impact zero-shot retrieval recall@10 on the MIRACL benchmark for low-resource languages compared to monolingu…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-17 18:40
8d5a627d-354d-4d… Gate 3 formula_repro How does the hybrid batch training strategy impact zero-shot retrieval accuracy on unseen low-resource language pairs when evaluated on the …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-17 18:39
16872a52-c7b2-4c… Gate 3 formula_repro How does fine-tuning on naturally occurring code-switched corpora (e.g., LINCS or NLPCC) compare to fine-tuning on artificially code-switche…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
9.0/10 2026-06-17 18:39
a068b0e0-7ac3-4d… Gate 3 formula_repro Does the synergistic hybrid batch training strategy improve zero-shot cross-lingual retrieval accuracy for languages with varying typologica…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10
8.8/10 2026-06-17 18:38
e864d5a7-01cf-48… Gate 3 formula_repro To what extent do self-supervised speech models pre-trained on Flemish Dutch generalize to low-resource dialects compared to English pre-tra…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
8.9/10 2026-06-17 18:38
4d287f11-5746-4c… Gate 3 formula_repro Does fine-tuning English pre-trained speech models on limited Flemish data yield comparable robustness to noise as models pre-trained exclus…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-17 18:36
52cce802-988b-44… Gate 3 formula_repro How does scaling the model size of TSDiff impact its performance on cross-domain time series forecasting benchmarks (e.g., UCR archive) comp…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-17 18:36
69e07f68-b58d-4d… Gate 3 formula_repro What is the impact of simultaneous monolingual and cross-lingual objective optimization on the generalization capability of multilingual enc…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-17 12:34
8aad928b-e796-47… Gate 3 formula_repro How does hybrid batch training affect zero-shot cross-lingual retrieval accuracy on low-resource language pairs in the MIRACL benchmark comp…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-17 12:34
96ee3e4c-90e2-47… Gate 3 formula_repro How does varying the ratio of monolingual to cross-lingual training examples in hybrid batches affect the performance trade-off between NQ a…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10
8.8/10 2026-06-17 12:34
c1c990b6-70f0-44… Gate 3 formula_repro What is the impact of simultaneous monolingual, cross-lingual, and multilingual optimization on the retrieval performance of transformer mod…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
6.3/10 2026-06-17 12:31
efef605f-428e-4d… Gate 3 formula_repro How does the synergistic hybrid batch training strategy compare to standard multilingual fine-tuning in terms of zero-shot cross-lingual ret…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-17 12:31
5e54cc94-1461-48… Gate 3 formula_repro Can integrating domain-specific monolingual data (e.g., legal, medical) into hybrid batch training improve zero-shot retrieval accuracy on X…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
9.0/10 2026-06-17 12:28
45ea17e5-1565-49… Gate 3 formula_repro Can the model-agnostic nature of SafeCoDe be validated across different multimodal architectures (e.g., LLaVA, Qwen-VL) by comparing their s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
8.7/10 2026-06-17 12:28
d4118170-4257-4f… Gate 3 formula_repro How does the hybrid batch training strategy compare to language-specific adapter modules in improving zero-shot cross-lingual retrieval accu…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10
9.1/10 2026-06-17 06:27
f36a2df0-3772-49… Gate 3 formula_repro What is the impact of varying the degree of artificial code-switching in training data on the robustness of zero-shot cross-lingual retrieva…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-17 06:27
8acc7c8a-b815-43… Gate 3 formula_repro Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval performance on non-English language pairs in XM3600 compar…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-17 06:26
479ce4ad-ca7a-40… Gate 3 formula_repro Can intermediate-task training on English reasoning datasets mitigate cross-lingual performance degradation in low-resource languages in the…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-17 06:26
cf208549-5033-41… Gate 3 formula_repro Does multilingual intermediate-task training improve zero-shot transfer accuracy on XTREME-R domain-specific subsets compared to English-onl…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 6.5/10
8.2/10 2026-06-17 06:26
5ad5a3a9-7cbb-4f… Gate 3 formula_repro What is the impact of varying the size and linguistic diversity of the English intermediate-task corpus on the degradation of zero-shot tran…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.0/10
8.4/10 2026-06-17 06:26
c64ed49c-d8b1-42… Gate 3 formula_repro How does cross-lingual query generation augmentation impact the adversarial robustness of dense retrieval models against paraphrase attacks …
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.5/10
8.1/10 2026-06-17 06:26
9cbec7df-52ae-46… Gate 3 formula_repro Does pretraining zero-shot cross-lingual retrieval models on artificially code-switched data improve robustness to language divergence in qu…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-17 00:25
791b2ca6-fdad-44… Gate 3 formula_repro Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval robustness in low-resource language settings for multimoda…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-17 00:25
c3b1fbef-bb8a-4c… Gate 3 formula_repro Does the hybrid batch training strategy improve retrieval performance on the XOR benchmark compared to models optimized solely for cross-lin…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-17 00:25
1003c363-131a-43… Gate 3 formula_repro Does training on artificially code-switched data improve zero-shot cross-lingual retrieval performance on the MLQA benchmark compared to sta…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.2/10
6.4/10 2026-06-17 00:25
0195f05e-2ecb-43… Gate 3 formula_repro Does the hybrid batch training strategy proposed for information retrieval improve multimodal alignment accuracy on zero-shot cross-lingual …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
5.7/10 2026-06-17 00:25
54b44e18-53d0-4a… Gate 3 formula_repro Does intermediate-task training on English reasoning datasets improve zero-shot cross-lingual performance on the XCOPA and XNLI subsets of X…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.0/10 2026-06-17 00:25
1fba2b0b-14c2-4f… Gate 3 formula_repro Can multilingual intermediate-task training outperform English-only intermediate training for zero-shot transfer on domain-specific subsets …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-17 00:24
f9e78b9c-2598-43… Gate 3 formula_repro How does the size of the English intermediate-task corpus affect the degradation of zero-shot transfer accuracy on low-resource languages wi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-17 00:24
f9cfb858-1284-49… Gate 3 formula_repro How does the performance of cross-lingual dense retrieval systems using query-augmented passage representations compare to those using multi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10
9.1/10 2026-06-17 00:24
8416a433-960a-43… Gate 3 formula_repro What is the impact of scaling the size of the synthetic dataset generated by CausalMixFT on the fine-tuning performance of tabular foundatio…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
8.8/10 2026-06-17 00:24
36871c14-b8df-4a… Gate 3 formula_repro How does cross-lingual query generation augmentation affect the adversarial robustness of dense retrieval models against paraphrase attacks …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
7.5/10 2026-06-16 18:21
006e8e4a-a201-4b… Gate 3 formula_repro How does the performance gap between high-resource and low-resource languages in cross-lingual retrieval models change when using different …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
8.2/10 2026-06-16 18:21
b6d14fcd-ba14-42… Gate 3 formula_repro How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.2/10
5.7/10 2026-06-16 18:21
ef1bc3a9-c7fa-44… Gate 3 formula_repro What is the impact of varying the proportion of code-switched terms in training data on the robustness of zero-shot cross-lingual retrieval …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
6.2/10 2026-06-16 18:21
b39d0c42-f3a9-4c… Gate 3 formula_repro How does the performance of cross-lingual query generation compare to multilingual contrastive learning (e.g., XLM-R, LasER) on the BEIR ben…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
5.7/10 2026-06-16 18:21
33238242-5dc0-45… Gate 2 unknown How does the cross-lingual transfer performance of mE5 compare to other multilingual models like XLM-R or mBERT when pre-trained on monoling… - 2026-06-16 17:19
2556595d-7efa-45… Gate 3 formula_repro How does the performance of multilingual dense retrieval models compare on WebFAQ when trained with synthetic data augmentation versus human…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
8.5/10 2026-06-16 12:21
fafdde58-463a-48… Gate 3 formula_repro How does the performance of Targeted Lexical Injection (TLI) with early-layer LoRA fine-tuning compare to full-parameter fine-tuning on the …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
8.2/10 2026-06-16 12:19
497fe53f-999a-48… Gate 3 formula_repro How does the pass@1 degradation of CodeT5 compare to JaCoText on MBPP Pro when subjected to semantic-preserving docstring perturbations vers…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
7.8/10 2026-06-16 12:19
d3826925-d702-4e… Gate 3 formula_repro Can TLI early-layer LoRA fine-tuning improve cross-domain alignment in Lugha-Llama for low-resource Bantu languages, as evaluated by mAP sco…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10
6.7/10 2026-06-16 12:19
d09dcdad-46aa-44… Gate 3 formula_repro What is the effect of Targeted Lexical Injection on cross-lingual alignment quality for Lugha-Llama when evaluated on semantic textual simil…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10
7.5/10 2026-06-16 12:19
c90a313e-3bcf-45… Gate 3 formula_repro How does early-layer LoRA with Targeted Lexical Injection impact zero-shot cross-lingual transfer accuracy on the XNLI benchmark for low-res…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 4.5/10
7.2/10 2026-06-16 12:19
647a30e8-8ab0-47… Gate 3 formula_repro To what extent does the depth of early-layer LoRA fine-tuning in TLI affect cross-lingual lexical alignment, as measured by LAS scores acros…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
7.8/10 2026-06-16 06:19
a8f56d30-b343-4a… Gate 3 formula_repro How do context-aware conversational models and sequence labeling approaches differ in zero-shot cross-lingual transfer accuracy for hate spe…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.0/10 2026-06-16 06:18
a63a1ae2-af59-40… Gate 3 formula_repro How does the scalability of CausalMixFT compare to other data augmentation methods (e.g., SMOTE, GAN-based augmentation) when fine-tuning ta…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-16 06:18
75b0a51f-2504-44… Gate 3 formula_repro Can SCM-based synthetic augmentation reduce the validation data requirements for early stopping in fine-tuning, as measured by the stability…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-16 06:18
3dd55559-98c7-47… Gate 3 formula_repro Do parameter-efficient fine-tuning methods like LoRA maintain instance segmentation performance on COCO when applied to other transformer ba…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 0.0/10
5.5/10 2026-06-16 06:18
b451d079-8c4b-4d… Gate 3 formula_repro How does the integration of CausalMixFT-generated synthetic data affect the fine-tuning convergence speed and validation accuracy of tabular…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
6.2/10 2026-06-16 06:18
069ce81a-1531-45… Gate 3 formula_repro Does CausalMixFT outperform diffusion-based data augmentation (e.g., DiffAugment) in terms of robustness to covariate shift when fine-tuning…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10
9.1/10 2026-06-16 06:17
a97c638e-389e-47… Gate 3 formula_repro How does integrating causal structure into TabPFN's synthetic data generation affect its performance on downstream task accuracy across diff…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
8.7/10 2026-06-16 06:17
d7b2e2bf-4375-41… Gate 2 unknown How do TimeGAN and VAE-generated synthetic financial time series compare in terms of robustness when used to evaluate the temporal reasoning… - 2026-06-16 01:04
e0caf16c-fb4c-48… Gate 3 formula_repro How does varying the depth of LoRA adapter injection in Lugha-Llama affect cross-lingual alignment accuracy on low-resource Swahili-English …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.3/10 2026-06-16 00:15
bc889895-c22b-44… Gate 3 formula_repro To what extent does the combination of SFT and DPO degrade the zero-shot reasoning capabilities of OPT-350M on the Big-Bench Hard suite rela…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
6.2/10 2026-06-16 00:14
fcc53574-9cfa-44… Gate 3 formula_repro How does the reasoning accuracy of multimodal large language models compare to diffusion-based trajectory policies in dynamic task planning …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
9.0/10 2026-06-16 00:14
852ac52e-42d2-4e… Gate 3 formula_repro How does the hybrid batch training strategy impact zero-shot cross-lingual retrieval accuracy on low-resource languages within the XQuAD ben…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-16 00:14
f8fc038d-df35-4b… Gate 3 formula_repro How does the scaling of synthetic data diversity in tabular foundation model pretraining affect accuracy degradation under distributional sh…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 8.5/10
7.9/10 2026-06-16 00:14
f1cb0512-e3b3-44… Gate 3 formula_repro To what extent does incorporating causal priors via CausalMixFT improve out-of-distribution (OOD) robustness in tabular foundation models, a…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.5/10 2026-06-16 00:13
4efb4ac7-2558-4e… Gate 3 formula_repro How does the cross-lingual query generation approach compare to cross-lingual passage generation in terms of enhancing the alignment capabil…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10
8.7/10 2026-06-15 18:13
e4b97bd1-f023-4b… Gate 3 formula_repro What is the correlation between training data volume in WebFAQ 2.0 and zero-shot cross-lingual retrieval performance gaps across the 75 supp…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
6.8/10 2026-06-15 18:11
b68d34da-d3d1-43… Gate 3 formula_repro Can synergistic optimization of monolingual and cross-lingual objectives reduce performance degradation on the XTREME retrieval benchmark fo…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 1.5/10
6.3/10 2026-06-15 18:10
de04d7fd-23ac-49… Gate 3 formula_repro Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval performance on downstream datasets like MIRACL or XNLI whe…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10
8.5/10 2026-06-15 18:08
82ef2513-1ed2-43… Gate 3 formula_repro Does early-layer LoRA adaptation for lexical alignment in Lugha-Llama maintain zero-shot translation accuracy on morphologically rich Bantu …
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
8.8/10 2026-06-15 18:08
79f75ace-7cdb-4f… Gate 3 formula_repro How does the alignment of synthetic financial data generated by GANs versus VAEs influence the downstream performance of multimodal models i…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-15 18:07
be269da2-4b74-48… Gate 3 formula_repro How does the noise level in automatically extracted bilingual lexicons impact the zero-shot cross-lingual retrieval accuracy of code-switche…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.5/10 · REPLICATION ATTACKER: 6.5/10
5.2/10 2026-06-15 18:07
1ce91234-09d8-44… Gate 3 formula_repro How does early-layer LoRA fine-tuning for lexical alignment in Lugha-Llama compare to full-parameter fine-tuning on zero-shot cross-lingual …
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-15 18:07
d78d2f8b-f3d8-45… Gate 3 formula_repro How does early-layer LoRA fine-tuning for lexical alignment in Lugha-Llama compare to full-parameter fine-tuning on cross-lingual retrieval …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.3/10 2026-06-15 12:06
fb1c1363-5e72-48… Gate 3 formula_repro Does early-layer LoRA fine-tuning improve cross-lingual lexical alignment more effectively than full-model fine-tuning for low-resource Afri…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 7.5/10
7.4/10 2026-06-15 12:05
9f2d0d5b-7064-49… Gate 3 formula_repro How does fine-tuning dense retrieval models on WebFAQ's 47 million non-English pairs impact zero-shot cross-lingual transfer accuracy on the…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10
7.2/10 2026-06-15 12:05
c64e666b-968b-44… Gate 3 formula_repro How does training on artificially code-switched data compare to translate-train methods in improving zero-shot cross-lingual retrieval accur…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
6.5/10 2026-06-15 12:05
befe8d82-5024-48… Gate 3 formula_repro How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to multilingual pret…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 6.5/10
6.2/10 2026-06-15 12:05
05a11663-8824-41… Gate 3 formula_repro How does hybrid batch training for simultaneous monolingual and cross-lingual optimization impact zero-shot retrieval accuracy on out-of-dom…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10
6.2/10 2026-06-15 12:05
e10a1785-aeeb-42… Gate 3 formula_repro Does training on artificially code-switched data improve cross-lingual retrieval precision compared to monolingual training when evaluated o…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.1/10 2026-06-15 12:05
28d7ecd5-529b-47… Gate 3 formula_repro Does training on artificially code-switched data improve cross-lingual robustness on the XQuAD benchmark when evaluated against standard mul…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
8.3/10 2026-06-15 12:05
324f98c4-488c-47… Gate 3 formula_repro Does intermediate-task training on domain-specific English corpora improve zero-shot transfer performance on multilingual domain subsets of …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 6.5/10
8.2/10 2026-06-15 12:05
367b042b-53cd-40… Gate 3 formula_repro How does the alignment between MIDI symbolic input and audio output in Tacotron-based models compare to that of neural source-filter wavefor…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
7.8/10 2026-06-15 06:04
853e72b4-662b-4e… Gate 3 formula_repro What is the impact of TLI early-layer LoRA fine-tuning on the robustness of Lugha-Llama against adversarial lexical perturbations in low-res…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
5.7/10 2026-06-15 06:04
e655d427-02d9-48… Gate 3 formula_repro How does early-layer LoRA adaptation in Lugha-Llama impact zero-shot cross-lingual retrieval accuracy on noisy Swahili-English datasets comp…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
8.8/10 2026-06-15 06:04
b435d9ed-a132-4b… Gate 3 formula_repro How does early-layer LoRA fine-tuning for lexical injection compare to middle-layer adaptation in improving cross-lingual alignment scores o…
COUNTEREXAMPLE HUNTER: 3.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 9.5/10
6.7/10 2026-06-15 06:04
c5a6b419-3445-44… Gate 3 formula_repro How does the token prioritization strategy in Vcc affect perplexity scores on the PG-19 benchmark compared to sparse attention patterns like…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 8.5/10
8.2/10 2026-06-15 06:04
757b4933-05c3-4b… Gate 3 formula_repro How does cross-lingual query generation compare to direct cross-lingual data training in terms of improving passage representation alignment…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-15 06:03
7ee1158c-08c4-4d… Gate 3 formula_repro Does augmenting passage representations with generated queries reduce the latency-throughput trade-off in cross-lingual dense retrieval syst…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.7/10 2026-06-15 06:03
f149150d-181d-4e… Gate 3 formula_repro How does the robustness of zero-shot cross-lingual voice cloning in flow-matching TTS models vary when evaluated on noisy or adversarial inp…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-15 06:03
67bf0cbf-613c-4e… Gate 3 formula_repro How does the combined SFT+DPO alignment strategy impact the reasoning accuracy of OPT-350M on complex multilingual queries relative to stand…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
6.2/10 2026-06-15 06:02
780db65d-2d7d-42… Gate 3 formula_repro To what extent does increasing the scale of the base language model mitigate the degradation in helpfulness scores observed in OPT-350M afte…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
4.7/10 2026-06-15 06:02
9013c53e-30cb-47… Gate 3 formula_repro How does early-layer LoRA adaptation for lexical alignment in Lugha-Llama compare to full fine-tuning in zero-shot cross-lingual transfer ac…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
6.2/10 2026-06-15 00:02
ae131cd2-d64a-42… Gate 3 formula_repro Does the latent cross-lingual alignment achieved via Targeted Lexical Injection in Lugha-Llama generalize to zero-shot machine translation p…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.3/10 2026-06-15 00:01
8d6c501f-4675-4a… Gate 3 formula_repro Do auxiliary objectives with factorized latent dynamics improve sample efficiency in small-scale Video-JEPA training relative to standard jo…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 6.5/10
4.6/10 2026-06-15 00:01
9ade25b1-b2f8-40… Gate 3 formula_repro What is the effect of factorized latent dynamics auxiliary objectives on the transfer learning performance of Video-JEPA when evaluated on d…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 6.5/10
7.1/10 2026-06-15 00:01
afe7316d-46f8-46… Gate 3 formula_repro How does CLIP-TD's zero-shot transfer accuracy on domain-shifted vision-language tasks compare to standard CLIP fine-tuning methods?
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10
7.5/10 2026-06-14 18:01
db269ac8-afd2-4c… Gate 3 formula_repro How does the scaling of self-supervised pretraining data size affect the performance of few-shot meta-learners on language model benchmarks …
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
5.0/10 2026-06-14 18:01
72958e36-47e4-4e… Gate 3 formula_repro What is the impact of integrating motion-image diffusion priors on the robustness of vision-language-action models against adversarial pertu…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 3.2/10 · REPLICATION ATTACKER: 8.5/10
6.3/10 2026-06-14 18:01
96f30ae4-f497-45… Gate 3 formula_repro How does the cross-lingual voice cloning performance of flow-matching TTS models compare to diffusion-based TTS models when evaluated on uns…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
6.2/10 2026-06-14 18:01
cad169a5-a675-41… Gate 3 formula_repro How does the performance of Targeted Lexical Injection (TLI) compare to full fine-tuning and adapter-based methods on the XTREME-R benchmark…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-14 18:01
cf8d5c87-fe0e-4d… Gate 3 formula_repro How does hybrid batch training affect the zero-shot cross-lingual retrieval accuracy of mBERT on low-resource language pairs compared to mon…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10
9.2/10 2026-06-14 17:59
463bd35f-feb7-45… Gate 3 formula_repro Does synergistic optimization of monolingual and cross-lingual objectives improve generalization to unseen language pairs in the BEIR zero-s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 2.5/10
7.0/10 2026-06-14 17:58
964ed2b1-5e26-41… Gate 3 formula_repro How does hybrid batch training affect zero-shot retrieval accuracy on low-resource languages in the XTREME benchmark compared to dedicated m…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10
9.2/10 2026-06-14 17:58
a6b4186a-a702-48… Gate 3 formula_repro What is the comparative robustness of CausalMixFT-generated synthetic data against other data augmentation methods (e.g., GAN-based or diffu…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
5.7/10 2026-06-14 11:57
42f5bcad-eb35-4b… Gate 3 formula_repro To what extent does CausalMixFT fine-tuning improve the generalization accuracy of tabular foundation models under data scarcity compared to…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
7.8/10 2026-06-14 11:57
d9c74c0f-cb53-41… Gate 3 formula_repro How does the F1-score of multilingual transformer models compare to monolingual models when evaluated on code-mixed hate speech datasets wit…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-14 11:57
5ec1afab-3eea-4c… Gate 3 formula_repro How does early-layer LoRA lexical injection compare to middle-layer adaptation in improving zero-shot cross-lingual retrieval accuracy for S…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-14 11:56
4e42da41-1273-43… Gate 3 formula_repro To what extent does Targeted Lexical Injection improve cross-lingual alignment scores on the XCOPA dataset for underrepresented Bantu langua…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
6.8/10 2026-06-14 11:56
5073185f-b8b7-47… Gate 3 formula_repro What is the impact of context window size on the retrieval-augmented generation performance of quantized LoRA-adapted models when evaluating…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-14 11:56
e486f159-c9b9-4f… Gate 3 formula_repro How does the alignment of multimodal embeddings (e.g., text and audio) in MUST-RAG affect the consistency and robustness of generated answer…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 9.5/10
6.8/10 2026-06-14 11:55
1fafcd2c-2be3-43… Gate 3 formula_repro How does the fidelity of structural causal models used for data augmentation impact the few-shot classification accuracy of fine-tuned tabul…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 3.0/10
6.5/10 2026-06-14 11:55
e404ac75-ed39-43… Gate 3 formula_repro Can targeted lexical injection in Lugha-Llama achieve comparable zero-shot cross-lingual performance to MMPLMs like WMT21fb on clinical doma…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10
6.0/10 2026-06-14 11:55
7b8d797a-9b95-44… Gate 3 formula_repro How does the use of causal data augmentation techniques like CausalMixFT compare to traditional data augmentation methods (e.g., SMOTE, GAN-…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 8.5/10
8.9/10 2026-06-14 05:55
cf1474bf-5527-47… Gate 3 formula_repro What is the accuracy degradation of generalized zero-shot learning models under norm-bounded perturbations across unseen classes?
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-14 05:54
a7211fa0-6c1a-43… Gate 3 formula_repro How does fine-tuning dense retrieval models on native multilingual WebFAQ data impact zero-shot cross-lingual retrieval accuracy on XQuAD co…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10
6.0/10 2026-06-14 05:54
99730361-463c-4d… Gate 3 formula_repro Does early-layer LoRA fine-tuning improve zero-shot cross-lingual natural language inference accuracy for low-resource African languages com…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
9.0/10 2026-06-14 05:54
fdb047e5-d021-45… Gate 3 formula_repro How does the performance of dense retrieval models trained on WebFAQ compare to those trained on Wikipedia-based datasets like Natural Quest…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-14 05:53
3be16ffd-6b7d-4f… Gate 3 formula_repro How does the ratio of synthetic to real pretraining data impact the few-shot classification accuracy of multimodal video-language models on …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-14 05:53
038c8225-74bd-49… Gate 3 formula_repro How does contrastive pretraining objective selection impact cross-lingual retrieval accuracy for low-resource language pairs in the XTREME b…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
7.0/10 2026-06-14 05:53
d1a71185-38af-4c… Gate 3 formula_repro Does hybrid batch training improve cross-domain generalization for multilingual retrieval models on unseen topics in low-resource languages …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-14 05:53
70f05cea-fe5d-40… Gate 3 formula_repro What is the impact of hybrid batch training on the scaling behavior of zero-shot retrieval accuracy when extending from low-resource to high…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.0/10 2026-06-14 05:53
86a0376b-0fad-42… Gate 3 formula_repro What is the impact of mixed-precision inference (e.g., FP16 vs. BF16) on the efficiency-accuracy trade-off for long-context models like Long…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
9.0/10 2026-06-14 05:53
d7973fbb-05f8-47… Gate 3 formula_repro How does the zero-shot cross-lingual retrieval accuracy of a multilingual encoder pre-trained on WebFAQ's 47M non-English QA pairs compare t…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 9.5/10
6.8/10 2026-06-13 23:53
3378e994-9dc5-4c… Gate 3 formula_repro Does scaling the proportion of non-English WebFAQ fine-tuning data improve retrieval latency and accuracy trade-offs for cross-lingual tasks…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 23:52
36a277bf-5c26-40… Gate 3 formula_repro What is the impact of incorporating visual modality into self-supervised learning for speech representations on the robustness of neural sou…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 6.5/10
6.7/10 2026-06-13 23:52
809b945c-726a-44… Gate 3 formula_repro What is the impact of mixed-dataset pretraining versus single-dataset pretraining on the robustness of Video-JEPA representations to tempora…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
8.8/10 2026-06-13 23:52
da64baea-9f76-40… Gate 3 formula_repro What is the impact of varying the ratio of synthetic to real data in CausalMixFT on the fine-tuning performance of tabular foundation models…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-13 23:51
b83881e6-6035-48… Gate 3 formula_repro What is the impact of causal data augmentation proportions on the sample efficiency and convergence speed of fine-tuning tabular foundation …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 23:51
c90f70ac-13c1-4c… Gate 3 formula_repro What is the correlation between the fidelity of synthetic tabular samples generated via SCMs and the downstream fine-tuning performance of f…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.0/10
8.4/10 2026-06-13 23:51
0b01bfe2-6818-4c… Gate 3 formula_repro Does integrating causal structure into synthetic data generation improve the robustness of TabPFN against feature permutation compared to st…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10
8.5/10 2026-06-13 23:51
b744a045-e120-4e… Gate 3 formula_repro What is the impact of varying the proportion of causal synthetic data during fine-tuning on the robustness of tabular foundation models acro…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
8.7/10 2026-06-13 23:51
1ab47328-816a-4c… Gate 3 formula_repro How does the robustness of dense retrievers pretrained on WebFAQ compare to those trained on monolingual datasets when evaluated on adversar…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
8.8/10 2026-06-13 23:50
e3b701fc-8dd1-45… Gate 3 formula_repro How does the performance of Video-JEPA models with factorized latent dynamics compare to non-factorized variants when evaluated on the Somet…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10
5.0/10 2026-06-13 17:50
1613d088-356e-46… Gate 3 formula_repro What is the impact of mixed-dataset pretraining (UCF-101 + Something-Something V2 + ImageNet-100) on the accuracy of Video-JEPA models with …
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 8.5/10
6.2/10 2026-06-13 17:50
51b48b75-c13e-4d… Gate 3 formula_repro Does the robustness gained from Targeted Lexical Injection in Lugha-Llama generalize to code-switched social media text as measured by F1 sc…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
7.8/10 2026-06-13 17:50
fc099746-cbba-4e… Gate 3 formula_repro Does fine-tuning tabular foundation models with Structural Causal Model-based synthetic data improve generalization accuracy more than stand…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10
6.5/10 2026-06-13 17:50
2003206f-03e3-4e… Gate 3 formula_repro Does combining ImageNet-100 with video datasets improve the domain robustness of self-supervised Video-JEPA representations on heterogeneous…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
5.3/10 2026-06-13 17:50
a836297c-8d52-40… Gate 3 formula_repro What is the impact of varying the rank of LoRA matrices on cross-lingual alignment for Turkic languages when fine-tuned on early layers, eva…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.3/10 2026-06-13 17:50
469e9bad-7f2e-41… Gate 3 formula_repro How does the generalization performance of CausalMixFT compare to other data augmentation methods (e.g., Mixup, SMOTE) when fine-tuning tabu…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
8.9/10 2026-06-13 17:50
d74469d0-67aa-44… Gate 3 formula_repro How does fine-tuning dense retrieval models on WebFAQ's 47 million non-English pairs impact zero-shot cross-lingual transfer accuracy on the…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 17:50
a76c6c0e-24bb-4e… Gate 3 formula_repro What is the comparative robustness of early-layer LoRA versus full-parameter fine-tuning for Lugha-Llama on cross-lingual natural language i…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 17:50
8df4041f-79f6-45… Gate 3 formula_repro How does the incorporation of auxiliary objectives in Video-JEPA models impact the robustness of learned representations when evaluated on o…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
6.7/10 2026-06-13 11:49
f33863b8-6dc3-46… Gate 3 formula_repro How does factorized latent dynamics in Video-JEPA compare to standard JEPA in cross-domain transfer accuracy from synthetic to real-world vi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-13 11:49
063ca2b8-24d2-46… Gate 3 formula_repro What is the impact of varying the number of LoRA layers on cross-lingual lexical alignment in Lugha-Llama when benchmarked against the FLORE…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 11:49
9dadb434-2322-46… Gate 3 formula_repro How do bitwise neural networks with stochastic inference techniques perform in comparison to full-precision networks with Monte Carlo dropou…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-13 11:49
bebb97c4-3ac3-46… Gate 3 formula_repro What is the effect of the SFT+DPO alignment strategy on the helpfulness retention rate of OPT-350M when evaluated on the Anthropic Helpful-H…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 11:48
8a29c57f-aef4-47… Gate 3 formula_repro How does retrieval-augmented revision compare to adversarial training in improving Big-Vul detection accuracy for Llama-3.1-8B without requi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
7.7/10 2026-06-13 11:48
dbd73bd9-a579-40… Gate 3 formula_repro How does fine-tuning dense retrieval models on the non-English subset of WebFAQ impact cross-lingual zero-shot performance on TyDi QA compar…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
5.0/10 2026-06-13 11:48
e890a0c6-75f2-41… Gate 3 formula_repro What is the impact of fine-tuning WebFAQ-pretrained dense retrieval models on downstream cross-lingual NLI tasks, as measured by XNLI accura…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 11:48
6910a60a-f0a2-40… Gate 3 formula_repro Do auxiliary factorized objectives in Video-JEPA improve few-shot learning performance on fine-grained video benchmarks relative to non-fact…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 9.5/10
8.1/10 2026-06-13 11:48
31ae88ad-ede0-43… Gate 3 formula_repro To what extent does Direct Preference Optimization enhance the robustness of counter-speech models against adversarial hate speech inputs co…
COUNTEREXAMPLE HUNTER: 4.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
7.7/10 2026-06-13 05:48
d82913b1-e2c4-40… Gate 3 formula_repro How does retrieval diversity in music-specific RAG frameworks impact answer robustness against adversarial perturbations compared to general…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10
9.0/10 2026-06-13 05:48
3cb37eff-ef87-4e… Gate 3 formula_repro How does the multimodal capture component in Expert Mind affect VQA accuracy on domain-specific datasets compared to text-only RAG baselines…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 05:47
e447cada-12a9-4f… Gate 3 formula_repro What is the comparative effect of graph sparsity versus density on the F1-score performance of retrieval-augmented generation models in zero…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 5.5/10 · REPLICATION ATTACKER: 9.5/10
8.0/10 2026-06-13 05:47
fd93dc1c-d547-4d… Gate 3 formula_repro How does the MRR of cross-lingual dense retrieval models degrade on WebFAQ low-resource language families compared to high-resource ones whe…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
7.8/10 2026-06-13 05:47
840b7dfc-7587-47… Gate 3 formula_repro What is the impact of scaling the multilingual dense retriever model size (e.g., small vs. large) on retrieval performance across low-resour…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
6.8/10 2026-06-13 05:47
223a5aad-7c31-46… Gate 3 formula_repro To what extent does training on artificially code-switched data improve cross-lingual retrieval robustness for low-resource languages compar…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 05:47
cf7bfdf5-1e02-40… Gate 3 formula_repro Does training dense retrievers on WebFAQ 2.0's bilingual aligned pairs improve zero-shot question answering accuracy on multilingual benchma…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-13 05:46
6811def3-b807-4f… Gate 3 formula_repro How does fine-tuning dense retrieval models on WebFAQ's non-English subsets impact zero-shot cross-lingual retrieval accuracy on the XTREME …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10
8.7/10 2026-06-13 05:46
80d91e4a-f8da-4d… Gate 3 formula_repro What is the impact of injecting LoRA adapters exclusively into attention mechanisms versus feed-forward networks in Llama-3.2-3B on the late…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 9.5/10
7.5/10 2026-06-12 21:25
1b596d33-8278-4f… Gate 3 formula_repro What is the impact of fine-tuning CodeT5 with adversarial training on its semantic consistency and robustness accuracy in generalized zero-s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-12 21:23
04a7cbf6-efb9-4b… Gate 3 formula_repro How does CausalMixFT compare to other data augmentation techniques (e.g., SMOTE, MixUp) in terms of fine-tuning robustness on tabular datase…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.5/10 2026-06-12 13:50
5e879d86-d825-41… Gate 3 formula_repro How does the ratio of synthetic-to-real data in CausalMixFT affect the F1 score variance of tabular foundation models on TabFact across mult…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
7.8/10 2026-06-12 13:50
d96ded43-64c0-44… Gate 3 formula_repro How does evidential deep learning with non-negative evidence constraints affect cross-modal retrieval accuracy on CLIP and ALBEF compared to…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
8.2/10 2026-06-12 13:50
6330a381-0e16-4e… Gate 3 formula_repro How does the data augmentation strategy used in scTab compare in effectiveness to other state-of-the-art data augmentation techniques when a…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10
7.2/10 2026-06-12 07:43
63170d1a-bec2-45… Gate 3 formula_repro To what extent does the causal structure complexity (e.g., number of confounders or mediators) in the SCM used for CausalMixFT affect the ge…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10
6.7/10 2026-06-12 07:43
475d3f67-fd79-47… Gate 3 formula_repro How does the generalization of scaled tabular models trained on Criteo data perform on unseen high-cardinality categorical features in other…
COUNTEREXAMPLE HUNTER: 7.3/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 6.5/10
6.1/10 2026-06-12 07:43
790e88e1-86e1-4d… Gate 3 formula_repro How does the domain gap between synthetic and real-world video data affect the zero-shot accuracy of CLIP-based video encoders in gesture re…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 1.0/10
6.3/10 2026-06-12 07:43
b22d1b2d-fd2b-41… Gate 2 unknown How do TabPFN, CTGAN, and CausalMixFT perform in cross-domain tabular data generation tasks when evaluated on both synthetic and real-world … - 2026-06-12 05:07
7e2cde64-adf0-4b… Gate 3 formula_repro Can causal synthetic data generation improve the robustness of tabular foundation models against distribution shifts in cross-domain evaluat…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10
8.7/10 2026-06-12 01:36
bc4f6f71-a74f-4b… Gate 3 formula_repro To what extent does the choice of Structural Causal Model (SCM) backbone (e.g., linear vs. nonlinear) in CausalMixFT affect few-shot accurac…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
5.7/10 2026-06-12 01:35
41ba449b-d600-44… Gate 3 formula_repro How does the CMAL framework's image-text alignment performance on COCO and Flickr30K compare to CLIP and ALBEF in terms of Recall@1 and NDCG…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-12 01:35
aa621143-3436-4a… Gate 3 formula_repro Does the scaling behavior of XSimGCL's contrastive loss formulation yield superior convergence rates compared to LightGCL when trained on de…
COUNTEREXAMPLE HUNTER: 10.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 10.0/10
9.8/10 2026-06-12 01:34
7958ccbd-1a8f-47… Gate 3 formula_repro What is the impact of the novel web-crawled data collection strategy in WebFAQ 2.0 on the domain generalization capabilities of multilingual…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-11 19:28
9f94de5f-bbb3-45… Gate 3 formula_repro What is the impact of varying the ratio of synthetic-to-real samples in CausalMixFT on the calibration error and generalization performance …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
9.1/10 2026-06-11 19:28
aa6a06f7-1784-40… Gate 3 formula_repro What is the effect of curriculum learning strategies on the accuracy of large multimodal models evaluated on the MedQA benchmark?
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-11 19:28
8018fed0-9d06-45… Gate 3 formula_repro How does curriculum-based multi-task learning impact the inference latency of large multimodal models on sparse medical image-text pairs?
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-11 19:25
3c14268f-b85e-4f… Gate 3 formula_repro What is the comparative memory footprint and inference latency of multi-task trained vision-language models versus single-task baselines on …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 0.0/10
6.2/10 2026-06-11 19:25
a324f8d3-15a0-49… Gate 3 formula_repro How does the stochastic inference technique in bitwise neural networks compare to other ensemble methods (e.g., snapshot ensembles, Monte Ca…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-11 19:25
3bb4313b-0a73-4d… Gate 3 formula_repro To what extent does training dense retrievers on the bilingual aligned QA pairs in WebFAQ 2.0 improve alignment metrics and retrieval robust…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10
8.7/10 2026-06-11 13:19
46e484e7-8529-4e… Gate 3 formula_repro To what extent does the inclusion of 47 million non-English WebFAQ pairs improve the robustness of multilingual encoders against domain shif…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10
6.2/10 2026-06-11 13:18
72716d26-4a67-4e… Gate 3 formula_repro How do multilingual dense retrievers trained on SWIM-IR perform on low-resource languages in BEIR compared to models trained on natural mult…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
6.3/10 2026-06-11 13:18
a9865db0-e4a2-4f… Gate 3 formula_repro How do different alignment strategies in multimodal models impact inference throughput in low-resource settings when evaluated on BRATS with…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-11 13:17
e733595e-23e1-4e… Gate 3 formula_repro What is the comparative robustness of multimodal reasoning in language models with different alignment strategies when applied to cross-doma…
COUNTEREXAMPLE HUNTER: 8.2/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
8.7/10 2026-06-11 13:17
6124f4e5-dc19-47… Gate 3 formula_repro To what extent does layer-wise KV cache reconstruction in methods like ReST-KV artificially inflate needle-in-a-haystack scores relative to …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-11 07:12
30fba3f7-6edd-44… Gate 3 formula_repro Reproducibility meta-analysis: 3 independent publications report divergent Qwen2.5 performance on Docvqa with a 80.3 percentage-point spread…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-11 07:11
0595bd4f-0470-40… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
7.5/10 2026-06-11 01:23
50a4f525-1f3f-43… Gate 3 formula_repro What is the performance degradation of Unified-IO 2 on the VQA-v2 dataset when audio modalities are introduced as distractors versus text-on…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 5.0/10 · REPLICATION ATTACKER: 7.5/10
6.7/10 2026-06-11 01:22
f82ac2f4-1a92-4e… Gate 3 formula_repro How does GRACE's quantization-aware training scale with model size, and how does it affect performance on the MME and MM1K benchmarks when a…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 2.0/10
5.9/10 2026-06-11 01:22
cc2d0e37-a950-4a… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10
7.0/10 2026-06-10 19:35
673590a7-25e9-41… Gate 3 formula_repro How does Qwen3's performance on GPQA Diamond compare to other frontier models when evaluated under chain-of-thought prompting versus standar…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
6.2/10 2026-06-10 19:35
b472d355-87a8-45… Gate 3 formula_repro How do language models compare to human experts on professional knowledge and science benchmarks v19
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
8.7/10 2026-06-10 19:34
b5058ffc-3f4d-46… Gate 3 formula_repro What is the impact of million-token context windows on multimodal reasoning accuracy in Gemini 1.5 Pro versus prior versions?
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
8.8/10 2026-06-10 19:34
09acaf30-ab81-49… Gate 3 formula_repro To what extent does chain-of-thought prompting mitigate performance degradation in long-horizon reasoning tasks for LLMs evaluated on the Bi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-10 19:34
031cd03f-2fbe-4d… Gate 3 formula_repro What are the benchmark performance scores of GLM-4.5-Air on reasoning mathematics coding and language understanding tasks
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
5.8/10 2026-06-10 19:34
99e0cc2f-ae34-40… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 8.5/10
8.9/10 2026-06-10 16:52
73ec2b2b-e67b-47… Gate 3 formula_repro What is the cross-domain generalization capability of OpenPangu-7B-MLA on empathetic speech understanding tasks when evaluated on MMSU and o…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-10 16:51
dd13f070-1013-42… Gate 3 formula_repro How does the performance of self-supervised foundation models on tabular data classification compare to standard normalization techniques wh…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
9.0/10 2026-06-10 16:51
2483aaac-7f84-4c… Gate 3 formula_repro To what extent does fine-tuning on adversarial multi-hop QA examples improve the robustness of RAG systems against distractor contexts compa…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.5/10 2026-06-10 16:51
3cbe8120-1209-45… Gate 3 formula_repro How does fine-tuning on AdvRACE affect the cross-lingual robustness of MRC models when evaluated on adversarial perturbations in non-English…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-10 16:51
4a909146-446f-4d… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 2.5/10
6.3/10 2026-06-10 10:48
426ccfd7-06e6-40… Gate 3 formula_repro How does the integration of non-lexical vocal cues in multimodal language models like OpenPangu-7B-MLA affect downstream task performance on…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
6.5/10 2026-06-10 10:47
294a5d5b-f300-40… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
5.7/10 2026-06-10 08:45
18e28019-37fb-4c… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.2/10
5.6/10 2026-06-10 08:45
a80b4a8e-8700-4c… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
8.9/10 2026-06-10 08:44
e26d33b4-a5b3-48… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 3.0/10
6.2/10 2026-06-10 08:44
30bd9c9a-90c8-4e… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
8.9/10 2026-06-10 08:43
3b783c5d-ec77-4e… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
5.9/10 2026-06-10 08:42
388b9655-1a81-4e… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 0.0/10
5.8/10 2026-06-10 08:42
42863d1d-2f6a-41… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.7/10 2026-06-10 08:42
0e47786d-3f42-43… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.2/10 2026-06-10 08:41
3904006d-6cfc-42… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.7/10 2026-06-10 08:41
845a22c0-61ad-4e… Gate 3 arithmetic_repro -
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
8.7/10 2026-06-10 08:41
42a5d013-2da3-4d… Gate 3 unknown -
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.2/10 2026-06-10 08:36
8520660f-c1c4-4c… Gate 3 unknown -
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
6.3/10 2026-06-10 08:36
fa1dffe8-f9a9-4f… Gate 3 formula_repro How does the F1-score of diffusion-based tabular generative models compare to CTGAN when augmenting data for training LLMs on imbalanced tex…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-10 08:35
ee851b65-000d-44… Gate 3 formula_repro What is the impact of varying the pretraining dataset size and diversity on the cross-domain generalization capabilities of tabular foundati…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-10 08:35
11c29061-cf3e-4b… Gate 3 formula_repro Does scaling the size of domain-specific training data for RAG models improve alignment with human evaluators when measured by RAGalyst's me…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
7.8/10 2026-06-10 08:35
9f6b0926-918c-40… Gate 3 formula_repro How does the scaling of unlabeled video-audio pretraining data affect the few-shot adaptation accuracy of latent action models on the RoboBe…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
9.3/10 2026-06-10 08:35

Math Counterexample Kills (105 total, showing 100)

Conjectures generated by the autonomous math research pipeline and killed at Gate 1 when a numerical counterexample was found. These never reach the Lean 4 proof stage.

Conjecture ID Problem Statement (falsified) Killed (UTC)
843d975d77414a55… Ramsey R(5,5) — upper bound improvement In any 2-coloring of the edges of K_43 that contains no monochromatic K_5, there exists no vertex v such that the red degree of v is exactly 21 AND the red neighborhood of v induces a subgraph containing a red triangle. … 2026-06-21 01:49
e7d599128e7d45b9… Twin prime density — Hardy-Littlewood conjecture v For all integers x >= 100, the absolute difference between the actual count of twin prime pairs up to x and the Hardy-Littlewood prediction (2*C2*x/ln(x)^2) is strictly bounded by the square root of the prediction itself… 2026-06-20 21:43
63dbbefc6b334b33… Twin prime conjecture — density analysis For every integer N >= 10,000, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let S_odd(N) be the sum of the smaller primes p in these pairs where p ends in the digit 3 or 9, and S_even(N) be the sum whe… 2026-06-20 17:35
342d5e78b226469c… Fibonacci primes — density conjecture For every index n > 4 such that the Fibonacci number F_n is prime, the index n itself must be a prime number that can be expressed as the sum of two squares (i.e., n is a Pythagorean prime or n=2). Consequently, no Fibon… 2026-06-20 09:23
cd9bb9306d6a4edf… Fibonacci primes — density conjecture For all integers n > 4 such that the n-th Fibonacci number F_n is prime, the index n must satisfy n ≡ 1 or 2 (mod 5). Furthermore, if n ≡ 2 (mod 5), then n must be exactly 3. Consequently, for all Fibonacci primes with i… 2026-06-20 09:23
01cae6580a0347ec… Geometric Sum Identity The sum of the first n odd powers of 3 is given by the closed-form formula (3^(2n) - 1) / 8. 2026-06-20 03:58
75ca28388123479e… Gauss Sum Identity For any natural number n, the sum of integers from 0 to n multiplied by 2 equals n times (n+1). Specifically verified for n=100. 2026-06-19 18:21
afd87b814d86453c… Square Minus Square Factoring For any natural number n less than 100, the square of n is either even or odd. 2026-06-19 18:20
0b56f01b498740fa… Square Minus Square Factoring For every natural number n less than 100, the square of n is either even or odd. 2026-06-19 18:20
7a77efd70f1a4118… Quadratic Residue mod 3 For every natural number n less than or equal to 100, the square of n modulo 3 is either 0 or 1. 2026-06-19 14:15
a7f5bd3a2fcc4f22… Primes of form n^2+1 — density and distribution For the sequence of primes of the form p = n^2 + 1, let S(x) be the set of such primes less than or equal to x. Define the 'quadratic gap ratio' for a prime p = n^2 + 1 (where n > 1) as R(p) = (p_next - p) / (2n), where … 2026-06-19 01:53
b94993ff3d514132… Ramsey R(5,5) — upper bound improvement In any 2-coloring of the edges of K_43 (the current lower bound for R(5,5)) that contains no monochromatic K_5, the maximum number of monochromatic K_4 subgraphs is exactly 204. Furthermore, any such extremal coloring mu… 2026-06-18 17:32
68b38389e7ec452c… Catalan's conjecture (Mihailescu) — Lean4 formal p For any integer n > 1, if n is a perfect power (n = x^a with x > 1, a > 1), then the distance to the nearest other perfect power m (m != n, m = y^b with y > 1, b > 1) satisfies |n - m| > sqrt(n) * (ln(n))^0.8, with the s… 2026-06-17 20:27
6516345988494423… Catalan's conjecture (Mihailescu) — Lean4 formal p For any integer n > 8 that is a perfect power (i.e., n = x^a with x, a > 1), the open interval (n, n + n^(5/6)) contains no other perfect powers. This conjecture asserts that for perfect powers greater than 8, the gap to… 2026-06-17 20:26
2849c8ac1ec74318… Geometric Sum Identity The sum of the first 101 powers of 2 (from 2^0 to 2^100) equals 2^101 - 1. 2026-06-17 20:26
afc717278ab246f9… Sum of Odd Numbers Identity The sum of the first 42 odd positive integers equals 42 squared. 2026-06-17 16:11
42ae3e2e624a4a21… Square Minus Square Factoring For any natural number n less than 100, the square of n is either even or odd. 2026-06-17 12:05
df9723f85369422d… Square Minus Square Factoring For every natural number n less than 100, the square of n is either even or odd (specifically, n squared modulo 2 is either 0 or 1). 2026-06-17 12:05
849d0bcdc5b04211… Quadratic Residue mod 4 For every natural number n less than 100, the square of n modulo 4 is either 0 or 1. 2026-06-17 08:01
3ab4d13b11594410… OEIS A001065 — perfect number conjecture For any even perfect number n > 6, let p be the largest prime factor of n (which is also the Mersenne prime exponent's base, i.e., n = 2^(p-1)*(2^p - 1)). The sum of the proper divisors of the Mersenne prime component (2… 2026-06-16 23:52
39276ea98e4f49b0… Primes of form n^2+1 — density conjecture For every integer N >= 2, let P_N be the set of primes of the form k^2+1 less than or equal to N. Let M_N be the maximum gap between consecutive elements in the sorted sequence P_N (defining the first gap as p_1 - 0). Th… 2026-06-16 07:15
816e34ad26774d21… Twin prime density — Hardy-Littlewood conjecture v The ratio of the actual count of twin prime pairs up to x to the Hardy-Littlewood prediction (2*C2*x/ln(x)^2) exhibits a systematic negative bias that decays according to a specific logarithmic correction term. Specifica… 2026-06-16 03:05
1f35272d16e64f29… Twin prime conjecture — density analysis For any integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let S_3(N) be the count of such pairs where the smaller prime p satisfies p mod 3 = 1. The conjecture states that the deviation of… 2026-06-15 22:59
6625e04ce42645a3… Fibonacci primes — density conjecture For all integers n > 3, if the n-th Fibonacci number F_n is prime, then n must be a prime number p such that p ≡ 1 (mod 4) or p = 3. In other words, there are no Fibonacci primes with prime indices p where p ≡ 3 (mod 4) … 2026-06-15 14:49
6168a219a5854c3f… Collatz conjecture — structural pattern search For any integer n > 1, let S(n) be the set of odd integers encountered in the Collatz trajectory of n before reaching 1. Define the 'Odd-Step Parity Signature' P(n) as the sum of the indices (0-based) of all odd elements… 2026-06-15 06:32
d457ed3a611f478f… Collatz conjecture — structural pattern search For any integer n > 1, let S(n) be the set of odd integers encountered in the Collatz trajectory of n before reaching 1 (excluding the final 1). The conjecture states that the sum of the reciprocals of the elements in S(… 2026-06-15 06:31
b76082e436db434d… Goldbach conjecture — computational extension For every even integer n >= 10,000, there exists a Goldbach partition n = p + q (where p and q are prime) such that both p and q lie within the interval [n/2 - sqrt(n), n/2 + sqrt(n)] AND at least one of the primes p or … 2026-06-15 02:26
1ffdfdaa6b324dfe… Primes of form n^2+1 — density and distribution For the sequence of integers n where n^2+1 is prime, let the gaps be defined as g_k = n_{k+1} - n_k. The conjecture states that for all k >= 2, the gap g_k is strictly less than 2.5 * sqrt(n_k) * ln(ln(n_k)). This refine… 2026-06-15 02:25
bb34d8f54e2641fb… Ramsey multiplicity K_4 — minimum number of monoch In any 2-coloring of the edges of K_18 that achieves the global minimum number of monochromatic K_4 subgraphs, the resulting color classes (graphs) must be isomorphic to each other. Furthermore, each color class must hav… 2026-06-14 13:39
d1b012cc38cd4dfd… Fibonacci primes — density conjecture For every integer n >= 5, if the nth Fibonacci number F_n is prime, then n must be a prime number p such that either p = 5 or p is congruent to 1 or 9 modulo 20. In other words, no Fibonacci prime exists at a prime index… 2026-06-14 01:01
2516b894d67544f4… Catalan's conjecture (Mihailescu) — Lean4 formal p For any integer n > 1, if there exist two distinct perfect powers P1 = x^a and P2 = y^b (with x,y,a,b > 1) such that P1 < n < P2 and the gap G = P2 - P1 satisfies G < n^(1/3), then n must be equal to 26. Specifically, 26… 2026-06-13 20:43
e69a3d74fbc1457a… Primes of form n^2+1 — density and distribution For any integer N >= 100, let S_N be the set of primes of the form k^2+1 less than or equal to N. Let gaps_N be the sorted list of differences between consecutive elements in S_N. The conjecture states that the standard … 2026-06-13 12:28
fc8a413e3bb340b7… Twin prime density — Hardy-Littlewood conjecture v For all integers k >= 3, let T_k be the k-th twin prime pair (p_k, p_k+2). The fractional part of the square root of the smaller prime, {sqrt(p_k)}, is strictly less than 0.95, with the sole exception of the first twin p… 2026-06-12 23:58
ec12513113104894… Twin prime conjecture — density analysis For any integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let C2 be the twin prime constant (approx 0.66016). The conjecture states that the normalized residual R(N) = (T(N) - 2*C2*N/(ln N… 2026-06-12 19:52
70cdbcc71ee04a64… Fibonacci primes — density conjecture For all indices n > 4 such that the Fibonacci number F_n is prime, the index n must be a prime number p satisfying the condition that 5 is a quadratic non-residue modulo p (i.e., the Legendre symbol (5/p) = -1), with the… 2026-06-12 15:06
07ee02bc74414051… Square Minus Square Factoring For every natural number n less than 100, the square of n minus the square of (100 - n) equals 200 times n minus 10000. 2026-06-12 02:26
58fbbd5840c74b8d… Square Minus Square Factoring For any natural number n less than 100, the square of n modulo 2 is either 0 or 1. 2026-06-12 02:25
95f128c744214fc7… OEIS A001065 — perfect number conjecture For every even perfect number n > 6, the sum of the proper divisors of n that are congruent to 1 modulo 4 is strictly greater than the sum of the proper divisors congruent to 3 modulo 4. Specifically, if S_1(n) = sum{d |… 2026-06-11 13:47
28a8b2a801ae431f… Geometric Sum Identity The sum of three consecutive geometric terms (base=2) equals 14. 2026-06-11 09:38
ee01e3b5cc8d48e1… Geometric Sum Identity The sum of powers of 2 from 2^0 to 2^15 equals 2^16 - 1. 2026-06-11 09:38
f6283ecc58b94fbe… Square Minus Square Factoring For every natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2. 2026-06-11 04:38
1ef4058fa9054186… Square Minus Square Factoring For every natural number n less than 100, the square of n is either even or odd. 2026-06-11 04:37
415722d528c54fb7… Square Minus Square Factoring For any natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2. 2026-06-11 04:37
3c69744486e44d43… Quadratic Residue mod 3 For every natural number n less than 100, the square of n modulo 3 is either 0 or 1. 2026-06-11 04:36
126dd256f8354b53… Sum of Odd Numbers Identity The sum of the first 100 odd positive integers equals 10,000. 2026-06-10 22:08
3f7781c61e534635… Square Minus Square Factoring For every natural number n less than 100, the square of n modulo 2 is either 0 or 1. 2026-06-10 18:05
8f940771d9454664… Square Minus Square Factoring For every natural number n from 0 to 99, the square of n modulo 2 is either 0 or 1. 2026-06-10 18:05
b33154755b914337… Square Minus Square Factoring For every natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2. 2026-06-10 18:05
208383baf95c4350… Sum of Odd Numbers Identity The sum of the first 100 odd positive integers equals 100 squared. 2026-06-10 09:18
5e66db8a98eb4f41… Sum of Odd Numbers Identity The sum of the first 150 odd positive integers equals 150 squared. 2026-06-10 09:18
7f248d0d38c24d74… Goldbach conjecture — computational extension For every even integer n >= 100, there exists a Goldbach partition n = p + q (with p <= q) such that the prime p satisfies p > n/2 - sqrt(n) * (ln ln n)^2, AND p is a quadratic residue modulo the smallest prime factor of… 2026-06-10 07:28
c37b8d2825f34f46… Primes of form n^2+1 — density and distribution Let S(x) be the set of integers n in [1, x] such that n^2 + 1 is prime. For any two consecutive elements a, b in S(x) (with a < b), the gap g = b - a satisfies g < 2.5 * sqrt(a) * ln(a) for all x >= 1000. This conjecture… 2026-06-10 07:25
007bb20f4be4478b… Ramsey multiplicity K_4 — minimum number of monoch In any 2-coloring of the edges of K_18 that achieves the global minimum number of monochromatic K_4 subgraphs, the resulting color classes (graphs) must both be isomorphic to the Turán graph T(18, 3). Consequently, the m… 2026-06-10 01:47
f600ae4401434fed… Fibonacci primes — density conjecture For every Fibonacci prime F_p with prime index p > 3, the quantity (F_p - 1) / p is never an integer. In other words, no Fibonacci prime (beyond F_3=2 and F_4=3, though 4 is not prime index, specifically checking p=5, 7,… 2026-06-09 13:24
e0243ef726e64075… Collatz conjecture — structural pattern search For any integer n > 1, let S(n) be the set of distinct values visited in the Collatz trajectory of n before reaching 1. Let M(n) be the maximum element in S(n). The conjecture states that the ratio of the count of odd nu… 2026-06-09 00:53
45b6484654eb41d2… Goldbach conjecture — computational extension For every even integer n > 6, there exists a Goldbach partition n = p + q (with p <= q) such that the smaller prime p satisfies p > sqrt(n) and the product p*q is congruent to 1 modulo 24. 2026-06-09 00:52
b171ab227ec34a92… Primes of form n^2+1 — density and distribution Let P be the set of primes of the form n^2+1. For any x >= 10, let S(x) be the sum of the reciprocals of the square roots of the generators n for all such primes p = n^2+1 <= x. The conjecture states that S(x) is strictl… 2026-06-08 20:48
afae0570267f4f10… Twin prime density — Hardy-Littlewood conjecture v For all integers x >= 10,000, the relative error between the actual count of twin prime pairs up to x and the Hardy-Littlewood prediction (2*C2*x/ln(x)^2) is strictly bounded by the function 1.8 / ln(x). Specifically, |p… 2026-06-08 07:37
b77b197f38ef42b4… Ramsey R(4,6) — computational bounds In any 2-coloring of the edges of K_35 that avoids a red K_4 and a blue K_6 (if such a coloring exists), the maximum degree of any vertex in the red subgraph must be strictly less than 12. That is, Δ(Red) ≤ 11. 2026-06-08 07:37
b9da4d4e215345c4… Fibonacci primes — density conjecture For all integers n > 4, if the nth Fibonacci number F_n is prime, then n is either prime itself or n=4. Furthermore, for every prime index p > 3 such that F_p is composite, F_p possesses at least one prime factor q such … 2026-06-07 23:17
24d0c43564104d67… Goldbach conjecture — extend computational verific For every even integer n > 10,000, there exists a Goldbach partition n = p + q (where p and q are primes) such that both p and q are 'isolated' within a window of size W(n) = floor(0.8 * ln(n) * ln(ln(n))). Specifically,… 2026-06-07 15:00
ab2f6f8062e94761… OEIS A001065 — perfect number conjecture For every even perfect number n > 6, the sum of the squares of its proper divisors is strictly congruent to 1 modulo the square of its associated Mersenne prime exponent. Specifically, if n = 2^(p-1)(2^p - 1) where p and… 2026-06-07 14:59
bc6085714d2046c4… Primes of form n^2+1 — density and distribution For the sequence of primes of the form p = n^2 + 1, let n_k be the k-th positive integer such that n_k^2 + 1 is prime. The conjecture states that for all k >= 2, the gap between consecutive bases n_k and n_{k-1} satisfie… 2026-06-07 06:31
9f7d6c51e37b4088… Twin prime density — Hardy-Littlewood conjecture v For all x >= 1000, the actual count of twin prime pairs up to x strictly exceeds the standard Hardy-Littlewood prediction (2*C2*x/ln(x)^2) but remains bounded above by the prediction augmented with a specific second-orde… 2026-06-06 18:04
845aaff7aef64a01… Fibonacci primes — density conjecture For every integer n >= 3, if the Fibonacci number F_n is prime, then n must be a prime number, AND the index n satisfies the property that 2n+1 is either a prime number or a semiprime (product of exactly two primes, not … 2026-06-06 09:39
1028c83002d64c6f… Catalan's conjecture (Mihailescu) — Lean4 formal p For any integer n > 1, if n is a perfect power (n = x^a with x, a > 1) and the next consecutive perfect power m (m = y^b with y, b > 1, m > n) satisfies m - n = 1, then n must be 8. Furthermore, for any perfect power n >… 2026-06-06 05:32
59707dec0c84466f… OEIS A001065 — perfect number conjecture For every even perfect number n > 6, the sum of the binary digits of (n/2) is strictly less than the number of distinct prime factors of (n-1). 2026-06-06 01:23
e1ea6af8e40f4d3a… Collatz conjecture — structural pattern search For any integer n > 1, let S(n) be the set of odd numbers encountered in the Collatz trajectory of n before reaching 1. Let m = min(S(n)). Then the total stopping time (number of steps to reach 1) is strictly less than m… 2026-06-05 21:15
52c749523bfe490f… Primes of form n^2+1 — density conjecture For any integer n >= 2, let S_n be the set of primes of the form k^2+1 where k <= n. Let M_n be the maximum gap between consecutive elements in the sorted sequence S_n (defining the first gap as p_1 - 2). Then, M_n is st… 2026-06-05 16:24
b01e6c25195044a2… Primes of form n^2+1 — density conjecture For every integer n >= 1, the count of primes of the form k^2 + 1 with k <= n (denoted P(n)) satisfies the inequality P(n) >= floor(1.2 * sqrt(n) / ln(n)). Furthermore, for any n >= 100 where P(n) > 0, the gap between co… 2026-06-05 16:22
20a77f3e6ff34241… Fibonacci primes — density conjecture For every Fibonacci prime F_p with index p > 5, the integer part of the square root of the index p, denoted as floor(sqrt(p)), is always a prime number. 2026-06-04 23:37
d177586b3b3d4762… Primes of form n^2+1 — density and distribution Let P_N be the set of primes of the form n^2+1 for 1 <= n <= N. Let A_N be the count of such primes where the generator n is itself a prime number. The conjecture states that for all N >= 1000, the ratio of the density o… 2026-06-04 10:33
498269cc76514396… Twin prime density — Hardy-Littlewood conjecture v The normalized error term of the twin prime count, defined as E(x) = (pi_2(x) * ln(x)^2) / (2 * C2 * x) - 1, exhibits a persistent negative bias for all x in the range [10^4, 10^8]. Specifically, the conjecture states th… 2026-06-03 22:07
b81ab2bf8a3742e6… OEIS A001065 — perfect number conjecture For any even perfect number n > 6, let p be the unique Mersenne prime such that n = 2^(p-1)*(2^p - 1). The sum of the divisors of the exponent (p-1), denoted sigma(p-1), is strictly less than the square root of the Merse… 2026-06-03 07:23
af2e36aa3e2c473a… Primes of form n^2+1 — density and distribution For the sequence of primes of the form n^2+1, let p_k be the k-th such prime. The conjecture states that for all k >= 2, the gap between consecutive primes p_k and p_{k-1} satisfies: p_k - p_{k-1} < 2 * sqrt(p_k) * (ln(p… 2026-06-03 02:25
3c752084d9a043ca… Primes of form n^2+1 — density conjecture For every integer n >= 2, let S_n be the set of primes of the form k^2+1 with k <= n. Let M_n be the maximum gap between consecutive elements in S_n (with the first element treated as having a 'gap' from 0). Then M_n < 4… 2026-06-02 22:15
7c85eafa9f3a4c9b… Twin prime conjecture — density analysis For every integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N, and let S(N) be the sum of the reciprocals of the smaller primes in these pairs (i.e., sum(1/p) for all such p). The conjecture … 2026-06-02 11:16
17b23802a7a14aa9… Cap set problem — F_3^n maximum Conjecture: For n=6, the maximum size of a cap set in F_3^6 is exactly 112, and this maximum is uniquely achieved (up to affine equivalence) by the set of vectors with weight congruent to 1 modulo 3 in the specific coord… 2026-06-02 04:47
1bc7acdee264452e… Catalan's conjecture (Mihailescu) — Lean4 formal p For any integer n > 1, if n is a perfect power (n = x^a with x, a > 1), then the interval (n, n + n^(2/3)] contains no other perfect powers, except for the specific case where n = 8 (2^3), in which case the interval (8, … 2026-06-02 04:44
3f02ace5c31a4891… Goldbach conjecture — computational extension For every even integer n > 100, there exists a Goldbach partition n = p + q (with p <= q) such that the prime p lies within the interval [n/2 - sqrt(n), n/2]. Furthermore, the smallest such prime p satisfies the stronger… 2026-06-01 21:26
2d0c51fde717499c… Primes of form n^2+1 — density and distribution For all integers n >= 2, the gap between consecutive primes of the form k^2+1 is strictly less than 4 * sqrt(p_m) * ln(p_m), where p_m is the smaller prime in the pair. Furthermore, the ratio of the actual gap to this bo… 2026-06-01 17:21
2229477e7b1a459e… Primes of form n^2+1 — density and distribution For the sequence of primes of the form n^2+1, let p_k = n_k^2+1 be the k-th such prime. The conjecture states that for all k >= 2, the gap between consecutive bases n_k and n_{k-1} satisfies: n_k - n_{k-1} < 2 * sqrt(n_{… 2026-06-01 17:18
3032040e036b4ec0… Primes of form n^2+1 — density conjecture For every integer n >= 100, the number of primes of the form k^2+1 with k <= n is strictly greater than the number of primes of the form k^2+1 with k <= n/2 multiplied by the factor (1.3 * sqrt(n) / ln(n)). This conjectu… 2026-06-01 13:41
e2f7b4d3db414cd8… Twin prime conjecture — density analysis For every integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. The ratio of the actual count T(N) to the Hardy-Littlewood estimate E(N) = 2 * C_2 * N / (ln N)^2 (where C_2 is the twin prime c… 2026-06-01 01:14
6d026f496bb045ec… Fibonacci primes — density conjecture For every integer n > 4, if the n-th Fibonacci number F_n is prime, then n must be a prime number p such that 5 is a quadratic non-residue modulo p (i.e., the Legendre symbol (5/p) = -1). This implies that all Fibonacci … 2026-05-31 21:04
1b3e58ad75384c33… Fibonacci primes — density conjecture For all integers n > 6, if the nth Fibonacci number F_n is prime, then n must be a prime number that can be expressed as the sum of two squares (i.e., n is 2, or n is a prime congruent to 1 modulo 4). This implies that n… 2026-05-31 21:03
a29125de7d584756… Catalan's conjecture (Mihailescu) — Lean4 formal p For any integer n > 1 that is not 8, if n is a perfect power (n = x^a with x>1, a>1), then the smallest perfect power m > n (where m = y^b with y>1, b>1) satisfies the gap inequality m - n > n^0.55. The only exception to… 2026-05-31 16:59
47c3e171d2ff4226… OEIS A001065 — perfect number conjecture For any even perfect number n > 6, let m = n/6. The sum of the proper divisors of m (denoted s(m)) is strictly greater than the square of the number of distinct prime factors of m (denoted omega(m)^2). 2026-05-31 12:56
9c24c2404c5b4ea2… Goldbach conjecture — computational extension The sum of two primes representing an even number n > 2 has its maximal prime difference bounded by n^(0.51), where the exponent 0.51 is strictly between 0.5 and 1. This refines the trivial bound of n-3 by showing the di… 2026-05-31 08:39
5a54ab3135de44a4… Primes of form n^2+1 — density and distribution For integers n >= 2, let P(n) be the set of primes of the form k^2+1 less than or equal to n. Let G(n) be the maximum gap between consecutive elements in P(n) (with the first gap defined as p_1 - 2). The conjecture state… 2026-05-31 05:55
815317887cf646b1… Primes of form n^2+1 — density and distribution For any integer N >= 100, let S_N be the set of primes p <= N such that p = k^2 + 1 for some integer k. Let M_N be the maximum gap between consecutive elements in the sorted sequence S_N (with the first gap defined as th… 2026-05-31 05:55
3e68141113f44ca9… Primes of form n^2+1 — density conjecture For every integer n >= 1, the number of primes of the form k^2 + 1 with k <= n is strictly less than 2 * sqrt(n). Furthermore, the ratio of this count to sqrt(n) never exceeds 1.8 for any n >= 100. 2026-05-31 01:48
21e6bfada240446d… Primes of form n^2+1 — density conjecture For every integer n >= 2, the number of primes of the form k^2 + 1 with k <= n is strictly greater than the number of integers k <= n such that k^2 + 1 is a product of exactly two distinct primes, both of which are congr… 2026-05-31 01:48
3f50b59f69c24ee3… Twin prime density — Hardy-Littlewood conjecture v For all integers x >= 10,000, the cumulative count of twin prime pairs pi_2(x) strictly exceeds the first-order Hardy-Littlewood approximation L_1(x) = 2*C_2 * x / (ln x)^2, but remains bounded above by a second-order co… 2026-05-30 17:37
84e7e3b311544ebb… Cap set problem F_3^6 — verify maximum size = 112 The maximum cap set size in F_3^6 is exactly 112, and this bound is achieved only by the canonical construction S_3^6 ⊂ F_3^6 2026-05-30 04:42
809a9ab0175448e8… Fibonacci primes — density conjecture For all integers n >= 3, if the nth Fibonacci number F_n is prime, then the index n must be a prime number p such that p is not a Wieferich prime base 2 (i.e., 2^(p-1) is not congruent to 1 modulo p^2). Furthermore, for … 2026-05-30 04:30
fe5aa22c047044f1… Cap set problem — F_3^n maximum The maximum size of a cap set in F_3^n for n ≤ 8 is bounded above by ⌊2.2^n⌋, and for n = 6, 7, 8 the values are exactly 124, 353, and 994 respectively 2026-05-30 01:15
23f6590eb1bc458c… Cap set problem — F_3^n maximum The maximum size of a cap set in F_3^n for n=6 is exactly 112, and this value is achieved by a specific construction based on the Edel's bound. 2026-05-30 01:13
028910cf4158418c… Primes of form n^2+1 — density conjecture The count of primes of the form n^2+1 up to a given bound is asymptotically equal to 2*C*Li(x) where C is a constant approximately 0.685 and Li(x) is the logarithmic integral, with the constant C being related to the pro… 2026-05-29 19:18
aac0f88db762449e… Ramsey multiplicity K_4 — minimum number of monoch In any 2-coloring of K_18, the minimum number of monochromatic K_4 is exactly 18, and this minimum is achieved only by colorings where the graph of one color forms a specific structured graph related to the Turán graph T… 2026-05-29 16:36