Public Falsification Record
Gate 2 / Gate 3 Pipeline Falsifications (294)
Claims VERIFIED at Gate 2 (sealed-sandbox repro) and subsequently falsified by the Gate 3 adversarial red-team (three independent LLM attackers, inverted scoring). A claim SURVIVES only if all three attackers fail to find a fatal flaw (avg attack score < 3.5; no individual score ≥ 5.0).
| Task ID | Gate | Claim type | Goal / Claim | Avg attack | Killed (UTC) |
|---|---|---|---|---|---|
| 760fbc69-65b1-4f… | Gate 3 | formula_repro |
How does the scaling of model size affect the performance gain from English intermediate-task training in zero-shot cross-lingual transfer, …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.6/10 | 2026-06-21 01:19 |
| 9a6f19a5-f2e6-4c… | Gate 3 | formula_repro |
To what extent does English intermediate-task training improve cross-lingual reasoning capabilities on multilingual benchmarks compared to d…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.2/10
|
6.9/10 | 2026-06-21 01:19 |
| 5571973f-50bb-48… | Gate 3 | formula_repro |
Does intermediate-task training on domain-specific multilingual datasets improve robustness to domain shift in zero-shot cross-lingual trans…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.2/10 | 2026-06-21 01:19 |
| a80c506e-d2f8-4a… | Gate 3 | formula_repro |
What is the impact of scaling the number of intermediate language-understanding tasks on zero-shot cross-lingual transfer performance for lo…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-21 01:18 |
| 2290ba9d-c1c0-46… | Gate 3 | formula_repro |
What is the impact of intermediate-task training on low-resource languages in the XTREME benchmark when using models pretrained on both Engl…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10
|
6.2/10 | 2026-06-21 01:18 |
| 2af78e0e-7fba-4d… | Gate 3 | formula_repro |
How does the effectiveness of English intermediate-task training for zero-shot cross-lingual transfer compare to multilingual intermediate-t…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-21 01:18 |
| 8544463f-7e19-46… | Gate 3 | formula_repro |
How does the performance of multilingual intermediate-task training on low-resource languages compare to English intermediate tasks when eva…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
4.9/10 | 2026-06-21 01:18 |
| bd43d6f2-a6c8-44… | Gate 3 | formula_repro |
How does the performance of intermediate-task training sequences compare to continuous pretraining on a multilingual corpus in zero-shot cro…
COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 2.5/10
|
6.1/10 | 2026-06-21 01:18 |
| 345ee2d0-d119-47… | Gate 3 | formula_repro |
Does multi-task intermediate training on diverse English NLU tasks improve robustness against typological divergence more effectively than s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10
|
8.0/10 | 2026-06-20 19:17 |
| 13542541-aac3-44… | Gate 3 | formula_repro |
What is the impact of English intermediate-task training on the alignment stability of multilingual encoders when evaluated on adversarial p…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-20 19:17 |
| d7a818b5-ede1-48… | Gate 3 | formula_repro |
Does the performance gain from English intermediate-task training on XTREME scale with increasing pretraining model size across diverse low-…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.9/10 | 2026-06-20 19:17 |
| 7f4a6491-9864-42… | Gate 3 | formula_repro |
How does English intermediate-task training affect zero-shot cross-lingual robustness on XTREME tasks with synthetic code-switching noise co…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-20 19:17 |
| aa533e6e-e763-4c… | Gate 3 | formula_repro |
How does intermediate-task training on English reasoning datasets affect zero-shot cross-lingual performance on the XCOPA and XNLI subsets o…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.2/10
|
6.1/10 | 2026-06-20 19:17 |
| 48a8684b-3452-40… | Gate 3 | formula_repro |
Does the order of intermediate-task fine-tuning (sequential vs. concurrent) influence the robustness of multilingual alignment in zero-shot …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.1/10 | 2026-06-20 19:16 |
| a76c6aed-2013-42… | Gate 3 | formula_repro |
How does the choice of English intermediate-task difficulty (e.g., low vs. high complexity) affect zero-shot cross-lingual transfer performa…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 9.2/10
|
6.7/10 | 2026-06-20 19:16 |
| 62dad032-b551-46… | Gate 3 | formula_repro |
How does the choice of intermediate task complexity (e.g., easy vs. hard language understanding tasks) affect zero-shot cross-lingual transf…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 3.2/10 · REPLICATION ATTACKER: 9.0/10
|
7.1/10 | 2026-06-20 19:15 |
| 4ffd234e-3144-46… | Gate 3 | formula_repro |
Does multilingual intermediate-task training on XTREME-R outperform monolingual English training in few-shot cross-lingual transfer across l…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
6.1/10 | 2026-06-20 19:15 |
| 5f20fd9d-3000-49… | Gate 2 | unknown | Does the integration of synthetic code-switched data improve the robustness of zero-shot cross-lingual retrieval models against adversarial … | - | 2026-06-20 16:59 |
| c29d0b1d-1c55-45… | Gate 2 | unknown | How does hybrid batch training for monolingual and cross-lingual objectives impact zero-shot retrieval accuracy on the BEIR benchmark compar… | - | 2026-06-20 16:57 |
| a10e30bc-cc3b-42… | Gate 3 | formula_repro |
How does training on artificially code-switched data affect the robustness of zero-shot cross-lingual retrieval models across low-resource l…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10
|
6.8/10 | 2026-06-20 13:15 |
| 88b977da-978a-4b… | Gate 3 | formula_repro |
How does the granularity of bilingual lexicons (e.g., word-level vs. phrase-level) impact the effectiveness of artificially code-switched tr…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
|
5.8/10 | 2026-06-20 13:15 |
| 9293eb3f-c631-43… | Gate 3 | formula_repro |
What is the impact of artificially code-switched training data on the robustness of cross-lingual retrieval models evaluated on the PAWS-X d…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10
|
6.7/10 | 2026-06-20 13:15 |
| 108e3723-cd80-4b… | Gate 3 | formula_repro |
How does the quality of bilingual lexicons impact the performance of zero-shot cross-lingual retrieval models on the BEIR benchmark when eva…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10
|
7.4/10 | 2026-06-20 13:14 |
| 4cb997d4-121c-4d… | Gate 3 | formula_repro |
Does training on artificially code-switched data improve zero-shot cross-lingual retrieval recall on the MIRACL benchmark compared to monoli…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.2/10 | 2026-06-20 13:14 |
| 420aa4ee-b924-40… | Gate 3 | formula_repro |
What is the effect of increasing the amount of artificially code-switched training data on the robustness of zero-shot cross-lingual retriev…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.5/10 | 2026-06-20 13:14 |
| bb51b78d-f1e5-48… | Gate 3 | formula_repro |
How does the transfer learning performance of self-supervised speech models pre-trained on Flemish Dutch compare to other low-resource langu…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
|
5.3/10 | 2026-06-20 07:14 |
| 1249d225-3960-40… | Gate 3 | formula_repro |
How does the noise level in automatically induced bilingual lexicons affect the nDCG@10 and MAP scores of zero-shot cross-lingual retrievers…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.2/10 | 2026-06-19 19:12 |
| 2e7a84f4-8bf1-4d… | Gate 3 | formula_repro |
To what extent does English intermediate-task training enhance zero-shot reasoning capabilities on multilingual benchmarks like XTREME-R for…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.8/10 | 2026-06-19 19:12 |
| d41c927e-b869-40… | Gate 3 | formula_repro |
How does intermediate-task training on non-English source languages compare to English-only intermediate training for zero-shot cross-lingua…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-19 19:12 |
| 44a960a9-7f95-41… | Gate 3 | formula_repro |
How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.7/10 | 2026-06-19 13:09 |
| 05adb989-b8ea-44… | Gate 3 | formula_repro |
Does training on artificially code-switched datasets improve the robustness of zero-shot cross-lingual retrievers against query-document lan…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
5.7/10 | 2026-06-19 13:06 |
| da82e7da-f16a-41… | Gate 3 | formula_repro |
Does the hybrid batch strategy improve zero-shot cross-lingual retrieval robustness on the XTD benchmark compared to standard multilingual c…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-19 13:06 |
| 2bb9a5bd-57ad-44… | Gate 3 | formula_repro |
How does training on artificially code-switched data affect zero-shot cross-lingual performance on the XNLI benchmark compared to standard m…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10
|
7.8/10 | 2026-06-19 13:06 |
| 3b872189-de14-45… | Gate 3 | formula_repro |
How does training on artificially code-switched data affect the zero-shot retrieval accuracy of multilingual dense retrievers on the Lasers …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10
|
7.5/10 | 2026-06-19 13:06 |
| 5ff9bb46-9d08-44… | Gate 3 | formula_repro |
To what extent does training on artificially code-switched data improve zero-shot cross-lingual retrieval robustness on XTREME-R when querie…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.7/10 | 2026-06-19 13:05 |
| 6ef56955-1505-47… | Gate 3 | formula_repro |
How does the proportion of code-switched tokens in synthetic training data correlate with the accuracy drop of zero-shot cross-lingual ranke…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10
|
8.8/10 | 2026-06-19 13:05 |
| 73b8a1d9-aee2-48… | Gate 3 | formula_repro |
How does training on artificially code-switched data affect the robustness of zero-shot cross-lingual rankers against adversarial noise comp…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-19 13:03 |
| 07f2d90e-a6ad-40… | Gate 2 | unknown | Does training on artificially code-switched data improve zero-shot cross-lingual retrieval performance for low-resource languages not includ… | - | 2026-06-19 12:46 |
| 7507c92e-3093-47… | Gate 3 | formula_repro |
Does integrating CausalMixFT during fine-tuning improve the robustness of tabular foundation models against adversarial perturbations in low…
COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
|
5.0/10 | 2026-06-19 07:00 |
| da92f3e0-4d5b-44… | Gate 3 | formula_repro |
How do dense RGB-D SLAM systems utilizing 3D Gaussian representations compare to neural implicit methods in terms of memory consumption and …
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.5/10 | 2026-06-19 07:00 |
| 3d9090b2-81c7-4d… | Gate 3 | formula_repro |
How do vision-language models perform in cross-domain robustness evaluations when tested on perturbed multimodal benchmarks from domains lik…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10
|
6.5/10 | 2026-06-19 07:00 |
| 0ff89691-6471-49… | Gate 3 | formula_repro |
How does the trade-off between model size and latency compare between OpenPangu-7B-MLA and smaller prosody-exclusive models when deployed on…
COUNTEREXAMPLE HUNTER: 8.0/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 8.5/10
|
7.0/10 | 2026-06-19 07:00 |
| 86059cb7-4f35-4c… | Gate 3 | formula_repro |
What is the impact of cross-lingual transfer from English pre-trained speech models versus monolingual Flemish pre-training on phoneme recog…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.3/10 | 2026-06-19 07:00 |
| 0cb45f8d-b6ae-45… | Gate 3 | formula_repro |
How does the addition of self-supervised pre-training objectives in zero-shot cross-lingual SLU models affect slot-filling accuracy on the M…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 2.5/10
|
6.5/10 | 2026-06-19 07:00 |
| 36c4669f-6437-43… | Gate 3 | formula_repro |
What is the effect of varying the size of the monolingual training set on the intent detection performance of zero-shot cross-lingual SLU mo…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 3.0/10
|
6.3/10 | 2026-06-19 07:00 |
| d245e830-fc51-49… | Gate 3 | formula_repro |
What is the impact of varying the code-switching ratio in training data on the retrieval performance degradation when query and document lan…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.0/10 | 2026-06-19 01:00 |
| a5ab3b78-a7cb-48… | Gate 3 | formula_repro |
How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 6.5/10
|
6.7/10 | 2026-06-19 01:00 |
| 5e61f043-f854-46… | Gate 3 | formula_repro |
What is the impact of varying the ratio of code-switched tokens in artificially generated training data on the robustness (measured by accur…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 6.5/10
|
7.8/10 | 2026-06-19 00:59 |
| a984d373-612a-42… | Gate 3 | formula_repro |
How does the performance of zero-shot cross-lingual rankers trained on artificially code-switched data compare to models fine-tuned on multi…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.5/10 | 2026-06-19 00:59 |
| 963c284c-e2a8-44… | Gate 3 | formula_repro |
How does increasing the proportion of code-switched tokens in the training data affect the robustness of zero-shot cross-lingual retrieval m…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10
|
5.0/10 | 2026-06-19 00:59 |
| 643ef90b-e1b6-4f… | Gate 3 | formula_repro |
How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 7.5/10
|
6.0/10 | 2026-06-19 00:59 |
| 05f4c33e-153e-4c… | Gate 3 | formula_repro |
Does scaling the multilingual pre-trained model size improve precision@k in zero-shot cross-lingual retrieval when using the proposed hybrid…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10
|
9.2/10 | 2026-06-18 18:59 |
| f988e6e1-5aab-4f… | Gate 3 | formula_repro |
Can scaling the hybrid batch training method to larger multilingual models (e.g., XLM-R or mT5) further enhance zero-shot cross-lingual retr…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-18 18:59 |
| 6a1d78e6-6b7d-44… | Gate 3 | formula_repro |
How does domain-adaptive fine-tuning of Flemish Dutch self-supervised speech models impact word error rate on CommonVoice compared to cross-…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-18 18:56 |
| 52c35878-037b-40… | Gate 3 | formula_repro |
What is the comparative effect of multi-task intermediate training versus single large-task training on reasoning capabilities within zero-s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-18 18:54 |
| 9b3a269a-5094-47… | Gate 3 | formula_repro |
Does combining diverse intermediate tasks improve robustness in zero-shot cross-lingual transfer on XTREME-R more effectively than training …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.0/10 | 2026-06-18 18:53 |
| 880c758a-54df-4a… | Gate 3 | formula_repro |
How does hybrid batch training impact zero-shot cross-lingual retrieval accuracy on XNLI compared to monolingual fine-tuning across varying …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
|
7.0/10 | 2026-06-18 18:53 |
| faa8fed9-29ac-4e… | Gate 3 | formula_repro |
Does the synergistic hybrid batch training approach improve cross-lingual retrieval robustness on the MIRACL benchmark under domain shift co…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-18 18:51 |
| fd7ab5fe-aab9-48… | Gate 3 | formula_repro |
What is the impact of hybrid batch training on the scaling behavior of zero-shot retrieval performance across varying model sizes within the…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 2.0/10
|
6.5/10 | 2026-06-18 18:51 |
| 2e00ae69-fcc4-4e… | Gate 3 | formula_repro |
How does hybrid batch training for simultaneous monolingual and cross-lingual retrieval impact zero-shot performance on low-resource languag…
COUNTEREXAMPLE HUNTER: 8.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 4.5/10
|
7.3/10 | 2026-06-18 18:50 |
| 25c348c6-4569-45… | Gate 3 | formula_repro |
Does the synergistic optimization of monolingual and cross-lingual objectives in hybrid batch training improve retrieval performance on long…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-18 18:48 |
| a9afc122-af0a-4c… | Gate 3 | formula_repro |
What is the impact of varying the proportion of code-switched tokens in artificially generated training data on the robustness of zero-shot …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.7/10 | 2026-06-18 12:48 |
| 96a5be5a-20a9-4d… | Gate 3 | formula_repro |
Does training on artificially code-switched data improve the robustness of retrieval models against language mismatch errors in queries and …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.0/10 | 2026-06-18 12:48 |
| d4730df9-5abd-4e… | Gate 3 | formula_repro |
What is the impact of varying bilingual lexicon coverage on the zero-shot cross-lingual retrieval performance of code-switched trained model…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-18 12:48 |
| 82718235-0464-46… | Gate 3 | formula_repro |
How does the cross-lingual retrieval accuracy of models trained on artificially code-switched data compare to full multilingual pretraining …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.2/10
|
9.2/10 | 2026-06-18 12:48 |
| 304f71e8-cfc0-47… | Gate 3 | formula_repro |
How does the performance of zero-shot cross-lingual retrieval models improve when trained on artificially code-switched data generated from …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.9/10 | 2026-06-18 12:47 |
| 5b801b53-702c-47… | Gate 3 | formula_repro |
How does the hybrid batch training strategy impact the zero-shot cross-lingual retrieval accuracy of larger multimodal models (e.g., PaLI, B…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-18 12:47 |
| 681c606e-0739-4d… | Gate 3 | formula_repro |
How does the scaling of model size (e.g., small, base, large) interact with the hybrid batch training strategy in terms of zero-shot cross-l…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-18 12:47 |
| fed390d7-0f30-4f… | Gate 3 | formula_repro |
Can the hybrid batch training strategy be adapted to improve zero-shot cross-lingual retrieval performance in low-resource language settings…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-18 12:46 |
| 69ee4e5c-067c-4b… | Gate 3 | formula_repro |
How does the scaling of intermediate-task dataset size affect the degradation of zero-shot cross-lingual transfer performance on the XTREME …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-18 06:45 |
| a9ba0423-52f5-40… | Gate 3 | formula_repro |
What is the impact of intermediate-task training on the robustness of zero-shot cross-lingual transfer to low-resource languages within the …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-18 06:45 |
| dc84b96b-753f-46… | Gate 3 | formula_repro |
Does the choice of multilingual intermediate tasks (e.g., language-agnostic vs. language-specific) impact the robustness of zero-shot cross-…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-18 06:45 |
| bff5aa00-46f0-42… | Gate 3 | formula_repro |
How does the performance of multilingual intermediate-task training compare to English intermediate-task training on the XTREME-R benchmark,…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 6.5/10
|
5.2/10 | 2026-06-18 06:44 |
| b6aec771-9353-47… | Gate 3 | formula_repro |
How does the hybrid batch training strategy impact zero-shot retrieval accuracy on low-resource MIRACL language pairs compared to dedicated …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 6.5/10
|
5.7/10 | 2026-06-18 06:44 |
| 82a8b49a-ca92-48… | Gate 3 | formula_repro |
How does fine-tuning Flemish Dutch self-supervised speech models with domain adaptation techniques affect word error rate on the CommonVoice…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10
|
7.5/10 | 2026-06-18 06:44 |
| 16fd9c48-0866-43… | Gate 3 | formula_repro |
What is the impact of structural causal model fidelity on the downstream classification accuracy of fine-tuned tabular foundation models in …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-18 06:43 |
| 27b34f6d-ce74-49… | Gate 3 | formula_repro |
What is the effect of domain-specific vs. general-domain code-switched data on zero-shot cross-lingual retrieval performance in multilingual…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.0/10 | 2026-06-18 00:41 |
| 706377b3-65ea-44… | Gate 3 | formula_repro |
How does the robustness of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare across different lang…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.0/10 | 2026-06-18 00:41 |
| 6724432f-b734-46… | Gate 3 | formula_repro |
How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to those trained on …
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.2/10 | 2026-06-18 00:41 |
| 5eeb7591-2ce2-45… | Gate 3 | formula_repro |
How does the retrieval accuracy per training token of models trained on artificially code-switched data compare to full multilingual pretrai…
COUNTEREXAMPLE HUNTER: 6.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
|
5.5/10 | 2026-06-18 00:41 |
| 572586b4-d231-44… | Gate 3 | formula_repro |
How does the lexical coverage ratio of bilingual dictionaries used for artificial code-switching correlate with zero-shot cross-lingual retr…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 6.5/10
|
7.1/10 | 2026-06-18 00:41 |
| b539628a-b9a7-4a… | Gate 3 | formula_repro |
How does hybrid batch training impact zero-shot retrieval recall@10 on the MIRACL benchmark for low-resource languages compared to monolingu…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-17 18:40 |
| 8d5a627d-354d-4d… | Gate 3 | formula_repro |
How does the hybrid batch training strategy impact zero-shot retrieval accuracy on unseen low-resource language pairs when evaluated on the …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-17 18:39 |
| 16872a52-c7b2-4c… | Gate 3 | formula_repro |
How does fine-tuning on naturally occurring code-switched corpora (e.g., LINCS or NLPCC) compare to fine-tuning on artificially code-switche…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.0/10 | 2026-06-17 18:39 |
| a068b0e0-7ac3-4d… | Gate 3 | formula_repro |
Does the synergistic hybrid batch training strategy improve zero-shot cross-lingual retrieval accuracy for languages with varying typologica…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10
|
8.8/10 | 2026-06-17 18:38 |
| e864d5a7-01cf-48… | Gate 3 | formula_repro |
To what extent do self-supervised speech models pre-trained on Flemish Dutch generalize to low-resource dialects compared to English pre-tra…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.9/10 | 2026-06-17 18:38 |
| 4d287f11-5746-4c… | Gate 3 | formula_repro |
Does fine-tuning English pre-trained speech models on limited Flemish data yield comparable robustness to noise as models pre-trained exclus…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-17 18:36 |
| 52cce802-988b-44… | Gate 3 | formula_repro |
How does scaling the model size of TSDiff impact its performance on cross-domain time series forecasting benchmarks (e.g., UCR archive) comp…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-17 18:36 |
| 69e07f68-b58d-4d… | Gate 3 | formula_repro |
What is the impact of simultaneous monolingual and cross-lingual objective optimization on the generalization capability of multilingual enc…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-17 12:34 |
| 8aad928b-e796-47… | Gate 3 | formula_repro |
How does hybrid batch training affect zero-shot cross-lingual retrieval accuracy on low-resource language pairs in the MIRACL benchmark comp…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-17 12:34 |
| 96ee3e4c-90e2-47… | Gate 3 | formula_repro |
How does varying the ratio of monolingual to cross-lingual training examples in hybrid batches affect the performance trade-off between NQ a…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10
|
8.8/10 | 2026-06-17 12:34 |
| c1c990b6-70f0-44… | Gate 3 | formula_repro |
What is the impact of simultaneous monolingual, cross-lingual, and multilingual optimization on the retrieval performance of transformer mod…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
6.3/10 | 2026-06-17 12:31 |
| efef605f-428e-4d… | Gate 3 | formula_repro |
How does the synergistic hybrid batch training strategy compare to standard multilingual fine-tuning in terms of zero-shot cross-lingual ret…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-17 12:31 |
| 5e54cc94-1461-48… | Gate 3 | formula_repro |
Can integrating domain-specific monolingual data (e.g., legal, medical) into hybrid batch training improve zero-shot retrieval accuracy on X…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.0/10 | 2026-06-17 12:28 |
| 45ea17e5-1565-49… | Gate 3 | formula_repro |
Can the model-agnostic nature of SafeCoDe be validated across different multimodal architectures (e.g., LLaVA, Qwen-VL) by comparing their s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.7/10 | 2026-06-17 12:28 |
| d4118170-4257-4f… | Gate 3 | formula_repro |
How does the hybrid batch training strategy compare to language-specific adapter modules in improving zero-shot cross-lingual retrieval accu…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10
|
9.1/10 | 2026-06-17 06:27 |
| f36a2df0-3772-49… | Gate 3 | formula_repro |
What is the impact of varying the degree of artificial code-switching in training data on the robustness of zero-shot cross-lingual retrieva…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-17 06:27 |
| 8acc7c8a-b815-43… | Gate 3 | formula_repro |
Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval performance on non-English language pairs in XM3600 compar…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-17 06:26 |
| 479ce4ad-ca7a-40… | Gate 3 | formula_repro |
Can intermediate-task training on English reasoning datasets mitigate cross-lingual performance degradation in low-resource languages in the…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-17 06:26 |
| cf208549-5033-41… | Gate 3 | formula_repro |
Does multilingual intermediate-task training improve zero-shot transfer accuracy on XTREME-R domain-specific subsets compared to English-onl…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 6.5/10
|
8.2/10 | 2026-06-17 06:26 |
| 5ad5a3a9-7cbb-4f… | Gate 3 | formula_repro |
What is the impact of varying the size and linguistic diversity of the English intermediate-task corpus on the degradation of zero-shot tran…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.0/10
|
8.4/10 | 2026-06-17 06:26 |
| c64ed49c-d8b1-42… | Gate 3 | formula_repro |
How does cross-lingual query generation augmentation impact the adversarial robustness of dense retrieval models against paraphrase attacks …
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.5/10
|
8.1/10 | 2026-06-17 06:26 |
| 9cbec7df-52ae-46… | Gate 3 | formula_repro |
Does pretraining zero-shot cross-lingual retrieval models on artificially code-switched data improve robustness to language divergence in qu…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-17 00:25 |
| 791b2ca6-fdad-44… | Gate 3 | formula_repro |
Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval robustness in low-resource language settings for multimoda…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-17 00:25 |
| c3b1fbef-bb8a-4c… | Gate 3 | formula_repro |
Does the hybrid batch training strategy improve retrieval performance on the XOR benchmark compared to models optimized solely for cross-lin…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-17 00:25 |
| 1003c363-131a-43… | Gate 3 | formula_repro |
Does training on artificially code-switched data improve zero-shot cross-lingual retrieval performance on the MLQA benchmark compared to sta…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.2/10
|
6.4/10 | 2026-06-17 00:25 |
| 0195f05e-2ecb-43… | Gate 3 | formula_repro |
Does the hybrid batch training strategy proposed for information retrieval improve multimodal alignment accuracy on zero-shot cross-lingual …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
|
5.7/10 | 2026-06-17 00:25 |
| 54b44e18-53d0-4a… | Gate 3 | formula_repro |
Does intermediate-task training on English reasoning datasets improve zero-shot cross-lingual performance on the XCOPA and XNLI subsets of X…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.0/10 | 2026-06-17 00:25 |
| 1fba2b0b-14c2-4f… | Gate 3 | formula_repro |
Can multilingual intermediate-task training outperform English-only intermediate training for zero-shot transfer on domain-specific subsets …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-17 00:24 |
| f9e78b9c-2598-43… | Gate 3 | formula_repro |
How does the size of the English intermediate-task corpus affect the degradation of zero-shot transfer accuracy on low-resource languages wi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-17 00:24 |
| f9cfb858-1284-49… | Gate 3 | formula_repro |
How does the performance of cross-lingual dense retrieval systems using query-augmented passage representations compare to those using multi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10
|
9.1/10 | 2026-06-17 00:24 |
| 8416a433-960a-43… | Gate 3 | formula_repro |
What is the impact of scaling the size of the synthetic dataset generated by CausalMixFT on the fine-tuning performance of tabular foundatio…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.8/10 | 2026-06-17 00:24 |
| 36871c14-b8df-4a… | Gate 3 | formula_repro |
How does cross-lingual query generation augmentation affect the adversarial robustness of dense retrieval models against paraphrase attacks …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.5/10 | 2026-06-16 18:21 |
| 006e8e4a-a201-4b… | Gate 3 | formula_repro |
How does the performance gap between high-resource and low-resource languages in cross-lingual retrieval models change when using different …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.2/10 | 2026-06-16 18:21 |
| b6d14fcd-ba14-42… | Gate 3 | formula_repro |
How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to models fine-tuned…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.2/10
|
5.7/10 | 2026-06-16 18:21 |
| ef1bc3a9-c7fa-44… | Gate 3 | formula_repro |
What is the impact of varying the proportion of code-switched terms in training data on the robustness of zero-shot cross-lingual retrieval …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.2/10 | 2026-06-16 18:21 |
| b39d0c42-f3a9-4c… | Gate 3 | formula_repro |
How does the performance of cross-lingual query generation compare to multilingual contrastive learning (e.g., XLM-R, LasER) on the BEIR ben…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
|
5.7/10 | 2026-06-16 18:21 |
| 33238242-5dc0-45… | Gate 2 | unknown | How does the cross-lingual transfer performance of mE5 compare to other multilingual models like XLM-R or mBERT when pre-trained on monoling… | - | 2026-06-16 17:19 |
| 2556595d-7efa-45… | Gate 3 | formula_repro |
How does the performance of multilingual dense retrieval models compare on WebFAQ when trained with synthetic data augmentation versus human…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.5/10 | 2026-06-16 12:21 |
| fafdde58-463a-48… | Gate 3 | formula_repro |
How does the performance of Targeted Lexical Injection (TLI) with early-layer LoRA fine-tuning compare to full-parameter fine-tuning on the …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
|
8.2/10 | 2026-06-16 12:19 |
| 497fe53f-999a-48… | Gate 3 | formula_repro |
How does the pass@1 degradation of CodeT5 compare to JaCoText on MBPP Pro when subjected to semantic-preserving docstring perturbations vers…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
|
7.8/10 | 2026-06-16 12:19 |
| d3826925-d702-4e… | Gate 3 | formula_repro |
Can TLI early-layer LoRA fine-tuning improve cross-domain alignment in Lugha-Llama for low-resource Bantu languages, as evaluated by mAP sco…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10
|
6.7/10 | 2026-06-16 12:19 |
| d09dcdad-46aa-44… | Gate 3 | formula_repro |
What is the effect of Targeted Lexical Injection on cross-lingual alignment quality for Lugha-Llama when evaluated on semantic textual simil…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 6.5/10
|
7.5/10 | 2026-06-16 12:19 |
| c90a313e-3bcf-45… | Gate 3 | formula_repro |
How does early-layer LoRA with Targeted Lexical Injection impact zero-shot cross-lingual transfer accuracy on the XNLI benchmark for low-res…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 4.5/10
|
7.2/10 | 2026-06-16 12:19 |
| 647a30e8-8ab0-47… | Gate 3 | formula_repro |
To what extent does the depth of early-layer LoRA fine-tuning in TLI affect cross-lingual lexical alignment, as measured by LAS scores acros…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.8/10 | 2026-06-16 06:19 |
| a8f56d30-b343-4a… | Gate 3 | formula_repro |
How do context-aware conversational models and sequence labeling approaches differ in zero-shot cross-lingual transfer accuracy for hate spe…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.0/10 | 2026-06-16 06:18 |
| a63a1ae2-af59-40… | Gate 3 | formula_repro |
How does the scalability of CausalMixFT compare to other data augmentation methods (e.g., SMOTE, GAN-based augmentation) when fine-tuning ta…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-16 06:18 |
| 75b0a51f-2504-44… | Gate 3 | formula_repro |
Can SCM-based synthetic augmentation reduce the validation data requirements for early stopping in fine-tuning, as measured by the stability…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-16 06:18 |
| 3dd55559-98c7-47… | Gate 3 | formula_repro |
Do parameter-efficient fine-tuning methods like LoRA maintain instance segmentation performance on COCO when applied to other transformer ba…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 0.0/10
|
5.5/10 | 2026-06-16 06:18 |
| b451d079-8c4b-4d… | Gate 3 | formula_repro |
How does the integration of CausalMixFT-generated synthetic data affect the fine-tuning convergence speed and validation accuracy of tabular…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.2/10 | 2026-06-16 06:18 |
| 069ce81a-1531-45… | Gate 3 | formula_repro |
Does CausalMixFT outperform diffusion-based data augmentation (e.g., DiffAugment) in terms of robustness to covariate shift when fine-tuning…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.0/10
|
9.1/10 | 2026-06-16 06:17 |
| a97c638e-389e-47… | Gate 3 | formula_repro |
How does integrating causal structure into TabPFN's synthetic data generation affect its performance on downstream task accuracy across diff…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.7/10 | 2026-06-16 06:17 |
| d7b2e2bf-4375-41… | Gate 2 | unknown | How do TimeGAN and VAE-generated synthetic financial time series compare in terms of robustness when used to evaluate the temporal reasoning… | - | 2026-06-16 01:04 |
| e0caf16c-fb4c-48… | Gate 3 | formula_repro |
How does varying the depth of LoRA adapter injection in Lugha-Llama affect cross-lingual alignment accuracy on low-resource Swahili-English …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.3/10 | 2026-06-16 00:15 |
| bc889895-c22b-44… | Gate 3 | formula_repro |
To what extent does the combination of SFT and DPO degrade the zero-shot reasoning capabilities of OPT-350M on the Big-Bench Hard suite rela…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.2/10 | 2026-06-16 00:14 |
| fcc53574-9cfa-44… | Gate 3 | formula_repro |
How does the reasoning accuracy of multimodal large language models compare to diffusion-based trajectory policies in dynamic task planning …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.0/10 | 2026-06-16 00:14 |
| 852ac52e-42d2-4e… | Gate 3 | formula_repro |
How does the hybrid batch training strategy impact zero-shot cross-lingual retrieval accuracy on low-resource languages within the XQuAD ben…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-16 00:14 |
| f8fc038d-df35-4b… | Gate 3 | formula_repro |
How does the scaling of synthetic data diversity in tabular foundation model pretraining affect accuracy degradation under distributional sh…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 8.5/10
|
7.9/10 | 2026-06-16 00:14 |
| f1cb0512-e3b3-44… | Gate 3 | formula_repro |
To what extent does incorporating causal priors via CausalMixFT improve out-of-distribution (OOD) robustness in tabular foundation models, a…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.5/10 | 2026-06-16 00:13 |
| 4efb4ac7-2558-4e… | Gate 3 | formula_repro |
How does the cross-lingual query generation approach compare to cross-lingual passage generation in terms of enhancing the alignment capabil…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.0/10
|
8.7/10 | 2026-06-15 18:13 |
| e4b97bd1-f023-4b… | Gate 3 | formula_repro |
What is the correlation between training data volume in WebFAQ 2.0 and zero-shot cross-lingual retrieval performance gaps across the 75 supp…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
|
6.8/10 | 2026-06-15 18:11 |
| b68d34da-d3d1-43… | Gate 3 | formula_repro |
Can synergistic optimization of monolingual and cross-lingual objectives reduce performance degradation on the XTREME retrieval benchmark fo…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 1.5/10
|
6.3/10 | 2026-06-15 18:10 |
| de04d7fd-23ac-49… | Gate 3 | formula_repro |
Does the hybrid batch training strategy improve zero-shot cross-lingual retrieval performance on downstream datasets like MIRACL or XNLI whe…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10
|
8.5/10 | 2026-06-15 18:08 |
| 82ef2513-1ed2-43… | Gate 3 | formula_repro |
Does early-layer LoRA adaptation for lexical alignment in Lugha-Llama maintain zero-shot translation accuracy on morphologically rich Bantu …
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.8/10 | 2026-06-15 18:08 |
| 79f75ace-7cdb-4f… | Gate 3 | formula_repro |
How does the alignment of synthetic financial data generated by GANs versus VAEs influence the downstream performance of multimodal models i…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-15 18:07 |
| be269da2-4b74-48… | Gate 3 | formula_repro |
How does the noise level in automatically extracted bilingual lexicons impact the zero-shot cross-lingual retrieval accuracy of code-switche…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.5/10 · REPLICATION ATTACKER: 6.5/10
|
5.2/10 | 2026-06-15 18:07 |
| 1ce91234-09d8-44… | Gate 3 | formula_repro |
How does early-layer LoRA fine-tuning for lexical alignment in Lugha-Llama compare to full-parameter fine-tuning on zero-shot cross-lingual …
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-15 18:07 |
| d78d2f8b-f3d8-45… | Gate 3 | formula_repro |
How does early-layer LoRA fine-tuning for lexical alignment in Lugha-Llama compare to full-parameter fine-tuning on cross-lingual retrieval …
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.3/10 | 2026-06-15 12:06 |
| fb1c1363-5e72-48… | Gate 3 | formula_repro |
Does early-layer LoRA fine-tuning improve cross-lingual lexical alignment more effectively than full-model fine-tuning for low-resource Afri…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 7.5/10
|
7.4/10 | 2026-06-15 12:05 |
| 9f2d0d5b-7064-49… | Gate 3 | formula_repro |
How does fine-tuning dense retrieval models on WebFAQ's 47 million non-English pairs impact zero-shot cross-lingual transfer accuracy on the…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10
|
7.2/10 | 2026-06-15 12:05 |
| c64e666b-968b-44… | Gate 3 | formula_repro |
How does training on artificially code-switched data compare to translate-train methods in improving zero-shot cross-lingual retrieval accur…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.5/10 | 2026-06-15 12:05 |
| befe8d82-5024-48… | Gate 3 | formula_repro |
How does the performance of zero-shot cross-lingual retrieval models trained on artificially code-switched data compare to multilingual pret…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 6.5/10
|
6.2/10 | 2026-06-15 12:05 |
| 05a11663-8824-41… | Gate 3 | formula_repro |
How does hybrid batch training for simultaneous monolingual and cross-lingual optimization impact zero-shot retrieval accuracy on out-of-dom…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10
|
6.2/10 | 2026-06-15 12:05 |
| e10a1785-aeeb-42… | Gate 3 | formula_repro |
Does training on artificially code-switched data improve cross-lingual retrieval precision compared to monolingual training when evaluated o…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.1/10 | 2026-06-15 12:05 |
| 28d7ecd5-529b-47… | Gate 3 | formula_repro |
Does training on artificially code-switched data improve cross-lingual robustness on the XQuAD benchmark when evaluated against standard mul…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
|
8.3/10 | 2026-06-15 12:05 |
| 324f98c4-488c-47… | Gate 3 | formula_repro |
Does intermediate-task training on domain-specific English corpora improve zero-shot transfer performance on multilingual domain subsets of …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 6.5/10
|
8.2/10 | 2026-06-15 12:05 |
| 367b042b-53cd-40… | Gate 3 | formula_repro |
How does the alignment between MIDI symbolic input and audio output in Tacotron-based models compare to that of neural source-filter wavefor…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.8/10 | 2026-06-15 06:04 |
| 853e72b4-662b-4e… | Gate 3 | formula_repro |
What is the impact of TLI early-layer LoRA fine-tuning on the robustness of Lugha-Llama against adversarial lexical perturbations in low-res…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
|
5.7/10 | 2026-06-15 06:04 |
| e655d427-02d9-48… | Gate 3 | formula_repro |
How does early-layer LoRA adaptation in Lugha-Llama impact zero-shot cross-lingual retrieval accuracy on noisy Swahili-English datasets comp…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.8/10 | 2026-06-15 06:04 |
| b435d9ed-a132-4b… | Gate 3 | formula_repro |
How does early-layer LoRA fine-tuning for lexical injection compare to middle-layer adaptation in improving cross-lingual alignment scores o…
COUNTEREXAMPLE HUNTER: 3.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 9.5/10
|
6.7/10 | 2026-06-15 06:04 |
| c5a6b419-3445-44… | Gate 3 | formula_repro |
How does the token prioritization strategy in Vcc affect perplexity scores on the PG-19 benchmark compared to sparse attention patterns like…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 8.5/10
|
8.2/10 | 2026-06-15 06:04 |
| 757b4933-05c3-4b… | Gate 3 | formula_repro |
How does cross-lingual query generation compare to direct cross-lingual data training in terms of improving passage representation alignment…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-15 06:03 |
| 7ee1158c-08c4-4d… | Gate 3 | formula_repro |
Does augmenting passage representations with generated queries reduce the latency-throughput trade-off in cross-lingual dense retrieval syst…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.7/10 | 2026-06-15 06:03 |
| f149150d-181d-4e… | Gate 3 | formula_repro |
How does the robustness of zero-shot cross-lingual voice cloning in flow-matching TTS models vary when evaluated on noisy or adversarial inp…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-15 06:03 |
| 67bf0cbf-613c-4e… | Gate 3 | formula_repro |
How does the combined SFT+DPO alignment strategy impact the reasoning accuracy of OPT-350M on complex multilingual queries relative to stand…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
6.2/10 | 2026-06-15 06:02 |
| 780db65d-2d7d-42… | Gate 3 | formula_repro |
To what extent does increasing the scale of the base language model mitigate the degradation in helpfulness scores observed in OPT-350M afte…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
|
4.7/10 | 2026-06-15 06:02 |
| 9013c53e-30cb-47… | Gate 3 | formula_repro |
How does early-layer LoRA adaptation for lexical alignment in Lugha-Llama compare to full fine-tuning in zero-shot cross-lingual transfer ac…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.2/10 | 2026-06-15 00:02 |
| ae131cd2-d64a-42… | Gate 3 | formula_repro |
Does the latent cross-lingual alignment achieved via Targeted Lexical Injection in Lugha-Llama generalize to zero-shot machine translation p…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.3/10 | 2026-06-15 00:01 |
| 8d6c501f-4675-4a… | Gate 3 | formula_repro |
Do auxiliary objectives with factorized latent dynamics improve sample efficiency in small-scale Video-JEPA training relative to standard jo…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 6.5/10
|
4.6/10 | 2026-06-15 00:01 |
| 9ade25b1-b2f8-40… | Gate 3 | formula_repro |
What is the effect of factorized latent dynamics auxiliary objectives on the transfer learning performance of Video-JEPA when evaluated on d…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.2/10 · REPLICATION ATTACKER: 6.5/10
|
7.1/10 | 2026-06-15 00:01 |
| afe7316d-46f8-46… | Gate 3 | formula_repro |
How does CLIP-TD's zero-shot transfer accuracy on domain-shifted vision-language tasks compare to standard CLIP fine-tuning methods?
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10
|
7.5/10 | 2026-06-14 18:01 |
| db269ac8-afd2-4c… | Gate 3 | formula_repro |
How does the scaling of self-supervised pretraining data size affect the performance of few-shot meta-learners on language model benchmarks …
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
|
5.0/10 | 2026-06-14 18:01 |
| 72958e36-47e4-4e… | Gate 3 | formula_repro |
What is the impact of integrating motion-image diffusion priors on the robustness of vision-language-action models against adversarial pertu…
COUNTEREXAMPLE HUNTER: 7.2/10 · CITATION AUDITOR: 3.2/10 · REPLICATION ATTACKER: 8.5/10
|
6.3/10 | 2026-06-14 18:01 |
| 96f30ae4-f497-45… | Gate 3 | formula_repro |
How does the cross-lingual voice cloning performance of flow-matching TTS models compare to diffusion-based TTS models when evaluated on uns…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.2/10 | 2026-06-14 18:01 |
| cad169a5-a675-41… | Gate 3 | formula_repro |
How does the performance of Targeted Lexical Injection (TLI) compare to full fine-tuning and adapter-based methods on the XTREME-R benchmark…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-14 18:01 |
| cf8d5c87-fe0e-4d… | Gate 3 | formula_repro |
How does hybrid batch training affect the zero-shot cross-lingual retrieval accuracy of mBERT on low-resource language pairs compared to mon…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10
|
9.2/10 | 2026-06-14 17:59 |
| 463bd35f-feb7-45… | Gate 3 | formula_repro |
Does synergistic optimization of monolingual and cross-lingual objectives improve generalization to unseen language pairs in the BEIR zero-s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 2.5/10
|
7.0/10 | 2026-06-14 17:58 |
| 964ed2b1-5e26-41… | Gate 3 | formula_repro |
How does hybrid batch training affect zero-shot retrieval accuracy on low-resource languages in the XTREME benchmark compared to dedicated m…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.0/10
|
9.2/10 | 2026-06-14 17:58 |
| a6b4186a-a702-48… | Gate 3 | formula_repro |
What is the comparative robustness of CausalMixFT-generated synthetic data against other data augmentation methods (e.g., GAN-based or diffu…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
|
5.7/10 | 2026-06-14 11:57 |
| 42f5bcad-eb35-4b… | Gate 3 | formula_repro |
To what extent does CausalMixFT fine-tuning improve the generalization accuracy of tabular foundation models under data scarcity compared to…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.8/10 | 2026-06-14 11:57 |
| d9c74c0f-cb53-41… | Gate 3 | formula_repro |
How does the F1-score of multilingual transformer models compare to monolingual models when evaluated on code-mixed hate speech datasets wit…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-14 11:57 |
| 5ec1afab-3eea-4c… | Gate 3 | formula_repro |
How does early-layer LoRA lexical injection compare to middle-layer adaptation in improving zero-shot cross-lingual retrieval accuracy for S…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-14 11:56 |
| 4e42da41-1273-43… | Gate 3 | formula_repro |
To what extent does Targeted Lexical Injection improve cross-lingual alignment scores on the XCOPA dataset for underrepresented Bantu langua…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
|
6.8/10 | 2026-06-14 11:56 |
| 5073185f-b8b7-47… | Gate 3 | formula_repro |
What is the impact of context window size on the retrieval-augmented generation performance of quantized LoRA-adapted models when evaluating…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-14 11:56 |
| e486f159-c9b9-4f… | Gate 3 | formula_repro |
How does the alignment of multimodal embeddings (e.g., text and audio) in MUST-RAG affect the consistency and robustness of generated answer…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 9.5/10
|
6.8/10 | 2026-06-14 11:55 |
| 1fafcd2c-2be3-43… | Gate 3 | formula_repro |
How does the fidelity of structural causal models used for data augmentation impact the few-shot classification accuracy of fine-tuned tabul…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 3.0/10
|
6.5/10 | 2026-06-14 11:55 |
| e404ac75-ed39-43… | Gate 3 | formula_repro |
Can targeted lexical injection in Lugha-Llama achieve comparable zero-shot cross-lingual performance to MMPLMs like WMT21fb on clinical doma…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10
|
6.0/10 | 2026-06-14 11:55 |
| 7b8d797a-9b95-44… | Gate 3 | formula_repro |
How does the use of causal data augmentation techniques like CausalMixFT compare to traditional data augmentation methods (e.g., SMOTE, GAN-…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 8.5/10
|
8.9/10 | 2026-06-14 05:55 |
| cf1474bf-5527-47… | Gate 3 | formula_repro |
What is the accuracy degradation of generalized zero-shot learning models under norm-bounded perturbations across unseen classes?
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-14 05:54 |
| a7211fa0-6c1a-43… | Gate 3 | formula_repro |
How does fine-tuning dense retrieval models on native multilingual WebFAQ data impact zero-shot cross-lingual retrieval accuracy on XQuAD co…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10
|
6.0/10 | 2026-06-14 05:54 |
| 99730361-463c-4d… | Gate 3 | formula_repro |
Does early-layer LoRA fine-tuning improve zero-shot cross-lingual natural language inference accuracy for low-resource African languages com…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.0/10 | 2026-06-14 05:54 |
| fdb047e5-d021-45… | Gate 3 | formula_repro |
How does the performance of dense retrieval models trained on WebFAQ compare to those trained on Wikipedia-based datasets like Natural Quest…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-14 05:53 |
| 3be16ffd-6b7d-4f… | Gate 3 | formula_repro |
How does the ratio of synthetic to real pretraining data impact the few-shot classification accuracy of multimodal video-language models on …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-14 05:53 |
| 038c8225-74bd-49… | Gate 3 | formula_repro |
How does contrastive pretraining objective selection impact cross-lingual retrieval accuracy for low-resource language pairs in the XTREME b…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
|
7.0/10 | 2026-06-14 05:53 |
| d1a71185-38af-4c… | Gate 3 | formula_repro |
Does hybrid batch training improve cross-domain generalization for multilingual retrieval models on unseen topics in low-resource languages …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-14 05:53 |
| 70f05cea-fe5d-40… | Gate 3 | formula_repro |
What is the impact of hybrid batch training on the scaling behavior of zero-shot retrieval accuracy when extending from low-resource to high…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.0/10 | 2026-06-14 05:53 |
| 86a0376b-0fad-42… | Gate 3 | formula_repro |
What is the impact of mixed-precision inference (e.g., FP16 vs. BF16) on the efficiency-accuracy trade-off for long-context models like Long…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.0/10 | 2026-06-14 05:53 |
| d7973fbb-05f8-47… | Gate 3 | formula_repro |
How does the zero-shot cross-lingual retrieval accuracy of a multilingual encoder pre-trained on WebFAQ's 47M non-English QA pairs compare t…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.0/10 · REPLICATION ATTACKER: 9.5/10
|
6.8/10 | 2026-06-13 23:53 |
| 3378e994-9dc5-4c… | Gate 3 | formula_repro |
Does scaling the proportion of non-English WebFAQ fine-tuning data improve retrieval latency and accuracy trade-offs for cross-lingual tasks…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 23:52 |
| 36a277bf-5c26-40… | Gate 3 | formula_repro |
What is the impact of incorporating visual modality into self-supervised learning for speech representations on the robustness of neural sou…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 6.5/10
|
6.7/10 | 2026-06-13 23:52 |
| 809b945c-726a-44… | Gate 3 | formula_repro |
What is the impact of mixed-dataset pretraining versus single-dataset pretraining on the robustness of Video-JEPA representations to tempora…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.8/10 | 2026-06-13 23:52 |
| da64baea-9f76-40… | Gate 3 | formula_repro |
What is the impact of varying the ratio of synthetic to real data in CausalMixFT on the fine-tuning performance of tabular foundation models…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-13 23:51 |
| b83881e6-6035-48… | Gate 3 | formula_repro |
What is the impact of causal data augmentation proportions on the sample efficiency and convergence speed of fine-tuning tabular foundation …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 23:51 |
| c90f70ac-13c1-4c… | Gate 3 | formula_repro |
What is the correlation between the fidelity of synthetic tabular samples generated via SCMs and the downstream fine-tuning performance of f…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 7.0/10
|
8.4/10 | 2026-06-13 23:51 |
| 0b01bfe2-6818-4c… | Gate 3 | formula_repro |
Does integrating causal structure into synthetic data generation improve the robustness of TabPFN against feature permutation compared to st…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.5/10 | 2026-06-13 23:51 |
| b744a045-e120-4e… | Gate 3 | formula_repro |
What is the impact of varying the proportion of causal synthetic data during fine-tuning on the robustness of tabular foundation models acro…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.7/10 | 2026-06-13 23:51 |
| 1ab47328-816a-4c… | Gate 3 | formula_repro |
How does the robustness of dense retrievers pretrained on WebFAQ compare to those trained on monolingual datasets when evaluated on adversar…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.8/10 | 2026-06-13 23:50 |
| e3b701fc-8dd1-45… | Gate 3 | formula_repro |
How does the performance of Video-JEPA models with factorized latent dynamics compare to non-factorized variants when evaluated on the Somet…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 8.5/10
|
5.0/10 | 2026-06-13 17:50 |
| 1613d088-356e-46… | Gate 3 | formula_repro |
What is the impact of mixed-dataset pretraining (UCF-101 + Something-Something V2 + ImageNet-100) on the accuracy of Video-JEPA models with …
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 8.5/10
|
6.2/10 | 2026-06-13 17:50 |
| 51b48b75-c13e-4d… | Gate 3 | formula_repro |
Does the robustness gained from Targeted Lexical Injection in Lugha-Llama generalize to code-switched social media text as measured by F1 sc…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.8/10 | 2026-06-13 17:50 |
| fc099746-cbba-4e… | Gate 3 | formula_repro |
Does fine-tuning tabular foundation models with Structural Causal Model-based synthetic data improve generalization accuracy more than stand…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10
|
6.5/10 | 2026-06-13 17:50 |
| 2003206f-03e3-4e… | Gate 3 | formula_repro |
Does combining ImageNet-100 with video datasets improve the domain robustness of self-supervised Video-JEPA representations on heterogeneous…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
|
5.3/10 | 2026-06-13 17:50 |
| a836297c-8d52-40… | Gate 3 | formula_repro |
What is the impact of varying the rank of LoRA matrices on cross-lingual alignment for Turkic languages when fine-tuned on early layers, eva…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.3/10 | 2026-06-13 17:50 |
| 469e9bad-7f2e-41… | Gate 3 | formula_repro |
How does the generalization performance of CausalMixFT compare to other data augmentation methods (e.g., Mixup, SMOTE) when fine-tuning tabu…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.9/10 | 2026-06-13 17:50 |
| d74469d0-67aa-44… | Gate 3 | formula_repro |
How does fine-tuning dense retrieval models on WebFAQ's 47 million non-English pairs impact zero-shot cross-lingual transfer accuracy on the…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 17:50 |
| a76c6c0e-24bb-4e… | Gate 3 | formula_repro |
What is the comparative robustness of early-layer LoRA versus full-parameter fine-tuning for Lugha-Llama on cross-lingual natural language i…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 17:50 |
| 8df4041f-79f6-45… | Gate 3 | formula_repro |
How does the incorporation of auxiliary objectives in Video-JEPA models impact the robustness of learned representations when evaluated on o…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.7/10 | 2026-06-13 11:49 |
| f33863b8-6dc3-46… | Gate 3 | formula_repro |
How does factorized latent dynamics in Video-JEPA compare to standard JEPA in cross-domain transfer accuracy from synthetic to real-world vi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-13 11:49 |
| 063ca2b8-24d2-46… | Gate 3 | formula_repro |
What is the impact of varying the number of LoRA layers on cross-lingual lexical alignment in Lugha-Llama when benchmarked against the FLORE…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 11:49 |
| 9dadb434-2322-46… | Gate 3 | formula_repro |
How do bitwise neural networks with stochastic inference techniques perform in comparison to full-precision networks with Monte Carlo dropou…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-13 11:49 |
| bebb97c4-3ac3-46… | Gate 3 | formula_repro |
What is the effect of the SFT+DPO alignment strategy on the helpfulness retention rate of OPT-350M when evaluated on the Anthropic Helpful-H…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 11:48 |
| 8a29c57f-aef4-47… | Gate 3 | formula_repro |
How does retrieval-augmented revision compare to adversarial training in improving Big-Vul detection accuracy for Llama-3.1-8B without requi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.7/10 | 2026-06-13 11:48 |
| dbd73bd9-a579-40… | Gate 3 | formula_repro |
How does fine-tuning dense retrieval models on the non-English subset of WebFAQ impact cross-lingual zero-shot performance on TyDi QA compar…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 7.5/10
|
5.0/10 | 2026-06-13 11:48 |
| e890a0c6-75f2-41… | Gate 3 | formula_repro |
What is the impact of fine-tuning WebFAQ-pretrained dense retrieval models on downstream cross-lingual NLI tasks, as measured by XNLI accura…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 11:48 |
| 6910a60a-f0a2-40… | Gate 3 | formula_repro |
Do auxiliary factorized objectives in Video-JEPA improve few-shot learning performance on fine-grained video benchmarks relative to non-fact…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 9.5/10
|
8.1/10 | 2026-06-13 11:48 |
| 31ae88ad-ede0-43… | Gate 3 | formula_repro |
To what extent does Direct Preference Optimization enhance the robustness of counter-speech models against adversarial hate speech inputs co…
COUNTEREXAMPLE HUNTER: 4.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
7.7/10 | 2026-06-13 05:48 |
| d82913b1-e2c4-40… | Gate 3 | formula_repro |
How does retrieval diversity in music-specific RAG frameworks impact answer robustness against adversarial perturbations compared to general…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 8.5/10
|
9.0/10 | 2026-06-13 05:48 |
| 3cb37eff-ef87-4e… | Gate 3 | formula_repro |
How does the multimodal capture component in Expert Mind affect VQA accuracy on domain-specific datasets compared to text-only RAG baselines…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 05:47 |
| e447cada-12a9-4f… | Gate 3 | formula_repro |
What is the comparative effect of graph sparsity versus density on the F1-score performance of retrieval-augmented generation models in zero…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 5.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.0/10 | 2026-06-13 05:47 |
| fd93dc1c-d547-4d… | Gate 3 | formula_repro |
How does the MRR of cross-lingual dense retrieval models degrade on WebFAQ low-resource language families compared to high-resource ones whe…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
|
7.8/10 | 2026-06-13 05:47 |
| 840b7dfc-7587-47… | Gate 3 | formula_repro |
What is the impact of scaling the multilingual dense retriever model size (e.g., small vs. large) on retrieval performance across low-resour…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 9.5/10
|
6.8/10 | 2026-06-13 05:47 |
| 223a5aad-7c31-46… | Gate 3 | formula_repro |
To what extent does training on artificially code-switched data improve cross-lingual retrieval robustness for low-resource languages compar…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 05:47 |
| cf7bfdf5-1e02-40… | Gate 3 | formula_repro |
Does training dense retrievers on WebFAQ 2.0's bilingual aligned pairs improve zero-shot question answering accuracy on multilingual benchma…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-13 05:46 |
| 6811def3-b807-4f… | Gate 3 | formula_repro |
How does fine-tuning dense retrieval models on WebFAQ's non-English subsets impact zero-shot cross-lingual retrieval accuracy on the XTREME …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10
|
8.7/10 | 2026-06-13 05:46 |
| 80d91e4a-f8da-4d… | Gate 3 | formula_repro |
What is the impact of injecting LoRA adapters exclusively into attention mechanisms versus feed-forward networks in Llama-3.2-3B on the late…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 9.5/10
|
7.5/10 | 2026-06-12 21:25 |
| 1b596d33-8278-4f… | Gate 3 | formula_repro |
What is the impact of fine-tuning CodeT5 with adversarial training on its semantic consistency and robustness accuracy in generalized zero-s…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-12 21:23 |
| 04a7cbf6-efb9-4b… | Gate 3 | formula_repro |
How does CausalMixFT compare to other data augmentation techniques (e.g., SMOTE, MixUp) in terms of fine-tuning robustness on tabular datase…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.5/10 | 2026-06-12 13:50 |
| 5e879d86-d825-41… | Gate 3 | formula_repro |
How does the ratio of synthetic-to-real data in CausalMixFT affect the F1 score variance of tabular foundation models on TabFact across mult…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.8/10 | 2026-06-12 13:50 |
| d96ded43-64c0-44… | Gate 3 | formula_repro |
How does evidential deep learning with non-negative evidence constraints affect cross-modal retrieval accuracy on CLIP and ALBEF compared to…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.2/10 | 2026-06-12 13:50 |
| 6330a381-0e16-4e… | Gate 3 | formula_repro |
How does the data augmentation strategy used in scTab compare in effectiveness to other state-of-the-art data augmentation techniques when a…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 8.5/10
|
7.2/10 | 2026-06-12 07:43 |
| 63170d1a-bec2-45… | Gate 3 | formula_repro |
To what extent does the causal structure complexity (e.g., number of confounders or mediators) in the SCM used for CausalMixFT affect the ge…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 4.2/10 · REPLICATION ATTACKER: 7.5/10
|
6.7/10 | 2026-06-12 07:43 |
| 475d3f67-fd79-47… | Gate 3 | formula_repro |
How does the generalization of scaled tabular models trained on Criteo data perform on unseen high-cardinality categorical features in other…
COUNTEREXAMPLE HUNTER: 7.3/10 · CITATION AUDITOR: 4.5/10 · REPLICATION ATTACKER: 6.5/10
|
6.1/10 | 2026-06-12 07:43 |
| 790e88e1-86e1-4d… | Gate 3 | formula_repro |
How does the domain gap between synthetic and real-world video data affect the zero-shot accuracy of CLIP-based video encoders in gesture re…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 1.0/10
|
6.3/10 | 2026-06-12 07:43 |
| b22d1b2d-fd2b-41… | Gate 2 | unknown | How do TabPFN, CTGAN, and CausalMixFT perform in cross-domain tabular data generation tasks when evaluated on both synthetic and real-world … | - | 2026-06-12 05:07 |
| 7e2cde64-adf0-4b… | Gate 3 | formula_repro |
Can causal synthetic data generation improve the robustness of tabular foundation models against distribution shifts in cross-domain evaluat…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10
|
8.7/10 | 2026-06-12 01:36 |
| bc4f6f71-a74f-4b… | Gate 3 | formula_repro |
To what extent does the choice of Structural Causal Model (SCM) backbone (e.g., linear vs. nonlinear) in CausalMixFT affect few-shot accurac…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
5.7/10 | 2026-06-12 01:35 |
| 41ba449b-d600-44… | Gate 3 | formula_repro |
How does the CMAL framework's image-text alignment performance on COCO and Flickr30K compare to CLIP and ALBEF in terms of Recall@1 and NDCG…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-12 01:35 |
| aa621143-3436-4a… | Gate 3 | formula_repro |
Does the scaling behavior of XSimGCL's contrastive loss formulation yield superior convergence rates compared to LightGCL when trained on de…
COUNTEREXAMPLE HUNTER: 10.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 10.0/10
|
9.8/10 | 2026-06-12 01:34 |
| 7958ccbd-1a8f-47… | Gate 3 | formula_repro |
What is the impact of the novel web-crawled data collection strategy in WebFAQ 2.0 on the domain generalization capabilities of multilingual…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-11 19:28 |
| 9f94de5f-bbb3-45… | Gate 3 | formula_repro |
What is the impact of varying the ratio of synthetic-to-real samples in CausalMixFT on the calibration error and generalization performance …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.2/10
|
9.1/10 | 2026-06-11 19:28 |
| aa6a06f7-1784-40… | Gate 3 | formula_repro |
What is the effect of curriculum learning strategies on the accuracy of large multimodal models evaluated on the MedQA benchmark?
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-11 19:28 |
| 8018fed0-9d06-45… | Gate 3 | formula_repro |
How does curriculum-based multi-task learning impact the inference latency of large multimodal models on sparse medical image-text pairs?
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-11 19:25 |
| 3c14268f-b85e-4f… | Gate 3 | formula_repro |
What is the comparative memory footprint and inference latency of multi-task trained vision-language models versus single-task baselines on …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 0.0/10
|
6.2/10 | 2026-06-11 19:25 |
| a324f8d3-15a0-49… | Gate 3 | formula_repro |
How does the stochastic inference technique in bitwise neural networks compare to other ensemble methods (e.g., snapshot ensembles, Monte Ca…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-11 19:25 |
| 3bb4313b-0a73-4d… | Gate 3 | formula_repro |
To what extent does training dense retrievers on the bilingual aligned QA pairs in WebFAQ 2.0 improve alignment metrics and retrieval robust…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 7.5/10
|
8.7/10 | 2026-06-11 13:19 |
| 46e484e7-8529-4e… | Gate 3 | formula_repro |
To what extent does the inclusion of 47 million non-English WebFAQ pairs improve the robustness of multilingual encoders against domain shif…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 9.5/10
|
6.2/10 | 2026-06-11 13:18 |
| 72716d26-4a67-4e… | Gate 3 | formula_repro |
How do multilingual dense retrievers trained on SWIM-IR perform on low-resource languages in BEIR compared to models trained on natural mult…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 2.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.3/10 | 2026-06-11 13:18 |
| a9865db0-e4a2-4f… | Gate 3 | formula_repro |
How do different alignment strategies in multimodal models impact inference throughput in low-resource settings when evaluated on BRATS with…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-11 13:17 |
| e733595e-23e1-4e… | Gate 3 | formula_repro |
What is the comparative robustness of multimodal reasoning in language models with different alignment strategies when applied to cross-doma…
COUNTEREXAMPLE HUNTER: 8.2/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.7/10 | 2026-06-11 13:17 |
| 6124f4e5-dc19-47… | Gate 3 | formula_repro |
To what extent does layer-wise KV cache reconstruction in methods like ReST-KV artificially inflate needle-in-a-haystack scores relative to …
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-11 07:12 |
| 30fba3f7-6edd-44… | Gate 3 | formula_repro |
Reproducibility meta-analysis: 3 independent publications report divergent Qwen2.5 performance on Docvqa with a 80.3 percentage-point spread…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-11 07:11 |
| 0595bd4f-0470-40… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.5/10 | 2026-06-11 01:23 |
| 50a4f525-1f3f-43… | Gate 3 | formula_repro |
What is the performance degradation of Unified-IO 2 on the VQA-v2 dataset when audio modalities are introduced as distractors versus text-on…
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 5.0/10 · REPLICATION ATTACKER: 7.5/10
|
6.7/10 | 2026-06-11 01:22 |
| f82ac2f4-1a92-4e… | Gate 3 | formula_repro |
How does GRACE's quantization-aware training scale with model size, and how does it affect performance on the MME and MM1K benchmarks when a…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.2/10 · REPLICATION ATTACKER: 2.0/10
|
5.9/10 | 2026-06-11 01:22 |
| cc2d0e37-a950-4a… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 3.5/10
|
7.0/10 | 2026-06-10 19:35 |
| 673590a7-25e9-41… | Gate 3 | formula_repro |
How does Qwen3's performance on GPQA Diamond compare to other frontier models when evaluated under chain-of-thought prompting versus standar…
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 9.5/10
|
6.2/10 | 2026-06-10 19:35 |
| b472d355-87a8-45… | Gate 3 | formula_repro |
How do language models compare to human experts on professional knowledge and science benchmarks v19
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.7/10 | 2026-06-10 19:34 |
| b5058ffc-3f4d-46… | Gate 3 | formula_repro |
What is the impact of million-token context windows on multimodal reasoning accuracy in Gemini 1.5 Pro versus prior versions?
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
8.8/10 | 2026-06-10 19:34 |
| 09acaf30-ab81-49… | Gate 3 | formula_repro |
To what extent does chain-of-thought prompting mitigate performance degradation in long-horizon reasoning tasks for LLMs evaluated on the Bi…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-10 19:34 |
| 031cd03f-2fbe-4d… | Gate 3 | formula_repro |
What are the benchmark performance scores of GLM-4.5-Air on reasoning mathematics coding and language understanding tasks
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 0.0/10 · REPLICATION ATTACKER: 8.5/10
|
5.8/10 | 2026-06-10 19:34 |
| 99e0cc2f-ae34-40… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.2/10 · REPLICATION ATTACKER: 8.5/10
|
8.9/10 | 2026-06-10 16:52 |
| 73ec2b2b-e67b-47… | Gate 3 | formula_repro |
What is the cross-domain generalization capability of OpenPangu-7B-MLA on empathetic speech understanding tasks when evaluated on MMSU and o…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-10 16:51 |
| dd13f070-1013-42… | Gate 3 | formula_repro |
How does the performance of self-supervised foundation models on tabular data classification compare to standard normalization techniques wh…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.0/10 | 2026-06-10 16:51 |
| 2483aaac-7f84-4c… | Gate 3 | formula_repro |
To what extent does fine-tuning on adversarial multi-hop QA examples improve the robustness of RAG systems against distractor contexts compa…
COUNTEREXAMPLE HUNTER: 9.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.5/10 | 2026-06-10 16:51 |
| 3cbe8120-1209-45… | Gate 3 | formula_repro |
How does fine-tuning on AdvRACE affect the cross-lingual robustness of MRC models when evaluated on adversarial perturbations in non-English…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-10 16:51 |
| 4a909146-446f-4d… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 2.5/10
|
6.3/10 | 2026-06-10 10:48 |
| 426ccfd7-06e6-40… | Gate 3 | formula_repro |
How does the integration of non-lexical vocal cues in multimodal language models like OpenPangu-7B-MLA affect downstream task performance on…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 3.5/10 · REPLICATION ATTACKER: 7.5/10
|
6.5/10 | 2026-06-10 10:47 |
| 294a5d5b-f300-40… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
5.7/10 | 2026-06-10 08:45 |
| 18e28019-37fb-4c… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 9.2/10
|
5.6/10 | 2026-06-10 08:45 |
| a80b4a8e-8700-4c… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.9/10 | 2026-06-10 08:44 |
| e26d33b4-a5b3-48… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 6.5/10 · REPLICATION ATTACKER: 3.0/10
|
6.2/10 | 2026-06-10 08:44 |
| 30bd9c9a-90c8-4e… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
|
8.9/10 | 2026-06-10 08:43 |
| 3b783c5d-ec77-4e… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 9.2/10
|
5.9/10 | 2026-06-10 08:42 |
| 388b9655-1a81-4e… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 0.0/10
|
5.8/10 | 2026-06-10 08:42 |
| 42863d1d-2f6a-41… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.7/10 | 2026-06-10 08:42 |
| 0e47786d-3f42-43… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 7.5/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.2/10 | 2026-06-10 08:41 |
| 3904006d-6cfc-42… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.7/10 | 2026-06-10 08:41 |
| 845a22c0-61ad-4e… | Gate 3 | arithmetic_repro |
-
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 8.5/10 · REPLICATION ATTACKER: 8.5/10
|
8.7/10 | 2026-06-10 08:41 |
| 42a5d013-2da3-4d… | Gate 3 | unknown |
-
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.2/10 | 2026-06-10 08:36 |
| 8520660f-c1c4-4c… | Gate 3 | unknown |
-
COUNTEREXAMPLE HUNTER: 0.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
6.3/10 | 2026-06-10 08:36 |
| fa1dffe8-f9a9-4f… | Gate 3 | formula_repro |
How does the F1-score of diffusion-based tabular generative models compare to CTGAN when augmenting data for training LLMs on imbalanced tex…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-10 08:35 |
| ee851b65-000d-44… | Gate 3 | formula_repro |
What is the impact of varying the pretraining dataset size and diversity on the cross-domain generalization capabilities of tabular foundati…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-10 08:35 |
| 11c29061-cf3e-4b… | Gate 3 | formula_repro |
Does scaling the size of domain-specific training data for RAG models improve alignment with human evaluators when measured by RAGalyst's me…
COUNTEREXAMPLE HUNTER: 8.5/10 · CITATION AUDITOR: 7.5/10 · REPLICATION ATTACKER: 7.5/10
|
7.8/10 | 2026-06-10 08:35 |
| 9f6b0926-918c-40… | Gate 3 | formula_repro |
How does the scaling of unlabeled video-audio pretraining data affect the few-shot adaptation accuracy of latent action models on the RoboBe…
COUNTEREXAMPLE HUNTER: 9.0/10 · CITATION AUDITOR: 9.5/10 · REPLICATION ATTACKER: 9.5/10
|
9.3/10 | 2026-06-10 08:35 |
Math Counterexample Kills (105 total, showing 100)
Conjectures generated by the autonomous math research pipeline and killed at Gate 1 when a numerical counterexample was found. These never reach the Lean 4 proof stage.
| Conjecture ID | Problem | Statement (falsified) | Killed (UTC) |
|---|---|---|---|
| 843d975d77414a55… | Ramsey R(5,5) — upper bound improvement | In any 2-coloring of the edges of K_43 that contains no monochromatic K_5, there exists no vertex v such that the red degree of v is exactly 21 AND the red neighborhood of v induces a subgraph containing a red triangle. … | 2026-06-21 01:49 |
| e7d599128e7d45b9… | Twin prime density — Hardy-Littlewood conjecture v | For all integers x >= 100, the absolute difference between the actual count of twin prime pairs up to x and the Hardy-Littlewood prediction (2*C2*x/ln(x)^2) is strictly bounded by the square root of the prediction itself… | 2026-06-20 21:43 |
| 63dbbefc6b334b33… | Twin prime conjecture — density analysis | For every integer N >= 10,000, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let S_odd(N) be the sum of the smaller primes p in these pairs where p ends in the digit 3 or 9, and S_even(N) be the sum whe… | 2026-06-20 17:35 |
| 342d5e78b226469c… | Fibonacci primes — density conjecture | For every index n > 4 such that the Fibonacci number F_n is prime, the index n itself must be a prime number that can be expressed as the sum of two squares (i.e., n is a Pythagorean prime or n=2). Consequently, no Fibon… | 2026-06-20 09:23 |
| cd9bb9306d6a4edf… | Fibonacci primes — density conjecture | For all integers n > 4 such that the n-th Fibonacci number F_n is prime, the index n must satisfy n ≡ 1 or 2 (mod 5). Furthermore, if n ≡ 2 (mod 5), then n must be exactly 3. Consequently, for all Fibonacci primes with i… | 2026-06-20 09:23 |
| 01cae6580a0347ec… | Geometric Sum Identity | The sum of the first n odd powers of 3 is given by the closed-form formula (3^(2n) - 1) / 8. | 2026-06-20 03:58 |
| 75ca28388123479e… | Gauss Sum Identity | For any natural number n, the sum of integers from 0 to n multiplied by 2 equals n times (n+1). Specifically verified for n=100. | 2026-06-19 18:21 |
| afd87b814d86453c… | Square Minus Square Factoring | For any natural number n less than 100, the square of n is either even or odd. | 2026-06-19 18:20 |
| 0b56f01b498740fa… | Square Minus Square Factoring | For every natural number n less than 100, the square of n is either even or odd. | 2026-06-19 18:20 |
| 7a77efd70f1a4118… | Quadratic Residue mod 3 | For every natural number n less than or equal to 100, the square of n modulo 3 is either 0 or 1. | 2026-06-19 14:15 |
| a7f5bd3a2fcc4f22… | Primes of form n^2+1 — density and distribution | For the sequence of primes of the form p = n^2 + 1, let S(x) be the set of such primes less than or equal to x. Define the 'quadratic gap ratio' for a prime p = n^2 + 1 (where n > 1) as R(p) = (p_next - p) / (2n), where … | 2026-06-19 01:53 |
| b94993ff3d514132… | Ramsey R(5,5) — upper bound improvement | In any 2-coloring of the edges of K_43 (the current lower bound for R(5,5)) that contains no monochromatic K_5, the maximum number of monochromatic K_4 subgraphs is exactly 204. Furthermore, any such extremal coloring mu… | 2026-06-18 17:32 |
| 68b38389e7ec452c… | Catalan's conjecture (Mihailescu) — Lean4 formal p | For any integer n > 1, if n is a perfect power (n = x^a with x > 1, a > 1), then the distance to the nearest other perfect power m (m != n, m = y^b with y > 1, b > 1) satisfies |n - m| > sqrt(n) * (ln(n))^0.8, with the s… | 2026-06-17 20:27 |
| 6516345988494423… | Catalan's conjecture (Mihailescu) — Lean4 formal p | For any integer n > 8 that is a perfect power (i.e., n = x^a with x, a > 1), the open interval (n, n + n^(5/6)) contains no other perfect powers. This conjecture asserts that for perfect powers greater than 8, the gap to… | 2026-06-17 20:26 |
| 2849c8ac1ec74318… | Geometric Sum Identity | The sum of the first 101 powers of 2 (from 2^0 to 2^100) equals 2^101 - 1. | 2026-06-17 20:26 |
| afc717278ab246f9… | Sum of Odd Numbers Identity | The sum of the first 42 odd positive integers equals 42 squared. | 2026-06-17 16:11 |
| 42ae3e2e624a4a21… | Square Minus Square Factoring | For any natural number n less than 100, the square of n is either even or odd. | 2026-06-17 12:05 |
| df9723f85369422d… | Square Minus Square Factoring | For every natural number n less than 100, the square of n is either even or odd (specifically, n squared modulo 2 is either 0 or 1). | 2026-06-17 12:05 |
| 849d0bcdc5b04211… | Quadratic Residue mod 4 | For every natural number n less than 100, the square of n modulo 4 is either 0 or 1. | 2026-06-17 08:01 |
| 3ab4d13b11594410… | OEIS A001065 — perfect number conjecture | For any even perfect number n > 6, let p be the largest prime factor of n (which is also the Mersenne prime exponent's base, i.e., n = 2^(p-1)*(2^p - 1)). The sum of the proper divisors of the Mersenne prime component (2… | 2026-06-16 23:52 |
| 39276ea98e4f49b0… | Primes of form n^2+1 — density conjecture | For every integer N >= 2, let P_N be the set of primes of the form k^2+1 less than or equal to N. Let M_N be the maximum gap between consecutive elements in the sorted sequence P_N (defining the first gap as p_1 - 0). Th… | 2026-06-16 07:15 |
| 816e34ad26774d21… | Twin prime density — Hardy-Littlewood conjecture v | The ratio of the actual count of twin prime pairs up to x to the Hardy-Littlewood prediction (2*C2*x/ln(x)^2) exhibits a systematic negative bias that decays according to a specific logarithmic correction term. Specifica… | 2026-06-16 03:05 |
| 1f35272d16e64f29… | Twin prime conjecture — density analysis | For any integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let S_3(N) be the count of such pairs where the smaller prime p satisfies p mod 3 = 1. The conjecture states that the deviation of… | 2026-06-15 22:59 |
| 6625e04ce42645a3… | Fibonacci primes — density conjecture | For all integers n > 3, if the n-th Fibonacci number F_n is prime, then n must be a prime number p such that p ≡ 1 (mod 4) or p = 3. In other words, there are no Fibonacci primes with prime indices p where p ≡ 3 (mod 4) … | 2026-06-15 14:49 |
| 6168a219a5854c3f… | Collatz conjecture — structural pattern search | For any integer n > 1, let S(n) be the set of odd integers encountered in the Collatz trajectory of n before reaching 1. Define the 'Odd-Step Parity Signature' P(n) as the sum of the indices (0-based) of all odd elements… | 2026-06-15 06:32 |
| d457ed3a611f478f… | Collatz conjecture — structural pattern search | For any integer n > 1, let S(n) be the set of odd integers encountered in the Collatz trajectory of n before reaching 1 (excluding the final 1). The conjecture states that the sum of the reciprocals of the elements in S(… | 2026-06-15 06:31 |
| b76082e436db434d… | Goldbach conjecture — computational extension | For every even integer n >= 10,000, there exists a Goldbach partition n = p + q (where p and q are prime) such that both p and q lie within the interval [n/2 - sqrt(n), n/2 + sqrt(n)] AND at least one of the primes p or … | 2026-06-15 02:26 |
| 1ffdfdaa6b324dfe… | Primes of form n^2+1 — density and distribution | For the sequence of integers n where n^2+1 is prime, let the gaps be defined as g_k = n_{k+1} - n_k. The conjecture states that for all k >= 2, the gap g_k is strictly less than 2.5 * sqrt(n_k) * ln(ln(n_k)). This refine… | 2026-06-15 02:25 |
| bb34d8f54e2641fb… | Ramsey multiplicity K_4 — minimum number of monoch | In any 2-coloring of the edges of K_18 that achieves the global minimum number of monochromatic K_4 subgraphs, the resulting color classes (graphs) must be isomorphic to each other. Furthermore, each color class must hav… | 2026-06-14 13:39 |
| d1b012cc38cd4dfd… | Fibonacci primes — density conjecture | For every integer n >= 5, if the nth Fibonacci number F_n is prime, then n must be a prime number p such that either p = 5 or p is congruent to 1 or 9 modulo 20. In other words, no Fibonacci prime exists at a prime index… | 2026-06-14 01:01 |
| 2516b894d67544f4… | Catalan's conjecture (Mihailescu) — Lean4 formal p | For any integer n > 1, if there exist two distinct perfect powers P1 = x^a and P2 = y^b (with x,y,a,b > 1) such that P1 < n < P2 and the gap G = P2 - P1 satisfies G < n^(1/3), then n must be equal to 26. Specifically, 26… | 2026-06-13 20:43 |
| e69a3d74fbc1457a… | Primes of form n^2+1 — density and distribution | For any integer N >= 100, let S_N be the set of primes of the form k^2+1 less than or equal to N. Let gaps_N be the sorted list of differences between consecutive elements in S_N. The conjecture states that the standard … | 2026-06-13 12:28 |
| fc8a413e3bb340b7… | Twin prime density — Hardy-Littlewood conjecture v | For all integers k >= 3, let T_k be the k-th twin prime pair (p_k, p_k+2). The fractional part of the square root of the smaller prime, {sqrt(p_k)}, is strictly less than 0.95, with the sole exception of the first twin p… | 2026-06-12 23:58 |
| ec12513113104894… | Twin prime conjecture — density analysis | For any integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. Let C2 be the twin prime constant (approx 0.66016). The conjecture states that the normalized residual R(N) = (T(N) - 2*C2*N/(ln N… | 2026-06-12 19:52 |
| 70cdbcc71ee04a64… | Fibonacci primes — density conjecture | For all indices n > 4 such that the Fibonacci number F_n is prime, the index n must be a prime number p satisfying the condition that 5 is a quadratic non-residue modulo p (i.e., the Legendre symbol (5/p) = -1), with the… | 2026-06-12 15:06 |
| 07ee02bc74414051… | Square Minus Square Factoring | For every natural number n less than 100, the square of n minus the square of (100 - n) equals 200 times n minus 10000. | 2026-06-12 02:26 |
| 58fbbd5840c74b8d… | Square Minus Square Factoring | For any natural number n less than 100, the square of n modulo 2 is either 0 or 1. | 2026-06-12 02:25 |
| 95f128c744214fc7… | OEIS A001065 — perfect number conjecture | For every even perfect number n > 6, the sum of the proper divisors of n that are congruent to 1 modulo 4 is strictly greater than the sum of the proper divisors congruent to 3 modulo 4. Specifically, if S_1(n) = sum{d |… | 2026-06-11 13:47 |
| 28a8b2a801ae431f… | Geometric Sum Identity | The sum of three consecutive geometric terms (base=2) equals 14. | 2026-06-11 09:38 |
| ee01e3b5cc8d48e1… | Geometric Sum Identity | The sum of powers of 2 from 2^0 to 2^15 equals 2^16 - 1. | 2026-06-11 09:38 |
| f6283ecc58b94fbe… | Square Minus Square Factoring | For every natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2. | 2026-06-11 04:38 |
| 1ef4058fa9054186… | Square Minus Square Factoring | For every natural number n less than 100, the square of n is either even or odd. | 2026-06-11 04:37 |
| 415722d528c54fb7… | Square Minus Square Factoring | For any natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2. | 2026-06-11 04:37 |
| 3c69744486e44d43… | Quadratic Residue mod 3 | For every natural number n less than 100, the square of n modulo 3 is either 0 or 1. | 2026-06-11 04:36 |
| 126dd256f8354b53… | Sum of Odd Numbers Identity | The sum of the first 100 odd positive integers equals 10,000. | 2026-06-10 22:08 |
| 3f7781c61e534635… | Square Minus Square Factoring | For every natural number n less than 100, the square of n modulo 2 is either 0 or 1. | 2026-06-10 18:05 |
| 8f940771d9454664… | Square Minus Square Factoring | For every natural number n from 0 to 99, the square of n modulo 2 is either 0 or 1. | 2026-06-10 18:05 |
| b33154755b914337… | Square Minus Square Factoring | For every natural number n less than 100, the square of n is congruent to either 0 or 1 modulo 2. | 2026-06-10 18:05 |
| 208383baf95c4350… | Sum of Odd Numbers Identity | The sum of the first 100 odd positive integers equals 100 squared. | 2026-06-10 09:18 |
| 5e66db8a98eb4f41… | Sum of Odd Numbers Identity | The sum of the first 150 odd positive integers equals 150 squared. | 2026-06-10 09:18 |
| 7f248d0d38c24d74… | Goldbach conjecture — computational extension | For every even integer n >= 100, there exists a Goldbach partition n = p + q (with p <= q) such that the prime p satisfies p > n/2 - sqrt(n) * (ln ln n)^2, AND p is a quadratic residue modulo the smallest prime factor of… | 2026-06-10 07:28 |
| c37b8d2825f34f46… | Primes of form n^2+1 — density and distribution | Let S(x) be the set of integers n in [1, x] such that n^2 + 1 is prime. For any two consecutive elements a, b in S(x) (with a < b), the gap g = b - a satisfies g < 2.5 * sqrt(a) * ln(a) for all x >= 1000. This conjecture… | 2026-06-10 07:25 |
| 007bb20f4be4478b… | Ramsey multiplicity K_4 — minimum number of monoch | In any 2-coloring of the edges of K_18 that achieves the global minimum number of monochromatic K_4 subgraphs, the resulting color classes (graphs) must both be isomorphic to the Turán graph T(18, 3). Consequently, the m… | 2026-06-10 01:47 |
| f600ae4401434fed… | Fibonacci primes — density conjecture | For every Fibonacci prime F_p with prime index p > 3, the quantity (F_p - 1) / p is never an integer. In other words, no Fibonacci prime (beyond F_3=2 and F_4=3, though 4 is not prime index, specifically checking p=5, 7,… | 2026-06-09 13:24 |
| e0243ef726e64075… | Collatz conjecture — structural pattern search | For any integer n > 1, let S(n) be the set of distinct values visited in the Collatz trajectory of n before reaching 1. Let M(n) be the maximum element in S(n). The conjecture states that the ratio of the count of odd nu… | 2026-06-09 00:53 |
| 45b6484654eb41d2… | Goldbach conjecture — computational extension | For every even integer n > 6, there exists a Goldbach partition n = p + q (with p <= q) such that the smaller prime p satisfies p > sqrt(n) and the product p*q is congruent to 1 modulo 24. | 2026-06-09 00:52 |
| b171ab227ec34a92… | Primes of form n^2+1 — density and distribution | Let P be the set of primes of the form n^2+1. For any x >= 10, let S(x) be the sum of the reciprocals of the square roots of the generators n for all such primes p = n^2+1 <= x. The conjecture states that S(x) is strictl… | 2026-06-08 20:48 |
| afae0570267f4f10… | Twin prime density — Hardy-Littlewood conjecture v | For all integers x >= 10,000, the relative error between the actual count of twin prime pairs up to x and the Hardy-Littlewood prediction (2*C2*x/ln(x)^2) is strictly bounded by the function 1.8 / ln(x). Specifically, |p… | 2026-06-08 07:37 |
| b77b197f38ef42b4… | Ramsey R(4,6) — computational bounds | In any 2-coloring of the edges of K_35 that avoids a red K_4 and a blue K_6 (if such a coloring exists), the maximum degree of any vertex in the red subgraph must be strictly less than 12. That is, Δ(Red) ≤ 11. | 2026-06-08 07:37 |
| b9da4d4e215345c4… | Fibonacci primes — density conjecture | For all integers n > 4, if the nth Fibonacci number F_n is prime, then n is either prime itself or n=4. Furthermore, for every prime index p > 3 such that F_p is composite, F_p possesses at least one prime factor q such … | 2026-06-07 23:17 |
| 24d0c43564104d67… | Goldbach conjecture — extend computational verific | For every even integer n > 10,000, there exists a Goldbach partition n = p + q (where p and q are primes) such that both p and q are 'isolated' within a window of size W(n) = floor(0.8 * ln(n) * ln(ln(n))). Specifically,… | 2026-06-07 15:00 |
| ab2f6f8062e94761… | OEIS A001065 — perfect number conjecture | For every even perfect number n > 6, the sum of the squares of its proper divisors is strictly congruent to 1 modulo the square of its associated Mersenne prime exponent. Specifically, if n = 2^(p-1)(2^p - 1) where p and… | 2026-06-07 14:59 |
| bc6085714d2046c4… | Primes of form n^2+1 — density and distribution | For the sequence of primes of the form p = n^2 + 1, let n_k be the k-th positive integer such that n_k^2 + 1 is prime. The conjecture states that for all k >= 2, the gap between consecutive bases n_k and n_{k-1} satisfie… | 2026-06-07 06:31 |
| 9f7d6c51e37b4088… | Twin prime density — Hardy-Littlewood conjecture v | For all x >= 1000, the actual count of twin prime pairs up to x strictly exceeds the standard Hardy-Littlewood prediction (2*C2*x/ln(x)^2) but remains bounded above by the prediction augmented with a specific second-orde… | 2026-06-06 18:04 |
| 845aaff7aef64a01… | Fibonacci primes — density conjecture | For every integer n >= 3, if the Fibonacci number F_n is prime, then n must be a prime number, AND the index n satisfies the property that 2n+1 is either a prime number or a semiprime (product of exactly two primes, not … | 2026-06-06 09:39 |
| 1028c83002d64c6f… | Catalan's conjecture (Mihailescu) — Lean4 formal p | For any integer n > 1, if n is a perfect power (n = x^a with x, a > 1) and the next consecutive perfect power m (m = y^b with y, b > 1, m > n) satisfies m - n = 1, then n must be 8. Furthermore, for any perfect power n >… | 2026-06-06 05:32 |
| 59707dec0c84466f… | OEIS A001065 — perfect number conjecture | For every even perfect number n > 6, the sum of the binary digits of (n/2) is strictly less than the number of distinct prime factors of (n-1). | 2026-06-06 01:23 |
| e1ea6af8e40f4d3a… | Collatz conjecture — structural pattern search | For any integer n > 1, let S(n) be the set of odd numbers encountered in the Collatz trajectory of n before reaching 1. Let m = min(S(n)). Then the total stopping time (number of steps to reach 1) is strictly less than m… | 2026-06-05 21:15 |
| 52c749523bfe490f… | Primes of form n^2+1 — density conjecture | For any integer n >= 2, let S_n be the set of primes of the form k^2+1 where k <= n. Let M_n be the maximum gap between consecutive elements in the sorted sequence S_n (defining the first gap as p_1 - 2). Then, M_n is st… | 2026-06-05 16:24 |
| b01e6c25195044a2… | Primes of form n^2+1 — density conjecture | For every integer n >= 1, the count of primes of the form k^2 + 1 with k <= n (denoted P(n)) satisfies the inequality P(n) >= floor(1.2 * sqrt(n) / ln(n)). Furthermore, for any n >= 100 where P(n) > 0, the gap between co… | 2026-06-05 16:22 |
| 20a77f3e6ff34241… | Fibonacci primes — density conjecture | For every Fibonacci prime F_p with index p > 5, the integer part of the square root of the index p, denoted as floor(sqrt(p)), is always a prime number. | 2026-06-04 23:37 |
| d177586b3b3d4762… | Primes of form n^2+1 — density and distribution | Let P_N be the set of primes of the form n^2+1 for 1 <= n <= N. Let A_N be the count of such primes where the generator n is itself a prime number. The conjecture states that for all N >= 1000, the ratio of the density o… | 2026-06-04 10:33 |
| 498269cc76514396… | Twin prime density — Hardy-Littlewood conjecture v | The normalized error term of the twin prime count, defined as E(x) = (pi_2(x) * ln(x)^2) / (2 * C2 * x) - 1, exhibits a persistent negative bias for all x in the range [10^4, 10^8]. Specifically, the conjecture states th… | 2026-06-03 22:07 |
| b81ab2bf8a3742e6… | OEIS A001065 — perfect number conjecture | For any even perfect number n > 6, let p be the unique Mersenne prime such that n = 2^(p-1)*(2^p - 1). The sum of the divisors of the exponent (p-1), denoted sigma(p-1), is strictly less than the square root of the Merse… | 2026-06-03 07:23 |
| af2e36aa3e2c473a… | Primes of form n^2+1 — density and distribution | For the sequence of primes of the form n^2+1, let p_k be the k-th such prime. The conjecture states that for all k >= 2, the gap between consecutive primes p_k and p_{k-1} satisfies: p_k - p_{k-1} < 2 * sqrt(p_k) * (ln(p… | 2026-06-03 02:25 |
| 3c752084d9a043ca… | Primes of form n^2+1 — density conjecture | For every integer n >= 2, let S_n be the set of primes of the form k^2+1 with k <= n. Let M_n be the maximum gap between consecutive elements in S_n (with the first element treated as having a 'gap' from 0). Then M_n < 4… | 2026-06-02 22:15 |
| 7c85eafa9f3a4c9b… | Twin prime conjecture — density analysis | For every integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N, and let S(N) be the sum of the reciprocals of the smaller primes in these pairs (i.e., sum(1/p) for all such p). The conjecture … | 2026-06-02 11:16 |
| 17b23802a7a14aa9… | Cap set problem — F_3^n maximum | Conjecture: For n=6, the maximum size of a cap set in F_3^6 is exactly 112, and this maximum is uniquely achieved (up to affine equivalence) by the set of vectors with weight congruent to 1 modulo 3 in the specific coord… | 2026-06-02 04:47 |
| 1bc7acdee264452e… | Catalan's conjecture (Mihailescu) — Lean4 formal p | For any integer n > 1, if n is a perfect power (n = x^a with x, a > 1), then the interval (n, n + n^(2/3)] contains no other perfect powers, except for the specific case where n = 8 (2^3), in which case the interval (8, … | 2026-06-02 04:44 |
| 3f02ace5c31a4891… | Goldbach conjecture — computational extension | For every even integer n > 100, there exists a Goldbach partition n = p + q (with p <= q) such that the prime p lies within the interval [n/2 - sqrt(n), n/2]. Furthermore, the smallest such prime p satisfies the stronger… | 2026-06-01 21:26 |
| 2d0c51fde717499c… | Primes of form n^2+1 — density and distribution | For all integers n >= 2, the gap between consecutive primes of the form k^2+1 is strictly less than 4 * sqrt(p_m) * ln(p_m), where p_m is the smaller prime in the pair. Furthermore, the ratio of the actual gap to this bo… | 2026-06-01 17:21 |
| 2229477e7b1a459e… | Primes of form n^2+1 — density and distribution | For the sequence of primes of the form n^2+1, let p_k = n_k^2+1 be the k-th such prime. The conjecture states that for all k >= 2, the gap between consecutive bases n_k and n_{k-1} satisfies: n_k - n_{k-1} < 2 * sqrt(n_{… | 2026-06-01 17:18 |
| 3032040e036b4ec0… | Primes of form n^2+1 — density conjecture | For every integer n >= 100, the number of primes of the form k^2+1 with k <= n is strictly greater than the number of primes of the form k^2+1 with k <= n/2 multiplied by the factor (1.3 * sqrt(n) / ln(n)). This conjectu… | 2026-06-01 13:41 |
| e2f7b4d3db414cd8… | Twin prime conjecture — density analysis | For every integer N >= 100, let T(N) be the count of twin prime pairs (p, p+2) with p <= N. The ratio of the actual count T(N) to the Hardy-Littlewood estimate E(N) = 2 * C_2 * N / (ln N)^2 (where C_2 is the twin prime c… | 2026-06-01 01:14 |
| 6d026f496bb045ec… | Fibonacci primes — density conjecture | For every integer n > 4, if the n-th Fibonacci number F_n is prime, then n must be a prime number p such that 5 is a quadratic non-residue modulo p (i.e., the Legendre symbol (5/p) = -1). This implies that all Fibonacci … | 2026-05-31 21:04 |
| 1b3e58ad75384c33… | Fibonacci primes — density conjecture | For all integers n > 6, if the nth Fibonacci number F_n is prime, then n must be a prime number that can be expressed as the sum of two squares (i.e., n is 2, or n is a prime congruent to 1 modulo 4). This implies that n… | 2026-05-31 21:03 |
| a29125de7d584756… | Catalan's conjecture (Mihailescu) — Lean4 formal p | For any integer n > 1 that is not 8, if n is a perfect power (n = x^a with x>1, a>1), then the smallest perfect power m > n (where m = y^b with y>1, b>1) satisfies the gap inequality m - n > n^0.55. The only exception to… | 2026-05-31 16:59 |
| 47c3e171d2ff4226… | OEIS A001065 — perfect number conjecture | For any even perfect number n > 6, let m = n/6. The sum of the proper divisors of m (denoted s(m)) is strictly greater than the square of the number of distinct prime factors of m (denoted omega(m)^2). | 2026-05-31 12:56 |
| 9c24c2404c5b4ea2… | Goldbach conjecture — computational extension | The sum of two primes representing an even number n > 2 has its maximal prime difference bounded by n^(0.51), where the exponent 0.51 is strictly between 0.5 and 1. This refines the trivial bound of n-3 by showing the di… | 2026-05-31 08:39 |
| 5a54ab3135de44a4… | Primes of form n^2+1 — density and distribution | For integers n >= 2, let P(n) be the set of primes of the form k^2+1 less than or equal to n. Let G(n) be the maximum gap between consecutive elements in P(n) (with the first gap defined as p_1 - 2). The conjecture state… | 2026-05-31 05:55 |
| 815317887cf646b1… | Primes of form n^2+1 — density and distribution | For any integer N >= 100, let S_N be the set of primes p <= N such that p = k^2 + 1 for some integer k. Let M_N be the maximum gap between consecutive elements in the sorted sequence S_N (with the first gap defined as th… | 2026-05-31 05:55 |
| 3e68141113f44ca9… | Primes of form n^2+1 — density conjecture | For every integer n >= 1, the number of primes of the form k^2 + 1 with k <= n is strictly less than 2 * sqrt(n). Furthermore, the ratio of this count to sqrt(n) never exceeds 1.8 for any n >= 100. | 2026-05-31 01:48 |
| 21e6bfada240446d… | Primes of form n^2+1 — density conjecture | For every integer n >= 2, the number of primes of the form k^2 + 1 with k <= n is strictly greater than the number of integers k <= n such that k^2 + 1 is a product of exactly two distinct primes, both of which are congr… | 2026-05-31 01:48 |
| 3f50b59f69c24ee3… | Twin prime density — Hardy-Littlewood conjecture v | For all integers x >= 10,000, the cumulative count of twin prime pairs pi_2(x) strictly exceeds the first-order Hardy-Littlewood approximation L_1(x) = 2*C_2 * x / (ln x)^2, but remains bounded above by a second-order co… | 2026-05-30 17:37 |
| 84e7e3b311544ebb… | Cap set problem F_3^6 — verify maximum size = 112 | The maximum cap set size in F_3^6 is exactly 112, and this bound is achieved only by the canonical construction S_3^6 ⊂ F_3^6 | 2026-05-30 04:42 |
| 809a9ab0175448e8… | Fibonacci primes — density conjecture | For all integers n >= 3, if the nth Fibonacci number F_n is prime, then the index n must be a prime number p such that p is not a Wieferich prime base 2 (i.e., 2^(p-1) is not congruent to 1 modulo p^2). Furthermore, for … | 2026-05-30 04:30 |
| fe5aa22c047044f1… | Cap set problem — F_3^n maximum | The maximum size of a cap set in F_3^n for n ≤ 8 is bounded above by ⌊2.2^n⌋, and for n = 6, 7, 8 the values are exactly 124, 353, and 994 respectively | 2026-05-30 01:15 |
| 23f6590eb1bc458c… | Cap set problem — F_3^n maximum | The maximum size of a cap set in F_3^n for n=6 is exactly 112, and this value is achieved by a specific construction based on the Edel's bound. | 2026-05-30 01:13 |
| 028910cf4158418c… | Primes of form n^2+1 — density conjecture | The count of primes of the form n^2+1 up to a given bound is asymptotically equal to 2*C*Li(x) where C is a constant approximately 0.685 and Li(x) is the logarithmic integral, with the constant C being related to the pro… | 2026-05-29 19:18 |
| aac0f88db762449e… | Ramsey multiplicity K_4 — minimum number of monoch | In any 2-coloring of K_18, the minimum number of monochromatic K_4 is exactly 18, and this minimum is achieved only by colorings where the graph of one color forms a specific structured graph related to the Turán graph T… | 2026-05-29 16:36 |