mastodon.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
The original server operated by the Mastodon gGmbH non-profit

Administered by:

Server stats:

341K
active users

#arxiv

76 posts26 participants0 posts today

[2504.01830] Is Lorentz invariance violation found?

arxiv.org/abs/2504.01830

> ...Very recently, the Carpet collaboration has completed the full data analysis, reporting further support for their previously detected photon now at ${\cal E} = 300^{+ 43}_{- 38} \, {\rm TeV}$, which manifestly clashes with conventional physics
...
> If confirmed by future observations our finding would represent the first positive result in quantum gravity phenomenology.

@physics
#Physics #Relativity #arXiv

DeepSeek: Inference-Time Scaling for Generalist Reward Modeling

arxiv.org/abs/2504.02495

arXiv logo
arXiv.orgInference-Time Scaling for Generalist Reward ModelingReinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale. Recently, the incentivization of reasoning capabilities in LLMs from RL indicates that $\textit{proper learning methods could enable effective inference-time scalability}$. A key challenge of RL is to obtain accurate reward signals for LLMs in various domains beyond verifiable questions or artificial rules. In this work, we investigate how to improve reward modeling (RM) with more inference compute for general queries, i.e. the $\textbf{inference-time scalability of generalist RM}$, and further, how to improve the effectiveness of performance-compute scaling with proper learning methods. For the RM approach, we adopt pointwise generative reward modeling (GRM) to enable flexibility for different input types and potential for inference-time scaling. For the learning method, we propose Self-Principled Critique Tuning (SPCT) to foster scalable reward generation behaviors in GRMs through online RL, to generate principles adaptively and critiques accurately, resulting in $\textbf{DeepSeek-GRM}$ models. Furthermore, for effective inference-time scaling, we use parallel sampling to expand compute usage, and introduce a meta RM to guide voting process for better scaling performance. Empirically, we show that SPCT significantly improves the quality and scalability of GRMs, outperforming existing methods and models in various RM benchmarks without severe biases, and could achieve better performance compared to training-time scaling. DeepSeek-GRM still meets challenges in some tasks, which we believe can be addressed by future efforts in generalist reward systems. The models will be released and open-sourced.

Banked Memories for Soft SIMT Processors

arxiv.org/abs/2503.24132

arXiv logo
arXiv.orgBanked Memories for Soft SIMT ProcessorsRecent advances in soft GPGPU architectures have shown that a small (<10K LUT), high performance (770 MHz) processor is possible in modern FPGAs. In this paper we architect and evaluate soft SIMT processor banked memories, which can support high bandwidth (up to 16 ports) while maintaining high speed (over 770 MHz). We compare 9 different memory architectures, including simpler multi-port memories, and run a total of 51 benchmarks (different combinations of algorithms, data sizes and processor memories) to develop a comprehensive set of data which will guide the reader in making an informed memory architecture decision for their application. Our benchmarks are comprised of matrix transpositions (memory intensive) and FFTs (split between memory accesses, floating point, and integer computations) to provide a balanced evaluation. We show that the simpler (but more memory block intensive) multi-port memories offer higher performance than the more architecturally complex banked memories for many applications, especially for smaller memories, but the effective footprint cost of the multi-port memories quickly becomes prohibitive as dataset sizes increase. Our banked memory implementation results - high bandwidth, high Fmax, and high density - can be used for other FPGA applications as well, such as HLS (High Level Synthesis).

Today on the #arXiv:

Narayanan et al. 2025, "Thermal Desorption Kinetics, Binding Energies, and Entrapment of Methyl Mercaptan Ices" - arxiv.org/abs/2504.01102

If you were wondering what a protoplanetary disc might be said to smell like.

arXiv logo
arXiv.orgThermal Desorption Kinetics, Binding Energies, and Entrapment of Methyl Mercaptan IcesOrganosulfur species are potential major carriers of sulfur in the interstellar medium, as well as interesting ingredients in prebiotic chemistry. The most fundamental question regarding these species is under which conditions they reside in the gas versus solid phase. Here, we characterize the thermal desorption kinetics, binding energies, and entrapment of the organosulfur methyl mercaptan (CH$_3$SH, or MeSH) in different ice environments, comparing them with those of methanol (CH$_3$OH, or MeOH) ices. The derived multi-layer (pure MeSH-MeSH) and sub-monolayer (layered MeSH-H$_2$O) binding energies are surprisingly similar, corresponding to snow line locations where the disk midplane temperature is ~105 K. In both H$_2$O-dominated and more realistic H$_2$O:CO$_2$-dominated ices, 100% of the MeSH is entrapped, almost exclusively desorbing at the molecular volcano desorption peak, indicating that MeSH is retained at the water snow line if initially mixed with water ice during formation. Additionally, the presence of MeSH in an ice mixture enhances the entrapment of CO$_2$ and MeOH (up to 100%) until the onset of volcano desorption; without MeSH, both desorb at their respective pure desorption temperatures and also co-desorb with water. Compared to MeOH, MeSH binds less well to water, explaining why MeSH escapes during water ice crystallization rather than co-desorbing with water. These results show the larger relative size of MeSH compared to MeOH significantly impacts its ability to bind to water and its entrapment efficiency. Therefore, molecular size plays an important role in the adsorption and retention of S-bearing organics and, in turn, other volatiles in ices.

Who could have predicted this? 🙄 state-of-the-art LLMs score 5% on the 2025 mathematical olympiad despite having been trained extensively on past editions :

arxiv.org/abs/2503.21934

arXiv logo
arXiv.orgProof or Bluff? Evaluating LLMs on 2025 USA Math OlympiadRecent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with the leading model, o3-mini, achieving scores comparable to top human competitors. However, these benchmarks evaluate models solely based on final numerical answers, neglecting rigorous reasoning and proof generation which are essential for real-world mathematical tasks. To address this, we introduce the first comprehensive evaluation of full-solution reasoning for challenging mathematical problems. Using expert human annotators, we evaluated several state-of-the-art reasoning models on the six problems from the 2025 USAMO within hours of their release. Our results reveal that all tested models struggled significantly, achieving less than 5% on average. Through detailed analysis of reasoning traces, we identify the most common failure modes and find several unwanted artifacts arising from the optimization strategies employed during model training. Overall, our results suggest that current LLMs are inadequate for rigorous mathematical reasoning tasks, highlighting the need for substantial improvements in reasoning and proof generation capabilities.
#ai#AIhype#llm

[2503.24187] NeuRaLaTeX: A machine learning library written in pure LaTeX
arxiv.org/abs/2503.24187

Wait, what written in WHAT??

#arXiv #ML #MachineLearning #LaTeX #TeX #AprilFools (maybe)

arXiv logo
arXiv.orgNeuRaLaTeX: A machine learning library written in pure LaTeXIn this paper, we introduce NeuRaLaTeX, which we believe to be the first deep learning library written entirely in LaTeX. As part of your LaTeX document you can specify the architecture of a neural network and its loss functions, define how to generate or load training data, and specify training hyperparameters and experiments. When the document is compiled, the LaTeX compiler will generate or load training data, train the network, run experiments, and generate figures. This paper generates a random 100 point spiral dataset, trains a two layer MLP on it, evaluates on a different random spiral dataset, produces plots and tables of results. The paper took 48 hours to compile and the entire source code for NeuRaLaTeX is contained within the source code of the paper. We propose two new metrics: the Written In Latex (WIL) metric measures the proportion of a machine learning library that is written in pure LaTeX, while the Source Code Of Method in Source Code of Paper (SCOMISCOP) metric measures the proportion of a paper's implementation that is contained within the paper source. We are state-of-the-art for both metrics, outperforming the ResNet and Transformer papers, as well as the PyTorch and Tensorflow libraries. Source code, documentation, videos, crypto scams and an invitation to invest in the commercialisation of NeuRaLaTeX are available at https://www.neuralatex.com

Call for participation: *SciVQA* Shared Task (sdproc.org/2025/scivqa.html)

@NFDI4DS members Ekaterina Borisova and Georg Rehm are organizing a shared task “Scientific Visual Question Answering Shared Task (SciVQA)” on July 31 or August 1st, 2025 in Vienna, Austria, hosted as part of the SDP 2025 Workshop.

Deadline for system submissions: May 16, 2025

#chart
#diagram
#multimodalQA
#visualattributes
#questionanswering
#arXiv
#SciVQA
#SDP2025
#ACL2025
#Vienna
#codabench
#huggingface
#NFDI4DS

sdproc.org5th Workshop on Scholarly Document Processing4th Workshop on Scholarly Document Processing
Continued thread

4 / Pour finir : "Written in the Stars: How your (pens and) papers decide the fate of the arXiverse". Dans une parodie parfaite des débats sur la tension H0 (différentes expériences mesurent différentes valeurs pour le taux d'expansion de l'Univers) les auteurs mesurent le taux d'expansion du nombre d'articles sur arxiv, et plus particulièrement dans la section "astro". Ils trouvent eux aussi une tension, suivant qu'ils prennent comme référence les articles de cosmologie, d'astrophysique stellaire, à l'échelle galactique, etc...

arxiv.org/abs/2503.23957

Continued thread

3 / Plus subtil cette fois : "On the structure of open clusters: geometric vs geomantic". Ici, on "découvre" avec les données du satellite Gaia que les amas ouverts d'étoiles autour du système solaire sont tous orientés dans notre direction ! Héliocentrisme, complot, feng shui cosmique ? Non, un petit disclaimer en fin d'article nous explique que les incertitudes de mesures sont plus grandes le long de la ligne de vue lorsqu'on mesure des distances avec la parallaxe.

arxiv.org/abs/2503.22800