cs.LG @ 2025-06-01: 1952
-
00 05-29 (4) From Chat Logs to Collective Insights: Aggregative Question Answering Von Chat Logs zu Collective Insights: Aggregative Question Answering 从聊天日志到集体透视:聚合问题解答 2505.23765v1 -
01 05-29 Differential Information: An Information-Theoretic Perspective on Preference Optimization Differentialinformation: Eine informationstheoretische Perspektive zur Preference-Optimierung 差别信息:关于首选优化的信息理论观点 2505.23761v1 -
02 05-29 Model Immunization from a Condition Number Perspective Modell Immunisierung aus einem Zustand Anzahl Perspektive 从条件数字角度进行示范免疫 2505.23760v1 -
03 05-29 Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint Puzzlet von Puzzles: Wenn Vision-Language-Modelle keinen Hinweis aufnehmen können 由谜题拼取的谜题: 当视觉语言模型无法使用提示时 2505.23759v1 -
04 05-29 REOrdering Patches Improves Vision Models REOrdering Patches verbessert Vision Modelle 重新排列补丁改进愿景模式 2505.23751v1 -
05 05-29 Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences? Verzerrung der AI Alignment: Optimiert Preference Optimization für Preferences? AI对齐的扭曲:偏好优化是否优化优惠? 2505.23749v1 -
06 05-29 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Raum-MLLM: Steigerung der MLLM-Kapazitäten in visueller räumlicher Intelligenz 空间-MLLM:增强以视觉为基础的空间情报中的MLLM能力 2505.23747v1 -
07 05-29 To Trust Or Not To Trust Your Vision-Language Model’s Prediction Vertrauen oder nicht Vertrauen in die Vorhersage Ihres Vision-Sprache-Modells 相信或不相信你的视觉语言模型的预测 2505.23745v1 -
08 05-29 On the Convergence Analysis of Muon Zur Konvergenzanalyse von Muon Muon的趋同分析 2505.23737v1 -
09 05-29 EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast EmotionRankCLAP: Bridging Natural Language Speaking Styles und Ordinal Speech Emotion via Rank-N-Contrast 情感-RankCLAP:通过Ran-N-Contrast将自然语言语言语言的口语风格和普通语言的情感联系起来 2505.23732v1 -
10 05-29 Keep Everyone Happy: Online Fair Division of Numerous Items with Few Copies Halten Sie alle glücklich: Online Fair Division von zahlreichen Artikeln mit wenigen Kopien 让人人快乐:许多物品的在线公平分会,只有很少的影印件。 2408.12845v2 -
11 05-29 MuLoCo: Muon is a practical inner optimizer for DiLoCo MuLoCo: Muon ist ein praktischer Innenoptimierer für DiLoCo MuLoCo: Muon 是 DiLoCo 的实用内部优化器 2505.23725v1 -
12 05-29 SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA SC-LoRA: Ausbalancieren effizienter Feinsteuerung und Wissenserhaltung über Subraum-kontrainierte LoRA SC-LORA:通过分空间训练LORA平衡高效微调和知识保护 2505.23724v1 -
13 05-29 ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering ML-Agent: Verstärkung von LLM-Agenten für autonome Maschinenbautechnik ML-代理:加强自动机械学习工程的LLM代理 2505.23723v1 -
14 05-29 Understanding and Mitigating Distribution Shifts For Machine Learning Force Fields Verteilungsverschiebungen für maschinelle Lernkräfte verstehen und abmildern 机器学习领域理解和缩小分布变化 2503.08674v2 -
15 05-29 DiffER: Categorical Diffusion for Chemical Retrosynthesis DiffER: Kategorische Diffusion für chemische Retrosynthese DiffER: 化学复制合成的分类扩散 2505.23721v1 -
16 05-29 COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents COBRA: Kontextueller Bandit-Algorithmus für die Sicherung wahrheitsgetreuer strategischer Agenten COBRA: 确保真实战略媒介的背景土匪比重 2505.23720v1 -
17 05-29 FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control FastTD3: Einfaches, schnelles und fähiges Verstärkungslernen für die humanoide Kontrolle 快速TD3: 人类控制简单、快速和有能力的强化学习 2505.22642v2 -
18 05-29 TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning TiRex: Nullschnelle Vorhersagen über lange und kurze Horizonte mit verbessertem In-Context-Lernen TiRex: 利用强化的内文学习,对长地和短地平线进行零热预测 2505.23719v1 -
19 05-29 Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation Fundamentalmodell versteckte Darstellungen für die Herzfrequenzschätzung aus der Auskultation 基金会 “ 基金会 “ 用于从修术中心速估计的模型隐藏模型代表 2505.20745v2 -
20 05-29 Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning Haut-Lesion-Phenotypisierung über verschachteltes multimodales kontrastives Lernen 通过Nested多模式反竞争学习进行皮肤脱 Le基因分析 2505.23709v1 -
21 05-29 Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Wissensisolierende Vision-Sprache-Action-Modelle: Schnell trainieren, schnell laufen, besser generalisieren 知识绝知识的愿景-语言-行动模式:快速列车、快速跑车、更普遍化 2505.23705v1 -
22 05-29 (U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEs (U)NFV: Überwachte und unüberwachte neurale Finite-Volume-Methoden zur Lösung hyperbolischer PDEs (U) NFV: 被监督和不受监督的解决双曲 PDE 的神经有限量方法 2505.23702v1 -
23 05-29 DiCoFlex: Model-agnostic diverse counterfactuals with flexible control DiCoFlex: Modell-agnostische diverse Gegenfakten mit flexibler Steuerung DiCoFlex:具有灵活控制的模型 – – 不可知性多元反事实 2505.23700v1 -
24 05-29 Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms Computational Algebra mit Achtung: Transformer Oracles für Border Basis Algorithmen 注意的计算代数:边境基准比值的变异甲骨文 2505.23696v1 -
25 05-29 On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures Über die Ausbildungskonvergenz von Transformern für die In-Context-Klassifizierung von Gauß-Mischungen Gaussian混合物内集成分类变异器培训趋同 2410.11778v3 -
26 05-29 From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms Von der individuellen Erfahrung zu kollektiven Beweisen: Ein meldepflichtiger Rahmen für die Identifizierung systemischer Schäden 从个人经验到集体证据:查明系统危害的报告框架 2502.08166v2 -
27 05-29 Mobi-$π$: Mobilizing Your Robot Learning Policy Mobi-$π$: Mobilisierung Ihrer Roboter-Lernpolitik Mobi-$ 美元:调动机器人学习政策 2505.23692v1 -
28 05-29 Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels Vereinheitlichende Perspektiven: Plausible gegenfaktische Erklärungen auf globaler, gruppenweiser und lokaler Ebene 统一观点:关于全球、集团和当地雇员的可视反事实解释 2405.17642v2 -
29 05-29 Learning Compositional Functions with Transformers from Easy-to-Hard Data Komponative Funktionen mit Transformern von einfachen Daten lernen 学习从易读数据转换器的学习构成函数 2505.23683v1 -
30 05-29 Understanding Mode Connectivity via Parameter Space Symmetry Mode-Konnektivität über Parameter Raumsymmetrie verstehen 通过参数空间对称法理解模式连通性 2505.23681v1 -
31 05-29 SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem SVRPBench: Ein realistischer Maßstab für stochastisches Fahrzeugrouting-Problem SVRPBench: 蒸汽车辆流出问题的现实基准 2505.21887v2 -
32 05-29 Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds Bayesische Optimierung durch menschliches Feedback: Nah-optimale Reue-Bounds Bayesian 人体反馈的优化:接近最佳的冷却环 2505.23673v1 -
33 05-29 GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents GSO: Herausfordernde Software-Optimierungsaufgaben zur Bewertung von SWE-Agenten GSO:评估SWE-Agentics的有挑战的软件优化任务 2505.23671v1 -
34 05-29 Maximizing Confidence Alone Improves Reasoning Maximierung des Vertrauens allein verbessert die Vernunft 使信心最大化单独提高合理性 2505.22660v2 -
35 05-29 SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression SLiM: Ein-Schuss-Quantisierung und Sparsamkeit mit Low-Rank-Annäherung für LLM-Gewichtskompression SLiM: LLM 重量压缩的单射量和与低级别近似相近的分数 2410.09615v3 -
36 05-29 LoLA: Low-Rank Linear Attention With Sparse Caching LoLA: Low-Rank Lineare Aufmerksamkeit mit Sparse Caching LoLA: 低兰克线性注意, 以粗糙的缓存 2505.23666v1 -
37 05-29 AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction AMBER: Adaptive Mesh-Generierung durch iterative Mesh-Auflösungsvorhersage 以迭代网目分辨率预测的适应性代谢代谢 2505.23663v1 -
38 05-29 Bayesian Perspective on Memorization and Reconstruction Bayesische Perspektive auf Erinnerung und Wiederaufbau Bayes人对记忆和重建的看法 2505.23658v1 -
39 05-29 Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation Aktives Layer-Kontrastives Decodieren reduziert Halluzination bei der Generierung von Großsprachenmodellen 大型语言模式生成中活性多语言解层解码减少幻觉 2505.23657v1 -
40 05-29 How does Transformer Learn Implicit Reasoning? Wie lernt Transformer Implizite Vernunft? 变形者如何学习隐含理由? 2505.23653v1 -
41 05-29 Optimization-Free Diffusion Model – A Perturbation Theory Approach Optimierungsfreies Diffusionsmodell – Ein Perturbationstheorie-Ansatz 优化-无优化传播模式 – – 扰动理论方法 2505.23652v1 -
42 05-29 Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation Merge-Friendly Post-Training Quantization für Multi-Target Domain-Anpassung 多目标域适应培训后量化 2505.23651v1 -
43 05-29 Optimal Bounds for Adversarial Constrained Online Convex Optimization Optimale Grenzen für die Online-Konvergenzoptimierung 优化在线电传优化优化 2503.13366v4 -
44 05-29 Continuous Chain of Thought Enables Parallel Exploration and Reasoning Kontinuierliche Gedankenkette ermöglicht parallele Erkundung und Vernunft 连续思考链有助于平行探索和推理 2505.23648v1 -
45 05-29 Are Reasoning Models More Prone to Hallucination? Sind vernünftigere Modelle eher halluzinierend? 理性模型更能让人产生幻觉吗? 2505.23646v1 -
46 05-29 Towards Unified Attribution in Explainable AI, Data-Centric AI, and Mechanistic Interpretability Auf dem Weg zu einer einheitlichen Attribution in erklärbarer KI, datenzentraler KI und mechanistischer Interpretierbarkeit 实现可解释的AI、数据集中AI和机械可解释性的统一归属 2501.18887v3 -
47 05-29 Global optimization of graph acquisition functions for neural architecture search Globale Optimierung von Graphen-Erfassungsfunktionen für die neuronale Architektursuche 全球优化用于神经结构搜索的图图获取功能 2505.23640v1 -
48 05-29 Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives Position: Skalierung von LLM-Agenten erfordert asymptotische Analyse mit LLM-Primitiven 位置: 缩放 LLM 代理需要用 LLM 原始功能进行抗药性分析 2502.04358v2 -
49 05-29 MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment MCP Safety Training: Lernen, falsch benachbarte MCP-Exploits mit verbesserter Präferenzausrichtung abzulehnen MCP 安全培训:学会利用改进的优惠协调,错误拒绝 MCP 剥削 2505.23634v1 -
50 05-29 Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection Prompting Whisper für verbesserte wörtliche Transkription und End-to-End-Missue-Erkennung 逐字记录和终端至终端杂项探测 2505.23627v1 -
51 05-29 Quartet: Native FP4 Training Can Be Optimal for Large Language Models Quartett: Native FP4 Training kann für große Sprachmodelle optimal sein 四方:土著FFF4培训可以成为大语言模式的最佳方式 2505.14669v2 -
52 05-29 SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks SPACE: SPike-Aware Consistency Enhancement für Test-Time-Anpassung in Spiking Neuronal Networks 空间:在Spiking神经网络中加强在测试-时间适应方面的SPike-Aware一致性增强 2504.02298v2 -
53 05-29 Instance-Optimality for Private KL Distribution Estimation Instanz-Optimalität für private KL-Verteilungsabschätzung 私人 KL 分布分布估计的实情- 最佳度 2505.23620v1 -
54 05-29 Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes Wenig scharfe Rede Deepfake Detection Anpassung an Gaußsche Prozesse Gaussian 过程的“深假探测”适应 2505.23619v1 -
55 05-29 One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory Eine Trajektorie, ein Token: Erdliche Video-Tokenisierung über panoptische Sub-Objekt-Trajektorie 一个轨迹, 一个 Token: 通过泛光子物件轨迹, 固定的视频轨迹 2505.23617v1 -
56 05-29 Causal Machine Learning in IoT-based Engineering Problems: A Tool Comparison in the Case of Household Energy Consumption Kausales maschinelles Lernen in IoT-basierten Engineering-Problemen: Ein Tool-Vergleich im Fall des Haushaltsenergieverbrauchs 以木工工程问题为基础的因果机械学习:家庭能源消费工具比较 2505.12147v2 -
57 05-29 Learning Interpretable Differentiable Logic Networks for Tabular Regression Learning Interpretable Differentiable Logic Networks for Tabular Regression 用于制表递减的可解释可解释逻辑网络 2505.23615v1 -
58 05-29 Inference-time Scaling of Diffusion Models through Classical Search Inferenzzeit Skalierung von Diffusionsmodellen durch klassische Suche 通过古典搜索对传播模型进行传播的推断-时间缩放 2505.23614v1 -
59 05-29 The Generalized Skew Spectrum of Graphs Das generalisierte Skew-Spektrum der Graphen 普通的Skew图象光谱 2505.23609v1 -
60 05-29 Data Model Design for Explainable Machine Learning-based Electricity Applications Datenmodell-Design für erklärbare maschinelle Learning-basierte Stromanwendungen 可解释机器学习用电力应用数据模型设计 2505.23607v1 -
61 05-29 Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Muddit: Befreiende Generation jenseits von Text-zu-Bild mit einem Unified Discrete Diffusion Model Muddit: 利用统一分解传播模型在文本到图像之外解放一代 2505.23606v1 -
62 05-29 STeCa: Step-level Trajectory Calibration for LLM Agent Learning STeCa: Schritt-Level-Trajektorienkalibrierung für LLM Agent Learning STeCa:LLM代理学习的职级轨迹校准 2502.14276v2 -
63 05-29 On Transferring Transferability: Towards a Theory for Size Generalization Übertragbarkeit: Auf dem Weg zu einer Theorie der Größenverallgemeinerung 关于转让可转让性:走向一个通用规模理论 2505.23599v1 -
64 05-29 LLM Performance for Code Generation on Noisy Tasks LLM-Performance für Code-Generierung bei lauten Aufgaben LLM 噪音任务代码生成的LLM性能 2505.23598v1 -
65 05-29 Multilook Coherent Imaging: Theoretical Guarantees and Algorithms Multilook Coherent Imaging: Theoretische Garantien und Algorithmen 多视相协调成像:理论保障和理算 2505.23594v1 -
66 05-29 Position: Federated Foundation Language Model Post-Training Should Focus on Open-Source Models Position: Federated Foundation Language Model Nachschulung sollte sich auf Open-Source-Modelle konzentrieren 立场:联邦基金会语文示范培训后培训应侧重于开放来源模式 2505.23593v1 -
67 05-29 Accelerated Training of Federated Learning via Second-Order Methods Beschleunigte Ausbildung des Föderierten Lernens über Methoden der zweiten Ordnung 通过二级方法加快联邦学习培训 2505.23588v1 -
68 05-29 PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation PCA für verbesserte Cross-Dataset-Verallgemeinerung in der Brust-Ultraschall-Tumor-Segmentierung 五氯苯甲醚,用于在乳房超声波肿瘤分割中增强交叉数据的通用性 2505.23587v1 -
69 05-29 On-Policy RL with Optimal Reward Baseline On-Policy RL mit optimaler Prämienbasis 具有最佳回报基准的 政策性RL 2505.23585v1 -
70 05-29 Improving Time Series Forecasting via Instance-aware Post-hoc Revision Verbesserung der Zeitreihenprognose über Instance-aware Post-hoc-Revision 改进时间序列预测,通过 “ 热后后预测 “ 改进时间序列预测 2505.23583v1 -
71 05-29 Wake-Informed 3D Path Planning for Autonomous Underwater Vehicles Using A* and Neural Network Approximations Wake-Informierte 3D-Pfadplanung für autonome Unterwasserfahrzeuge mit A*- und Neuralnetzwerk-Annäherungen 使用A* 和神经网络相近的自动水下车辆的觉醒3D路径规划 2502.01918v2 -
72 05-29 BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model BioReason: Förderung multimodaler biologischer Vernunft innerhalb eines DNA-LLM-Modells BioReason:在DNA-LLM模型中激励多式生物理由 2505.23579v1 -
73 05-29 CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring CoT Red-Handed: Stresstesting Chain-of-Thought-Überwachung COT 红手:压力测试研究链监测 2505.23575v1 -
74 05-29 Maximum Likelihood Learning of Latent Dynamics Without Reconstruction Maximale Wahrscheinlichkeit Lernen von latenten Dynamiken ohne Rekonstruktion 学习没有重建的原始动力学 2505.23569v1 -
75 05-29 DRO: A Python Library for Distributionally Robust Optimization in Machine Learning DRO: Eine Python-Bibliothek für Distributional Robuste Optimierung im maschinellen Lernen DRO: 一个用于在机器学习中进行分配式强力优化的 Python 图书馆 2505.23565v1 -
76 05-29 Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models Segment Policy Optimization: Effektive Segment-Level-Kreditvergabe in RL für große Sprachmodelle 政策优化优化:大语言模式RL中有效的分部一级信用分配 2505.23564v1 -
77 05-29 LEXam: Benchmarking Legal Reasoning on 340 Law Exams LEXam: Benchmarking der rechtlichen Begründung von 340 Rechtsprüfungen LEXam:340项法律考试的法律依据基准 2505.12864v2 -
78 05-29 Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information Qwen Look Again: Leitende Vision-Sprachen-Reasoning-Modelle, um visuelle Informationen erneut zu speichern 再看一遍:指导视觉信息重新阅读的视觉-语言定位依据模式 2505.23558v1 -
79 05-29 Learning Parametric Distributions from Samples and Preferences Parametrische Verteilungen aus Proben und Präferenzen lernen 抽样和优惠制的学习参数分布 2505.23557v1 -
80 05-29 Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling Adaptives Federated LoRA in heterogenen drahtlosen Netzwerken mit unabhängiger Probenahme 具有独立抽样调查的多源无线网络中的联邦适应性 2505.23555v1 -
81 05-29 Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters Nachhaltiges CO2-basiertes und wassereffizientes LLM-Scheeduling in Geo-verteilten Cloud-Rechenzentren 地球分布云数据中心的可持续碳软件和水效率高的LLM 2505.23554v1 -
82 05-29 Comparing the Moore-Penrose Pseudoinverse and Gradient Descent for Solving Linear Regression Problems: A Performance Analysis Vergleich der Moore-Penrose Pseudoinverse und Gradient Descent zur Lösung linearer Regressionsprobleme: Eine Leistungsanalyse 将摩尔-彭罗斯-普塞多温和梯底比较以解决线性倒退问题:绩效分析 2505.23552v1 -
83 05-29 Diffusion Sampling Correction via Approximately 10 Parameters Diffusions-Probenahmekorrektur über ca. 10 Parameter 通过大约10个参数校正传播抽样校正 2411.06503v3 -
84 05-29 Fast Large Language Model Collaborative Decoding via Speculation Schnelles Large Language Model Kollaboratives Decodieren über Spekulation 通过投机进行快速大语言合作示范模式 2502.01662v2 -
85 05-29 Domain-Aware Tensor Network Structure Search Domain-Aware Tensor Netzwerkstruktur Suche 域- 软件显示器网络网络结构搜索 2505.23537v1 -
86 05-29 It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Es ist ein (Blind) Match! Richtung Vision-Sprache Korrespondenz ohne Paralleldaten 这是一个( Blind) 匹配! 向没有平行数据的视觉语言对应函授 2503.24129v2 -
87 05-29 NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks NACHOS: Neurale Architektur Suche nach Hardware eingeschränkt Early Exit Neural Networks NACHOS: 早期外出神经网络硬件控制系统神经结构搜索 2401.13330v2 -
88 05-29 Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning Subgraph Gaussian Einbettungskontrast für selbstüberwachtes Graphen-Darstellungslernen 自支持图表代表制学习的 Subgraph Gaussian 嵌入式对比对比度 2505.23529v1 -
89 05-29 Comparative assessment of fairness definitions and bias mitigation strategies in machine learning-based diagnosis of Alzheimer’s disease from MR images Vergleichende Bewertung von Fairness-Definitionen und Bias-Minderungsstrategien in der maschinellen Lern-basierten Diagnose der Alzheimer-Krankheit aus MR-Bildern 对利用MR图像对阿尔茨海默氏病进行机器学习诊断的公平定义和减少偏见战略的比较评估 2505.23528v1 -
90 05-29 Normalizing Flows are Capable Models for RL Normalisierende Strömungen sind fähige Modelle für RL 正常流动是RL的能力模型 2505.23527v1 -
91 05-29 Accelerating AllReduce with a Persistent Straggler AllReduce mit einem persistenten Straggler beschleunigen 使用持久性斯特拉格驱动器加速全部拖动 2505.23523v1 -
92 05-29 Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents Von Mäusen und Maschinen: Ein Vergleich des Lernens zwischen Real World Mäusen und RL Agenten Mice和Mings:真实世界Mice和RL代理商之间的学习比较 2505.12204v2 -
93 05-29 An AI System for Continuous Knee Osteoarthritis Severity Grading Using Self-Supervised Anomaly Detection with Limited Data Ein KI-System für kontinuierliche Knie-Osteoarthritis Schweregraduierung mittels selbstüberwachter Anomalieerkennung mit begrenzten Daten AI 使用有限数据的自超异常检测系统 2407.11500v2 -
94 05-29 SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning SimBa: Einfachheit Bias für das Skalieren von Parametern im Deep Reinforcement Learning SimBA: 深强化学习中增强参数的简单比值 2410.09754v2 -
95 05-29 OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data OmniEarth-Bench: Auf dem Weg zu einer ganzheitlichen Bewertung der sechs Sphären und der Wechselwirkungen zwischen der Erde und multimodalen Erddaten Omni地球环境:争取全面评价地球六层和与多模式对地观测地球数据交互作用 2505.23522v1 -
96 05-29 AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity AnkerAchtung: Differenz-Bewusst Sparse Achtung mit Streifen Granularität 锁定目标: 带条形颗粒的差别- 软件分散注意 2505.23520v1 -
97 05-29 Hyperspherical Normalization for Scalable Deep Reinforcement Learning Hypersphärische Normalisierung für skalierbares Deep Reinforcement Learning 可缩放深强化学习超球常规化 2502.15280v2 -
98 05-29 SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures SGD Jittering: Eine Schulungsstrategie für robuste und präzise modellbasierte Architekturen SGD JGT JUGT JIGT: 强健和准确的建模建筑培训战略 2410.14667v2 -
99 05-29 Joint Localization and Activation Editing for Low-Resource Fine-Tuning Gemeinsame Lokalisierungs- und Aktivierungsbearbeitung für Low-Resource Fine-Tuning 低资源微调联合定位和启动编辑 2502.01179v4 -
100 05-29 DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration DeepFilterGAN: Ein Full-Band-Real-Time-Speech Enhancement-System mit GAN-basierter stochastischer Regeneration DeepFilterGAN:全频实时语音增强系统,以GAN为基础进行蒸汽再生 2505.23515v1 -
101 05-29 Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds Spektrotemporale Modulation: Effiziente und interpretierbare Feature-Darstellung für die Klassifizierung von Sprach-, Musik- und Umweltgeräuschen 时速变化:演讲、音乐和环境声音的分类化演讲、音乐和环境声音的高效和可解释的地物代表 2505.23509v1 -
102 05-29 Why Machine Learning Models Fail to Fully Capture Epistemic Uncertainty Warum Modelle des maschinellen Lernens die epistemische Unsicherheit nicht vollständig erfassen 机器学习模型为何不能完全捕捉宇宙的不确定性 2505.23506v1 -
103 05-29 Hijacking Large Language Models via Adversarial In-Context Learning Entführen von großen Sprachmodellen über das adversarische In-Context-Lernen 通过对抗性内书学习劫持大语言模式 2311.09948v3 -
104 05-29 Epistemic Errors of Imperfect Multitask Learners When Distributions Shift Epistemische Fehler von unvollkommenen Multitask Learner bei Verteilungsverschiebungen 发行转移时不完美的多任务学习者 2505.23496v1 -
105 05-29 Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking Diagnose und Bewältigung von Pitfalls in KG-RAG-Datensätzen: Zu zuverlässigerem Benchmarking 分析和处理KG-RAG数据集的缺陷:争取更可靠的基准 2505.23495v1 -
106 05-29 Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning Kurzbefehle in audio-visuellen Deepfake-Erkennungsdatensätzen mit unüberwachtem Lernen 在未经监督的学习的视听深假发现数据集中绕过捷径 2412.00175v3 -
107 05-29 A False Discovery Rate Control Method Using a Fully Connected Hidden Markov Random Field for Neuroimaging Data Eine falsche Discovery Rate Control-Methode mit einem vollständig verbundenen versteckten Markov Random Field für Neuroimaging-Daten 假发现率控制方法, 使用完全连接的隐藏 Markov 随机字段来生成 Neuroimage 数据 2505.20688v2 -
108 05-29 Learning to Poison Large Language Models for Downstream Manipulation Große Sprachmodelle für Downstream-Manipulation zu vergiften 学习下游操作毒物大语言模式 2402.13459v3 -
109 05-29 SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training SGD als Freie Energie Minimierung: Ein thermodynamischer Blick auf neurales Netzwerktraining SGD作为自由能源最小化:关于神经网络培训的热动力学观点 2505.23489v1 -
110 05-29 Federated Granger Causality Learning for Interdependent Clients with State Space Representation Föderiertes Granger-Causality-Lernen für interdependente Kunden mit staatlicher Raumdarstellung 为具有国家空间代表制的相互依存客户提供 2501.13890v4 -
111 05-29 TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning TimePoint: Beschleunigte Zeitreihenausrichtung über selbstüberwachtes Keypoint- und Descriptor-Lernen 时间点:通过自上调关键点和描述学习加速时间序列调整 2505.23475v1 -
112 05-29 Refining Labeling Functions with Limited Labeled Data Verfeinerung von Beschriftungsfunktionen mit begrenzten beschrifteten Daten 用有限标签数据改进标签功能 2505.23470v1 -
113 05-29 Surveying the space of descriptions of a composite system with machine learning Vermessung des Raumes der Beschreibungen eines Verbundsystems mit maschinellem Lernen 勘查机器学习综合系统说明的空间 2411.18579v2 -
114 05-29 Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models Retrieval Visuelle Kontrastive Dekodierung zu Mitigate-Objekt-Halluzinationen in großen Vision-Sprachen-Modellen 在大型视觉-语言模型中,将检索视觉对抗性脱钩作为稀释物体幻觉的大型视觉-语言模型 2505.20569v2 -
115 05-29 A Tutorial on Meta-Reinforcement Learning Ein Tutorial zum Meta-Reinforcement-Lernen 关于元加强学习的教学材料 2301.08028v4 -
116 05-29 Agentic Knowledgeable Self-awareness Agentisch sachkundiges Selbstbewußtsein A. 动态知识自觉意识 2504.03553v2 -
117 05-29 Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning Pessimismus-Prinzip kann wirksam sein: Auf dem Weg zu einem Rahmen für Null-Shot-Transfer-Verstärkungs-Lernen 悲观主义原则可以有效:建立一个零热转移强化学习框架 2505.18447v2 -
118 05-29 LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection LENSLLM: Enthüllen von Feintuning-Dynamik für die LLM-Auswahl LENSLLLM: 用于选择LLM的连续精细调整动态 2505.03793v2 -
119 05-29 Broadband Ground Motion Synthesis by Diffusion Model with Minimal Condition Broadband Ground Motion Synthese durch Diffusion Modell mit minimalem Zustand 以最小条件传播模型进行宽带地面移动合成 2412.17333v2 -
120 05-29 On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment Globale Konvergenzraten für Föderierten politischen Gradienten unter heterogener Umwelt 关于不同不同环境下联邦政策分级制全球趋同率的全球趋同率 2505.23459v1 -
121 05-29 Diffusion Guidance Is a Controllable Policy Improvement Operator Diffusion Guidance ist ein kontrollierbarer Politikverbesserungs-Betreiber 传播指导是可控制的政策改进操作员 2505.23458v1 -
122 05-29 TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction TabReason: Eine verstärkte Lern-verbesserte Begründung LLM für erklärbare tabellarische Datenvorhersage TabReson: 用于可解释的图表数据预测的强化学习-提高合理理由的强化学习-强化LLMLM 2505.21807v2 -
123 05-29 Learning Cascade Ranking as One Network Kaskaden-Ranking als ein Netzwerk lernen 学习连级安排 “ 一个网络 “ 网络 2503.09492v2 -
124 05-29 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation DynaMem: Online-Dynamischer Raum-Semantischer Speicher für mobile Manipulationen in der offenen Welt DynaMem: 用于开放世界移动操纵的在线动态空间-空间内存 2411.04999v2 -
125 05-29 Network Inversion for Uncertainty-Aware Out-of-Distribution Detection Netzwerk-Inversion für unsichere Out-of-Distribution-Erkennung 用于不确定性软件发送外检测的网络转换 2505.23448v1 -
126 05-29 GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning GSQ-Tuning: Group-Shared Exponents integer in einer voll quantifizierten Schulung für LLMs On-Device-Fine-Tuning GSQ-Turning:为在线设计精微调LLM女士提供全面量化培训的集团共享指数整数 2502.12913v3 -
127 05-29 SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins SCoTT: Strategisches Chain-of-Thought-Tasking für Wireless-Aware-Roboternavigation in digitalen Zwillingen SCTT: “ 数字双双 “ 中无线软件机器人导航战略研究链任务 2411.18212v2 -
128 05-29 The Strong, Weak and Benign Goodhart’s law. An independence-free and paradigm-agnostic formalisation The Strong, Weak and Benign Goodharts Gesetz. Eine unabhängigkeitsfreie und paradigmatisch-agnostische Formalisierung 强势、弱弱和本尼·古德哈特法,无独立和无范式、不可知的正规化 2505.23445v1 -
129 05-29 Strategic Classification with Non-Linear Classifiers Strategische Klassifizierung mit nicht linearen Klassifikatoren 战略分类与非链分类法战略分类 2505.23443v1 -
130 05-29 Rethinking Regularization Methods for Knowledge Graph Completion Überdenken von Regularisierungsmethoden für Wissensgraphenvervollständigung 重新思考知识图完成正规化方法 2505.23442v1 -
131 05-29 The challenge of hidden gifts in multi-agent reinforcement learning Die Herausforderung der versteckten Gaben in Multi-Agenten-Verstärkung Lernen 多试剂强化学习中隐藏礼品的挑战 2505.20579v2 -
132 05-29 LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty LoTUS: Großformatige Maschine entlernen mit einem Geschmack von Ungewissheit LoTUS: 大型机器与不确定性的味道脱钩 2503.18314v4 -
133 05-29 Bounded-Abstention Pairwise Learning to Rank Gebundene Abhaltung Pairwise Learning to Rank 学习排名 2505.23437v1 -
134 05-29 Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning Trainieren mit Perturbation, schlussfolgern nach Merging: Ein Zwei-Stufen-Rahmen für kontinuierliches Lernen 接转训练、合并后的推推:持续学习的双阶段框架 2505.22389v2 -
135 05-29 Emergent Risk Awareness in Rational Agents under Resource Constraints Emergent Risk Awareness in Rational Agents unter Ressourcenbeschränkungen 资源限制下对合理代理的新兴风险意识 2505.23436v1 -
136 05-29 Diversity-Aware Policy Optimization for Large Language Model Reasoning Diversity-Aware-Politikoptimierung für groß angelegte Sprachmodell-Reasoning 大语言示范理由的多样性政策优化 2505.23433v1 -
137 05-29 Improved Learning via k-DTW: A Novel Dissimilarity Measure for Curves Verbessertes Lernen über k-DTW: Ein neuartiges Maß an Unähnlichkeit für Kurven 通过 k-DTW改进学习:曲线的新差异措施 2505.23431v1 -
138 05-29 Proper Dataset Valuation by Pointwise Mutual Information Richtiger Datensatz Bewertung durch pointwise Gegenseitige Informationen 按点对点相互信息分列的适当数据集估价 2405.18253v3 -
139 05-29 Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary Überrefusal in LLMs aus Sicht der Sicherheitsentscheidungsgrenze zu verstehen und zu mildern 从安全裁定边界的始终如一的视角理解和减轻LLM女士的过度拒绝 2505.18325v2 -
140 05-29 On the Validity of Head Motion Patterns as Generalisable Depression Biomarkers Über die Gültigkeit von Head Motion Patterns als Generalisable Depression Biomarkers 头动模式作为可普遍适用的萧条生物标志物的有效性 2505.23427v1 -
141 05-29 Enhanced DACER Algorithm with High Diffusion Efficiency Verbesserter DACER-Algorithmus mit hoher Diffusionseffizienz DACER 高传播效率增强的DACER 计算法 2505.23426v1 -
142 05-29 Hierarchical Neuro-Symbolic Decision Transformer Hierarchischer neuro-symbolischer Entscheidungstransformator 等级性神经-共制决定变换器 2503.07148v3 -
143 05-29 Risk-aware Direct Preference Optimization under Nested Risk Measure Risikobewusste Direktpräferenzoptimierung unter verschachtelter Risikomaßnahme 内层风险措施下认识到风险的直接最优化 2505.20359v2 -
144 05-29 OTPTO: Joint Product Selection and Inventory Optimization in Fresh E-commerce Front-End Warehouses OTPTO: Gemeinsame Produktauswahl und Bestandsoptimierung in Fresh E-Commerce Front-End Warehouses OTPTO: 在新的电子商务前端仓库中联合产品选择和清单优化 2505.23421v1 -
145 05-29 Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition Probeneffiziente menschliche Bewertung großer Sprachmodelle durch maximalen Diskrepanzwettbewerb 通过最大差异竞争对大语言模式进行抽样有效人力评价 2404.08008v2 -
146 05-29 Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates Robustheitskongruente Adversarial Training für sicheres maschinelles Lernen Modellaktualisierungen 安全机器学习模型更新的强力和共性安全机器学习模型自动培训 2402.17390v2 -
147 05-29 Privacy Amplification by Structured Subsampling for Deep Differentially Private Time Series Forecasting Datenschutzverstärkung durch strukturierte Subsampling für tief differential private Zeitreihen Forecasting 以结构化的分抽样对深相异私人时间序列预测进行隐私放大 2502.02410v2 -
148 05-29 On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists On-Device Collaborative Language Modeling über eine Mischung aus Generalisten und Spezialisten 通过通识主义者和专家混合组合的在线合作语言建模 2409.13931v4 -
149 05-29 KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction KVzip: Query-Agnostic KV Cache-Kompression mit Kontext-Rekonstruktion KVzip: 在背景重建中压缩缓存 2505.23416v1 -
150 05-29 Bidirectional predictive coding Bidirektionale vorausschauende Kodierung 双向预测双向预测编码 2505.23415v1 -
151 05-29 Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model Identifizierung und optimale nichtlineare Steuerung der Turbojet-Engine mit Koopman Eigenfunktionsmodell 使用 Koopman Eigen功能模型对涡轮喷气发动机进行最佳非线性识别和最佳非线性控制 2505.10438v2 -
152 05-29 Buffer-free Class-Incremental Learning with Out-of-Distribution Detection Pufferfreies Klassen-Inkrementelles Lernen mit Out-of-Distribution Detection 含有扩散外检测检测的无缓缓度免费类级学习 2505.23412v1 -
153 05-29 Video Editing for Audio-Visual Dubbing Videobearbeitung für Audio-Visual-Dubbing 音像视频编辑 2505.23406v1 -
154 05-29 A Refined Analysis of UCBVI Eine raffinierte Analyse von UCBVI UCBVI的精细分析 2502.17370v2 -
155 05-29 Closed-form Solutions: A New Perspective on Solving Differential Equations Closed-form Lösungen: Eine neue Perspektive zur Lösung von Differentialgleichungen 封闭式解决办法:解决差异等量的新视角 2405.14620v3 -
156 05-29 Subgroups Matter for Robust Bias Mitigation Untergruppen Materie für robuste Bias Mitigation 稳健的Biust Bias 减轻风险的分组事项 2505.21363v2 -
157 05-29 Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments Entschlüsselung des Interplays zwischen Übertragungseffekten und Belohnungsautokorrelationen in Switchback-Experimenten 在回转实验中解开结转效应与回转回实验中回调自动关系之间的交互作用 2403.17285v3 -
158 05-29 Dynamic Estimation Loss Control in Variational Quantum Sensing via Online Conformal Inference Dynamische Abschätzungsverlustkontrolle bei der variationalen Quantensensing über Online-Konforme Inferenz 通过在线非正式推断在变化量测量中动态估计损失控制 2505.23389v1 -
159 05-29 BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction BatteryLife: Ein umfassender Datensatz und Benchmark für die Vorhersage der Akkulaufzeit 电池寿命:电池寿命预测综合数据集和基准 2502.18807v4 -
160 05-29 A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers Eine statistische Lernperspektive zur halbdualen Neural Optimal Transport Solvers 半对半对半的神经神经优化运输解决方案的统计学习视角 2502.01310v2 -
161 05-29 Automated Modeling Method for Pathloss Model Discovery Automatisierte Modellierungsmethode für Pathloss Model Discovery 病理模型发现自动建模方法 2505.23383v1 -
162 05-29 Tracking Progress Towards Sustainable Development Goal 6 Using Satellite Imagery Fortschritte auf dem Weg zu einer nachhaltigen Entwicklung verfolgen Ziel 6 Nutzung von Satellitenbildern 利用卫星图像跟踪可持续发展目标6的进展情况 2411.19093v2 -
163 05-29 Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models Meta-Learning-Ansätze für Sprecher-Abhängige Sprachmüdigkeitsmodelle 议长 – – 独立的声音 “ fatigue “ 模式的元学习方法 2505.23378v1 -
164 05-29 GWQ: Gradient-Aware Weight Quantization for Large Language Models GWQ: Gradient-Aware Weight Quantization für große Sprachmodelle GWQ: 大语言模型的渐变软件重量 2411.00850v4 -
165 05-29 Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective Das Nachdenken über die Auswahlkriterien bei der Stärkung des Lernens für LLM-Reasoning: Eine Kompetenz-Schwierigkeits-Alignment-Perspektive 重新思考在加强学习学习中为LLM 合理性提供强化学习的抽样标准:能力-困难-协调观点 2505.17652v2 -
166 05-29 Dynamic Spectral Backpropagation for Efficient Neural Network Training Dynamische Spektral-Backpropagation für effizientes Neural-Netzwerk-Training 促进高效神经网络培训的动态光谱后方通信 2505.23369v1 -
167 05-29 Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs Graph of Records: Steigerung der retrieval Augmented Generation für Langkontext-Zusammenfassung mit Graphen 记录图图:用图表进行长文本摘要的推进检索增量生成器 2410.11001v2 -
168 05-29 Guarantees of a Preconditioned Subgradient Algorithm for Overparameterized Asymmetric Low-rank Matrix Recovery Garantien eines vorkonditionierten Subgradienten Algorithmus für überparameterisierte asymmetrische Low-rank Matrix Erholung 保证为超参数化的测量性对称低级矩阵恢复提供先决条件的亚梯分算法的保障 2410.16826v2 -
169 05-29 Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control Grower-in-the-Loop Interaktives Verstärkungslernen für Greenhouse Climate Control 种植者在Loop-Loop 互动强化学习促进温室气候控制 2505.23355v1 -
170 05-29 ChatHuman: Chatting about 3D Humans with Tools ChatHuman: Chatten über 3D-Menschen mit Tools 聊天:用工具聊天关于3D人类 2405.04533v2 -
171 05-29 BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change BAH-Datensatz für Ambivalenz/Hesitanzerkennung in Videos für Verhaltensänderungen BAH 行为变化视频中双向/隐私识别 BAH 数据集 2505.19328v2 -
172 05-29 Towards Reward Fairness in RLHF: From a Resource Allocation Perspective Zur Belohnung Fairness in RLHF: Aus Ressourcenzuweisungsperspektive 走向RLHF的奖励公平:从资源分配角度 2505.23349v1 -
173 05-29 Sentinel: Scheduling Live Streams with Proactive Anomaly Detection in Crowdsourced Cloud-Edge Platforms Sentinel: Planung von Livestreams mit proaktiver Anomalieerkennung in Crowdsourced Cloud-Edge-Plattformen 哨兵:将现场流排成日程,在人源云源云源平台上进行主动异常探测 2505.23347v1 -
174 05-29 Graph Positional Autoencoders as Self-supervised Learners Graphische Positionale Autoencoder als selbstüberwachte Lernende 作为自监管学习者进行定位自动校对的图形图 2505.23345v1 -
175 05-29 A Descriptor Is All You Need: Accurate Machine Learning of Nonadiabatic Coupling Vectors Ein Deskriptor ist alles, was Sie brauchen: Genaues maschinelles Lernen von nichtadiabatischen Kupplungsvektoren 描述符是你需要的:非非异相叠合矢量的精确机器学习 2505.23344v1 -
176 05-29 Matryoshka Model Learning for Improved Elastic Student Models Matryoshka Model Learning für verbesserte elastische Studentenmodelle Matryoshka 改进弹性学生模式示范学习模式 2505.23337v1 -
177 05-29 X2Graph for Cancer Subtyping Prediction on Biological Tabular Data X2Graph für Krebs Subtyping Vorhersage auf biologische Tabellendaten 用于对生物表表数据进行癌症子图谱预测的X2Graph 2505.23334v1 -
178 05-29 Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization Feintuning Next-Scale Visual Autoregressive Modelle mit gruppenrelativer Politikoptimierung 采用群体相对政策优化优化的 下尺度视觉自动递减模型 2505.23331v1 -
179 05-29 Error Broadcast and Decorrelation as a Potential Artificial and Natural Learning Mechanism Fehlerübertragung und Decorrelation als potenzieller künstlicher und natürlicher Lernmechanismus 错误 广播和装饰关系作为一种潜在的人工和自然学习机制 2504.11558v2 -
180 05-29 Combinatorial Rising Bandit Kombinatorial Rising Bandit 混合崛起强盗 2412.00798v3 -
181 05-29 Efficient Parameter Estimation for Bayesian Network Classifiers using Hierarchical Linear Smoothing Effiziente Parameterschätzung für Bayesian Network Klassifikatoren mit Hierarchical Linear Glättung Bayesian 网络分类器使用等级线性线性平滑法的高效参数参数估测 2505.23320v1 -
182 05-29 A Straightforward Gradient-Based Approach for High-Tc Superconductor Design: Leveraging Domain Knowledge via Adaptive Constraints Ein einfacher gradient-basierter Ansatz für High-Tc-Supraleiter-Design: Nutzung von Domain-Wissen über adaptive Einschränkungen 高Tc超级导体设计的直向渐进式高超导体设计方法:通过适应性制约因素利用域知识 2403.13627v2 -
183 05-29 Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments Verbesserung der Genauigkeit der Markerbewertung durch ordinelles Vertrauensmodellierung in Bildungsbewertungen 通过在教育评估中建立常规信任模型,加强标标码的准确度 2505.23315v1 -
184 05-29 Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition Adversariale Semantische und Label-Störung Angriff für Fußgänger Attribute Anerkennung 对抗性语义和Label干扰攻击,以确认佩德斯特属性 2505.23313v1 -
185 05-29 Rethinking Gradient-Based Methods: Multi-Property Materials Design Beyond Differentiable Targets Rethinking Gradient-Based Methods: Multi-Property Materials Design Beyond Differentiable Targets 重新思考渐进方法:超出可区别目标的多财产材料设计 2410.08562v4 -
186 05-29 Score-based Generative Modeling for Conditional Independence Testing Score-basierte Generative Modellierung für die Prüfung der bedingten Unabhängigkeit 有条件独立测试基于记分率生成模型 2505.23309v1 -
187 05-29 MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction MGE-LDM: Gemeinsame Latente Diffusion für simultane Musikgeneration und Quellenextraktion MGE-LDM:同时制作音乐和来源采掘联合前期传播 2505.23305v1 -
188 05-29 Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models Verstehen und Abmildern von Fehlkalibrierung bei sofortiger Tuning für Vision-Language-Modelle 理解和减缓视觉语言模型快速开票时的误差 2410.02681v4 -
189 05-29 How Does Response Length Affect Long-Form Factuality Wie wirkt sich die Response-Länge auf die Langform-Faktizität aus? 反应时间长度如何影响长期事实质量 2505.23295v1 -
190 05-29 Multi-Modal Framing Analysis of News Multi-Modal Framing Analyse der Nachrichten 新闻多模式结构分析 2503.20960v3 -
191 05-29 Comparative Analysis of the Land Use and Land Cover Changes in Different Governorates of Oman using Spatiotemporal Multi-spectral Satellite Data Vergleichende Analyse der Bodennutzungs- und Bodenbedeckungsänderungen in verschiedenen Gouvernements von Oman unter Verwendung spatiotemporaler multispektraler Satellitendaten 利用斯帕蒂多光谱多谱段卫星数据对阿曼不同省份土地利用和土地覆盖变化的比较分析 2505.23285v1 -
192 05-29 Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers Verbesserung der kontinuierlichen Lernleistung und Effizienz mit Hilfsklassifikatoren 提高持续学习成绩和效率,辅级分级 2403.07404v4 -
193 05-29 Optimal Protocols for Continual Learning via Statistical Physics and Control Theory Optimale Protokolle für kontinuierliches Lernen über statistische Physik und Steuerungstheorie 通过统计物理和控制理论不断学习的最佳最佳协议 2409.18061v3 -
194 05-29 LADA: Scalable Label-Specific CLIP Adapter for Continual Learning LADA: Skalierbarer Label-Spezifischer CLIP Adapter für kontinuierliches Lernen 旱地退化评估:用于持续学习的可缩放标签特定CLIP适应器 2505.23271v1 -
195 05-29 Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs Entfernt Machine Unlearning wirklich Modellwissen? Ein Rahmen für die Prüfung von Unlearning in LLMs 机器取消学习是否真正删除了示范知识? 审计框架是否在LLMM中取消学习? 2505.23270v1 -
196 05-29 Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning Behavior-Regularized Diffusion Policy Optimierung für Offline-Verstärkung Lernen 离线强化学习的传播政策优化 2502.04778v2 -
197 05-29 Efficiently Access Diffusion Fisher: Within the Outer Product Span Space Effizienter Zugriff auf Diffusion Fisher: Innerhalb des Outer Product Span Space 有效获取扩散渔渔场:在外生产品空间内 2505.23264v1 -
198 05-29 Stable Thompson Sampling: Valid Inference via Variance Inflation Stabile Thompson-Probenahme: Gültige Schlussfolgerung durch Varianz-Inflation 稳定汤普森抽样:因通货膨胀差异而得出的有效推论 2505.23260v1 -
199 05-29 BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL BOFormer: Lernen, Multi-Objektive Bayesian Optimierung über nicht-Markovian RL zu lösen BOFormer: 学会通过非马尔科维安RL解决多目标巴耶斯最佳利用 2505.21974v2 -
200 05-29 Skywork Open Reasoner 1 Technical Report Skywork Open Reasoner 1 Technischer Bericht ” 天窗开放理由1 “ 技术报告 2505.22312v2 -
201 05-29 Tensor Product Attention Is All You Need Tensor Produkt-Achtung ist alles, was Sie brauchen 色素产品 关注是所有你需要的 2501.06425v4 -
202 05-29 Sparseformer: a Transferable Transformer with Multi-granularity Token Sparsification for Medical Time Series Classification Sparseformer: ein übertragbarer Transformer mit Multigranularitäts-Tokensparsifikation für die Klassifizierung medizinischer Zeitreihen 分散式分析器:医疗时间序列分类的可转让变异器,具有多管质质调分法 2503.15578v2 -
203 05-29 RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting RiverMamba: Ein staatliches Weltraummodell für globale Flussentladung und Hochwasserprognose RiverMamba:全球河流排泄和洪水预报国家空间模型 2505.22535v2 -
204 05-29 Accelerating RLHF Training with Reward Variance Increase Beschleunigung des RLHF-Trainings mit Belohnungsvarianzsteigerung 加快RLHF培训,增加奖励差异 2505.23247v1 -
205 05-29 Measuring Participant Contributions in Decentralized Federated Learning Messung der Teilnehmerbeiträge im dezentralisierten Föderierten Lernen 分权联邦学习中的衡量参与者贡献 2505.23246v1 -
206 05-29 Are You Using Reliable Graph Prompts? Trojan Prompt Attacks on Graph Neural Networks Verwenden Sie zuverlässige Graph-Prompts? Trojanische Prompt-Angriffe auf Graph-Neural-Netzwerke 你用的是可靠图形提示吗? Trojan对图形神经网络的迅速攻击 2410.13974v2 -
207 05-29 Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts Autonome Datenauswahl mit Zero-shot Generative Klassifikatoren für mathematische Texte 具有数学文本零光生成分类器的自动数据选择 2402.07625v6 -
208 05-29 Equivalence of stochastic and deterministic policy gradients Gleichwertigkeit stochastischer und deterministischer politischer Gradienten 政策梯度和确定性政策梯度等同 2505.23244v1 -
209 05-29 Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game Sprachagenten mit Verstärkung Lernen für strategisches Spiel im Werwolf Spiel 在狼人游戏中进行战略游戏强化学习的语文代理 2310.18940v4 -
210 05-29 Joint estimation of smooth graph signals from partial linear measurements Gemeinsame Schätzung glatter Graphensignale aus partiellen linearen Messungen 对部分线性测量得出的平滑图示信号的联合估计 2505.23240v1 -
211 05-29 Learn Singularly Perturbed Solutions via Homotopy Dynamics Singulär perturbed Lösungen über Homotopy Dynamics lernen 通过智多基动力学学习单点受扰动的解决方案 2502.00488v3 -
212 05-29 HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model HiDe-LlaVA: Hierarchische Entkopplung zur kontinuierlichen Instruktionstuning von multimodalen Großsprachenmodellen HIDE-LLALAVA:多式大语言模式连续教学制导的等级脱钩 2503.12941v2 -
213 05-29 Graph Random Walk with Feature-Label Space Alignment: A Multi-Label Feature Selection Method Graph Random Walk mit Feature-Label-Raumausrichtung: Eine Multi-Label-Feature-Auswahlmethode 带有地貌标签空间对齐的任意漫步图图 : 多标签特征选择方法 2505.23228v1 -
214 05-29 am-ELO: A Stable Framework for Arena-based LLM Evaluation am-ELO: Ein stabiles Rahmenwerk für Arena-basierte LLM-Evaluierung AM-ELO:基于竞技场的LLM评价稳定框架 2505.03475v2 -
215 05-29 Generalizability vs. Counterfactual Explainability Trade-Off Generalisierbarkeit vs. gegenfaktische Erklärbarkeit Trade-Off 通用与反事实解释 2505.23225v1 -
216 05-29 JANET: Joint Adaptive predictioN-region Estimation for Time-series JANET: Gemeinsame adaptive Vorhersage-Region Schätzung für Zeitreihen JANET: 时间序列联合适应性预测N-区域估算 2407.06390v2 -
217 05-29 A Signed Graph Approach to Understanding and Mitigating Oversmoothing in GNNs Ein signierter Graphansatz zum Verständnis und zur Milderung von Übersäuerung in GNNs 签署《理解和减缓全球NNNs中过度过度使用问题图表方法》 2502.11394v2 -
218 05-29 Daunce: Data Attribution through Uncertainty Estimation Daunce: Datenzuweisung durch Unsicherheitsabschätzung Daunce:通过不确定性估计数据归属 2505.23223v1 -
219 05-29 Trajectory Generator Matching for Time Series Trajektorie Generator passend für Zeitreihen 时间序列匹配轨迹生成器 2505.23215v1 -
220 05-29 Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model Engere Datenschutzprüfung von DP-SGD im Hidden State Threat Model 对隐藏国家威胁模式DP-SGD的更严格隐私审计 2405.14457v3 -
221 05-29 Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces Verbesserung der parallelen Programmleistung mit LLM-Optimierern über Agent-System-Schnittstellen 通过代理-系统接口改进与LLM优化器的平行方案绩效 2410.15625v3 -
222 05-29 On the performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models Über die Leistung von Monte Carlo mit maschinellem Lernen bei der Probenahme von einfachen Modellen der statistischen Physik 关于机械学习辅助蒙特卡洛利用简单统计物理模型取样的 2505.22598v2 -
223 05-29 Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM Auf dem Weg zu einer robusten, überlappenden Spracherkennung: Ein Lautsprecher-Bewusst-Progressiver Ansatz mit WavLM 争取强劲的超重叠语音探测:使用WavLM 的演讲者-警示渐进方法 2505.23207v1 -
224 05-29 Disentangled Multi-span Evolutionary Network against Temporal Knowledge Graph Reasoning Disentangled Multi-Span Evolutionary Network gegen Temporal Knowledge Graph Reasoning 对抗时间知识图表推理的多空间演进网络 2505.14020v2 -
225 05-29 Aligning Text to Image in Diffusion Models is Easier Than You Think Text an Bild in Diffusions-Modellen ausrichten ist einfacher, als Sie denken 在传播模型中将文本对齐到图像比您想象的容易 2503.08250v4 -
226 05-29 JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows JAPAN: Gemeinsame adaptive Vorhersagebereiche mit Normalisierungs-Flows JAPAN: 联合适应性预测区与标准化花束 2505.23196v1 -
227 05-29 Less is More: Unlocking Specialization of Time Series Foundation Models via Structured Pruning Weniger ist mehr: Unlocking Spezialisierung von Time Series Foundation Models über strukturiertes Pruning 较少是更多:通过结构式普鲁宁解锁时间序列基础模型的专业化 2505.23195v1 -
228 05-29 Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection Multimodale Inverse Aufmerksamkeit Netzwerk mit Intrinsic Discriminant Feature Exploitation für gefälschte Nachrichten Erkennung 多式反向关注网络,利用内在差异性地貌特征利用假新闻探测 2502.01699v2 -
229 05-29 Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics Beyond Zero Initialization: Untersuchung der Auswirkungen von Non-Zero Initialization auf LoRA Fine-Tuning Dynamics 零启动后零启动后:调查非零初始化对LORA微调动力学的影响 2505.23194v1 -
230 05-29 DeepRTE: Pre-trained Attention-based Neural Network for Radiative Tranfer DeepRTE: Pre-trained Aufmerksamkeit-basiertes Neural-Netzwerk für Radiative Tranfer DeepRTE: 培训前的辐射Tranfer神经网络,以关注为主的神经网络 2505.23190v1 -
231 05-29 Plug In and Learn: Federated Intelligence over a Smart Grid of Models Plug In and Learn: Federated Intelligence über ein Smart Grid aus Modellen 插插插和学习:对智能模型网的联邦情报 2302.04363v4 -
232 05-29 Dequantified Diffusion-Schr{ö}dinger Bridge for Density Ratio Estimation Dequantifizierte Diffusion-Schr{ö}dinger-Brücke für Dichte-Verhältnis-Schätzung 密度比率估计的量化扩散 - Schrdinger桥 2505.05034v3 -
233 05-29 Unsupervisedly Learned Representations: Should the Quest be Over? Unüberwacht gelernte Repräsentationen: Sollte die Suche vorbei sein? 无人监督的派任代表:调查是否应该结束? 2001.07495v6 -
234 05-29 Rethinking Positive Pairs in Contrastive Learning Positive Paare im kontrastistischen Lernen neu denken 在反竞争学习中重新思考正对对 2410.18200v2 -
235 05-29 Improving the Effective Receptive Field of Message-Passing Neural Networks Verbesserung des effektiven Empfangsfeldes von message-passing Neural Networks 改进信息传送神经网络的有效接收领域 2505.23185v1 -
236 05-29 Two Is Better Than One: Rotations Scale LoRAs Zwei ist besser als eins: Rotationsskala LoRAs 二比一好:轮作规模LORAs 2505.23184v1 -
237 05-29 MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network MADCluster: Modell-agnostische Anomalieerkennung mit selbstüberwachtem Clustering-Netzwerk MADCluster:使用自监管的集群网进行模型-不可知异常探测 2505.16223v2 -
238 05-29 FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation FSL-SAGE: Beschleunigung des Federated Split Learning durch Smashed Activation Gradient Abschätzung FSL-SAGE:通过分散的激励加速渐进式估算,加速联邦分化学习 2505.23182v1 -
239 05-29 FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification FreRA: Eine frequenzrefinierte Augmentation für kontrastives Lernen in der Zeitreihenklassifikation FreRA:关于时间序列分类的校对性学习频率改进 2505.23181v1 -
240 05-29 The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning Die Panaceas zur Verbesserung der Zersetzung mit geringem Rank im kommunikativ-effizienten Federated Learning 改善通信-高效联邦学习中低-兰克分解的全景 2505.23176v1 -
241 05-29 Contrastive Learning and Abstract Concepts: The Case of Natural Numbers Kontrastives Lernen und abstrakte Konzepte: Der Fall natürlicher Zahlen 差异学习和抽象概念:自然数字案例 2408.02247v6 -
242 05-29 Pseudo Multi-Source Domain Generalization: Bridging the Gap Between Single and Multi-Source Domain Generalization Pseudo-Multi-Source-Domain-Verallgemeinerung: Die Lücke zwischen Single- und Multi-Source-Domain-Verallgemeinerung überbrücken Pseudo多源多源通用化:缩小单一源和多源通用化之间的差距 2505.23173v1 -
243 05-29 Global Tensor Motion Planning Globale Tensor-Bewegungsplanung 全球时势规划 2411.19393v3 -
244 05-29 Pre-training for Recommendation Unlearning Vorschulung für Empfehlung Unlearning 建议培训前培训 2505.22649v2 -
245 05-29 Best Arm Identification with Possibly Biased Offline Data Best Arm Identification mit möglicherweise Biased Offline Daten 最佳武器标识(可能附带的离线数据) 2505.23165v1 -
246 05-29 Temporal Relation Extraction in Clinical Texts: A Span-based Graph Transformer Approach Temporale Beziehungsextraktion in klinischen Texten: Ein Span-basierter Graph Transformer-Ansatz 临床文本中的时间关系抽取时间关系:基于泛泛面的图形变形器方法 2503.18085v2 -
247 05-29 Implicit Inversion turns CLIP into a Decoder Implizite Inversion macht CLIP zu einem Decoder 隐隐性 Indicide Inversion 将 CLIP 转换为解码器 2505.23161v1 -
248 05-29 Topological Adaptive Least Mean Squares Algorithms over Simplicial Complexes Topologische Adaptive Least Mean Squares Algorithmen über Simplicial Complexes 简单综合体的地形适应性最低中度平方平方平方平方平方平方平方平 2505.23160v1 -
249 05-29 Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services Privacy-Aware Joint DNN Model Bereitstellung und Partitionierung Optimierung für kollaborative Edge Inferenz Services DNN 联合DNN 合作边缘推断服务示范部署和分离优化优化模式 2502.16091v3 -
250 05-29 Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners Größer, regularisiert, kategorisch: High-Kapacity-Wert-Funktionen sind effiziente Multi-Task-Lerner 大型、正规、分类:高能力价值功能是高效多任务学习者 2505.23150v1 -
251 05-29 FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing FlowAlign: Trajektorie-regularisierte, inversionsfreie Fluss-basierte Bildbearbeitung 流动对等: 轨迹- 重新分类、 转换- 无流动图像编辑 2505.23145v1 -
252 05-29 OmniArch: Building Foundation Model For Scientific Computing OmniArch: Building Foundation Model for Scientific Computing OmniArch:建筑基金会科学计算模型 2402.16014v3 -
253 05-29 Policy Filtration for RLHF to Mitigate Noise in Reward Models Politische Filtration für RLHF zur Mititation von Lärm in Prämienmodellen 将RLHF政策归类为奖励模型中最小噪音的政策 2409.06957v4 -
254 05-29 Learning to Reason under Off-Policy Guidance Unter außerpolitischer Anleitung zur Vernunft lernen 根据非政策指导学习理由 2504.14945v4 -
255 05-29 VERINA: Benchmarking Verifiable Code Generation VERINA: Benchmarking der überprüfbaren Code-Generierung VERINA:可核实代码生成基准 2505.23135v1 -
256 05-29 DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs DOPPLER: Dual-Policy-Lernen für die Gerätezuordnung in asynchronen Datenflussgraphen DOPPLER: 同步数据流图表中设备分配的双政策学习 2505.23131v1 -
257 05-29 Developing Cryptocurrency Trading Strategy Based on Autoencoder-CNN-GANs Algorithms Entwicklung einer Cryptowährungs-Handelsstrategie auf der Grundlage von Autoencoder-CNN-GAN-Algorithmen 制定基于自动编码器-CNN-GANs算法的加密货币交易战略 2412.18202v5 -
258 05-29 Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network Surrogate-Assisted Evolutionary Verstärkung Lernen auf der Grundlage von Autoencoder und Hyperbolic Neural Network 基于自动编码器和双曲神经网络的代用辅助辅助进化辅助进化强化学习 2505.19423v2 -
259 05-29 Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals Lernen, in wiederholten Hauptagenten-Problemen mit Adversarial Agent Ankunft zu fördern 学习鼓励与抵达时的对冲代理人员重复发生主要问题 2505.23124v1 -
260 05-29 BroadGen: A Framework for Generating Effective and Efficient Advertiser Broad Match Keyphrase Recommendations BroadGen: Ein Framework zur Generierung effektiver und effizienter Advertiser Broad Match Keyphrase-Empfehlungen BloadGen:一个产生有效和高效广告的高效和高效广告大匹配关键词句建议的框架 2505.19164v2 -
261 05-29 CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark CASS: Nvidia zu AMD Transpilation mit Daten, Modellen und Benchmark CASS: Nvidia 到AMD 传输数据、模型和基准 2505.16968v3 -
262 05-29 To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay Zu richten oder nicht zu richten: LLM-Richtungen für Werbetreibende Keyphrase Relevanz bei eBay verwenden 法官或非法官:在eBay使用LLM判决来作广告 2505.04209v2 -
263 05-29 Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking Dekom-Renorm-Merge: Modellzusammenführung auf dem richtigen Raum verbessert Multitasking Decom-Renorm-Meorge:正确空间的模型合并改进多重任务 2505.23117v1 -
264 05-29 Learning to Reason from Feedback at Test-Time Von Feedback bei Test-Time zur Vernunft lernen 从测试时的反馈中学习到理由 2502.15771v2 -
265 05-29 CrossLinear: Plug-and-Play Cross-Correlation Embedding for Time Series Forecasting with Exogenous Variables CrossLinear: Plug-and-Play-Cross-Korrelation für Zeitreihenvorhersage mit exogenen Variablen einbetten Crossliear: 用外源变量预测时间序列的插件和插件交叉校正嵌入 2505.23116v1 -
266 05-29 Instance-dependent Convergence Theory for Diffusion Models Instanz-abhängige Konvergenztheorie für Diffusionsmodelle 扩散模型集成模型理论 2410.13738v2 -
267 05-29 FutureGen: LLM-RAG Approach to Generate the Future Work of Scientific Article FutureGen: LLM-RAG Ansatz zur Generierung der zukünftigen Arbeit des wissenschaftlichen Artikels FutureGen:LLM-RAG 产生科学条款未来工作的方法 2503.16561v2 -
268 05-29 Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery Neural Interpretable PDEs: Harmonisierung Fourier Insights mit Aufmerksamkeit für skalierbare und Interpretierbare Physik Discovery 神经可解释的PDEs:协调Fourier Insights,注意可缩放和可解释的物理发现 2505.23106v1 -
269 05-29 LUMION: Fast Fault Recovery for ML Jobs Using Programmable Optical Fabrics LUMION: Schnelle Fehlerwiederherstellung für ML-Jobs mit programmierbaren optischen Stoffen LUMION: 使用可编程光学制造器快速回收 ML 工作 2505.23105v1 -
270 05-29 Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret Ungefähre Thompson-Probenahme für das Lernen linearer quadratischer Regulatoren mit $O(\sqrt{T})$ Bedauern Thompson 学习线性赤道调节器的近似 Thompson 抽样 以 $(\ sqrt{T}) regret $(\ sqrt{T}) 为学习线性赤道调节器 2405.19380v2 -
271 05-29 Weight Spectra Induced Efficient Model Adaptation Gewicht Spectra Induzierte effiziente Modellanpassung 引导有效模型适应 2505.23099v1 -
272 05-29 Learning to Search for Vehicle Routing with Multiple Time Windows Lernen, nach Fahrzeug Routing mit mehreren Zeitfenstern zu suchen 学习搜索多时间窗口运行的车辆 2505.23098v1 -
273 05-29 Stochastic Diffusion: A Diffusion Based Model for Stochastic Time Series Forecasting Stochastische Diffusion: Ein diffusionsbasiertes Modell für stochastische Zeitreihen 斯托卡扩散:以传播为基础的斯托卡时间序列预测模型 2406.02827v2 -
274 05-29 Constraints and Variables Reduction for Optimal Power Flow Using Hierarchical Graph Neural Networks with Virtual Node-Splitting Einschränkungen und Variablen-Reduktion für optimalen Stromfluss mittels Hierarchischer Graphen-Neural-Netzwerke mit virtuellem Knoten-Splitting 利用具有虚拟节点切除功能的等级形图形神经网络减少最佳电力流动的制约因素和变数 2411.06268v2 -
275 05-29 MAP: Revisiting Weight Decomposition for Low-Rank Adaptation KARTE: Wiederbesuchen der Gewichtsverringerung für Low-Rank-Anpassung MAP: 重新审视低浓度适应的重量分解 2505.23094v1 -
276 05-29 Equivariant Spherical Transformer for Efficient Molecular Modeling Equivarianter Spherical Transformer für effiziente molekulare Modellierung 高效分子建模的等同球质变变变器 2505.23086v1 -
277 05-29 Gradient Boosting Decision Tree with LSTM for Investment Prediction Gradienten Auftrieb Entscheidungsbaum mit LSTM für Investitionsvorhersage 与 LSTM 一起逐步促进投资预测决策树 2505.23084v1 -
278 05-29 Gradient Methods with Online Scaling Part I. Theoretical Foundations Gradient Methoden mit Online-Skalierung Teil I. Theoretische Grundlagen 在线扩展第一部分的渐进方法 理论基础 2505.23081v1 -
279 05-29 Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble Zweite Meinungsfrage: Auf dem Weg zu adaptiver klinischer KI über den Konsens des Expert Model Ensembles 第二意见事项:通过专家示范组共识实现适应性临床AI 2505.23075v1 -
280 05-29 Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts Shortcut-verbundene Experten-Parallelität für die Beschleunigung von Mixture-of-Experts 加速混合专家专家专家平行专家 2404.05019v3 -
281 05-29 Multi-Modal Learning with Bayesian-Oriented Gradient Calibration Multi-Modal-Lernen mit Bayesian-Oriented Gradient Calibration 多模式学习,以巴耶斯为主的梯度校准 2505.23071v1 -
282 05-29 Sparse Linear Bandits with Blocking Constraints Sparse Linear Bandits mit Blockierung Einschränkungen 带有阻塞限制的粗细线条强力 2410.20041v2 -
283 05-29 GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers GrokFormer: Graph Fourier Kolmogorov-Arnold Transformer GrokFormer:图示 Fourier Kolmogorov-Arnold变形器 2411.17296v3 -
284 05-29 Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling Skalierung von Flüssig-Resistenz-Netzwerken für eine effiziente Sequenzmodellierung 增强增强流动性恢复力的流动性能力网络,以建立高效序列建模 2505.21717v2 -
285 05-29 SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models SORSA: Singuläre Werte und Orthonormale Regularisierte Singuläre Vektoren Anpassung großer Sprachmodelle SORSA: 单项价值和正正正的正规化的单项矢量,以适应大语言模式 2409.00055v6 -
286 05-29 M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes M3Bench: Benchmarking Ganzkörper-Bewegungs-Generation für mobile Manipulation in 3D-Szenen M3Bench:3D场景移动操纵基准全体运动生成 2410.06678v3 -
287 05-29 Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems Topologisches Strukturlernen sollte eine Forschungspriorität für LLM-basierte Multi-Agent-Systeme sein 地形结构学习应成为以LLM为基础的多种机构系统的研究重点 2505.22467v2 -
288 05-29 Efficient Quantum Approximate $k$NN Algorithm via Granular-Ball Computing Effiziente Quanten Ungefähre $k$NN-Algorithmus über Granular-Ball Computing 通过颗粒球式计算机计算, 近于 $k$NN 的高效量量量 2505.23066v1 -
289 05-29 Machine Learning Framework for Characterizing Processing-Structure Relationship in Block Copolymer Thin Films Machine Learning Framework zur Charakterisierung von Verarbeitungs-Struktur-Beziehungen in Block Copolymer Thin Films 确定胶合聚合薄薄膜加工-结构关系特征的机械学习框架 2505.23064v1 -
290 05-29 Loss-Guided Model Sharing and Local Learning Correction in Decentralized Federated Learning for Crop Disease Classification Loss-Guided Model Sharing und lokale Lernkorrektur bei dezentralisiertem Föderated Learning für die Klassifizierung von Crop Diseases 关于作物疾病分类的分散化联邦学习中损失指导模式共享和地方学习校正 2505.23063v1 -
291 05-29 Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data Composite Flow passend zum Verstärkungslernen mit Shifted-Dynamics-Daten 与上下动动量数据匹配的强化学习综合流程 2505.23062v1 -
292 05-29 Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design Spekulative Dekodierung trifft auf Quantisierung: Kompatibilitätsbewertung und Hierarchisches Framework Design 投机性下限符合量化:兼容性评价和等级框架设计 2505.22179v2 -
293 05-29 DINGO: Constrained Inference for Diffusion LLMs DINGO: Beschränkte Schlussfolgerung für Diffusion LLMs DINGO: 扩散长效LMM的连续推论 2505.23061v1 -
294 05-29 Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization Verbesserte letzte Konvergenz der schrumpfenden Gradienten-Methoden für rauchfreie Convex-Optimierung 优化非移动convex最佳化的渐进式打碎方法的改进后最后 2505.23056v1 -
295 05-29 CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents CDR-Agent: Intelligente Auswahl und Durchführung klinischer Entscheidungsregeln unter Verwendung von Large Language Model Agents CDR-代理:明智选择和执行使用大语言示范物剂的临床决定规则 2505.23055v1 -
296 05-29 Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network Lernen von suboptimalen Daten in der kontinuierlichen Kontrolle über Auto-Regressive Soft Q-Network 通过自动递减软软QNetwork, 从连续控制中的亚最佳数据中学习 2502.00288v2 -
297 05-29 DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration DenoiseRotator: Verbesserung der Beschneidungsfestigkeit für LLMs durch Bedeutungskonzentration DenoisRotator:通过重视浓度提高LLMs的稳健力 2505.23049v1 -
298 05-29 ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory Imputation ProDiff: Prototypen-geführte Diffusion für minimale Information Trajektorie Imputation ProDiff: 用于最小信息轨迹截肢的原型类型辅助扩散 2505.23048v1 -
299 05-29 Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping Nicht konvexe stochastische Optimierung unter schwerfälligen Geräuschen: Optimale Konvergenz ohne gradientes Clipping 在重困噪音下非convex 斯托卡优化: 没有梯度缩放的最佳趋同 2412.19529v4 -
300 05-29 From Theory to Application: Fine-Tuning Large EEG Model with Real-World Stress Data Von der Theorie zur Anwendung: Feintuning-Großes EEG-Modell mit realen Stressdaten 从理论到应用:使用现实世界应激数据精美应用大型电子EEG模型 2505.23042v1 -
301 05-29 TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation TINED: GNNs-to-MLPs von Lehrerinjektion und Dirichlet Energy Destillation TINED:通过教师注射和稀释能源蒸馏,将GNNs改为MLP 2412.11180v3 -
302 05-29 One Model for One Graph: A New Perspective for Pretraining with Cross-domain Graphs Ein Modell für einen Graphen: Eine neue Perspektive für das Pretraining mit domänenübergreifenden Graphen 一图一模型:带有跨领域图的训练前新视角 2412.00315v2 -
303 05-29 Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation Cross-modal RAG: Sub-dimensionale Retrieval-Augmented Text-to-Image Generation 跨模式RAG:次二维检索增强的文本到图像生成 2505.21956v2 -
304 05-29 Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction Case-Based Reasoning verbessert die vorausschauende Kraft von LLMs in der Arzneimittel-Drogen-Interaktion 以个案为依据的理由加强药物-药物相互作用LLMs的预测能力 2505.23034v1 -
305 05-29 Exploring the Limitations of Mamba in COPY and CoT Reasoning Erforschung der Grenzen von Mamba in COPY und CoT Reasoning 探索COPY和COT理由解释中Mamba的局限性 2410.03810v3 -
306 05-29 AntiLeakBench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge AntiLeakBench: Datenkontamination durch automatisches Konstruieren von Benchmarks mit aktualisiertem Real-World-Wissen verhindern 防止泄漏:利用最新现实世界知识自动建立基准,防止数据污染 2412.13670v2 -
307 05-29 Bayesian Neural Scaling Laws Extrapolation with Prior-Fitted Networks Bayesische Neural Scaling-Gesetze Extrapolation mit vormontierten Netzwerken Bayesian神经扩增法与事先确定网络的外推法 2505.23032v1 -
308 05-29 Diverse Prototypical Ensembles Improve Robustness to Subpopulation Shift Unterschiedliche prototypische Ensembles verbessern die Robustheit der Subpopulationsverschiebung 提高亚人口变换能力 2505.23027v1 -
309 05-29 Graph Wave Networks Graphische Wellennetze 图图波网络 2505.20034v2 -
310 05-29 Offline Learning for Combinatorial Multi-armed Bandits Offline-Lernen für kombinatorische Multi-Armed Bandits 多武装混合强盗离线学习 2501.19300v2 -
311 05-29 An Empirical Study of Federated Prompt Learning for Vision Language Model Eine empirische Studie über Federated Prompt Learning for Vision Language Model 联邦快速学习促进愿景语言模式经验研究 2505.23024v1 -
312 05-29 GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning GuardAgent: LLM-Agenten durch einen Guard Agent durch wissensgestützte Vernunft schützen 警卫人员:由警卫人员通过 “ 知识化理由 “ 保护有限责任公司代理 2406.09187v3 -
313 05-29 SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference SCORPIO: Den richtigen Anfragen zur richtigen Zeit für heterogene SLOs in LLM-Schlussfolgerung dienen 在LLM推理中异基因性溶液的适当时间满足正确的要求 2505.23022v1 -
314 05-29 SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models SciHorizon: Benchmarking von KI-für-Science Readiness von wissenschaftlichen Daten zu großen Sprachmodellen SciHorizon:将AI-SciHorizon科学准备程度从科学数据基准确定为大语言模式 2503.13503v3 -
315 05-29 BECAME: BayEsian Continual Learning with Adaptive Model MErging BECAME: BayEsian Continual Learning mit adaptivem Modell-Merging BECAME: 采用适应性示范招生模型的巴伊连续学习 2504.02666v2 -
316 05-29 $K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting $K^2$VAE: Ein Koopman-Kalman-Verbesserter Variations-AutoEncoder für probabilistische Zeitreihenprognosen 2美元VAE: 概率时间序列预测的Koopman-Kalman增强变异自动编码器 2505.23017v1 -
317 05-29 Hyperbolic-PDE GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations Hyperbolic-PDE GNN: Spektral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations GNN: 从超曲偏偏部分异差系统的角度看待光谱图形神经网络 2505.23014v1 -
318 05-29 SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting SplitLoRA: Balance Stabilität und Plastizität im kontinuierlichen Lernen durch gradienten Raum Splitting Split LoRA:通过逐步空间分割在持续学习中平衡稳定和可塑性 2505.22370v2 -
319 05-29 Scalable Complexity Control Facilitates Reasoning Ability of LLMs Skalierbare Komplexitätskontrolle erleichtert die Fähigkeit von LLMs, sich zu verankern C. 便利理 理 动 利 利 利 利 商 利 利 利 利 利 商 利 利 利 利 利 商 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 2505.23013v1 -
320 05-29 BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models BA-LoRA: Bias-Alleviating Low-Rank Anpassung an Mitigate Katastrophische Vererbung in großen Sprachmodellen BA-LORA:在大语言模型中,对减轻灾害传承的低率适应 2408.04556v5 -
321 05-29 EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge EmergentTTS-Eval: Bewertung von TTS-Modellen auf komplexe Prosodic, Expressivität und sprachliche Herausforderungen mit Model-as-a-Judge 新兴TTS-Eval:利用 “ 模拟即审法官 “ 评估关于复杂立案、表达性和语言挑战的TTS模型 2505.23009v1 -
322 05-29 QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining QLIP: Eine dynamische Quadtree Vision verbessert die MLLM-Performance ohne Umschulung QLIP: 动态的四方愿景,事先提高MLLM业绩,不再培训 2505.23004v1 -
323 05-29 Universal Sequence Preconditioning Universelle Sequenz Vorkonditionierung 通用序列序序预设 2502.06545v2 -
324 05-29 Hybrid Cross-domain Robust Reinforcement Learning Hybrides Cross-Domain Robustes Verstärkungslernen 跨部门加强强化学习 2505.23003v1 -
325 05-29 Improved and Oracle-Efficient Online $\ell_1$-Multicalibration Verbesserte und Oracle-Effizient Online $\ell_1$-Multikalibrierung 改进和 Oracle-Effacient 在线 $\ell_1美元-多边校准 2505.17365v2 -
326 05-29 Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning Dolphin: Ein programmierbares Framework für skalierbares neurosymbolisches Lernen Dolphin: 可缩放的神经元学习程序框架 2410.03348v4 -
327 05-29 A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints Ein Bayesian Modellauswahl-Kriterium für die Auswahl von Vortrainings-Checkpoints 选择培训前检查站的巴伊西亚示范甄选标准标准 2410.05612v2 -
328 05-29 HydraNet: Momentum-Driven State Space Duality for Multi-Granularity Tennis Tournaments Analysis HydraNet: Momentum-getriebene State Space-Dualität für Multi-Granularity-Tennisturniere Analyse HydraNet: 动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力- 2505.21882v2 -
329 05-29 Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment Jenseits der Belohnung Hacking: Kausale Belohnungen für großsprachige Modellausrichtung 优胜后加分:大语言模型对齐的因果奖励 2501.09620v2 -
330 05-29 ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning ReinFlow: Feinsteuerungs-Flow Matching-Politik mit Online-Verstärkungs-Lernen ReinFlow: 与在线强化学习匹配流动政策的微调 2505.22094v2 -
331 05-29 Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement Ist Achtung für Transformer-Inferenz erforderlich? Erkunden Sie Funktionserhaltende Aufmerksamkeitsersatz 需要注意吗? 探索功能保持注意替换 2505.21535v2 -
332 05-29 LLM Agents for Bargaining with Utility-based Feedback LLM-Agenten für Schnäppchen mit Utility-basiertem Feedback LLM 与基于利用的反馈进行交涉的代理代理 2505.22998v1 -
333 05-29 Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features Theoretische Grundlagen des Deep Copula Klassifikators: Ein generativer Ansatz zur Modellierung abhängiger Merkmale 深 Cocula 分类法理论基础:建模附属地貌的开创性方法 2505.22997v1 -
334 05-29 Walking the Weight Manifold: a Topological Approach to Conditioning Inspired by Neuromodulation Wiege manifold gehen: ein topologischer Ansatz zur Konditionierung Inspiriert durch Neuromodulation 身穿轻重背重力:在神经调节的启发下,从地形学角度处理条件问题 2505.22994v1 -
335 05-29 Number of Clusters in a Dataset: A Regularized K-means Approach Anzahl der Cluster in einem Datensatz: Ein regularisierter K-Mittelansatz 数据集中的组群数量:正规化的K手段方法 2505.22991v1 -
336 05-29 MenTeR: A fully-automated Multi-agenT workflow for end-to-end RF/Analog Circuits Netlist Design MenTeR: Ein vollautomatisierter Multi-AgenT-Workflow für End-to-End-RF/Analog-Schaltungen Netlist Design MenTeR: 终端至终端RF/Analog 电路网络列表设计全自动多元T工作流程 2505.22990v1 -
337 05-29 Effects of Dropout on Performance in Long-range Graph Learning Tasks Auswirkungen des Dropouts auf die Leistungsfähigkeit in großflächigen Graphen-Lernaufgaben 辍学对远程图表学习任务绩效的影响 2502.07364v2 -
338 05-29 Model-Preserving Adaptive Rounding Modellschonende adaptive Rundung 模型保护适应性四舍五入 2505.22988v1 -
339 05-29 Knowledge Distillation for Reservoir-based Classifier: Human Activity Recognition Wissensdestillation für Reservoir-basierte Klassifikator: Menschliche Aktivitätserkennung 以储量为基础的分类法知识蒸馏:人类活动认识 2505.22985v1 -
340 05-29 A Computational Approach to Improving Fairness in K-means Clustering Ein Computational Approach zur Verbesserung der Fairness im K-Mittel-Clustering 改进K类手段分类组合的公平性计算方法 2505.22984v1 -
341 05-29 MedRAX: Medical Reasoning Agent for Chest X-ray MedRAX: Medizinischer Reasoning Agent für Bruströntgen MedraX: 胸前X光医疗理疗代理 2502.02673v2 -
342 05-29 Theoretical guarantees on the best-of-n alignment policy Theoretische Garantien für die optimale Ausrichtungspolitik 关于最佳协调政策理论保障 2401.01879v3 -
343 05-29 Learning coordinated badminton skills for legged manipulators Koordinierte Badminton-Fähigkeiten für Legged Manipulatoren lernen 为腿脚操纵者学习协调的羽毛球技能 2505.22974v1 -
344 05-29 EquiReg: Equivariance Regularized Diffusion for Inverse Problems EquiReg: Äquivarianz Regularisierte Diffusion für Inverse Probleme equireg: 用于反向问题的公平、正规化传播 2505.22973v1 -
345 05-29 Minimal Sufficient Views: A DNN model making predictions with more evidence has higher accuracy Minimal Ausreichende Ansichten: Ein DNN-Modell, das Vorhersagen mit mehr Beweisen macht, hat höhere Genauigkeit 最低限度的充分意见:一个DNN模型,用更多证据作出预测,其准确性更高 2402.01095v2 -
346 05-29 MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming MermaidFlow: Neudefinition der agentischen Workflow-Generierung durch sicherheitsbeschränkte evolutionäre Programmierung 美人鱼:通过受安全限制的进化方案拟订,重新确定干燥性工作流的产生 2505.22967v1 -
347 05-29 Exploring Scaling Laws for EHR Foundation Models Erforschung von Skalierungsgesetzen für EHR-Stiftungsmodelle 探索EHR基金会模式的扩展法律 2505.22964v1 -
348 05-29 INRFlow: Flow Matching for INRs in Ambient Space INRFlow: Flow Passend für INRs im Umgebungsraum INFRFlow: 环境空间IRR的流量匹配 2412.03791v2 -
349 05-29 ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind ToMAP: Training Gegner-Bewusst LLM überzeugt mit Theorie des Geistes ToMAP:培训有思想理论的对抗者软件软件LLM 2505.22961v1 -
350 05-29 Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness Multi-Agenten-Debatte als Test-Time Scaling: Eine systematische Studie der bedingten Wirksamkeit 重新审议作为试验时间尺度的多机构辩论:对有条件有效性的系统研究 2505.22960v1 -
351 05-29 Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View Enthüllen von Umweltauswirkungen von großsprachigen Modellen: Eine funktionale Einheitsansicht 大型语文服务模式的不懈环境影响:职能单位观点 2502.11256v2 -
352 05-29 CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance CodeSteer: Symbolisch-Augmentierte Sprachmodelle über Code/Text Anleitung 代码器:通过编码/文本指导的代码/文本指导的代码器:代号辅助语言模式 2502.04350v2 -
353 05-29 Understanding Bias Reinforcement in LLM Agents Debate Verständnis der Bias-Verstärkung in LLM-Agenten-Debatte 了解LLLM代理商的强化申请 2503.16814v2 -
354 05-29 Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach Leistungsgarantie Vergiftung Angriffe im Föderierten Lernen: Ein Schiebemodus Ansatz 联邦学习中保证中毒袭击的绩效:一种脱落模式方法 2505.16403v2 -
355 05-29 CellFlux: Simulating Cellular Morphology Changes via Flow Matching CellFlux: simulierende zelluläre Morphologie-Änderungen durch Flow Matching 细胞通量:通过流动匹配模拟细胞生理变化 2502.09775v2 -
356 05-29 Directed Graph Grammars for Sequence-based Learning Gezielte Graphen-Grammatik für sequenzbasiertes Lernen 以序列为基础的学习方向图表语法 2505.22949v1 -
357 05-28 (3) NegVQA: Can Vision Language Models Understand Negation? NegVQA: Können Visions-Sprachmodelle Negation verstehen? NegVQA:视觉语言模式能理解差吗? 2505.22946v1 -
358 05-28 Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates Kann LLMs CLIP deciive? Benchmarking Adversarial Compositionalität der vortrainierten multimodalen Darstellung über Textaktualisierungen LLMs CLIP能否通过文本更新确定培训前多模式代表的反向构成基准? 2505.22943v1 -
359 05-28 Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified? Sind Domain Generalization Benchmarks mit Genauigkeit auf der Zeile falsch angegeben? 域通用基准与误标线的准确性是否一致? 2504.00186v2 -
360 05-28 Generative Social Choice: The Next Generation Generative soziale Wahl: Die nächste Generation 产生社会选择:下一代 2505.22939v1 -
361 05-28 Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models Ist die Lärmkonditionierung notwendig? Eine einheitliche Theorie der Bedingungslosen Graphen-Diffusionsmodelle 是否有必要设定噪音条件? 无条件图形扩散模型的统一理论 2505.22935v1 -
362 05-28 Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging Unraveling LoRA Interferenz: Orthogonale Subräume für robuste Modellzusammenführung 开放 LoRA 干涉度: 用于强力模型合并的正弦形子空间 2505.22934v1 -
363 05-28 K-Paths: Reasoning over Graph Paths for Drug Repurposing and Drug Interaction Prediction K-Paths: Begründung über Graphenpfade für Drogenrepurposing und Drogeninteraktionsvorhersage K-Paths: 以图解路径为依据进行药物再定位和药物相互作用预测 2502.13344v3 -
364 05-28 How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias Wie Transformer lernen Regelmäßige Spracherkennung: Eine theoretische Studie über Trainingsdynamik und Implizite Bias 变换人如何学习常规语言识别:关于培训动态和隐含偏见的理论研究 2505.00926v3 -
365 05-28 Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking Skalierbare Parameter und Speicher Effizientes Vortraining für LLM: Algorithmische Fortschritte und Benchmarking LLM的可缩放参数和记忆高效预修培训:最近的演算进展和基准 2505.22922v1 -
366 05-28 Unlocking Mental Health: Exploring College Students’ Well-being through Smartphone Behaviors Entsperren der psychischen Gesundheit: Erforschen des Wohlbefindens der Studenten durch Smartphone-Verhalten 解锁心理健康:通过智能手机行为探索大学生福祉 2502.08766v2 -
367 05-28 Enhancing Semi-supervised Learning with Zero-shot Pseudolabels Halbbeaufsichtigtes Lernen mit Null-Shot-Pseudo-Labels verbessern 用零弹Pseudo标签加强半监督的学习 2502.12584v2 -
368 05-28 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning cadrille: Multimodale CAD-Rekonstruktion mit Online-Verstärkung 与在线强化学习相结合的多模式 CAD重建 2505.22914v1 -
369 05-28 Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference Mustafar: Förderung unstrukturierter Sparsamkeit für KV Cache Pruning in LLM Inferenz Mustafar:在LLM推理中促进KV Cache Pruning的无结构平衡 2505.22913v1 -
370 05-28 GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation GraphEval: Ein leichter Graph-basierter LLM-Rahmen für die Idee-Evaluierung 图图Eval:基于轻量图图的理论评估LLM框架 2503.12600v2 -
371 05-28 Ensuring User-side Fairness in Dynamic Recommender Systems Gewährleistung der benutzerseitigen Fairness in dynamischen Recommender-Systemen 确保动态建议系统在用户方面的公平公正 2308.15651v3 -
372 05-28 SP2RINT: Spatially-Decoupled Physics-Inspired Progressive Inverse Optimization for Scalable, PDE-Constrained Meta-Optical Neural Network Training SP2RINT: Spatially-Decoupled Physics-Inspired Progressive Inverse Optimization für skalierbare, PDE-Constrained Meta-Optical Neural Network Training SP2RINT: 空间-减速物理激励-渐进式反向优化,用于可缩放、PDE-受培训的元神经网络培训 2505.18377v2 -
373 05-28 Defining Foundation Models for Computational Science: A Call for Clarity and Rigor Fundamentalmodelle für die Computerwissenschaft definieren: Ein Ruf nach Klarheit und Starrheit 界定计算科学基础模型:要求明确和严格 2505.22904v1 -
374 05-28 Norm-Bounded Low-Rank Adaptation Normgebundene Low-Rank-Anpassung 适应性 2501.19050v3 -
375 05-28 On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning Zum dynamischen Bedauern, dem regularisierten Führer zu folgen: Optimismus mit Geschichtsveredelung 在追赶正规领导人之后的强烈遗憾:对历史的乐观态度 2505.22899v1 -
376 05-28 The Geometry of ReLU Networks through the ReLU Transition Graph Die Geometrie von ReLU-Netzwerken durch den ReLU-Übergangsgraphen 通过 ReLU 过渡图绘制 ReLU 网络的几何图 2505.11692v2 -
377 05-28 Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory Neurale Netzwerke als universelle Finite-State-Maschinen: Eine konstruktive Deterministische Finite-Automaten-Theorie 神经网络作为普遍有限国家机器:具有建设性决定作用的有限自定义理论 2505.11694v2 -
378 05-28 A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization A Combinatorial Theory of Dropout: Subnetzwerke, Graphische Geometrie und Generalisierung 辍学综合理论:子网络、图形几何和一般化 2504.14762v2 -
379 05-28 Smart Surrogate Losses for Contextual Stochastic Linear Optimization with Robust Constraints Intelligente Surrogatverluste für kontextuelle stochastische Linearoptimierung mit robusten Einschränkungen 具有强力限制的内幕斯托卡式线性优化的智能代谢损失 2505.22881v1 -
380 05-28 Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks Signaldämpfung ermöglicht skalierbares dezentrales Multi-Agenten-Verstärkungslernen über Netzwerke 信号减速使可伸缩的分散式多试剂强化学习超越网络 2505.11461v2 -
381 05-28 CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models CFP-Gen: Kombinatorische funktionelle Proteinerzeugung über Diffusions-Sprachenmodelle CFP-Gen:通过传播语言模式生成混合功能性蛋白质 2505.22869v1 -
382 05-28 Multimodal Survival Modeling in the Age of Foundation Models Multimodale Überlebensmodellierung im Zeitalter der Gründungsmodelle 基金会时代多模式生存模型 2505.07683v2 -
383 05-28 CrossNAS: A Cross-Layer Neural Architecture Search Framework for PIM Systems CrossNAS: Ein Cross-Layer Neural Architecture Search Framework für PIM-Systeme CrossNAS:PIM系统跨行业神经结构搜索框架 2505.22868v1 -
384 05-28 Scaling Offline RL via Efficient and Expressive Shortcut Models Skalierung von Offline-RL über effiziente und Expressive Shortcut-Modelle 通过高效和直表达快捷键模式缩放离线 RL 2505.22866v1 -
385 05-28 Your Data, My Model: Learning Who Really Helps in Federated Learning Ihre Daten, mein Modell: Lernen, die wirklich hilft beim Federated Learning 您的数据, 我的模型: 学习谁真正帮助联邦学习 2409.02064v3 -
386 05-28 Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel Causal-PIK: Kausalitätsbasierte Physical Reasoning mit einem physikinformierten Kernel 原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-因物理内心造成的身体原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-因物理 2505.22861v1 -
387 05-28 Permissioned LLMs: Enforcing Access Control in Large Language Models Zugelassene LLMs: Erzwingen der Zugriffskontrolle in großen Sprachmodellen 获得许可的LLMM:在大语言模型中实施访问控制 2505.22860v1 -
388 05-28 NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding NGPU-LM: GPU-beschleunigtes N-Gram-Sprachenmodell für Kontext-Biasing in Greedy ASR-Dekodierung NGPU-LM: 加速GPU-加速型N-Gram语语模式,用于在贪婪ASR标记中进行背景切换 2505.22857v1 -
389 05-28 Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning Nutzung von nicht gekennzeichneten Daten durch Kernel-Funktion Annäherung im Offline-Verstärkungs-Lernen 在离线强化学习中,通过 Kernel 函数相近接近的内核功能利用未贴标签的数据分享来利用无标签数据分享 2408.12307v3 -
390 05-28 Point Cloud Synthesis Using Inner Product Transforms Punkt-Cloud-Synthese mit inneren Produkt-Transformationen 使用内产产品变换的点云合成 2410.18987v3 -
391 05-28 RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation RocqStar: Leveraging-ähnliche Retrieval- und Agentiksysteme für die Rocq-Generation RocqStar:利用利用相似度驱动回收系统和干系统来生成Rocq 2505.22846v1 -
392 05-28 Entropy-regularized Gradient Estimators for Approximate Bayesian Inference Entropie-regularisierte Gradienten-Estimatoren für ungefähre Bayesische Schlussfolgerung 用于近近贝耶斯推断的全天正规化梯度测算器 2503.11964v3 -
393 05-28 Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion Jenseits der Permutationssymmetrie der Transformer: Die Rolle der Rotation für die Modellfusion 变异器超越变异对称:变动对模型融合的作用 2502.00264v2 -
394 05-28 Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation Bayesian Attention Mechanism: Ein probabilistisches Framework für die Positionskodierung und Kontextlängen-Extrapolation Bayesian注意机制:定位编码和背景长度外推概率框架 2505.22842v1 -
395 05-28 Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study Kernelgeglättete Punktzahlen für die Denoisierung der Diffusion: Eine Bias-Varianz-Studie Disoising 扩散的内核悬浮分数:生物量变化研究 2505.22841v1 -
396 05-28 Development and Validation of SXI++ LNM Algorithm for Sepsis Prediction Entwicklung und Validierung von SXI++ LNM-Algorithmus für Sepsis-Vorhersage SXI+++ LNM 测距算法的制定和校验 2505.22840v1 -
397 05-28 How Do Diffusion Models Improve Adversarial Robustness? Wie verbessern Diffusionsmodelle die widrige Robustheit? 传播模型如何改善反逆能力? 2505.22839v1 -
398 05-28 Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies Bridging Distribution Shift und KI-Sicherheit: Konzeptionelle und methodische Synergien 搭桥分配转变与AI安全:概念与方法的协同作用 2505.22829v1 -
399 05-28 PGLearn – An Open-Source Learning Toolkit for Optimal Power Flow PGLearn – Ein Open-Source-Learning-Toolkit für optimalen Stromfluss PGLearn – – 最佳电力流动开放源学习工具包 2505.22825v1 -
400 05-28 Comparing Human and AI Rater Effects Using the Many-Facet Rasch Model Vergleich menschlicher und KI-Rater-Effekte mit dem Multi-Facet-Rasch-Modell 使用多面 Rasch 模型比较人类和AI Rater效应 2505.18486v2 -
401 05-28 Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection Hybride Disagreement-Diversity Aktives Lernen für die bioakustische Sound-Erkennung 生物声波声音事件探测发现活动积极学习 2505.20956v2 -
402 05-28 Scalable Differentially Private Bayesian Optimization Skalierbare differenzierte private Bayesian-Optimierung Bayesian优化化 2502.06044v2 -
403 05-28 When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations Wenn Kollaborative Filterung nicht kollaborativ ist: Unfairness von PCA für Empfehlungen 当协作过滤不是协作过滤时:常设仲裁院不公平以征求建议 2310.09687v2 -
404 05-28 Preference Learning with Response Time Präferenz-Lernen mit Reaktionszeit 具有响应时间的优先学习 2505.22820v1 -
405 05-28 IMTS is Worth Time $\times$ Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction IMTS ist Zeit wert $\times$ Channel Patches: Visual Masked Autoencoder für irreguläre Multivariate Time Series Prediction IMTS 是有价值的时间 $\ times$$ 频道补丁: 用于非常规多变时间序列预测的视觉蒙面自动编码器 2505.22815v1 -
406 05-28 Regression and Forecasting of U.S. Stock Returns Based on LSTM Regression und Prognose von US-Aktienrenditen basierend auf LSTM 根据LSTM对美国库存收益的回归和预测 2502.05210v3 -
407 05-28 X-Factor: Quality Is a Dataset-Intrinsic Property X-Factor: Qualität ist eine datensatzintrinsische Eigenschaft X 要素: 质量是一个数据集 - Intrins 属性 2505.22813v1 -
408 05-28 Credit Risk Identification in Supply Chains Using Generative Adversarial Networks Kreditrisikoidentifizierung in Lieferketten mit generativen Adversarial-Netzwerken 利用产生反逆网络的供应链中的信用风险识别 2501.10348v4 -
409 05-28 Highly Efficient and Effective LLMs with Multi-Boolean Architectures Hocheffiziente und effektive LLMs mit Multi-Boolean-Architekturen 多Boolean建筑群高效益、高效益、高效益、高效益、高效益、高效益的LLMs 2505.22811v1 -
410 05-28 Distribution free M-estimation Verteilungsfreie M-Schätzung 免费分发 M - 估计 2505.22807v1 -
411 05-28 Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation Anomalien durch Synthese: Anomalieerkennung mit generativen Diffusionsmodellen für Off-Road-Navigation 合成反常现象:使用非轨道导航生成扩散模型进行异常检测 2505.22805v1 -
412 05-28 CLUE: Neural Networks Calibration via Learning Uncertainty-Error alignment CLUE: Neurale Netzwerke Kalibrierung über Learning Uncertainty-Error Alignment CLUE:通过学习不确定性-差错对齐校准神经网络 2505.22803v1 -
413 05-28 Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning Instruct-SkillMix: Eine leistungsstarke Pipeline für LLM Instruction Tuning 指令- SkillMix: 用于LLM 指令导导图的强大管道 2408.14774v4 -
414 05-28 SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains SequentialBreak: Große Sprachmodelle können durch Einbetten von Jailbreak Prompts in Sequential Prompt Chains ausgeblendet werden 顺序式布雷克:大语言模型可以通过将破狱线索嵌入顺序式提示链来蒙骗大语言模型 2411.06426v3 -
415 05-28 Efficient Preimage Approximation for Neural Network Certification Effiziente Preimage-Annäherung für die Neural Network Zertifizierung 神经网络认证的高效预感近似率 2505.22798v1 -
416 05-28 DeSocial: Blockchain-based Decentralized Social Networks DeSocial: Dezentrale soziale Netzwerke auf Blockchain-Basis 社会:基于供应链的权力下放社会网络 2505.21388v2 -
417 05-28 The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli Das Empirische Mittel ist Minimax Optimal für lokale Glivenko-Cantelli 当地格利文科-坎泰利的经验中值为 Minimax 最佳当地格利文科-坎泰利 2410.02835v2 -
418 05-28 KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization KVQuant: In Richtung 10 Millionen Kontextlänge LLM-Inferenz mit KV Cache-Quantisierung KVQuant: 努力达到1000万个内长长LLM 与 KV 缓存量推论 2401.18079v6 -
419 05-28 Navigating the Latent Space Dynamics of Neural Models Navigation der latenten Raumdynamik von Neuralmodellen 导航内壳模型的冷层空间动态 2505.22785v1 -
420 05-28 On the definition and importance of interpretability in scientific machine learning Zur Definition und Bedeutung der Deutbarkeit im wissenschaftlichen maschinellen Lernen 关于科学机器学习中可解释性的定义和重要性 2505.13510v2 -
421 05-28 Adaptive Exploration for Multi-Reward Multi-Policy Evaluation Adaptive Exploration für Multi-Reward Multi-Policy-Bewertung 多方奖励多政策评价的适应性探索 2502.02516v2 -
422 05-28 Temporal Convolutional Autoencoder for Interference Mitigation in FMCW Radar Altimeters Temporal Convolutional Autoencoder für Interferenzmilderung in FMCW Radar Höhenmessern FMCC 雷达测高仪中用于减少干扰干扰的时时变自动算器 2505.22783v1 -
423 05-28 Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games Finite-Sample-Konvergenzgrenzen für die Optimierung der Treuhandregion-Politik in Mittelfeld-Spielen 平地运动会中信任区政策优化 2505.22781v1 -
424 05-28 Machine Learning Models Have a Supply Chain Problem Modelle des maschinellen Lernens haben ein Problem mit der Lieferkette 机器学习模式有供应链问题 2505.22778v1 -
425 05-28 GraphNarrator: Generating Textual Explanations for Graph Neural Networks GraphNarrator: Erzeugen von Texterklärungen für Graph Neuronale Netzwerke 图示记录器:生成图形神经网络的文字解释 2410.15268v2 -
426 05-28 The Value of Information in Human-AI Decision-making Der Wert von Informationen in der Mensch-AI-Entscheidungsfindung 信息在人类-大赦国际决策中的价值 2502.06152v4 -
427 05-28 Calibrated Value-Aware Model Learning with Stochastic Environment Models Kalibriertes wertbewusstes Modelllernen mit stochastischen Umweltmodellen 使用存储环境模型校准价值软件模型学习 2505.22772v1 -
428 05-28 Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting Multivariate de Bruijn Graphen: Ein symbolisches Graphen-Framework für die Vorhersage von Zeitreihen 布鲁伊图多变量图:时间序列预测符号图框架 2505.22768v1 -
429 05-28 Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks Degenerierung von Mess- und Regellösungen über aufgabenorientierte recurrente Neuralnetzwerke hinweg 跨任务技术经常性神经网络的退化 2410.03972v2 -
430 05-28 Test-time augmentation improves efficiency in conformal prediction Testzeitvergrößerung verbessert die Effizienz in der konformen Vorhersage 提高试验时间的提高提高符合预测的效率 2505.22764v1 -
431 05-28 Generalizable Representation Learning for fMRI-based Neurological Disorder Identification Generalisierbares Repräsentationslernen für die fMRI-basierte neurologische Störungserkennung FMRI基于神经疾病识别的神经疾病学学习 2412.16197v2 -
432 05-28 MIAS-SAM: Medical Image Anomaly Segmentation without thresholding MIAS-SAM: Medizinische Bildanomalie Segmentierung ohne Schwellenbildung MIAS-SAM: 医学形象非典型分割,无阈值 2505.22762v1 -
433 05-28 Non-convex entropic mean-field optimization via Best Response flow Nicht konvexe entropische Mittelfeld-Optimierung über Best Response Flow 通过最佳反应流程优化非convex 电子中位平均场 2505.22760v1 -
434 05-28 FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference FlashFormer: Ganzmodell-Kernel für effiziente Low-Batch-Inferenz FlashFormer: 用于高效低批量推断的全模块内核 2505.22758v1 -
435 05-28 Decomposing Elements of Problem Solving: What “Math” Does RL Teach? Zersetzende Elemente der Problemlösung: Was “Math” lehrt RL? 问题解决的分解要素:RL教什么“马思”? 2505.22756v1 -
436 05-28 Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling Darstellungsdynamiken von Diffusionsmodellen durch Low-Dimensional Modeling verstehen 通过低多样性建模理解通过低多样性建模传播模型的动态 2502.05743v2 -
437 05-28 VideoRAG: Retrieval-Augmented Generation over Video Corpus VideoRAG: Retrieval-Augmented Generation über Video Corpus VideoRAG: 利用视频公司回收的原始一代 2501.05874v3 -
438 05-28 Self-orthogonalizing attractor neural networks emerging from the free energy principle Selbst-orthogonalisierendes Attraktor-Neuralnetzwerk, das aus dem Prinzip der freien Energie entspringt 根据自由能源原则建立的自我调整的吸引人神经网络 2505.22749v1 -
439 05-28 An unsupervised method for MRI recovery: Deep image prior with structured sparsity Eine unüberwachte Methode für die MRT-Wiederherstellung: Tiefenbild vor mit strukturierter Sparsamkeit MRI 恢复的一种不受监督的方法: 结构宽度之前的深图像 2501.01482v3 -
440 05-28 StarBASE-GP: Biologically-Guided Automated Machine Learning for Genotype-to-Phenotype Association Analysis StarBASE-GP: Biologisch geführtes automatisiertes maschinelles Lernen für die Analyse von Genotyp-zu-Phenotyp-Verbindungen StarBASE-GP: 基因型至极型协会分析的生物辅助自动计算机学习 2505.22746v1 -
441 05-28 Information-Computation Gaps in Quantum Learning via Low-Degree Likelihood Informations-Computation Lücken im Quanten-Lernen über Low-Degree Likelihood 通过低贫困风险学习的量子学习中的信息估计差距 2505.22743v1 -
442 05-28 Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing Darstellung Shattering in Transformers: Synthetische Studie mit Wissensbearbeitung 在变形器中代表变形器:带有知识编辑的合成研究 2410.17194v4 -
443 05-28 AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models AutoL2S: Auto-Lang-Short-Reasoning für effiziente große Sprachmodelle 自动L2S:高效大语言模式的自动长期短期理由 2505.22662v1 -
444 05-28 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model 3DLLM-Mem: Langzeit-Raum-Temporal-Speicher für körpereigenes 3D-Großsprachmodell 3DLLM-Mem:3D大语言模型内嵌成的3D大语言长期空间-时间记忆 2505.22657v1 -
445 05-28 Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents Position: Ungewissheitsquantifizierung braucht eine Neubewertung für großsprachige Modellagenten 位置:大语言示范物剂的不确定性量化需求评估 2505.22655v1 -
446 05-28 Sherlock: Self-Correcting Reasoning in Vision-Language Models Sherlock: Selbstkorrekte Vernunft in Vision-Sprachen-Modellen 夏洛克:视觉语言模型中的自我校正理由 2505.22651v1 -
447 05-28 On Learning Verifiers for Chain-of-Thought Reasoning Über das Lernen von Prüfern für die Ketten-of-Thought-Reasoning 关于研究链理由的学习验证符 2505.22650v1 -
448 05-28 Private Rate-Constrained Optimization with Applications to Fair Learning Private Rate-Constrained Optimization mit Anwendungen für faires Lernen 利用公平学习申请实现优化 2505.22703v1 -
449 05-28 Spectral Survival Analysis Spektrale Überlebensanalyse 光谱生存分析 2505.22641v1 -
450 05-28 SimProcess: High Fidelity Simulation of Noisy ICS Physical Processes SimProcess: Hohe Fidelity-Simulation von lärmigen ICS-Physischen Prozessen 中间过程:高菲力模拟有噪音的ICS物理过程 2505.22638v1 -
451 05-28 Understanding (Un)Reliability of Steering Vectors in Language Models Verständnis (Un)Zuverlässigkeit von Steuerungsvektoren in Sprachmodellen (un) 语言模式指导矢量的可靠性 2505.22637v1 -
452 05-28 Spatial Knowledge Graph-Guided Multimodal Synthesis Raumwissen Graph-geführte multimodale Synthese 空间知识图表辅助多模式合成 2505.22633v1 -
453 05-28 GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks GraphOmni: Ein umfassender und erweiterbarer Benchmark-Rahmen für große Sprachmodelle zu graphtheoretischen Aufgaben 图图Omni:图理学任务大语言模型综合和可扩展基准框架 2504.12764v3 -
454 05-28 SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning SCIZOR: Ein selbstüberwachter Ansatz zur Datenkuration für großflächiges Imitationslernen SCIZOR: 大规模模拟学习数据计算法的自我监督办法 2505.22626v1 -
455 05-28 Principled Out-of-Distribution Generalization via Simplicity Prinzipielle Nicht-Verteilung Verallgemeinerung über Einfachheit 通过简单化普遍化 2505.22622v1 -
456 05-28 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Der Entropie-Mechanismus des Verstärkten Lernens für sinnvolle Sprachmodelle 理由语言模式强化学习的全英机制 2505.22617v1 -
457 05-28 Bridging Supervised Learning and Reinforcement Learning in Math Reasoning Bridging Supervised Learning und Verstärkung Lernen in Mathe-Reasoning 在数学原因方面的受监督学习和强化学习架桥 2505.18116v2 -
458 05-28 Fully Heteroscedastic Count Regression with Deep Double Poisson Networks Voll heterogene Grafenregression mit tiefen Doppelpoisson-Netzwerken 带有深双 Poisson 网络的全导流计数回归 2406.09262v4 -
459 05-28 Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency Abgeschirmte Diffusion: Erzeugen von neuen und vielfältigen Bildern mit Sparse Repellency 盾牌扩散:利用微缩生成新奇和多样化图像 2410.06025v3 -
460 05-28 Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay Inverse Probleme mit tiefen linearen neuralen Netzwerken lösen: Globale Konvergenzgarantien für gradienten Abstieg mit Gewichtsverfall 解决深线神经神经网络的反面问题:全球一致保障渐变后裔与体重衰减 2502.15522v2 -
461 05-28 Chest Disease Detection In X-Ray Images Using Deep Learning Classification Method Brusterkrankungen Detektion in Röntgenbildern mit Deep Learning-Klassifikationsmethode 利用深学习分类方法在X射线图像中检测胸前疾病 2505.22609v1 -
462 05-28 AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling AutoElicit: Mit großen Sprachmodellen für vorausschauende Modellierung von Expertenvoraussagen 自动:在预测模拟中使用大语言模型,供专家使用 2411.17284v5 -
463 05-28 One Rank at a Time: Cascading Error Dynamics in Sequential Learning Ein Rang zu einer Zeit: Cascading Error Dynamics in Sequential Learning 一次一排: 序列学习中连带错误动态 2505.22602v1 -
464 05-28 Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching Adjoint Sampling: Hoch skalierbare Diffusions-Probenehmer über Adjoint Matching 联合采样:通过联合配配制的高可缩放扩散采样器 2504.11713v3 -
465 05-28 Machine Unlearning under Overparameterization Maschine Unlearning unter Überparameterisierung 超参数化下脱学机 2505.22601v1 -
466 05-28 HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym HDDLGym: Ein Tool zum Studieren multi-agenter Hierarchischer Probleme, definiert in HDDL mit OpenAI Gym HDDLGym: 与 OpenAI Gym 一起研究在HDDL 中界定的多代理等级问题的工具 2505.22597v1 -
467 05-28 SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement Synworld: 用于改进制剂行动知识的虚拟情景合成 2504.03561v2 -
468 05-28 Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning Self-Error-Instruct: Verallgemeinern von Fehlern für LLMs Mathematische Begründung 自错误教学法: 数学理由LLMs 的错误一般化 2505.22591v1 -
469 05-28 VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use VTool-R1: VLMs lernen mit Bildern zu denken, indem sie mehr über multimodale Werkzeugnutzung lernen VTool-R1:VLMs通过多模式工具使用强化学习学习如何用图像思考 2505.19255v2 -
470 05-28 ReLearn: Unlearning via Learning for Large Language Models ReLearn: Entlernen über Learning for Large Language Models Reearn:通过学习大语言模式来重新学习 2502.11190v3 -
471 05-28 Benignity of loss landscape with weight decay requires both large overparametrization and initialization Die Benignität der Verlustlandschaft mit dem Verfall des Gewichts erfordert sowohl große Überparametrierung als auch Initialisierung 损失景观与体重衰减的尊严要求大规模过度平衡和初始化 2505.22578v1 -
472 05-28 FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators FNOPE: Simulationsbasierte Inferenz auf Funktionsräumen mit Fourier-Neural-Betreibern FNOPE: Fourier神经操作员对功能空间的模拟推推 2505.22573v1 -
473 05-28 PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion PRISM: Videodatensatz-Kondensation mit progressiver Veredelung und Einfügung für Sparse Motion PRISM: 视频数据集浓缩,并逐步精化和插入,用于微缩移动 2505.22564v1 -
474 05-28 Geometric Hyena Networks for Large-scale Equivariant Learning Geometrische Hyänennetze für großmaßstäbliches Äquivalent-Lernen 大规模平等学习的几何Hyena网络 2505.22560v1 -
475 05-28 Preference Adaptive and Sequential Text-to-Image Generation Präferenz Adaptive und sequentielle Text-zu-Bild-Generierung 适应性和顺序性文字到图像生成 2412.10419v2 -
476 05-28 Can Copulas Be Used for Feature Selection? A Machine Learning Study on Diabetes Risk Prediction Kann Copulas für die Feature-Auswahl verwendet werden? Eine maschinelle Studie über Diabetes Risikovorhersage Copulas 能够用来选择特质吗? 糖尿病风险预测的机器学习研究。 2505.22554v1 -
477 05-28 Data-Distill-Net: A Data Distillation Approach Tailored for Reply-based Continual Learning Data-Distill-Net: Ein Datendestillationsansatz, der auf Reply-based Continual Learning zugeschnitten ist Data-still-Net:为基于答复的不断学习量身定制的数据蒸馏方法 2505.20135v2 -
478 05-28 DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models DES-LOC: Entsynced Low Communication Adaptive Optimizers for Training Foundation Models DES-LOC:为培训基金会模型提供发光的低通信适应性适应性优化剂 2505.22549v1 -
479 05-28 A Human-Centric Approach to Explainable AI for Personalized Education Ein menschlich-zentraler Ansatz zur erklärbaren KI für die personalisierte Bildung 以人文文化方式解释个人个性化教育的可解释的AI 2505.22541v1 -
480 05-28 Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks Ungewissheitsquantifizierung mit korrekten Bewertungsregeln: Anpassung von Maßnahmen an Vorhersageaufgaben 以适当排序规则对不确定性进行量化:预测任务调整措施 2505.22538v1 -
481 05-28 TabularQGAN: A Quantum Generative Model for Tabular Data TabularQGAN: Ein Quantum Generatives Modell für Tabulardaten 表格QGAN:表格数据量子生成模型 2505.22533v1 -
482 05-28 Prediction of the Most Fire-Sensitive Point in Building Structures with Differentiable Agents for Thermal Simulators Vorhersage des feuerempfindlichsten Punkts in Gebäudestrukturen mit differenzierbaren Agenten für thermische Simulatoren 预测热模拟器使用不同物剂建造结构时最能防火的火敏度点 2502.03424v4 -
483 05-28 Training RL Agents for Multi-Objective Network Defense Tasks Schulung von RL-Agenten für multi-objektive Netzwerkverteidigungsaufgaben 多目标网络防御任务培训RL代理 2505.22531v1 -
484 05-28 Symplectic Generative Networks (SGNs): A Hamiltonian Framework for Invertible Deep Generative Modeling Symplektische Generative Netzwerke (SGNs): Ein Hamiltonsches Framework für invertible Deep Generative Modeling 症状产生网络:一个汉密尔顿框架,用于可垂直产生深层产生模型的建立 2505.22527v1 -
485 05-28 Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo Test-Time Alignment von diskreten Diffusionsmodellen mit Sequential Monte Carlo 使用顺序式蒙特卡洛的分解传播模型的测试时间对齐 2505.22524v1 -
486 05-28 Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data Bewertung von überwachten Lernmodellen für Betrugserkennung: Eine vergleichende Studie klassischer und tiefer Architekturen zu unausgewogenen Transaktionsdaten 评价受监督的欺诈侦查学习模式:关于不平衡交易数据的经典和深层结构比较研究 2505.22521v1 -
487 05-28 IGNIS: A Neural Network Framework for Robust Parameter Estimation in Archimedean Copulas IGNIS: Ein neurales Netzwerk-Framework für robuste Parameterschätzungen in Archimedischen Copulas INGNIS: Archimedean Copuulas 强参数估计神经网络框架 2505.22518v1 -
488 05-28 Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers? Kolmogorov-Arnold Achtung: Ist erlernbare Aufmerksamkeit besser für Vision Transformer? 科尔莫戈罗夫-阿诺尔德关注:对愿景转变者来说,学习关注是否更好? 2503.10632v2 -
489 05-28 Accelerating Optimization via Differentiable Stopping Time Beschleunigung der Optimierung durch differenzierbare Stoppzeit 通过有区别的停止时间加速优化 2505.22509v1 -
490 05-28 Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models Closed-Form Training Dynamics Reveal Erlernte Funktionen und lineare Struktur in Word2Vec-ähnlichen Modellen 类似Word2Vec 模型中的封闭形式培训动态观测发现特性和线形结构 2502.09863v2 -
491 05-28 Sparsification and Reconstruction from the Perspective of Representation Geometry Sparsifikation und Rekonstruktion aus Sicht der Repräsentationsgeometrie 从代表制角度看分解与重建 2505.22506v1 -
492 05-28 Geometric GNNs for Charged Particle Tracking at GlueX Geometrische GNNs für geladene Partikelverfolgung bei GlueX GNNs 用于凝胶X充电粒子跟踪的几何 GNNs 2505.22504v1 -
493 05-28 Assessing Quantum Advantage for Gaussian Process Regression Bewertung des Quantenvorteils für Gaussian Process Regression 评估高山进程倒退的量度优势 2505.22502v1 -
494 05-28 Novelty Detection in Reinforcement Learning with World Models Neuheitserkennung im Verstärkungslernen mit Weltmodellen 利用世界模式加强学习新颖发现 2310.08731v4 -
495 05-28 ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods ProSpero: Aktives Lernen für robustes Proteindesign jenseits von Wild-Typ-Nachbarschaften ProSpero:在野生部落邻里以外积极学习巨型蛋白设计 2505.22494v1 -
496 05-28 Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation Entmystifizierung des Paradoxon der wichtigen Probenahme mit einer geschätzten historisch-nachfolgenden Verhaltenspolitik in der Off-Policy-Bewertung 以非政策评价中的估计历史依赖者行为政策来解开重要性抽样反常现象的神秘化 2505.22492v1 -
497 05-28 On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling Über die überraschende Wirksamkeit großer Lernraten unter Standardbreitenskalierung 根据标准宽宽度比例扩大的大型学习率的惊人效果 2505.22491v1 -
498 05-28 Understanding Adversarial Training with Energy-based Models Verständnis von Adversarial Training mit energiebasierten Modellen 与基于能源模式的对等培训的谅解 2505.22486v1 -
499 05-28 Intrinsic User-Centric Interpretability through Global Mixture of Experts Intrinsische Benutzer-Centric-Interpretability durch globale Mischung von Experten 通过全球专家混合解释 2402.02933v4 -
500 05-28 A Closer Look at Multimodal Representation Collapse Ein genauerer Blick auf multimodale Darstellungskollaps 更仔细地审视多模式代表制的崩溃 2505.22483v1 -
501 05-28 Hypothesis Testing in Imaging Inverse Problems Hypothesenprüfung in bildgebenden Inversen Problemen 想象反反问题假设测试 2505.22481v1 -
502 05-28 Position: Don’t Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints Position: Verwenden Sie den CLT nicht in LLM-Evalen mit weniger als ein paar hundert Datenpunkten 位置: 不要在LLM Evals中使用 CLT, 其数据点小于几百个数据点 2503.01747v3 -
503 05-28 Non-Asymptotic Analysis of (Sticky) Track-and-Stop Nicht-asymptotische Analyse von (Sticky) Track-and-Stop 对(Stiskky)轨道和停止的非症状分析 2505.22475v1 -
504 05-28 Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks Überbrückung von Sprache, Vision und Aktion: Multimodale VAE in Robotermanipulationsaufgaben 架桥语言、愿景和行动:机器人操纵任务中的多式机动性 2404.01932v2 -
505 05-28 Forecasting Multivariate Urban Data via Decomposition and Spatio-Temporal Graph Analysis Voraussichtliche Multivariate Stadtdaten durch Zersetzung und räumlich-Temporale Graphenanalyse 通过分解和时空空间图分析预测多变量城市数据 2505.22474v1 -
506 05-28 Pure Exploration with Infinite Answers Reine Exploration mit unendlichen Antworten 纯探索无无限答案 2505.22473v1 -
507 05-28 CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs CPINN-ABPI: Physik-informierte Neuralnetze für genaue Leistungsschätzung in MPCs CPINN-ABPI: MPSoCs中精确功率估计物理内建神经网络 2505.22469v1 -
508 05-28 FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation FitCF: Ein Framework für die automatische Feature-Importanz-geführte kontrafaktische Beispielgenerierung FitCF: 自动地物、重要引导反事实实例生成框架 2501.00777v3 -
509 05-28 Embedding Safety into RL: A New Take on Trust Region Methods Einbettung der Sicherheit in RL: Ein neuer Ansatz für Methoden der Vertrauensregion 将安全嵌入RL:信任区域方法的新做法 2411.02957v3 -
510 05-28 OptiMindTune: A Multi-Agent Framework for Intelligent Hyperparameter Optimization OptiMindTune: Multi-Agenten-Framework für intelligente Hyperparameter-Optimierung OptiMindTunne: 智能超参数优化的多机构框架 2505.19205v2 -
511 05-28 Depth-Based Matrix Classification for the HHL Quantum Algorithm Tiefenbasierte Matrix-Klassifikation für den HHL-Quantenalgorithmus HHL 量图算法的深度矩阵分类 2505.22454v1 -
512 05-28 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Unüberwachte Nachschulung für Multi-Modal LLM Reasoning via GRPO 无人监督的多模式LLM通过GROPO进行多模式LLM进修培训后培训 2505.22453v1 -
513 05-28 Position: All Current Generative Fidelity and Diversity Metrics are Flawed Position: Alle aktuellen Generativen Fidelity und Diversity Metrics sind abgeflacht 位置:所有当前产生分裂性和多样性 2505.22450v1 -
514 05-28 SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning SOReL und TOReL: Zwei Methoden für vollständiges Offline-Verstärkungslernen SOLEL和TOREL: 完全脱线强化学习的两种方法 2505.22442v1 -
515 05-28 Variational Positive-incentive Noise: How Noise Benefits Models Variational Positiv-incentive Noise: Wie Lärm Vorteile Modelle 变化式积极积极激励噪音:如何创造噪音效益模式 2306.07651v2 -
516 05-28 LAMBDA: A Large Model Based Data Agent LAMBDA: Ein großer modellbasierter Datenagent LAMBDA:一个大型模型数据代理 2407.17535v3 -
517 05-28 Data-Driven Antenna Miniaturization: A Knowledge-Based System Integrating Quantum PSO and Predictive Machine Learning Models Datengetriebene Antenne Miniaturisierung: Ein wissensbasiertes System zur Integration von Quanten-PSO und vorausschauenden Machine Learning-Modellen 数据驱动天线微型化:以知识为基础的系统综合量子PSO和可预测性机器学习模型 2505.22440v1 -
518 05-28 Synonymous Variational Inference for Perceptual Image Compression Synonyme Variationsableitung für Wahrnehmungsbildkompression 感知图像压缩的同义同义变异推理 2505.22438v1 -
519 05-28 Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models Ausgelagerte Diffusionsprobenahme: Effiziente hintere Inferenz in latenten Räumen generativer Modelle 外部外包扩散采样:在基因变异模型潜在空间中有效的后继推论 2502.06999v2 -
520 05-28 C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models C-LoRA: Kontextuelle Low-Rank-Anpassung für Unsicherheitsabschätzungen in großen Sprachmodellen C-LORA:用于大语言模型中不确定性估算的不确定性估算的上下文性低风险适应 2505.17773v2 -
521 05-28 AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy AstroVisBench: Ein Code-Bench für wissenschaftliche Computing und Visualisierung in der Astronomie AstroVisbench:天文科学计算和可视化标准 2505.20538v2 -
522 05-28 Scaling Reasoning without Attention Skalierung ohne Aufmerksamkeit 无人注意的调整理由 2505.22425v1 -
523 05-28 STaR-Bets: Sequential Target-Recalculating Bets for Tighter Confidence Intervals StaR-Bets: Sequentielle Target-Rekalkulationswetten für engere Vertrauensintervalle STaR-Bets: 更密切信任间隔的序列目标-计算重新计算保证 2505.22422v1 -
524 05-28 Beyond Verifiable Rewards: Scaling Reinforcement Learning for Language Models to Unverifiable Data Jenseits von überprüfbaren Belohnungen: Skalierung von Verstärkung Lernen für Sprachmodelle zu unüberprüfbaren Daten 超越可核实的奖励:加强语文模式的强化学习,以获得不可核实的数据 2503.19618v2 -
525 05-28 Mitigating Overthinking in Large Reasoning Models via Manifold Steering Überdenken in großen Vernunftmodellen durch Manifold Steering verhindern 通过 MManicform 指导减轻大型理性模型中的过度思考 2505.22411v1 -
526 05-28 Decoupled Subgraph Federated Learning Entkoppelter Subgraph Federated Learning 分校分科分科分科分科 2402.19163v3 -
527 05-28 Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring Jenseits von externen Monitoren: Verbesserung der Transparenz von großen Sprachmodellen für eine einfachere Überwachung 外部监测之外的外部监测:提高大语言模型的透明度,促进更易监测 2502.05242v2 -
528 05-28 BILBO: BILevel Bayesian Optimization BILBO: BILevel Bayesian Optimierung BILBO: BI级巴耶斯最佳优化 2502.02121v2 -
529 05-28 Simultaneously Solving FBSDEs and their Associated Semilinear Elliptic PDEs with Small Neural Operators Gleichzeitige Lösung von FBSDs und ihren zugehörigen semilinearen elliptischen PDEs mit kleinen neuralen Operatoren 与小型神经操作器同时解决FBSDEs及其相关半线性椭圆形粒体 2410.14788v2 -
530 05-28 Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing Inferenz-Time Scaling für Flow-Modelle über stochastische Generation und Rollover Budget Forcing 通过存储器生成和滚转预算推力对流动模型的推推时间调整 2503.19385v4 -
531 05-28 Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation Physik-informierte Destillation von Diffusionsmodellen für PDE-kontrainierte Generation PDE - 受培训的一代的传播模型的物理改造 2505.22391v1 -
532 05-28 Revisiting Feature Interactions from the Perspective of Quadratic Neural Networks for Click-through Rate Prediction Überprüfung von Feature-Interaktionen aus der Perspektive quadratischer neuraler Netzwerke für Click-through-Rate-Vorhersage 从 “ 点击通速率预测 “ 四方神经网络的角度重新审视地貌相互作用 2505.17999v2 -
533 05-28 DAM: Domain-Aware Module for Multi-Domain Dataset Condensation DAM: Domain-Aware-Modul für Multi-Domain-Datensatz-Kondensation DAM: 多域数据集集中的域- 软件模块 2505.22387v1 -
534 05-28 When do neural networks learn world models? Wann lernen neuronale Netzwerke Weltmodelle? 神经网络何时学习世界模型? 2502.09297v3 -
535 05-28 Infinite-dimensional Mahalanobis Distance with Applications to Kernelized Novelty Detection Infinite-dimensionale Mahalanobis-Distanz mit Anwendungen zur kernisierten Neuheitserkennung 无限的马哈拉诺比斯距离,应用内核新闻探测技术 2407.11873v2 -
536 05-28 Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning Überwindung von Dimensional Factorization Limits in diskreten Diffusionsmodellen durch Quantum Joint Distribution Learning 通过量子联合分发学习克服分辨传播模式中的分量限制 2505.05151v2 -
537 05-28 A Divide-and-Conquer Approach for Modeling Arrival Times in Business Process Simulation Ein Divide-and-Conquer-Ansatz für die Modellierung von Ankunftszeiten in der Business Process Simulation 在模拟商业进程中模拟抵达时 2505.22381v1 -
538 05-28 Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association Lipschitz-Driven 不确定性为空间协会量化 2502.06067v2 -
539 05-28 Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization Memento No More: Coaching von KI-Agenten zu Master mehrere Aufgaben durch Hinweise Internalisierung 不再纪念:通过Hints内部化,指导AI代理人员掌握多项任务 2502.01562v2 -
540 05-28 Update Your Transformer to the Latest Release: Re-Basin of Task Vectors Aktualisieren Sie Ihren Transformer auf die neueste Version: Re-Basin der Task-Vektoren 将您的变换器更新为最新版本: 任务矢量的重新 Basin 2505.22697v1 -
541 05-28 An Empirical Evaluation of Rewiring Approaches in Graph Neural Networks Eine empirische Bewertung der Verdrahtungsansätze in Graphen-Neuralen Netzwerken 对图形神经网络重新布线方法的经验评价 2305.19717v2 -
542 05-28 Topological Eigenvalue Theorems for Tensor Analysis in Multi-Modal Data Fusion Topologische Eigenwert-Theoreme für die Tensoranalyse in multi-Modal Data Fusion 多模式数据融合中用于天线分析的多模式数据融合中的表光分析的表性地球价值地形学理论论 2409.09392v3 -
543 05-28 Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows Computing Optimal Transport Maps und Wasserstein Barycenter mit bedingten Normalisierungsflüssen 使用条件性正常流动的最佳运输地图和瓦塞尔斯坦百分点 2505.22364v1 -
544 05-28 Directed Homophily-Aware Graph Neural Network Regie führte homophily-aware Graph Neural Network 直导光电图神经网络 2505.22362v1 -
545 05-28 Continuum-armed Bandit Optimization with Batch Pairwise Comparison Oracles Kontinuierliche Bandit-Optimierung mit Batch Pairwise Vergleich Oracles 以批次对称比较甲骨文优化利用批次对称比较 2505.22361v1 -
546 05-28 Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification 多级损失 多级损失 多级 损 失 多级 分 分 分 分 化 中 梯 源 普遍化的多级 几何事项 2505.22359v1 -
547 05-28 Budget-Adaptive Adapter Tuning in Orthogonal Subspaces for Continual Learning in LLMs Budget-Adaptive Adapter Tuning in Orthogonal Subspaces für kontinuierliches Lernen in LLMs 用于LLMM中持续学习的正方形子空间的预算-ADA 预算-ADA 调适器图案 2505.22358v1 -
548 05-28 Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings Eignungsfilter: Ein statistisches Rahmenwerk für die Klassifikator-Evaluierung in Real-World-Einsatzeinstellungen 适用性过滤器:在现实世界部署设置中进行分类评价的统计框架 2505.22356v1 -
549 05-28 Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning Schauen Sie nach innen oder schauen Sie darüber hinaus? Ein theoretischer Vergleich zwischen Parameter-Effizient und Full Fine-Tuning 内观还是外观? 参数有效与完全精准之间的理论比较。 2505.22355v1 -
550 05-28 Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing Kontext-sensible neocortical Neuronen verwandeln die Wirksamkeit und Effizienz der neuronalen Informationsverarbeitung 环境敏感的新园艺神经元改变神经信息处理的效益和效率 2207.07338v7 -
551 05-28 AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings AKRMap: Adaptive Kernel-Regression für vertrauenswürdige Visualisierung von Cross-Modal-Embeddings AKRMap:跨模式嵌入的可信赖可视化的适应性内核倒退 2505.14664v2 -
552 05-28 Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training Progressive Data Dropout: Ein verblüffend einfacher Ansatz zum schnelleren Training 渐进数据辍学:快速培训的一个令人尴尬的简单方法 2505.22342v1 -
553 05-28 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Multimodale Reasoning durch verstärktes Lernen mit kaltem Start fördern 通过 “ 冷起 “ 的强化学习推进多模式理由 2505.22334v1 -
554 05-28 Credal Prediction based on Relative Likelihood Credal Prediction basierend auf relativer Likelihood 基于相对可能性的裂变预测 2505.22332v1 -
555 05-28 Learning in Stackelberg Games with Non-myopic Agents Lernen in Stackelberg Spiele mit nicht-myopischen Agenten 学习与非中色剂在斯塔克尔贝格运动会中的学习 2208.09407v3 -
556 05-28 When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks? Wann führt Neuroevolution das Verstärkte Lernen in Transfer-Lernaufgaben durch? 在转让学习任务方面,神经革命何时会超越竞争加强学习? 2505.22696v1 -
557 05-28 LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning LLM-ODDR: Ein großes Sprachmodell für Joint Order Dispatching und Driver Repositioning LLM-ODDD:联合调度和司机重新定位大语言示范框架 2505.22695v1 -
558 05-28 Individualised Counterfactual Examples Using Conformal Prediction Intervals Individualisierte gegenfaktische Beispiele mit konformen Vorhersageintervallen 使用非正式预测间隔的个别反事实实例 2505.22326v1 -
559 05-28 A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective Ein genauerer Blick auf die Erinnerung an Tabular Diffusion Modell: Eine datenzentrische Perspektive 更仔细地看一看表格传播模型中的记忆化:数据核心视角 2505.22322v1 -
560 05-28 Core Context Aware Transformers for Long Context Language Modeling Core Context Aware Transformers für lange Kontext-Sprachenmodellierung 长语语言建模核心认知变型器 2412.12465v2 -
561 05-28 Copresheaf Topological Neural Networks: A Generalized Deep Learning Framework Copresheaf Topologische neurale Netzwerke: Ein generalisiertes Deep Learning Framework Copresheaf 地形神经网络:普遍深层学习框架 2505.21251v2 -
562 05-28 If Pigs Could Fly… Can LLMs Logically Reason Through Counterfactuals? Wenn Schweine fliegen könnten… können LLMs logischerweise durch Gegenfakten denken? 如果猪能飞… 2505.22318v1 -
563 05-28 Rethinking BPS: A Utility-Based Evaluation Framework Rethinking BPS: Ein Nutzen-basierter Bewertungsrahmen 重新思考BPS:基于公用事业的评价框架 2505.22316v1 -
564 05-28 MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections MUDDFormer: Breaking Residual Engpässe in Transformatoren über Multiway Dynamic Dense Connections MUDDFormer:通过多路动态感应连接在变形器中打破残余瓶颈 2502.12170v2 -
565 05-28 From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization Von Dormant zu Gelöscht: Tamper-Resistent Unlearning durch Gewicht-Raum-Regularisierung 从杜尔曼特移到删除:通过宽空正规化,让塔帕-较远摆脱学习 2505.22310v1 -
566 05-28 FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration FireQ: Schnelle INT4-FP8-Kernel- und RoPE-gestützte Quantisierung für LLM-Inferenzbeschleunigung 消防:快速INT4-FFP8 内核和ROPE-感知的LLM 推推加速量 2505.20839v2 -
567 05-28 Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning Transformer vorgebildet auf verfahrenstechnische Daten enthalten modulare Strukturen für algorithmische Vernunft 在包含用于算法理由的模块结构的程序性数据方面受过预先培训的变异器 2505.22308v1 -
568 05-28 Risk-Informed Diffusion Transformer for Long-Tail Trajectory Prediction in the Crash Scenario Risiko-informierter Diffusionstransformator für langspurige Trajektorien-Vorhersage im Crash-Szenario 崩溃设想情景中长帆轨迹预测风险化传导变异器 2501.16349v2 -
569 05-28 Robustness and Cybersecurity in the EU Artificial Intelligence Act Robustheit und Cybersicherheit im EU-Gesetz über künstliche Intelligenz 《欧盟人工情报法》中的强力和网络安全 2502.16184v2 -
570 05-28 Versatile Cardiovascular Signal Generation with a Unified Diffusion Transformer Vielseitige kardiovaskuläre Signalgenerierung mit einem Unified Diffusion Transformer 具有统一扩散变异器的心血管心血管信号生成 2505.22306v1 -
571 05-28 LLäMmlein: Compact and Competitive German-Only Language Models from Scratch LLäMmlein: Kompakte und wettbewerbsfähige deutschsprachige Sprachmodelle von Scratch LläMmlein:来自斯克拉奇的契约和竞争性独德语言模式 2411.11171v4 -
572 05-28 Diss-l-ECT: Dissecting Graph Data with Local Euler Characteristic Transforms Diss-l-ECT: Entschlüsselung von Graphendaten mit lokalen Euler-Charakteristik-Transformationen Diss- l- ECT: 用本地电磁特征变换解析图表数据 2410.02622v2 -
573 05-28 360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training 360-LlaMA-Fabrik: Plug & Play-Sequenz-Parallelität für langes Nachtraining 360-LLamaMA-Factory: 长期培训之后的插件和播放序列平行主义 2505.22296v1 -
574 05-28 Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Light-R1: Curriculum SFT, DPO und RL für Long COT aus Scratch und darüber hinaus Light-R1:SFT、DPO和RL课程,用于Scratch及以后的长期COT 2503.10460v4 -
575 05-28 MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning MoRE: Eine Mischung aus Low-Rank Experten für adaptives Multi-Task Learning MoRE: 适应性多任务学习低级专家混合组合 2505.22694v1 -
576 05-28 Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling Das Unlösbare neu denken: Wenn In-Context Search Test-Time Scaling trifft 重新思考无法解答的问题: 当 In-Ctext 搜索遇到测试时间缩放时 2505.22290v1 -
577 05-28 A Variational Perspective on Generative Protein Fitness Optimization Eine abwechslungsreiche Perspektive auf generative Protein-Fitness-Optimierung 关于最优化的生质蛋白质健身的变异视角 2501.19200v2 -
578 05-28 Random Feature Representation Boosting Zufällige Merkmalsdarstellung steigert sich 随机特性显示促进 2501.18283v3 -
579 05-28 Sample Efficient Robot Learning in Supervised Effect Prediction Tasks Beispiel Effizientes Roboter-Lernen in überwachten Effekt-Vorhersage-Aufgaben 在监督效应预测任务中提高机器人学习效率 2412.02331v2 -
580 05-28 From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning Von den Kerneln zu den Features: Eine Multi-Scale Adaptive Theorie des Feature Learning 从核心到地貌特征:多尺度适应性地貌学习理论 2502.03210v2 -
581 05-28 Zero-Shot Mono-to-Binaural Speech Synthesis Null-Schuss-Mono-bis-Binaural-Sprachsynthese 零热单声词合成 2412.08356v2 -
582 05-28 Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration 利用语言代理框架中的双重进程理论促进实时同时人类-AI合作 2502.11882v5 -
583 05-28 TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup TransMLA: Migration von GQA-Modellen zu MLA mit voller DeepSeek-Kompatibilität und Speedup TransMLA:将GQA模型迁移到具有全深搜索兼容性和加速性的司法协助模式 2502.07864v4 -
584 05-28 Full Domain Analysis in Fluid Dynamics Vollständige Domänenanalyse in Fluiddynamik 流体动态全域分析 2505.22275v1 -
585 05-28 EventFlow: Forecasting Temporal Point Processes with Flow Matching EventFlow: Vorhersage von zeitlichen Punktprozessen mit Flow Matching 事件:预测与流动匹配的时点进程 2410.07430v2 -
586 05-28 Reward Generalization in RLHF: A Topological Perspective Lohnverallgemeinerung in RLHF: Eine topologische Perspektive RLHF的奖励普遍化:地形学观点 2402.10184v7 -
587 05-28 A Novel Characterization of the Population Area Under the Risk Coverage Curve (AURC) and Rates of Finite Sample Estimators Eine neuartige Charakterisierung des Populationsgebiets unter der Risikodeckungskurve (AURC) und Raten von Finite Sample-Schätzern 风险覆盖曲线下人口区的新特点和有限抽样估计率 2410.15361v3 -
588 05-28 Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations Verbesserung der regelbasierten Reasoning in LLMs mit neurosymbolischen Darstellungen 改进使用新阳性表示法的LLM中基于规则的理据 2502.01657v3 -
589 05-28 Training on Plausible Counterfactuals Removes Spurious Correlations Training auf Plausible Counterfactals entfernt spurlose Korrelationen 关于可视反事实消除污损的培训 2505.16583v3 -
590 05-28 LiDAR Based Semantic Perception for Forklifts in Outdoor Environments LiDAR basierte semantische Wahrnehmung für Gabelstapler im Freien 室外环境中叉车使用基于 LiDAR 的语义感 2505.22258v1 -
591 05-28 Something’s Fishy In The Data Lake: A Critical Re-evaluation of Table Union Search Benchmarks Irgendetwas ist Fishy In The Data Lake: Eine kritische Neubewertung der Tabelle Union Suche Benchmarks “数据湖中的鱼:对表格联合搜索基准的重要重新评估” 2505.21329v2 -
592 05-28 Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training Revisiting Group Relative Policy Optimization: Einblicke in die On-Policy- und Off-Policy-Schulung 重新审视小组相对政策优化:对政策和非政策培训的深入了解 2505.22257v1 -
593 05-28 Train Sparse Autoencoders Efficiently by Utilizing Features Correlation Bahnsparse Autoencoder effizient durch die Nutzung von Funktionen Korrelation 通过使用地物关联, 高效地列列“ 分散的自动编译器” 。 2505.22255v1 -
594 05-28 A Unified Online-Offline Framework for Co-Branding Campaign Recommendations Ein einheitliches Online-Offline-Rahmenwerk für Co-Branding-Kampagnenempfehlungen 联合捆绑运动建议统一在线离线框架 2505.22254v1 -
595 05-28 B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical Data B-XAIC Datensatz: Benchmarking Erklärbare KI für Graph Neuronale Netzwerke unter Verwendung chemischer Daten B-XAIC数据集:使用化学数据的图形神经网络基准可解释的AI 2505.22252v1 -
596 05-28 Evaluating Compact LLMs for Zero-Shot Iberian Language Tasks on End-User Devices Bewertung kompakter LLMs für blitzfreie iberische Sprachaufgaben auf Endbenutzer-Geräten 评价关于最终用户装置的零 - 低 - 低 - 高 - 伊比利亚语语言任务 2504.03312v2 -
597 05-28 UDuo: Universal Dual Optimization Framework for Online Matching UDuo: Universal Dual Optimization Framework für Online-Matching UDuo: 通用双优化在线匹配框架 2505.22243v1 -
598 05-28 Reinforcement Learning with Verifiable Rewards: GRPO’s Effective Loss, Dynamics, and Success Amplification Verstärktes Lernen mit überprüfbaren Belohnungen: Effektiver Verlust, Dynamik und Erfolgsverstärkung von GRPO 利用可核实的奖励加强学习:GROP的有效损失、动态和成功扩展 2503.06639v3 -
599 05-28 Rethinking GNN Expressive Power from a Distributed Computational Model Perspective Überdenken von GNN Expressive Power aus einer distributed Computational Model Perspective 从分配的计算模型模型角度重新思考GNNN 的表达力 2410.01308v3 -
600 05-28 NRFormer: Nationwide Nuclear Radiation Forecasting with Spatio-Temporal Transformer NRFormer: landesweite Vorhersage der nuklearen Strahlung mit Spatio-Temporal Transformer NR 前:利用时空变压器进行全国核辐射预报 2410.11924v3 -
601 05-28 On Provable Length and Compositional Generalization Auf evable Länge und kompositorische Verallgemeinerung 关于可预见长度和组 成 式 通 泛 化 2402.04875v6 -
602 05-28 Yambda-5B – A Large-Scale Multi-modal Dataset for Ranking And Retrieval Yambda-5B – Ein multimodaler Datensatz für das Ranking und das Retrieval Yambda-5B – – 用于排名和检索的大型多模式数据集 2505.22238v1 -
603 05-28 Decision-Focused Forecasting: A Differentiable Multistage Optimisation Architecture Entscheidungsorientierte Prognose: Eine differenzierbare mehrstufige Optimierungsarchitektur 决定重点预测:可区别的多阶段优化结构 2405.14719v2 -
604 05-28 Optimal kernel regression bounds under energy-bounded noise Optimale Kernel-Regressionsgrenzen unter energiegebundenem Rauschen 在受能源限制的噪音下的最佳内核回归界限 2505.22235v1 -
605 05-28 Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models Qualität Across-Sprachen beurteilen: Ein mehrsprachiger Ansatz zur Vorschulung von Datenfiltern mit Sprachmodellen 判断各语文的质量:采用多种语文办法,利用语言模式进行培训前数据过滤 2505.22232v1 -
606 05-28 You Do Not Fully Utilize Transformer’s Representation Capacity Sie nicht voll nutzen Transformer-Repräsentanz Kapazität 您没有充分利用变换器的代表能力 2502.09245v2 -
607 05-28 Solver-Free Decision-Focused Learning for Linear Optimization Problems Solver-Free decision-focused Learning für lineare Optimierungsprobleme 处理线性优化问题的无解决者决定-集中学习 2505.22224v1 -
608 05-28 Taming Recommendation Bias with Causal Intervention on Evolving Personal Popularity Zähmungsempfehlung Bias mit ursächlicher Intervention zur Entwicklung persönlicher Beliebtheit ” 与个人大众演变的因果关系干预 “ 的 “ 比亚斯 “ 和 “ 个人大众演变 “ 的 “ 比亚斯 “ 建议 2505.14310v2 -
609 05-28 Quantum framework for Reinforcement Learning: Integrating Markov decision process, quantum arithmetic, and trajectory search Quanten-Framework for Reinforcement Learning: Markov-Entscheidungsprozess, Quantenarithmetik und Flugbahnsuche integrieren 强化学习的量子框架:纳入Markov决策程序、量数算术和轨迹搜索 2412.18208v3 -
610 05-28 Advancing Sequential Numerical Prediction in Autoregressive Models Advancing Sequential Numerical Prediction in Autoregressive Modelle 自动递减模型中推进序列序号预测 2505.13077v2 -
611 05-28 On the Within-class Variation Issue in Alzheimer’s Disease Detection Zur klasseninternen Variationsfrage bei der Alzheimer-Erkennung 阿尔茨海默氏氏病检测的 类内变化变化问题 2409.16322v2 -
612 05-28 Interpreting CLIP with Hierarchical Sparse Autoencoders CLIP mit Hierarchical Sparse Autoencodern interpretieren 使用等级式的粗度自动解析器解释 CLIP 2502.20578v2 -
613 05-28 LaMM: Semi-Supervised Pre-Training of Large-Scale Materials Models LaMM: Halbüberwachte Vorausbildung von großformatigen Werkstoffmodellen LAMM: 大型材料模型的半监督前培训 2505.22208v1 -
614 05-28 Pitfalls of Rule- and Model-based Verifiers – A Case Study on Mathematical Reasoning Pitfalls of Rule- and Model-based Verifiers – Eine Fallstudie zur mathematischen Begründung 规则和基于示范的验证符咒 – – 关于数学理由的个案研究 2505.22203v1 -
615 05-28 Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer Verbesserung der Unsicherheitsabschätzung und -interpretierbarkeit über Bayesian Non-negative Decision Layer 通过Bayesian非负决定层加强不确定性的估算和解释 2505.22199v1 -
616 05-28 An Augmentation-Aware Theory for Self-Supervised Contrastive Learning Eine Augmentations-Bewusst-Theorie für selbstüberwachtes kontrastives Lernen 自我监督违规学习的增强- 软件软件理论 2505.22196v1 -
617 05-28 Physics-inspired Generative AI models via real hardware-based noisy quantum diffusion Physik-inspirierte Generative KI-Modelle über reale Hardware-basierte laute Quantendiffusion 通过实实在在的硬件噪音量子扩散 产生人工智能模型 2505.22193v1 -
618 05-28 Beyond RMSE and MAE: Introducing EAUC to unmask hidden bias and unfairness in dyadic regression models Jenseits von RMSE und MAE: Einführung des EUC zur Enttarnung versteckter Bias und Ungerechtigkeit in dyadischen Regressionsmodellen RUSE 和MAE 之后的RUSE 和MAE:将EAUC引入dyadic回归模型中隐蔽的偏见和不公平现象 2401.10690v5 -
619 05-28 LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently LoRA-One: Ein-Schritt-Full Gradient könnte genug für feines Tuning von großen Sprachmodellen sein, wahrscheinlich und effizient LORA-OI: 精巧、高效、可预见和高效的微调大语言模型的单步全步可满足需要 2502.01235v2 -
620 05-28 LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits LC-Tsallis-INF: Generalisierte Best-of-Both-Worlds Lineare Kontextbanditen LC-Tsallis-INF: 普遍化的两世界最佳线性线性直线性范围内的强盗 2403.03219v3 -
621 05-28 Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes Kontinuierliche und diskrete Diffusion mit nicht gleichzeitigen Diffusionsprozessen 与非平行扩散进程一起进行连续和分解的不连续和分解文本传播 2505.22165v1 -
622 05-28 AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping AgriFM: Multi-Source-Modell für die zeitliche Fernerkundung AgriFM:多种来源的时空遥感基金会作物绘图模型 2505.21357v2 -
623 05-28 The informativeness of the gradient revisited Die Aufschlusskraft des Gradienten wurde überarbeitet 重新讨论的梯度信息性 2505.22158v1 -
624 05-28 Towards Practical Defect-Focused Automated Code Review Auf dem Weg zu einer praktischen fehlerorientierten automatisierten Code-Überprüfung 走向实际失效-受污染的自动编码审查 2505.17928v2 -
625 05-28 Uncertainty Estimation for Heterophilic Graphs Through the Lens of Information Theory Ungewissheitsschätzung für heterophile Graphen durch die Linse der Informationstheorie 信息镜头信息理论流流中异血哲学图谱的不确定性估计 2505.22152v1 -
626 05-28 Oryx: a Performant and Scalable Algorithm for Many-Agent Coordination in Offline MARL Oryx: ein performanter und skalierbarer Algorithmus für viele-Agenten-Koordination in Offline MARL Oryx: MARL 离线下许多机构协调的性能和可缩放的数值 2505.22151v1 -
627 05-28 Gradient Boosting Reinforcement Learning Gradientenfördernde Stärkung des Lernens 逐步推进强化学习 2407.08250v2 -
628 05-28 Bridging Arbitrary and Tree Metrics via Differentiable Gromov Hyperbolicity Überbrückung von Willkür- und Baummetrics durch differenzierbare Gromov-Hyperbolizität 通过差别化格罗莫夫双向主义 2505.21073v2 -
629 05-28 Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments Begrenzte Verallgemeinerbarkeit im Argumentbergbau: State-of-The-Art-Modelle lernen Datensätze, keine Argumente 《争议采矿业的限制性通用性:国家与艺术中的模式学习数据集,非论据》 2505.22137v1 -
630 05-28 RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding RAD: Redundanz-Bewusst-Destillation für Hybridmodelle über selbstspekulative Decodierung RAD: 通过自投机代号为混合模型进行再利用-软件蒸馏 2505.22135v1 -
631 05-28 JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning JEDI: Latent End-to-End-Diffusion mildert die Asymmetrie von Agent-Human Performance im modellbasierten Verstärkungslernen JEDI: 以模型为基础的加强学习中前端至终端扩散 消化剂-人类性能对称性 2505.19698v2 -
632 05-28 Optimize Cardinality Estimation Model Pretraining by Simplifying the Training Datasets Kardinalitätsabschätzungsmodell optimieren Vorschulung durch Vereinfachung der Trainingsdatensätze 通过简化培训数据集,优化红红心估计模型预培训模式 2502.14350v2 -
633 05-28 Revisiting Weak-to-Strong Generalization in Theory and Practice: Reverse KL vs. Forward KL Neuvisualisierung von Schwach-zu-Strong-Verallgemeinerung in Theorie und Praxis: Reverse KL vs. Forward KL 重新审视理论和实践中弱到强的简单化:反向 KL vs. fward KL 2502.11107v3 -
634 05-28 BiMi Sheets: Infosheets for bias mitigation methods BiMi Sheets: Infosheets für Methoden zur Biasminderung BiMi 工作表:用于减少偏差方法的信息表 2505.22114v1 -
635 05-28 Understanding Model Ensemble in Transferable Adversarial Attack Model-Ensemble in übertragbarem Widersacher-Angriff verstehen 理解可转让反向攻击中可相互转让攻击的示范组合 2410.06851v3 -
636 05-28 The quest for the GRAph Level autoEncoder (GRALE) Die Suche nach dem GRAph Level AutoEncoder (GRALE) 寻求GRALE(GRALE)的GRAP 高级自动编码器(GRALE) 2505.22109v1 -
637 05-28 Inclusive, Differentially Private Federated Learning for Clinical Data Inklusives, differenziert privates Federated Learning für klinische Daten 包容性、差异化私联校临床数据学习 2505.22108v1 -
638 05-28 Curse of High Dimensionality Issue in Transformer for Long-context Modeling Fluch der Hochdimensionalitätsfrage im Transformer für die Langkontextmodellierung 变异器中高多维度问题的诅咒,用于长期建模 2505.22107v1 -
639 05-28 Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models Devil ist in den Details: Dichte-Anleitung für Detail-Aware-Generation mit Flow-Modellen 魔鬼在细节中: 使用流动模型生成详细软件的密度指导 2502.05807v2 -
640 05-28 Visuospatial Cognitive Assistant Visuospatial Cognitive Assistant 活性呼吸空间感知助理 2505.12312v3 -
641 05-28 Efficient Dynamic Shielding for Parametric Safety Specifications Effiziente dynamische Abschirmung für parametrische Sicherheitsspezifikationen 用于参数安全规格的有效动态防护 2505.22104v1 -
642 05-28 Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts Auf dem Weg zur Visuospatialen Kognition durch hierarchische Fusion von visuellen Experten 争取通过视觉专家的等级化融合实现纵向空间聚合 2505.12363v3 -
643 05-28 Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting Bedingtes Stören trifft auf Polynommodellierung: Ein flexibles entkoppeltes Framework für die Zeitreihenprognose 满足多面性建模:时间序列预测灵活拆分框架 2410.13253v6 -
644 05-28 On the Transferability and Discriminability of Repersentation Learning in Unsupervised Domain Adaptation Über die Übertragbarkeit und Diskriminierbarkeit von Representation Learning in unüberwachter Domain-Anpassung 关于无监督域适应中可转让性和可转让性 2505.22099v1 -
645 05-28 Knowledge Base Construction for Knowledge-Augmented Text-to-SQL Knowledge Base Construction für wissensbasierte Text-zu-SQL 知识强化文字到SQL知识基础建设 2505.22096v1 -
646 05-28 Diffusion Models as Cartoonists: The Curious Case of High Density Regions Diffusionsmodelle als Karikaturisten: Der seltsame Fall von Regionen mit hoher Dichte 作为漫画家的传播模型:高密度地区令人好奇的案例 2411.01293v4 -
647 05-28 High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models Hohe Lautstärke 3D-Ultraschall-Rekonstruktion mit Diffusions-Modellen 3D超声波重建,采用传播模型 2505.22090v1 -
648 05-28 Base and Exponent Prediction in Mathematical Expressions using Multi-Output CNN Basis- und Exponentvorhersage in mathematischen Ausdrücken mit Multi-Output CNN 利用有线电视新闻网的多种产出对数学表达式进行基础和指数预测 2407.14967v2 -
649 05-28 Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations Domain-spezifisches Pruning von großen Mixture-of-Experts-Modellen mit nur wenigen Demonstrationen 大型混合型专家模型的域特定情景,少发示范 2504.06792v2 -
650 05-28 PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning PADAM: Parallel gemittelter Adam reduziert Fehler bei stochastischer Optimierung im wissenschaftlichen maschinellen Lernen PADAM: 平行平均 Adam 减少科学机器学习中随机优化的错误 2505.22085v1 -
651 05-28 Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz Hyperbolisches rezidivierendes neuronales Netzwerk als erste Art von nicht-euklidischen neuronalen Quantenzustandsansatz 超双曲经常性神经网络,作为第一种非欧洲的神经量子状态 ansatz 2505.22083v1 -
652 05-28 Improved Bounds for Swap Multicalibration and Swap Omniprediction Verbesserte Bounds für Swap Multikalibrierung und Swap Omniprediction 用于交换多校准和交换面宽度的改进宽度 2505.20885v2 -
653 05-28 LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation LongReD: Degradierung von Langtext-Großen Sprachmodellen durch Restaurationsdestillation LongReD:通过恢复蒸馏减少长长长大语言模型的短期退化 2502.07365v3 -
654 05-28 A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension Hybrides Multi-Factor-Netzwerk mit dynamischer Sequenzmodellierung zur Frühwarnung von intraoperativer Hypotonie 混合多要素网络,具有动态序列模型模型,以及早警告不合作水分的不合作状态; 2409.11064v3 -
655 05-28 Can Test-time Computation Mitigate Memorization Bias in Neural Symbolic Regression? Kann Testzeit-Computation Mitigate Memorization Bias in Neural Symbolische Regression? 测试时计算在神经符号回落中是否可模拟记忆回弹? 2505.22081v1 -
656 05-28 The Resurrection of the ReLU Die Auferstehung der ReLU 鲁鲁的复活, 2505.22074v1 -
657 05-28 PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models PRMBench: Ein feinkörniger und anspruchsvoller Benchmark für Prozess-Level-Reward-Modelle PRMBBench:进程一级奖励模式的精细和质疑基准 2501.03124v4 -
658 05-28 Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations Message-Passing-GNNs fehlschlagen an ungefähren Sparse Dreiecks-Fabrizierungen 投送信件 GNN 失败于近似偏差的三角三角因子化 2502.01397v2 -
659 05-28 Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head Dual-Head-Wissensdestillation: Optimierung der Logits-Nutzung mit Hilfe eines Hilfskopfes 双头知识蒸馏:用辅助头加强登录的使用 2411.08937v2 -
660 05-28 Learning Latent Graph Structures and their Uncertainty Lernen Latent Graph Structures und ihre Unsicherheit 学习后边图结构及其不确定性 2405.19933v2 -
661 05-28 Towards Resilient and Sustainable Global Industrial Systems: An Evolutionary-Based Approach Auf dem Weg zu stabilen und nachhaltigen globalen Industriesystemen: ein evolutionärer Ansatz 走向具有复原力和可持续的全球工业系统:基于演变的方法 2503.11688v2 -
662 05-28 Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact Quanten-Kernel-Lernen für kleine Datensätze Modellierung in Halbleiterfertigung: Anwendung auf Ohm-Kontakt 半导体制造中小型数据集建模的量子核心学习: Ohmic 接触的应用 2409.10803v3 -
663 05-28 A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Eine umfassende Umfrage in LLM(-Agent) Full Stack Sicherheit: Daten, Schulung und Bereitstellung 用LLLM(-代理)全堆安全:数据、培训和部署进行的全面调查 2504.15585v3 -
664 05-28 ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation ORIGEN: Zero-Shot 3D-Orientierungsgrundierung in Text-zu-Bild-Generierung 将零热3D定向定位作为产生文字到图像的基础 2503.22194v2 -
665 05-28 Reinforced Reasoning for Embodied Planning Verstärkte Begründung für die körperbetonte Planung 强化规划强化理由 2505.22050v1 -
666 05-28 Differentiable Generalized Sliced Wasserstein Plans Unterschiedliche generalisierte Wasserstein-Pläne 刀切瓦西斯坦计划 2505.22049v1 -
667 05-28 Learning Curves of Stochastic Gradient Descent in Kernel Regression Lernkurven des stochastischen Gradienten Abstiegs in Kernel-Regression 内核倒退中尾部渐变源的学习曲线 2505.22048v1 -
668 05-28 Learning to Steer Learners in Games Lernen zu Steer Learners in Spielen 在运动会中学习向运动会中的稳坐学生学习 2502.20770v2 -
669 05-28 PUATE: Efficient Average Treatment Effect Estimation from Treated (Positive) and Unlabeled Units PUATE: Effiziente Schätzung des durchschnittlichen Behandlungseffekts aus behandelten (Positiven) und nicht gekennzeichneten Einheiten PUATE: 高效平均处理效果估算处理(积极)单位和无标签单位的高效平均处理效果 2501.19345v2 -
670 05-28 MultiScale Contextual Bandits for Long Term Objectives MultiScale Contextual Bandits für langfristige Ziele 长期目标多层次背景影响 2503.17674v2 -
671 05-28 Latent Mamba Operator for Partial Differential Equations Latent Mamba Operator für partielle Differentialgleichungen 部分差异方程的 中端 Mamba 运算符 2505.19105v2 -
672 05-28 Estimating the Effects of Sample Training Orders for Large Language Models without Retraining Bewertung der Auswirkungen von Mustertrainingsaufträgen für große Sprachmodelle ohne Umschulung 估计无再培训的大语言模式抽样培训令的影响 2505.22042v1 -
673 05-28 Detecting Undesired Process Behavior by Means of Retrieval Augmented Generation Erkennung von unerwünschtem Prozessverhalten mittels retrievaler Augmented Generation 通过回收增加一代的手段检测不想要的流程行为 2505.22041v1 -
674 05-28 Revisiting In-Context Learning with Long Context Language Models Das In-Context-Lernen mit langen Kontext-Sprachmodellen 以长方语言模式重新研究内文学习 2412.16926v3 -
675 05-28 Weakly-Supervised Contrastive Learning for Imprecise Class Labels Schwachüberwachtes Kontrastives Lernen für ungenaue Klassen-Etiketten 简便类标签的微弱监督反竞争学习 2505.22028v1 -
676 05-28 Evaluation of the impact of expert knowledge: How decision support scores impact the effectiveness of automatic knowledge-driven feature engineering (aKDFE) Bewertung der Auswirkungen von Expertenwissen: Wie die Entscheidungsunterstützung die Wirksamkeit des automatischen wissensbasierten Feature Engineerings beeinflusst (aKDFE) 评价专家知识的影响:决策支持的评分如何影响知识驱动的自动知识特性工程(KDFE)的有效性 2504.05928v2 -
677 05-28 Efficient Online Reinforcement Learning for Diffusion Policy Effizientes Online-Verstärkungslernen für die Diffusionspolitik 高效在线强化学习促进传播政策 2502.00361v3 -
678 05-28 Model Diffusion for Certifiable Few-shot Transfer Learning Modell-Diffusion für zertifizierbares Transfer-Lernen mit wenigen Fotos 可核证的 “ 几光 “ 转让学习模型传播 2502.06970v2 -
679 05-28 Learning in Compact Spaces with Approximately Normalized Transformers Lernen in kompakten Räumen mit etwa normalisierten Transformatoren 学习与大约正常化变异器的紧凑空间的学习 2505.22014v1 -
680 05-28 SageAttention2++: A More Efficient Implementation of SageAttention2 SageAttention2++: Effizientere Umsetzung von SageAttention2 SageAttention2++:更有效地实施SageAttention2 2505.21136v2 -
681 05-28 A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs? Eine umfassende Real-World Bewertung von Audio Watermarking Algorithmen: Werden sie überleben Neural Codecs? 对音频水标定法的全面现实世界评估:它们能否生存神经规范? 2505.19663v2 -
682 05-28 Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains Domaino1s: Leitende LLM-Gründung für erklärbare Antworten in High-Stakes-Domains 域1:在高占用域中解释可解答案的 指导性LLM 2501.14431v2 -
683 05-28 Align-DA: Align Score-based Atmospheric Data Assimilation with Multiple Preferences Align-DA: Align Score-basierte atmosphärische Daten Assimilation mit mehreren Präferenzen Aleign-DA: 与多重优惠相仿的一致计分大气数据 2505.22008v1 -
684 05-28 Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings Generalisierungsanalyse für überwachtes Kontrastives Repräsentationslernen unter Nicht-IID-Einstellungen 在非IID设置下受监督的违反代表制学习的通用分析 2505.04937v3 -
685 05-28 Locking-Free Training of Physics-Informed Neural Network for Solving Nearly Incompressible Elasticity Equations Locking-Free Training of Physics-informed Neural Network for Solving Fast Incompressible Elasticity Equations 用于解决近不压缩弹性等量的物理内成神经网络的无锁化培训 2505.21994v1 -
686 05-28 Identifying Causal Direction via Variational Bayesian Compression Identifizierung der Kausalrichtung durch variationale Bayesische Kompression 通过变异贝耶斯压缩确定因果方向 2505.07503v3 -
687 05-28 ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning ACE: Exploring Activation Cosine Ähnlichkeit und Varianz für genaues und kalibrationseffizientes LLM Pruning ACE: 探索在准确度和校准-有效LLM Pruning 方面活跃共生相近性和差异 2505.21987v1 -
688 05-28 Reward-Independent Messaging for Decentralized Multi-Agent Reinforcement Learning Reward-independent Messaging für dezentralisiertes Mehr-Agenten-Verstärkungs-Lernen 权力下放多机构加强学习分权式多机构加强学习的回报独立通信 2505.21985v1 -
689 05-28 How to Synthesize Text Data without Model Collapse? Wie können Sie Textdaten ohne Modellkollaps synthesieren? 如何在没有模式折叠的情况下合成文本数据 ? 2412.14689v3 -
690 05-28 Latent Weight Diffusion: Generating reactive policies instead of trajectories Latent Weight Diffusion: Erzeugen von reaktiven Strategien anstelle von Trajektorien 负负重扩散: 产生反应性政策, 而不是轨迹 2410.14040v2 -
691 05-28 Two-Stage Feature Generation with Transformer and Reinforcement Learning Zweistufige Feature-Generierung mit Transformer und Verstärkungslernen 具有变换器和强化学习的两阶段特色生成 2505.21978v1 -
692 05-28 Judging LLMs on a Simplex LLMs auf einem Simplex zu urteilen 以简单方式判断LLMs 2505.21972v1 -
693 05-28 Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing Heterogene Token-Übertragung in LLM-Wissensbearbeitung abmildern 减轻LLLM知识编辑中变异式 Tok 超称 2502.00602v2 -
694 05-28 Robust Reward Alignment via Hypothesis Space Batch Cutting Robuste Belohnung Ausrichtung durch Hypothesis Raum Batch Schneiden 通过假设空间批量切割进行强力奖励调整 2502.02921v3 -
695 05-28 Cooperation of Experts: Fusing Heterogeneous Information with Large Margin Kooperation von Experten: Verschmelzende Heterogene Informationen mit großer Spanne 专家合作:利用具有较大边际效应的异种信息 2505.20853v2 -
696 05-28 EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles EnsemW2S: Verbesserung der Schwach-zu-Strong-Verallgemeinerung mit großsprachigen Modellensembles EnsemW2S:用大语言模型组合加强弱至强的通用化 2505.21959v1 -
697 05-28 A Stochastic Approximation Approach for Efficient Decentralized Optimization on Random Networks Ein stochastischer Annäherungsansatz für eine effiziente dezentralisierte Optimierung von Random Networks 随机网络高效分散优化优化的斯托卡接近方法 2410.18774v2 -
698 05-28 Kimi k1.5: Scaling Reinforcement Learning with LLMs Kimi k1.5: Skalierungs-Verstärkungs-Lernen mit LLMs Kimi k1.5:利用LLMs加强加强学习 2501.12599v3 -
699 05-28 Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization Stochastische primäre Doppelblockkoordinate für Zwei-Wege-Partielle AUC-Maximierung 双向部分AUC 最大化 2505.21944v1 -
700 05-28 Continual Learning Beyond Experience Rehearsal and Full Model Surrogates Kontinuierliches Lernen über die Erfahrung hinaus Proben und vollständige Modellüberlagerungen 排练和全模模范代理公司 2505.21942v1 -
701 05-28 Go With the Flow: Fast Diffusion for Gaussian Mixture Models Mit dem Fluss gehen: Schnelle Diffusion für Gaussian Mixture Models 随流而去:高山混合模型的快速扩散 2412.09059v4 -
702 05-28 Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection Praktische Adversarialangriffe auf stochastische Banditen durch gefälschte Dateninjektion 通过假数据注射,实际对抗性攻击斯托卡强盗 2505.21938v1 -
703 05-28 ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation ReQFlow: Rektifizierter Quaternionsfluss für effiziente und hochwertige Protein-Backbone-Generation ReQFlow:为高效和高品质蛋白后骨生成而调整的四量流动 2502.14637v3 -
704 05-28 Higher-Order Group Synchronization Gruppensynchronisierung mit höherer Ordnung 高级分级组同步化 2505.21932v1 -
705 05-28 Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning Ermittlung von Kriterien für die Neugewichtung von Verlusten zur Verbesserung des LLM-Entlernens 探索损失重新加权标准,加强LLM 重新学习 2505.11953v2 -
706 05-28 Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets Effizientes Ensemble für die Feinabstimmung von Sprachmodellen auf mehreren Datensätzen 多个数据集微调语言模型高效组合组合 2505.21930v1 -
707 05-28 Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment Effiziente Logit-basierte Wissensdestillation von Tiefen-Spiking-Neural-Netzwerken für die Bereitstellung von Vollstrecken-Zeitschritten 用于全红时间步骤部署的深渗透神经网络的高效基于逻辑的知识蒸馏 2501.15925v2 -
708 05-28 Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology Subspezialitätsspezifisches Stiftungsmodell für intelligente Gastrointestinalpathologie 智能气胃肠道病理学 2505.21928v1 -
709 05-28 RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination RenderFormer: Transformer-basiertes Neural-Rendering von Dreiecksnetzen mit globaler Beleuchtung 成形前:以变形器为基础的以全球光化为工具的三角三角光板的神经成形 2505.21925v1 -
710 05-28 FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design FALCON: Ein ML-Framework für vollautomatisierte Layout-Kontrainierte analoge Schaltungen FALCON: 完全自动布局约束模拟电路设计 ML 框架 2505.21923v1 -
711 05-28 Self-supervised Learning Method Using Transformer for Multi-dimensional Sensor Data Processing Selbstüberwachte Lernmethode mit Transformer für mehrdimensionale Sensordatenverarbeitung 利用变压器进行多维传感器数据处理的自监督学习方法 2505.21918v1 -
712 05-28 SlimLLM: Accurate Structured Pruning for Large Language Models SlimLLM: Genau strukturiertes Pruning für große Sprachmodelle SlimLLM:大型语言模型的准确结构审慎 2505.22689v1 -
713 05-28 Understanding the behavior of representation forgetting in continual learning Das Verhalten der Repräsentation verstehen vergessen im kontinuierlichen Lernen 理解在不断学习中遗忘的代言人行为 2505.20970v2 -
714 05-28 ExpProof : Operationalizing Explanations for Confidential Models with ZKPs ExpProof : Operationalisierung von Erklärungen für vertrauliche Modelle mit ZKPs 利用:对ZKPs的机密模型的解释投入运作 2502.03773v3 -
715 05-28 Taming Transformer Without Using Learning Rate Warmup Zähmung Transformer ohne Verwendung von Lernrate Warmup 塔姆变形器不使用学习速率暖化 2505.21910v1 -
716 05-28 Criticality and Safety Margins for Reinforcement Learning Kritizität und Sicherheitsmargen für verstärktes Lernen 强化学习的临界和安全边缘 2409.18289v2 -
717 05-28 Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding Verstärktes Lernen für Out-of-Distribution-Reasoning in LLMs: Eine empirische Studie zur diagnostischen Gruppencodierung 在LLMM中加强分配外原因的强化学习:诊断相关群体编码经验研究 2505.21908v1 -
718 05-28 OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models OVERT: Ein Benchmark für eine überwiderrechtliche Bewertung von Text-zu-Bild-Modellen GUT: 对文本到图像模型的反否决评价基准 2505.21347v2 -
719 05-28 Geometry-Informed Neural Operator Transformer Geometrie-informierter Neuraloperator Transformer 智能神经操作器变换器 2504.19452v3 -
720 05-28 Integrating Intermediate Layer Optimization and Projected Gradient Descent for Solving Inverse Problems with Diffusion Models Integration von Intermediate Layer Optimization und projizierter Gradient Descent zur Lösung inverser Probleme mit Diffusionsmodellen 整合中间层优化和预测梯度,以解决传播模型的反向问题 2505.20789v2 -
721 05-28 Combinatorial Reinforcement Learning with Preference Feedback Kombinatorisches Stärkungslernen mit Präferenz-Feedback 结合强化学习与优先反馈 2502.10158v2 -
722 05-28 ReGNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction ReGNet: Reziproke Raum-Bewusst-Langstrecken-Modellierung für kristalline Eigenschaftsvorhersage ReGNet:水晶财产预测的对等空间-软件长距离模型模型 2502.02748v2 -
723 05-28 Language-Enhanced Representation Learning for Single-Cell Transcriptomics Sprachverstärktes Repräsentationslernen für Single-Cell-Transkriptomik 单一计算机转基因学的提高语言代表性学习 2503.09427v3 -
724 05-28 Federated Continual Graph Learning Föderiertes kontinuierliches Graphenlernen 联邦连续图学习 2411.18919v3 -
725 05-28 Towards Large Reasoning Models for Agriculture Auf dem Weg zu groß angelegten Konzepten für die Landwirtschaft 争取实现农业大理由解释模式 2505.19259v2 -
726 05-28 Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization Komprimierende Sine-Activated Low-Rank-Adapter durch Quantisierung nach dem Training 通过培训后定量化压缩松状活动低Rank适应器 2505.21895v1 -
727 05-28 SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training SDPO: Importance-Sampled Direct Preference Optimierung für stabile Diffusionsschulungen SDPO: 稳定传播培训的重要性抽样直接优惠优化 2505.21893v1 -
728 05-28 ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image ControlTac: Kraft- und positionsgesteuerte taktile Datenvergrößerung mit einem einzigen Referenzbild 控制塔克: 带有单一参考图像的 力控和位置控轨迹数据增强 2505.20498v2 -
729 05-28 Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion Fast lineare Konvergenz unter Minimal-Score Annahmen: Quantisierte Transition Diffusion 在最低分数假设下几乎线性聚合:量化过渡扩散 2505.21892v1 -
730 05-28 Towards Robust Automated Perceptual Voice Quality Assessment with Speech Foundation Models Auf dem Weg zu robuster automatisierter Wahrnehmungsqualitätsbewertung mit Sprachstiftungsmodellen 以语音基金会模式进行强有力的自主声音质量评估 2505.21356v2 -
731 05-28 Symbolic Foundation Regressor on Complex Networks Symbolischer Foundation-Regressor auf komplexen Netzwerken 复杂网络上的反射器 2505.21879v1 -
732 05-28 Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning Hybride Batch-Normalisierung: Lösung des Dilemmas der Batch-Normalisierung im Federated Learning 混合批次正常化:解决联邦学习中批次正常化的难题 2505.21877v1 -
733 05-28 Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images Gezieltes Lernen mit gestörten Zeichen Gradient Methoden mit Anwendungen auf medizinischen Bildern 采用固定信号渐进方法,在医学图像上应用医学图象,有针对性地取消学习 2505.21872v1 -
734 05-28 Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning Coarse-to-fine Q-Network mit Aktionssequenz für dateneffizientes Roboterlernen Coarse 至 fine Q 网络与数据效率机器人学习行动序列 2411.12155v4 -
735 05-28 Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures Mini-Batch Coresets für speichereffiziente Sprachmodellschulungen auf Datenmischungen 记忆效率语言数据混合模型培训微型批量核心数据集 2407.19580v4 -
736 05-28 Revisiting Bayesian Model Averaging in the Era of Foundation Models Bayesianisches Modell im Zeitalter der Gründungsmodelle neu besuchen 重新审查基金会模式时代的贝耶斯模式 2505.21857v1 -
737 05-28 Meta Co-Training: Two Views are Better than One Meta Co-Training: Zwei Ansichten sind besser als eine Meta联合培训:两种观点比一种观点更好 2311.18083v5 -
738 05-28 Investigating the effectiveness of multimodal data in forecasting SARS-COV-2 case surges Untersuchung der Wirksamkeit multimodaler Daten bei der Prognose von SARS-COV-2-Fallfluten 调查多式联运数据在预测SARS-COV-2案件激增方面的有效性 2505.22688v1 -
739 05-28 Multi-Label Bayesian Active Learning with Inter-Label Relationships Multi-Label Bayesian Aktives Lernen mit inter-Label Beziehungen 多标签贝耶斯人积极学习与跨标签关系 2411.17941v2 -
740 05-28 Improving the Variance of Differentially Private Randomized Experiments through Clustering Verbesserung der Varianz von differenziert privaten Randomisierten Experimenten durch Clustering 通过集群化改进差异私人随机化实验的差异 2308.00957v3 -
741 05-28 ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model ItDPDM: Informationstheoretisches Diskretes Poisson-Diffusionsmodell ITDDDM:信息-理论分辨偏异Poisson传播模型 2505.05082v3 -
742 05-28 Solving Empirical Bayes via Transformers Lösen von Empirischen Buchten über Transformer 通过变换器解决实证贝贝 2502.09844v2 -
743 05-28 Continuous Thought Machines Kontinuierliche Gedankenmaschinen 连续思考机 2505.05522v3 -
744 05-28 Statistical Inference for Temporal Difference Learning with Linear Function Approximation Statistische Schlussfolgerung für zeitliches Differenzlernen mit linearer Funktionsannäherung 与线性函数接近一致的时空差异学习统计推推 2410.16106v3 -
745 05-28 A Provable Approach for End-to-End Safe Reinforcement Learning Ein realistischer Ansatz für das Ende-zu-Ende sichere Stärkungslernen 最终至最终安全强化学习的可行办法 2505.21852v1 -
746 05-28 Streaming Flow Policy: Simplifying diffusion$/$flow-matching policies by treating action trajectories as flow trajectories Streaming Flow Policy: Vereinfachende Diffusion$/$ Flow-Matching-Richtlinien durch Behandlung von Aktionsbahnen als Flow-Trajektorien 流流流流流流政策:通过将行动轨迹作为流动轨迹处理,简化以美元/美元/美元的流量匹配政策 2505.21851v1 -
747 05-28 Spectral clustering for dependent community Hawkes process models of temporal networks Spektrales Clustering für abhängige Community Hawkes Prozessmodelle von zeitlichen Netzwerken 依赖依赖性社区霍克斯时间网络过程模型光谱群群群 2505.21845v1 -
748 05-28 A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem Ein physikinformiertes Lernrahmenwerk zur Lösung des Unendlichen-Horizon-Optimalen Steuerungsproblems 解决无限 – – 霍里佐最佳控制问题的物理综合学习框架 2505.21842v1 -
749 05-28 An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints Optimistischer Algorithmus für Online-CMDPS mit jederzeit feindlichen Einschränkungen 带有任何时间的反逆限制的在线 CMDPS 优化算法 2505.21841v1 -
750 05-28 Natural Language Reinforcement Learning Natürliche Sprache Stärkung Lernen 自然语言强化学习 2411.14251v3 -
751 05-28 UniMoGen: Universal Motion Generation UniMoGen: Universal Motion Generation UniMoGen: 宇宙运动一代 2505.21837v1 -
752 05-27 (2) Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures Ableiten von Verkehrsmodellen im Terminal-Luftraum von Flugspuren und -verfahren 从飞行轨道和程序中推断终端航空空间的交通模式 2303.09981v3 -
753 05-27 TuneComp: Joint Fine-tuning and Compression for Large Foundation Models TuneComp: Gemeinsame Feinabstimmung und Kompression für große Fundamentmodelle TununComp:大型基金会模型的联合微调和压缩 2505.21835v1 -
754 05-27 Constrained Discrete Diffusion Beschränkte diskrete Diffusion 限制的分解扩散 2503.09790v2 -
755 05-27 In Search of Adam’s Secret Sauce Auf der Suche nach Adams geheimer Sauce 寻找亚当的秘密香肠 2505.21829v1 -
756 05-27 Music Source Restoration Restaurierung der Musikquelle 音乐来源恢复 2505.21827v1 -
757 05-27 From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization Von EduVisBench zu EduVisAgent: Ein Benchmark- und Multi-Agent-Framework für eine sinnvolle pädagogische Visualisierung 从Edu Visb bench到Edu Visbench-Edu VisbearAgender:有理性的可视化教育基准和多机构框架 2505.16832v2 -
758 05-27 Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones Lassen Sie mich nachdenken! Eine lange Kette des Denkens kann es wert sein, auf jeden Fall viele kurze Menschen 让我想想吧!一个长期的思考链 可能值得一试 有很多短一个 2505.21825v1 -
759 05-27 Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations Unüberwachte Latent Pattern Analyse zur Schätzung des Typ-2-Diabetes-Risikos in nicht diagnostizierten Populationen 未经监督的对未诊断的人群2型糖尿病风险估算的 2505.21824v1 -
760 05-27 An Innovative Data-Driven and Adaptive Reinforcement Learning Approach for Context-Aware Prescriptive Process Monitoring Ein innovativer datengetriebener und adaptiver Weiterbildungsansatz für die kontext-aware Prescriptive Prozessüberwachung 采用创新型数据驱动和适应性强化学习方法,用于内容软件指令程序监测 2501.10543v2 -
761 05-27 DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra DiffMS: Diffusionserzeugung von Molekülen auf Massenspektren DiffMS: 受质量光谱约束的分子的扩散生成 2502.09571v2 -
762 05-27 Representative Language Generation Repräsentative Sprachgenerierung 代 代 代 语 语 代 语 代 语 代 2505.21819v1 -
763 05-27 Optimizing Data Augmentation through Bayesian Model Selection Optimierung der Datenvergrößerung durch Bayesian Model Selection 通过Bayesian模式选择优化数据增加 2505.21813v1 -
764 05-27 Learning Enhanced Ensemble Filters Enhanced Ensemble Filter lernen 学习增强的组合过滤器 2504.17836v2 -
765 05-27 ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails ThinkGuard: Besonnenes langsames Denken führt zu voreiligen Wärtern 思考指南:慎重考虑的慢思考引领谨慎警卫车 2502.13458v2 -
766 05-27 Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect Sprachqualitätsdimensionen als Interpretierbare Primitive für sprechenden Stil für atypische Sprache und Affekt 语音质量方面作为非非典型演讲和影响说话风格的可解释的原始语言 2505.21809v1 -
767 05-27 Towards Operational Automated Greenhouse Gas Plume Detection Auf dem Weg zu einer operationell automatisierten Treibhausgas-Plume-Erkennung 实现操作性自动温室气体管道探测 2505.21806v1 -
768 05-27 From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs Von der Anfahrt zu den Cones: Erforschung multidimensionaler Darstellungen von Propositional Facts in LLMs ” 从方向到锥体:探索液晶中各种潜在事实的多层面代表 “ 2505.21800v1 -
769 05-27 PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective PolarGrad: Eine Klasse von Matrix-Gradienten-Optimierern aus einer einheitlichen Sicht der Vorkonditionierung 极地格:从统一前置角度出发的矩阵-高压优化器类别 2505.21799v1 -
770 05-27 A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging Ein General-Purpose-Theorem für hochwahrscheinliche Grenzen stochastischer Annäherung mit Polyak Average 具有聚氨基挥动作用的斯托克相吸合高概率波断的普通用途理论 2505.21796v1 -
771 05-27 End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding End-to-End-Brustkrebs-Radiotherapie Planung über LMMs mit Konsistenz-Embedding 通过具有一致嵌入的LMMs进行端至端乳腺癌放射治疗规划 2311.15876v4 -
772 05-27 Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms Multimodales Federated Learning: Eine Umfrage durch die Linse verschiedener FL-Paradigmen 多模式联邦学习:通过不同FL范式的镜头进行调查 2505.21792v1 -
773 05-27 LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models LV-XAttn: Verteilte Cross-Attention für lange visuelle Eingänge in multimodalen großen Sprachmodellen LV-XAttn:多式大语言模型中长视输入分布式交叉注意 2502.02406v3 -
774 05-27 Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks 以美元为单位、以美元为单位、以美元为单位、以美元为单位、以目标为单位的全球最小化器 2505.21791v1 -
775 05-27 Faster Rates for Private Adversarial Bandits Schnellere Preise für private Adversarial Bandits 私人反盗贼的速率 2505.21790v1 -
776 05-27 Wanda++: Pruning Large Language Models via Regional Gradients Wanda++: Beschneiden großer Sprachmodelle über regionale Gradienten Wanda+++:通过区域渐变来保护大语言模式 2503.04992v3 -
777 05-27 Born a Transformer – Always a Transformer? Geboren ein Transformer - immer ein Transformer? 天生的变形人 - - 总是变形人? 2505.21785v1 -
778 05-27 Universal Approximation of Mean-Field Models via Transformers Universelle Annäherung von Mittelwert-Feld-Modellen über Transformer 通过变压器实现平均实地模型普遍接近 2410.16295v2 -
779 05-27 Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models Wasserzeichen im Sand: Unmöglichkeit der starken Wasserzeichen für generative Modelle 沙沙中的水印:在生成模型中使用强水标志的可能性 2311.04378v5 -
780 05-27 P-DROP: Poisson-Based Dropout for Graph Neural Networks P-DROP: Poisson-basiertes Dropout für Graphen-Neural-Netzwerke PDROP: 石形神经网络的 Poisson-Poisson 辍学 2505.21783v1 -
781 05-27 Diffusion Adversarial Post-Training for One-Step Video Generation Diffusions-Adversarial-Post-Training für die One-Step-Videogenerierung 单步制录像制作单步制片后培训 2501.08316v2 -
782 05-27 Memorization to Generalization: Emergence of Diffusion Models from Associative Memory Erinnerung an die Verallgemeinerung: Entstehung von Diffusionsmodellen aus dem assoziativen Gedächtnis 记忆化为普遍化:共同内存传播模型的出现 2505.21777v1 -
783 05-27 DualSchool: How Reliable are LLMs for Optimization Education? DualSchool: Wie zuverlässig sind LLMs für die Optimierungsbildung? 两所学校:优化教育LLMs有多可靠? 2505.21775v1 -
784 05-27 Backdoors in DRL: Four Environments Focusing on In-distribution Triggers Hintertüren in DRL: Vier Umgebungen mit Fokus auf In-Distribution Trigger DRL的后门:四个环境,侧重于内部分配触发器 2505.17248v2 -
785 05-27 Beyond 1D: Vision Transformers and Multichannel Signal Images for PPG-to-ECG Reconstruction Beyond 1D: Vision Transformers und Multichannel Signal Images für PPG-zu-ECG-Rekonstruktion 1D之后:为重建PPPG至ECG提供愿景变形器和多通道信号图像 2505.21767v1 -
786 05-27 Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop Erklärbare multimodale Zeitreihenvorhersage mit LLM-in-the-Loop 与LLM in-Loop的可解释的多时时间序列预测 2503.01013v2 -
787 05-27 TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster TS-RAG: Retrieval-Augmented Generation basierte Time Series Foundation Modelle sind stärker Zero-Shot Forecaster TS-RAG:基于时间序列的回收-养殖一代基于时间序列的基础模型是更强的零热预测仪 2503.07649v3 -
788 05-27 Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization Puristische Korrelationen in der hochdimensionalen Regression: Die Rollen der Regularisierung, der Einfachheit Bias und der Überparameterisierung 高度倒退中的纯净误值:常规化、简易生物和过度计量化的作用 2502.01347v2 -
789 05-27 FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering FRAMES-VQA: Benchmarking Fine-Tuning Robustheit über Multi-Modal Shifts in der visuellen Fragestellung FRAMES-VQA:确定视觉问题解答中多模式变化的精确调整强度基准 2505.21755v1 -
790 05-27 Path Planning for Masked Diffusion Model Sampling Pfadplanung für maskierte Diffusions-Modell-Probenahme 蒙面扩散模型取样规划路径 2502.03540v4 -
791 05-27 Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals Hierarchisches Stärkungslernen mit unsicheren, diffusionalen Unterzielen 具有不确定性的梯级强化学习,有不确定的辅助分传播目标 2505.21750v1 -
792 05-27 Revisiting Bi-Linear State Transitions in Recurrent Neural Networks Bi-Lineare State Transitions in recurrenten neuralen Netzwerken erneut besuchen 在经常性神经网络中重新审查双利那尔州过渡 2505.21749v1 -
793 05-27 Privacy for Free in the Overparameterized Regime Privatsphäre kostenlos im überparameterisierten Regime 过度计量制度中的免费隐私 2410.14787v2 -
794 05-27 Learning to See More: UAS-Guided Super-Resolution of Satellite Imagery for Precision Agriculture Mehr erfahren: UAS-geführte Super-Resolution von Satellitenbildern für Präzisionslandwirtschaft 学习更多见:UAS-UAS指导的精密农业卫星图像超级分辨率 2505.21746v1 -
795 05-27 Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen Das Unsichtbare simulieren: Crash Prediction muss lernen, was nicht passiert ist 模拟看不见:崩溃预测必须从没有发生的事情中吸取教训 2505.21743v1 -
796 05-27 Outlier-Robust Linear System Identification Under Heavy-tailed Noise Ausreißer-Robust Lineare System-Identifikation unter stark verdichtetem Lärm 在重尾噪音下识别线性系统 2501.00421v2 -
797 05-27 What is Adversarial Training for Diffusion Models? Was ist ein Adversarial Training für Diffusionsmodelle? 传播模型的反向培训是什么? 2505.21742v1 -
798 05-27 Polynomial Chaos Expanded Gaussian Process Polynomisches Chaos erweiterter Gauß-Prozess 扩大的高斯进程 2405.01052v2 -
799 05-27 Moment kernels: a simple and scalable approach for equivariance to rotations and reflections in deep convolutional networks Momentkerne: ein einfacher und skalierbarer Ansatz für Gleichmäßigkeit zu Rotationen und Reflexionen in tiefen konvolutionären Netzwerken 动力核心:一种简单和可伸缩的方法,在深刻的革命网络中,对轮换和反射的等同性采取简单和可伸缩的办法 2505.21736v1 -
800 05-27 Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization Adressierung von Konzept-Mislabeling in Konzept-Bottleneck-Modellen durch Preference-Optimierung 通过优先优化处理概念瓶颈模式中的概念误贴标签问题 2504.18026v2 -
801 05-27 Non-Markovian Discrete Diffusion with Causal Language Models Nicht-Markovianische Diskrepanz mit kausalen Sprachmodellen 非马尔科维语非马尔科维语分辨语言模式的传播 2502.09767v2 -
802 05-27 MIND-Stack: Modular, Interpretable, End-to-End Differentiability for Autonomous Navigation MIND-Stack: Modular, interpretierbar, End-to-End-Unterscheidbarkeit für die autonome Navigation MIND-Stack: 自主航行的模块、可解释、端到端至端差异 2505.21734v1 -
803 05-27 LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing LaX: Förderung der Low-Rank-Schulung von Stiftungsmodellen durch Latent Crossing LaX:通过中转交叉促进基金会模型的低射速培训 2505.21732v1 -
804 05-27 Deep Reinforcement Learning Agents are not even close to Human Intelligence Deep Enforcement Learning Agents sind nicht einmal der menschlichen Intelligenz nahe 深强化学习代理机构甚至离人类情报机构不近 2505.21731v1 -
805 05-27 Are Statistical Methods Obsolete in the Era of Deep Learning? Sind statistische Methoden im Zeitalter des tiefen Lernens überholt? 统计方法是否在深层学习时代过时? 2505.21723v1 -
806 05-27 Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape Sattel-zu-Sattel-Dynamik in Deep ReLU Networks: Low-Rank Bias bei der ersten Sattelflucht 深 ReLU 网络中的套装到套接的动态动态: 第一次套装逃跑中的低兰克比亚 2505.21722v1 -
807 05-27 CTBENCH: A Library and Benchmark for Certified Training CTBENCH: Eine Bibliothek und Benchmark für zertifizierte Ausbildung CTBENCH: 注册培训的图书馆和基准 2406.04848v4 -
808 05-27 Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference Nahezu dimensionsunabhängige Konvergenz des mittleren Feldes Black-Box Variationale Schlussfolgerung 中 - 现场黑 - 生物- 黑 - 生物- 黑 - 生物- 2505.21721v1 -
809 05-27 Simple Guidance Mechanisms for Discrete Diffusion Models Einfache Leitmechanismen für diskrete Diffusionsmodelle 分辨传播模型的简单指导机制 2412.10193v3 -
810 05-27 Training Dynamics of In-Context Learning in Linear Attention Trainingsdynamik des In-Context-Lernens in linearer Aufmerksamkeit 线线性关注的内文学习培训动态 2501.16265v2 -
811 05-27 Network classification through random walks Netzwerkklassifizierung durch zufällige Spaziergänge 通过随机行走进行网络分类 2505.21706v1 -
812 05-27 AMSFL: Adaptive Multi-Step Federated Learning via Gradient Difference-Based Error Modeling AMSFL: Adaptives Multi-Step-Federated Learning über gradient Difference-based Error Modeling ASFL:通过基于差异的渐进错误建模进行适应性多阶段联邦学习 2505.21695v1 -
813 05-27 What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization Welche Daten ermöglichen optimale Entscheidungen? Eine genaue Charakterisierung für lineare Optimierung 什么数据能使最佳决定实现最佳决定? 线性优化的精确属性 2505.21692v1 -
814 05-27 LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model LLMPR: Ein neuartiges LLM-getriebenes Transfer-Learning-basiertes Petitions-Ranking-Modell LLMPR:基于请愿排级的新式LLM-驱动转移学习模式 2505.21689v1 -
815 05-27 Empirical analysis of binding precedent efficiency in Brazilian Supreme Court via case classification Empirische Analyse der verbindlichen Präzedenzeffizienz im brasilianischen Obersten Gerichtshof über die Fallklassifizierung 通过案件分类对巴西最高法院具有约束力的先例效率进行经验分析 2407.07004v3 -
816 05-27 Probabilistic Reasoning with LLMs for k-anonymity Estimation Probabilistische Begründung mit LLMs für k-Anonymitätsschätzung K-匿名性估计法LLMs的概率推理 2503.09674v3 -
817 05-27 Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models Verbesserung der Benutzerverhaltensvorhersage: Annotator-Metadaten in überwachten Machine Learning-Modellen nutzen 改进用户行为预测:在受监督的机器学习模型中利用标记元数据 2503.21000v2 -
818 05-27 tenSVD algorithm for compression tenSVD-Algorithmus zur Kompression 用于压缩的 10SVD 算法 2505.21686v1 -
819 05-27 Edit Distance Robust Watermarks via Indexing Pseudorandom Codes Entfernung bearbeiten Robuste Wasserzeichen über Indexierung Pseudorandom Codes 通过索引化 Peredorandom 代码编辑远程硬体水印 2406.02633v2 -
820 05-27 Incentivizing Permissionless Distributed Learning of LLMs Anreize für das unbefugte Lernen von LLMs 激励对LLMM的无自由分配的学习 2505.21684v1 -
821 05-27 multivariateGPT: a decoder-only transformer for multivariate categorical and numeric data multivariateGPT: ein nur Decoder-Transformator für multivariate kategoriale und numerische Daten 多个变量GPT: 用于多变量绝对数据和数字数据的解码器专用变压器 2505.21680v1 -
822 05-27 Fast meta-solvers for 3D complex-shape scatterers using neural operators trained on a non-scattering problem Schnelle Meta-Lösung für 3D-Komplex-Spritzer mit neuronalen Operatoren, die auf einem nicht-streuenden Problem geschult sind 使用神经操作员就非碎裂问题接受培训的3D复合碎片散散射器快速元解析器 2405.12380v2 -
823 05-27 Robust LLM Alignment via Distributionally Robust Direct Preference Optimization Robuste LLM-Ausrichtung über distributiv robuste Direktpräferenzoptimierung 通过分布式强力直接首选项优化对齐 2502.01930v2 -
824 05-27 What happens when generative AI models train recursively on each others’ generated outputs? Was passiert, wenn generative KI-Modelle rekursiv auf den jeweils anderen generierten Ausgängen trainieren? 当基因化的AI模型对彼此产生的产出进行回溯性培训时会怎样呢? 2505.21677v1 -
825 05-27 In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention In-Context Lineare Regression Demystified: Trainingsdynamik und mechanistische Interpretierbarkeit von Multi-Head Softmax Achtung 内负线倒退:对多头软体注意力进行动态和机械解释的培训 2503.12734v2 -
826 05-27 Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations Schnelles lebenslanges Adaptives Inverses Verstärktes Lernen aus Demonstrationen 从示范活动中学习 2209.11908v8 -
827 05-27 Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing Adaptive Frontier Exploration von Graphen mit Anwendungen für netzwerkbasierte Krankheitstests 适应性边界探索应用网络基疾病测试图图的适应性边界探索 2505.21671v1 -
828 05-27 Efficient Controllable Diffusion via Optimal Classifier Guidance Effiziente steuerbare Diffusion über Optimal Classifier Guidance 通过最佳分类指南有效控制可控扩散 2505.21666v1 -
829 05-27 Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning Constraint-Adaptive Policy Switching für Offline-sicheres Ausbau-Lernen 离线安全强化学习约束性强化政策转换 2412.18946v2 -
830 05-27 PreGenie: An Agentic Framework for High-quality Visual Presentation Generation PreGenie: Agentisches Framework für hochwertige visuelle Präsentationsgeneration PreGenie:高质量视觉演示制作的代理框架 2505.21660v1 -
831 05-27 STACI: Spatio-Temporal Aleatoric Conformal Inference STACI: Spatio-Temporale aleatorische Konforme Schlussfolgerung STACI: 斯帕迪奥-时空空气迁移 2505.21658v1 -
832 05-27 Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations Erklärbarkeit großer Sprachmodelle mit SMILE: Statistische Modell-agnostische Interpretierbarkeit mit lokalen Erklärungen 使用SMILE解释大语言模型的可解释性:统计模型 – – 与当地解释的可解释性 2505.21657v1 -
833 05-27 BACON: A fully explainable AI model with graded logic for decision making problems BACON: Ein voll erklärbares KI-Modell mit abgestufter Logik für Entscheidungsprobleme 具有决策问题分级逻辑的完全可解释的AI模型 2505.14510v3 -
834 05-27 AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent AutoSGD: Automatische Lernrate-Auswahl für stochastische Gradient Descent AutoSGD: 存储渐变后代自动学习率选择 2505.21651v1 -
835 05-27 QuARI: Query Adaptive Retrieval Improvement QUARI: Abfrage Adaptive Verbesserung des Retrievals QuARI: 查询适应性检索改进 2505.21647v1 -
836 05-27 PrivATE: Differentially Private Confidence Intervals for Average Treatment Effects Private: Differenzielle private Vertrauensintervalle für durchschnittliche Behandlungseffekte 普里瓦特:对平均待遇影响有区别的私人信任互换 2505.21641v1 -
837 05-27 Efficient Diffusion Models for Symmetric Manifolds Effiziente Diffusionsmodelle für symmetrische Manifolds 高效扩散对称操纵模型 2505.21640v1 -
838 05-27 Apprenticeship learning with prior beliefs using inverse optimization Lehrlingsstudium mit früheren Überzeugungen mit inverser Optimierung 利用反向优化进行具有先入先信的学徒学习 2505.21639v1 -
839 05-27 Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives Ist Ihr LLM überladen Sie? Tokenization, Transparenz, und Incentives 您的法学硕士是否对你太过苛刻? 2505.21627v1 -
840 05-27 Localized Weather Prediction Using Kolmogorov-Arnold Network-Based Models and Deep RNNs Lokalisierte Wettervorhersage mit Kolmogorov-Arnold-Netzwerk-basierten Modellen und tiefen RNNs 利用Kolmogorov-Arnold网络模型和深区域网网 2505.22686v1 -
841 05-27 Learning Where to Learn: Training Distribution Selection for Provable OOD Performance Lernen, wo man lernen kann: Training Distribution Selection for Provable OOD Performance 学习从何学习:选择培训分布,以选择可实现的OOD业绩 2505.21626v1 -
842 05-27 VideoMarkBench: Benchmarking Robustness of Video Watermarking VideoMarkBench: Benchmarking Robustheit von Video Watermarking 视频MarkBench:视频水标记基准的坚实性 2505.21620v1 -
843 05-27 Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making Schweigen ist kein Konsens: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making 沉默不是共识:通过用于临床决策的Catfish代理商在多方代理LLMs中破坏协议的偏见 2505.21503v1 -
844 05-27 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents UI-Genie: Ein selbstverbesserender Ansatz zur iterativen Steigerung von MLLM-basierten mobilen GUI-Agenten UI-Genie: 一种自我改进的方法,用于在刺激下促进基于MLLLM的移动图形界面工具 2505.21496v1 -
845 05-27 Reinforcing General Reasoning without Verifiers Verstärkung der allgemeinen Vernunft ohne Prüfer 加强一般理由说明,无验证人 2505.21493v1 -
846 05-27 Be Decisive: Noise-Induced Layouts for Multi-Subject Generation Entscheidend sein: Lärminduzierte Layouts für die mehrteilige Generierung Be Decisive: 多主题生成的噪音生成布局 2505.21488v1 -
847 05-27 Hardware-Efficient Attention for Fast Decoding Hardware-Effiziente Aufmerksamkeit für schnelle Dekodierung 快速下标记的硬件高效关注 2505.21487v1 -
848 05-27 Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models Algorithmen und SQ Lower Bounds für robustes Lernen Real-valuierte Multi-Index-Modelle 强力学习实时估价多指数模型的等级和 SQ 下角宽度 2505.21475v1 -
849 05-27 Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions Annealing Flow Generative Modelle zur Probenahme hochdimensionaler und multi-Modalen Verteilungen 用于取样的高多样性和多模式分布和多模式分布的Ananining流程生成模型 2409.20547v4 -
850 05-27 SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge SOSBENCH: Benchmarking der Sicherheitsausrichtung auf wissenschaftliche Erkenntnisse SOSBENCH:以科学知识为安全协调基准 2505.21605v1 -
851 05-27 Guide your favorite protein sequence generative model Führen Sie Ihre Lieblings-Protein-Sequenz generative Modell 指导您最喜爱的蛋白质序列基因模型 2505.04823v2 -
852 05-27 When Are Concepts Erased From Diffusion Models? Wann werden Konzepte von Diffusionsmodellen ausgelöscht? 概念何时从传播模型中消失? 2505.17013v3 -
853 05-27 On the Robustness of Adversarial Training Against Uncertainty Attacks Über die Robustheit des zweifelhaften Trainings gegen Ungewissheitsangriffe 关于防止不确定袭击的反逆训练的有力性 2410.21952v2 -
854 05-27 Causal Posterior Estimation Kausale hintere Schätzung Causal Posides 估计值 2505.21468v1 -
855 05-27 GeLLMO: Generalizing Large Language Models for Multi-property Molecule Optimization GeLLMO: Verallgemeinern von großen Sprachmodellen für Multi-Property-Molekül-Optimierung GELLMO:通用多财产分子优化大语言模型 2502.13398v2 -
856 05-27 High-Dimensional Calibration from Swap Regret Hochdimensionale Kalibrierung aus Swap-Regret 从 Swap Regret 进行高维校准 2505.21460v1 -
857 05-27 Designing Cyclic Peptides via Harmonic SDE with Atom-Bond Modeling Konzipieren von Cyclic Peptides über Harmonische SDE mit Atom-Bond-Modellierung 通过使用原子-体型建模的波力SDE, 设计圆性五氯苯并配有原子-体型建模 2505.21452v1 -
858 05-27 Training neural control variates using correlated configurations Ausbildung von Neuralsteuerungsvariaten mit korrelierten Konfigurationen 使用相关配置的培训神经控制变异 2505.07719v2 -
859 05-27 When Two LLMs Debate, Both Think They’ll Win Wenn zwei LLMs diskutieren, denken beide, dass sie gewinnen werden 当两个LLM 辩论, 双方都认为他们会赢 2505.19184v2 -
860 05-27 Leveraging XP and CRISP-DM for Agile Data Science Projects Nutzung von XP und CRISP-DM für agile Data Science Projekte 利用XP和CRISP-DM为敏感数据科学项目发挥杠杆作用 2505.21603v1 -
861 05-27 Can Large Reasoning Models Self-Train? Können sich große vernünftigen Modelle selbst entwickeln? 大理由模型能够自我培训吗? 2505.21444v1 -
862 05-27 Autoencoding Random Forests Zufällige Wälder automatisch kodieren 自动编码随机森林 2505.21441v1 -
863 05-27 ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition ANCHOLIK-NER: Ein Benchmark-Datensatz für Bangla Regional Named Entity Recognition ANCHOLIK-NER:孟加拉地区命名实体识别基准数据集 2502.11198v3 -
864 05-27 Measuring Fine-Grained Relatedness in Multitask Learning via Data Attribution Messung der feinkörnigen Verbundenheit im Multitasking-Lernen über Datenzuweisung 通过数据归责衡量多任务学习中的细微关联 2505.21438v1 -
865 05-27 Distributional Scaling for Emergent Capabilities Verteilungsskalierung für Emergent Capabilities 新兴市场能力分配比例 2502.17356v3 -
866 05-27 Attribute-Efficient PAC Learning of Sparse Halfspaces with Constant Malicious Noise Rate Effizientes PAC-Lernen von Sparse-Halbräumen mit konstanter bösartiger Lärmrate 以常态恶意噪音率学习粗微半空空间的属性- 有效 PAC 学习 2505.21430v1 -
867 05-27 QuForge: A Library for Qudits Simulation QuForge: Eine Bibliothek für Qudits Simulation Quforge: Quits 模拟图书馆 2409.17716v2 -
868 05-27 Stochastic Online Conformal Prediction with Semi-Bandit Feedback Stochastische Online-Konforme Vorhersage mit Halbbandit Feedback 具有半银行反馈的在线非正式预测 2405.13268v3 -
869 05-27 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing R2R: Effizientes Navigieren unterschiedlicher Vernunftpfade mit klein-großen Model Token Routing R2R: 以小型模型调速器有效导航差异性理性路径 2505.21600v1 -
870 05-27 Policy Induction: Predicting Startup Success via Explainable Memory-Augmented In-Context Learning Politische Induktion: Vorhersage des Startup-Erfolgs durch erklärbares Memory-Augmented In-Context Learning 政策介绍:通过可解释的记忆增强的内文学习预测启动成功 2505.21427v1 -
871 05-27 Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks Individuelles Verhalten in agentenbasierten Modellen mit Graph Diffusionsnetzwerken lernen 具有图表传播网络的基于代理模型的学习个人行为 2505.21426v1 -
872 05-27 GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning GenPO: Generative Diffusionsmodelle treffen auf On-Policy-Verstärkungs-Lernen GENPO: 符合政策强化学习的生成传播模式 2505.18763v2 -
873 05-27 A Lightweight Method to Disrupt Memorized Sequences in LLM Eine leichte Methode zum Disruptieren von gemerkten Sequenzen in LLM LLM 中破坏记忆序列的轻量方法 2502.05159v2 -
874 05-27 Can Large Language Models Understand Symbolic Graphics Programs? Können große Sprachmodelle symbolische Grafikprogramme verstehen? 大语言模型能理解符号图形程序吗? 2408.08313v4 -
875 05-27 Optimizing Deep Learning for Skin Cancer Classification: A Computationally Efficient CNN with Minimal Accuracy Trade-Off Deep Learning für Hautkrebs-Klassifikation optimieren: Ein Computational Efficient CNN mit minimaler Genauigkeit Trade-Off 最优化皮肤癌症分类深层学习:计算效率高的有线电视新闻网与最低准确性交易 2505.21597v1 -
876 05-27 Learning optimal treatment strategies for intraoperative hypotension using deep reinforcement learning Optimale Therapiestrategien für intraoperative Hypotonie mit Deep-Enforcement-Lernen 利用深强化学习学习,学习采用最佳治疗战略,以弥补职业内衰退 2505.21596v1 -
877 05-27 Relevance-driven Input Dropout: an Explanation-guided Regularization Technique Relevanz-gesteuerter Input Dropout: eine Erklärungs-geführte Regularisierungstechnik 由相关性驱动的 “ 投入辍学:解释指导规范化技术 “ 2505.21595v1 -
878 05-27 Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges Benchmarking Spatiotemporal Reasoning in LLMs und Reasoning Models: Fähigkeiten und Herausforderungen 确定LLM和理由模型的偏差理由基准:能力和挑战 2505.11618v2 -
879 05-27 Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization Widersprüchliche Biasen am Rande der Stabilität: Norm versus Schärfe Regularisierung 稳定边缘的冲突两重冲突:规范与尖锐的规范化 2505.21423v1 -
880 05-27 When Shift Happens - Confounding Is to Blame Wenn es zu einer Verschiebung kommt - Verwirren ist die Schuld 发生变迁时 - 令人不安的是责怪 2505.21422v1 -
881 05-27 A Physics-Augmented GraphGPS Framework for the Reconstruction of 3D Riemann Problems from Sparse Data Ein physikgestütztes GraphGPS-Framework für den Wiederaufbau von 3D Riemann-Problemen aus Sparse-Daten 物理辅助图形GPS框架,用于从简简数据中重建3D里伊曼问题 2505.21421v1 -
882 05-27 From Continual Learning to SGD and Back: Better Rates for Continual Linear Models Vom kontinuierlichen Lernen bis hin zu SGD und Back: Bessere Preise für kontinuierliche lineare Modelle 从持续学习到SGD和后退:持续线性模型的更好比率 2504.04579v2 -
883 05-27 Efficiently Scaling LLM Reasoning with Certaindex Effiziente Skalierung der LLM-Vernunft mit bestimmtem Dex 高效扩增 LLM 使用 emitedex 说明 2412.20993v2 -
884 05-27 A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment Ein Rahmen für die strittige Analyse von Entscheidungsunterstützungssystemen vor der Einführung 在部署之前对决定支助系统进行反对分析的框架 2505.21414v1 -
885 05-27 Comparison of the Cox proportional hazards model and Random Survival Forest algorithm for predicting patient-specific survival probabilities in clinical trial data Vergleich des Cox-Proportional-Hazards-Modells und des Random Survival Forest-Algorithmus zur Vorhersage patientenspezifischer Überlebenswahrscheinlichkeiten in klinischen Studiendaten 比较Cox按比例比例危害模型和随机生存森林算法,以预测临床试验数据中特定患者生存概率 2502.03119v2 -
886 05-27 MRSD: Multi-Resolution Skill Discovery for HRL Agents MRSD: Multi-Resolution Skill Discovery für HRL-Agenten MRSD: HRL代理机构多分辨率技能发现 2505.21410v1 -
887 05-27 Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks Dual Natural Gradient Descent für skalierbare Ausbildung von physikinformierten Neuronalen Netzwerken 物理内成形神经网络可缩放培训 2505.21404v1 -
888 05-27 A Convergence Theory for Diffusion Language Models: An Information-Theoretic Perspective Eine Konvergenztheorie für Diffusions-Sprachmodelle: Eine informationstheoretische Perspektive 传播语言模型集成理论:信息理论视角 2505.21400v1 -
889 05-27 Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling Factual Self-Awareness in Sprachmodellen: Repräsentation, Robustheit und Skalierung 语言模式中的事实自觉意识:代表性、强力和比例 2505.21399v1 -
890 05-27 Square$χ$PO: Differentially Private and Robust $χ^2$-Preference Optimization in Offline Direct Alignment Square$x$PO: Differential privat und robust $x^2$-Preference Optimierung in Offline Direct Alignment 平方美元=美元PO$:在离线直接调整中区别对待的私人和强势的美元=2美元-优惠优化 2505.21395v1 -
891 05-27 Foundation Models on a Budget: Approximating Blocks in Large Vision Models Basismodelle auf einem Budget: Annähernde Blöcke in großen Visionsmodellen 预算模式基础模式:大愿景模式中类似障碍 2410.04941v5 -
892 05-27 Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits Die Macht der Gespräche nutzen: Optimale Auswahl der Schlüsselbegriffe in konversatorischen Kontextbanditen 利用对话的力量:在对话背景强盗中最佳关键条件选择 2505.21393v1 -
893 05-27 Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features Finite-Probenanalyse von linearen zeitlichen Unterschieden Lernen mit willkürlichen Funktionen 具有任意地貌特征的线性时间上差异学习的简单抽样分析 2505.21391v1 -
894 05-27 DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models DeCAF: Dezentrale Konsens-und-Factorisierung für Low-Rank-Anpassung von Stiftungsmodellen DeCAF: 基金会模式的低成本改造的分散化共识和因素 2505.21382v1 -
895 05-27 Securing Federated Learning against Backdoor Threats with Foundation Model Integration Sichern von Federated Learning gegen Hintertürbedrohungen durch die Integration von Foundation-Modellen 安全联邦学习应对后门威胁,采用基金会模式一体化模式 2410.17573v3 -
896 05-27 Linear $Q$-Learning Does Not Diverge in $L^2$: Convergence Rates to a Bounded Set Lineares $Q$-Lernen unterscheidet sich nicht in $L^2$: Konvergenzraten zu einem begrenzten Satz 线性 $Q $ 美元 学习 的 学习 不 以 $L $2 美元 进行 : 汇合率 与 环形 集 的 汇合率 2501.19254v4 -
897 05-27 Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment Chain-of-Zoom: Extreme Super-Resolution über Scale Autoregression und Preference Alignment 缩放链 缩放链 : 通过 缩放自动递减和偏好对齐, 极超分辨率 2505.18600v2 -
898 05-27 Improving LLM-based Global Optimization with Search Space Partitioning Verbesserung der globalen Optimierung auf LLM-Basis mit Search Space Partitioning 改进以LLM为基础的全球最佳利用搜索空间分割法 2505.21372v1 -
899 05-27 PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment PLANETALIGN: Eine umfassende Python-Bibliothek für die Ausrichtung von Benchmarking-Netzwerken PlanETALIGN: 用于基准确定网络协调的综合性俾顿图书馆 2505.21366v1 -
900 05-27 Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders Auf dem Weg zur Verdolmetschbarkeit ohne Opfer: treue Dense-Layer-Zersetzung mit Mischung aus Decodern 实现无牺牲的解释性:忠实的高密度层分解与代谢物混合 2505.21364v1 -
901 05-27 CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models CRISP-NAM: Konkurrenzfähige Risiken interpretierbare Überlebensvorhersage mit neuralen Additivenmodellen CRIISP-NAM: 与神经添加模型相竞争的风险解释性生存预测 2505.21360v1 -
902 05-27 Learning with Selectively Labeled Data from Multiple Decision-makers Lernen mit selektiv beschrifteten Daten von mehreren Entscheidungsträgern 学习来自多个决策者的选择性标签数据 2306.07566v4 -
903 05-27 Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning Nutzung von großen Sprachmodellen für Bengalische Mathematik-Wort-Probleme bei der Lösung der Kette der Gedankenveranlagung 利用大语言模型解决孟加拉语数学字词与思维链理性的解决问题 2505.21354v1 -
904 05-27 Diffusion Predictive Control with Constraints Diffusion Predictive Control mit Einschränkungen 受限制的预测控制 2412.09342v2 -
905 05-27 An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction Eine unsichere ED-LSTM für probabilistische Suffix-Vorhersage 用于概率后置物后置物预测的不确定性( ED-LSTM) 的不确定性警告 ED-LSTM 2505.21339v1 -
906 05-27 Controlling Participation in Federated Learning with Feedback Mit Feedback die Teilnahme am Föderierten Lernen kontrollieren 控制参加有反馈的联邦学习 2411.19242v2 -
907 05-27 PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning PeerGuard: Verteidigen von Multi-Agenten-Systemen gegen Hintertürangriffe durch gegenseitige Vernunft 同伴保护:捍卫多机构系统,防止通过相互理由进行后门攻击 2505.11642v2 -
908 05-27 Adaptive Sample Sharing for Multi Agent Linear Bandits Adaptive Probenfreigabe für Multi Agent Linear Bandits 多剂线性强盗的适应性样本共享 2309.08710v3 -
909 05-27 Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness Sign-Operator für den Umgang mit schwerfälligen Geräuschen in Nicht-Konvex-Optimierung: Hohe Wahrscheinlichkeitsgrenzen unter $(L_0, L_1)$-Smoothness 在非Convex优化情况下处理重故障噪音的签名操作员: 高概率弹道低于$(L_0, L_1), 低于$(L_1) 2502.07923v2 -
910 05-27 Joint Learning in the Gaussian Single Index Model Gemeinsames Lernen im Gaussischen Einzelindexmodell Gaussian单一指数模式联合学习 2505.21336v1 -
911 05-27 DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents DHP: Diskrete Hierarchische Planung für Hierarchische Verstärkungs-Learning Agents DHP: 等级加强学习代理的分级分级规划 2502.01956v2 -
912 05-27 Structure from Collision Struktur aus Kollision 来自碰撞的结构 2505.21335v1 -
913 05-27 Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach Robustheit und Genauigkeit in der Mischung von Experten optimieren: Ein Dual-Model-Ansatz 优化专家混合中的力量和准确性:双模式办法 2502.06832v3 -
914 05-27 Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices Eingewickelt Gaussian auf der Mannigfaltigkeit der Symmetrischen Positiven Definiten Matrizen 以正负负负负下方矩阵的方块包装高森 2502.01512v3 -
915 05-27 Scheduling with Uncertain Holding Costs and its Application to Content Moderation Planung mit unsicheren Holdingkosten und deren Anwendung auf Content Moderation 与不确定的控股成本及其对内容调节应用的时间安排 2505.21331v1 -
916 05-27 UGCE: User-Guided Incremental Counterfactual Exploration UGCE: User-Guided Incremental Counterfactual Exploration UGCE: 用户指导的递增反事实探索 2505.21330v1 -
917 05-27 Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization Bencher: Einfaches und reproduzierbares Benchmarking für Black-Box-Optimierung 座谈人: 简化和可复制的黑箱优化基准 2505.21321v1 -
918 05-27 A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features Ein Cross Modal Knowledge Destillation & Data Augmentation Rezept zur Verbesserung von Transkriptionsdarstellungen durch morphologische Merkmale 一种交叉模式知识蒸馏和数据增强休息室,以通过生理特征改进转基因医学的表现形式 2505.21317v1 -
919 05-27 It’s complicated. The relationship of algorithmic fairness and non-discrimination regulations for high-risk systems in the EU AI Act Es ist kompliziert. Das Verhältnis algorithmischer Fairness- und Nichtdiskriminierungsvorschriften für Hochrisikosysteme im EU-AI-Gesetz 这很复杂,在欧盟的AI法案中, 高风险系统的算法公正和不歧视规定之间的关系。 2501.12962v3 -
920 05-27 Item Cluster-aware Prompt Learning for Session-based Recommendation Artikel Cluster-aware Prompt Learning für sitzungsbasierte Empfehlung 项目 集群意识快速学习促进基于会议的建议 2410.04756v2 -
921 05-27 Overcoming Spurious Solutions in Semi-Dual Neural Optimal Transport: A Smoothing Approach for Learning the Optimal Transport Plan Überwinden von sauberen Lösungen im halbdualen Neural Optimalen Verkehr: Ein glättender Ansatz für das Lernen des optimalen Verkehrsplans 克服半双轨神经优化运输中的纯净解决方案:学习最佳运输计划的平滑方法 2502.04583v2 -
922 05-27 Interlocking-free Selective Rationalization Through Genetic-based Learning Interlocking-free Selektive Rationalisierung durch gentechnisch-basiertes Lernen 通过基于遗传的学习实现互连、无互闭和无互换的选择性合理化 2412.10312v2 -
923 05-27 Optimizing fMRI Data Acquisition for Decoding Natural Speech with Limited Participants Optimierung der fMRI-Datenerfassung für die Dekodierung von Natural Speech mit begrenzten Teilnehmern 优化FMRI数据获取,以便与有限参加者进行自然演讲 2505.21304v1 -
924 05-27 Large Language Models Miss the Multi-Agent Mark Große Sprachmodelle vermissen das Multi-Agent Mark 大语言模型 2505.21298v1 -
925 05-27 Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation Auf dem Weg zur Anpassung von Open Source großen Sprachmodellen für die Erstellung klinischer Notizen auf Expertenebene 努力调整用于专家级临床笔记制作的开放源大语言模型 2405.00715v6 -
926 05-27 LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning LoFT: Low-Rank-Anpassung, die sich wie Full-Fine-Tuning verhält LOFT: 行为如完全精美调整的低朗适应 2505.21289v1 -
927 05-27 GSAT: Graph Structure Attention Networks GSAT: Grafische Struktur GSAT: 图表结构关注网络 2505.21288v1 -
928 05-27 Learnable Kernel Density Estimation for Graphs Erlernbare Kerneldichteschätzung für Graphen 可学习的内核密度 2505.21285v1 -
929 05-27 Optimal Pricing for Data-Augmented AutoML Marketplaces Optimale Preise für datengesteigerte AutoML-Märkte 数据增强自动自动ML 市场最佳定价 2310.17843v2 -
930 05-27 Accelerated Parallel Tempering via Neural Transports Beschleunigung des parallelen Temperierens über neurale Transporte 通过神经运输加速平行探险 2502.10328v2 -
931 05-27 Dual-Directed Algorithm Design for Efficient Pure Exploration Dual-Directed-Algorithm-Design für effizientes Pure-Exploring 高效纯勘探的双重稀释算法设计 2310.19319v3 -
932 05-27 Taylor expansion-based Kolmogorov-Arnold network for blind image quality assessment Taylor-expansionsbasiertes Kolmogorov-Arnold-Netzwerk für blinde Bildqualitätsbewertung 以泰勒为扩展基地的Kolmogorov-Arnold盲人图像质量评估网络 2505.21592v1 -
933 05-27 Minimizing False-Positive Attributions in Explanations of Non-Linear Models Minimierung falsch-positiver Attribute in Erklärungen nicht-linearer Modelle 尽量减少解释非碱模型中的虚假动机归属 2505.11210v2 -
934 05-27 ResKoopNet: Learning Koopman Representations for Complex Dynamics with Spectral Residuals ResKoopNet: Koopman-Repräsentanzen für komplexe Dynamiken mit Spektralresidualen lernen ResKoopNet:学习 Koopman 代表器, 用于使用光谱残余物的复杂动态 2501.00701v4 -
935 05-27 Mitigating Molecular Aggregation in Drug Discovery with Predictive Insights from Explainable AI Mildernde molekulare Aggregation in der Drogenentdeckung mit vorausschauenden Erkenntnissen von erklärbarer KI 利用可解释的人工智能的预测洞察力减轻药物发现中的分子聚合 2306.02206v2 -
936 05-27 BindEnergyCraft: Casting Protein Structure Predictors as Energy-Based Models for Binder Design BindEnergyCraft: Proteinstrukturvorhersagen als energiebasierte Modelle für Binder-Design Bind EnergyCraft: 将蛋白结构预测器作为Binder设计以能源为基础的模型 2505.21241v1 -
937 05-27 Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies Breaking the Performance Ceiling in komplexen Verstärkungs-Lernen erfordert Inferenz-Strategien 综合加强学习中业绩上限的打破需要推断战略 2505.21236v1 -
938 05-27 STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization STRAP: Spatio-Temporal Pattern Retrieval für Out-of-Distribution-Verallgemeinerung STRAP: 普遍分发的Spadio-Temporal 样板回收 2505.19547v2 -
939 05-27 FRIREN: Beyond Trajectories – A Spectral Lens on Time FRIREN: Jenseits von Trajektorien – Eine Spektrallinse auf Zeit 在轨迹之外 – – 时光透镜 2505.17370v2 -
940 05-27 Is Hyperbolic Space All You Need for Medical Anomaly Detection? Ist hyperbolischer Raum alles, was Sie für medizinische Anomalie-Erkennung benötigen? 超双曲空间 是否所有你需要的 医疗异常检测? 2505.21228v1 -
941 05-27 Why Do More Experts Fail? A Theoretical Analysis of Model Merging Warum scheitern weitere Experten? Eine theoretische Analyse der Modellzusammenführung 为何有更多的专家失败?对模式合并的理论分析 2505.21226v1 -
942 05-27 The dark side of the forces: assessing non-conservative force models for atomistic machine learning Die dunkle Seite der Kräfte: Bewertung nicht konservativer Kraftmodelle für atomistisches maschinelles Lernen 部队的黑暗面:评估非保守力量模型,以进行原子学机器学习 2412.11569v3 -
943 05-27 Wavelet Flow For Extragalactic Foreground Simulations Wavelet Flow für extragalaktische Foreground Simulationen 用于外星际前景模拟的波浪流 2505.21220v1 -
944 05-27 Addressing Data Quality Decompensation in Federated Learning via Dynamic Client Selection Adressierung von Datenqualitätsentkompensation im Federated Learning über Dynamic Client Selection 通过动态客户选择解决联邦学习中的数据质量补偿问题 2505.21219v1 -
945 05-27 Transfer learning for multifidelity simulation-based inference in cosmology Transfer-Lernen für Multifidelity-Simulationsbasierte Schlussfolgerungen in der Kosmologie 在宇宙学中进行多种不贞行为模拟推论的转让性学习 2505.21215v1 -
946 05-27 Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning Auf dem Weg zur Enthüllung der Wirksamkeit von Klein-Scale-Fine-Tuning im R1-Stil Verstärktes Lernen 提高R1型强化学习中小规模微调的效力 2505.17988v2 -
947 05-27 Input Convex Kolmogorov Arnold Networks Input Convex Kolmogorov Arnold Networks 投入 Convex Kolmogorov Arnold 网络 2505.21208v1 -
948 05-27 Towards Identifiability of Interventional Stochastic Differential Equations Zur Identifizierbarkeit interventioneller stochastischer Differentialgleichungen 实现干预性斯托卡差异等同的可识别性 2505.15987v2 -
949 05-27 Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs Universal Reasoner: Ein einfacher, komponierbarer Plug-and-Play-Reasoner für gefrorene LLMs 通用理由:冻结长效LMs的单一、可合成插管和布局理由 2505.19075v2 -
950 05-27 Developing hybrid mechanistic and data-driven personalized prediction models for platelet dynamics Entwicklung hybrider mechanistischer und datengesteuerter personalisierter Vorhersagemodelle für Thrombozytendynamik 开发混合机械和数据驱动的小板板动力学混合机械和个人化预测模型 2505.21204v1 -
951 05-27 Implicit Dynamical Flow Fusion (IDFF) for Generative Modeling Implizite Dynamische Flussfusion (IDFF) für generative Modellierung 用于产生建模的隐含动态流动融合(IDFF) 2409.14599v4 -
952 05-27 Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection Kulturempfehlung mit maschinellem Lernen: Nutzung ökologischer und wirtschaftlicher Faktoren für eine optimale Ernteauswahl 采用机械学习的作物建议:利用环境和经济因素优化作物选择 2505.21201v1 -
953 05-27 Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning Pioniere 4-Bit FP-Quantisierung für Diffusionsmodelle: Mixup-Sign-Quantisierung und Timestep-Aware Feintuning 推出4-Bit FP 扩散模型量化:混合- Sign 量度和时间步骤- 软件精美调试 2505.21591v1 -
954 05-27 Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities Enthüllen von instruction-spezifischen Neuronen & Experten: Ein analytischer Rahmen für die instruction-following Fähigkeiten von LLM 具体未完成的指示性具体神经和专家:LLM教学-执行能力分析框架 2505.21191v1 -
955 05-27 Exploring the Latent Capacity of LLMs for One-Step Text Generation Erforschung der Latent-Kapazität von LLMs für die einstufige Textgenerierung 探索单步制文本生成LLMs的原始能力 2505.21189v1 -
956 05-27 Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees Gleichwertiges Repräsentationslernen für Symmetrie-Bewusstschluss mit Garantien 关于有担保的对称-软件推断的等同代表制学习 2505.19809v2 -
957 05-27 PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing GiftSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing 毒物群:通过示范众包普及有害信息合成 2505.21184v1 -
958 05-27 Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations Lernen, was zu tun ist und was nicht: Offline-Imitation von Experten und unerwünschten Demonstrationen 学会做什么做什么和不做什么:专家的脱线模仿和不受欢迎的示威 2505.21182v1 -
959 05-27 Latent label distribution grid representation for modeling uncertainty Latent Label Distribution Grid Darstellung für Modellierung Unsicherheit 用于模拟不确定性模型的延迟标签分配网格代表 2505.21180v1 -
960 05-27 Improved Online Confidence Bounds for Multinomial Logistic Bandits Verbesserte Online-Konfidenzgrenzen für multinomiale Logistische Banditen 提高多军后勤大盗的在线信任度 2502.10020v4 -
961 05-27 Topological Deep Learning for Speech Data Topologisches Deep Learning für Sprachdaten 为语音数据进行地形深层学习 2505.21173v1 -
962 05-27 Parameter Efficient Continual Learning with Dynamic Low-Rank Adaptation Parameter Effizientes kontinuierliches Lernen mit dynamischer Low-Rank-Anpassung 具有动态低Rank适应性的持续学习 2505.11998v2 -
963 05-27 STEB: In Search of the Best Evaluation Approach for Synthetic Time Series STEB: Auf der Suche nach dem besten Bewertungsansatz für die Synthetische Zeitreihe STEB:寻求合成时间系列的最佳评价方法 2505.21160v1 -
964 05-27 Model as Loss: A Self-Consistent Training Paradigm Modell als Verlust: Ein selbstkonsistentes Trainingsparadigma 损失模型:自我协调培训模型 2505.21156v1 -
965 05-27 FlexiReg: Flexible Urban Region Representation Learning FlexiReg: Flexibles Stadtraum-Repräsentanz-Lernen 灵活的城市地区代表性学习:灵活的城市地区代表性学习 2503.09128v2 -
966 05-27 Predicate Invention for Bilevel Planning Prädikat Erfindung für Bilevel-Planung 双级规划预发明 2203.09634v3 -
967 05-27 Semi-Supervised Conformal Prediction With Unlabeled Nonconformity Score Halbüberwachte konforme Vorhersage mit nicht markiertem Nonkonformity Score 带有未贴标签的不合规分数的半超半常规预测 2505.21147v1 -
968 05-27 A Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven Deformable Linear Object Manipulation Eine distributive Behandlung von Real2Sim2Real für die Anpassung an Objekt-Zentrische Agenten in visionsgetriebener, deformierbarer linearer Objektmanipulation 在视觉-驱动式可变线性物体操纵中用于物体中心剂适应的Real2Sim2Real的分布式处理法 2502.18615v2 -
969 05-27 Hallucinations are inevitable but can be made statistically negligible. The “innate” inevitability of hallucinations cannot explain practical LLM issues Halluzinationen sind unvermeidlich, können aber statistisch vernachlässigbar gemacht werden. Die “angeborene” Unvermeidbarkeit von Halluzinationen kann praktische LLM-Probleme nicht erklären 幻觉的“内在”不可避免性无法解释实际的LLM问题。 2502.12187v2 -
970 05-27 A Predicting Phishing Websites Using Support Vector Machine and MultiClass Classification Based on Association Rule Techniques Eine Vorhersage Phishing-Websites mit Unterstützung Vektor-Maschine und Multi-Klasse Klassifizierung basierend auf Assoziation Regel Techniken 基于协会规则技术的利用辅助病媒机和多类分类的预测钓鱼网站 2505.21141v1 -
971 05-27 HeteroBA: A Structure-Manipulating Backdoor Attack on Heterogeneous Graphs HeteroBA: Ein strukturmanipulierender Backdoor-Angriff auf Heterogene Graphen 异型BA:结构调节式后门对异种图的后门攻击 2505.21140v1 -
972 05-27 Identifying Heart Attack Risk in Vulnerable Population: A Machine Learning Approach Identifikation von Herzinfarktrisiko in gefährdeter Bevölkerung: Ein Ansatz zum maschinellen Lernen 查明弱势人口中的心脏攻击风险:机械学习方法 2505.21139v1 -
973 05-27 Learning Single Index Models with Diffusion Priors Einzelindexmodelle mit Diffusion Priors lernen 具有传播前版本的学习单一指数模式 2505.21135v1 -
974 05-27 Robust and Computation-Aware Gaussian Processes Robuste und rechnergestützte Gaußsche Prozesse 强力和计算- 软件软件高斯进程 2505.21133v1 -
975 05-27 Backpropagation-free Spiking Neural Networks with the Forward-Forward Algorithm Rückpropagierungsfreie Spiking-Neural-Netzwerke mit dem vorwärts-vorwärts-Algorithmus 带有前向前向演算法的无后向反向反向光谱反向神经网络 2502.20411v2 -
976 05-27 MetaGS: A Meta-Learned Gaussian-Phong Model for Out-of-Distribution 3D Scene Relighting MetaGS: Ein meta-erlerntes Gaussian-Phong-Modell für 3D-Szenen-Erhellung im Out-of-Distribution-Bereich MetaGS: 3D号场景光化模型 2405.20791v2 -
977 05-27 Universal Value-Function Uncertainties Universelle Wert-Funktions-Unsicherheiten 通用价值-功能不确定性 2505.21119v1 -
978 05-27 A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction Ein leichtes Multi-Expert Generatives Sprachmodellsystem für Engineering Information and Knowledge Extraction 工程信息和知识采掘轻量多专家生成语言示范系统 2505.21109v1 -
979 05-27 Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance Bedingte Diffusionsmodelle mit klassifikatorfreier Gibbs-ähnlicher Anleitung 有条件传播模式,附有无分类者免费吉布布斯类指南 2505.21101v1 -
980 05-27 Random Walk Diffusion for Efficient Large-Scale Graph Generation Random Walk Diffusion für effiziente großformatige Graphengeneration 高效大型图表生成的随机漫步扩散 2408.04461v2 -
981 05-27 Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI Sehen Sie, was ich sehe? Ein Ambiguous Optical Illusion Dataset, das Beschränkungen der erklärbaren KI aufdeckt 你看到我所看到的吗?一个模糊的光学幻影数据集暴露了可解释的人工智能的局限性。 2505.21589v1 -
982 05-27 Sequential Function-Space Variational Inference via Gaussian Mixture Approximation Sequentielle Funktions-Raum Variationelle Schlussfolgerung über Gaußsche Mischungsannäherung 通过高森混ixture近似加速发生序列函数-空间空间变动推断 2503.07114v2 -
983 05-27 Thinker: Learning to Think Fast and Slow Denker: Schnell und langsam denken lernen 思考者:学会快速和缓慢思考 2505.21097v1 -
984 05-27 Improved Impossible Tuning and Lipschitz-Adaptive Universal Online Learning with Gradient Variations Verbessertes Unmögliches Tuning und Lipschitz-Adaptives Universal Online-Lernen mit gradienten Variationen 改进不可能的图金和利普施维茨-适应性通用在线学习,有渐进变异 2505.21095v1 -
985 05-27 Recurrent Memory for Online Interdomain Gaussian Processes Recurrent Speicher für Online-Interdomain Gaussian Prozesse Gaussian 在线内部进程经常性内存 2502.08736v3 -
986 05-27 Out of the Shadows: Exploring a Latent Space for Neural Network Verification Out of the Shadows: Erforschen eines latenten Raumes für neurale Netzwerkverifizierung 暗影外:探索神经网络的原始空间核查 2505.17854v2 -
987 05-27 Efficient Large Language Model Inference with Neural Block Linearization Effiziente großsprachige Modellinferenz mit neuraler Blocklinearisierung 高效大语言模型与神经区块线性线性结合的推断 2505.21077v1 -
988 05-27 Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling Red-Teaming Text-to-Image-Systeme durch regelbasiertes Preference-Modelling 通过基于规则的首选模式建立红色团队式文本到图像系统 2505.21074v1 -
989 05-27 A domain adaptation neural network for digital twin-supported fault diagnosis Ein neuronales Netzwerk für die Domänenanpassung für die digitale Doppel-unterstützte Fehlerdiagnose 数字双支持缺陷诊断领域适应性神经神经网络 2505.21046v1 -
990 05-27 Scalable and adaptive prediction bands with kernel sum-of-squares Skalierbare und adaptive Vorhersagebänder mit Kernel-Summe von Quadraten 可缩放和适应性预测带带内核和平方总和的可缩放和适应性预测波段 2505.21039v1 -
991 05-27 Unraveling Indirect In-Context Learning Using Influence Functions Indirektes In-Context-Lernen mit Einflussfunktionen entschlüsseln 利用影响功能进行分散的间接间接内文学习 2501.01473v2 -
992 05-27 CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive Learning CellCLAT: Topologie und Trimming Redundanz im selbstüberwachten zellulären Kontrastiven Lernen erhalten CellCLAT: 在自我维持的细胞抵触学习中保留地形学和三角再利用 2505.21587v1 -
993 05-27 Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding Direktes Semi-Simplizielles Lernen mit Anwendungen zur Entschlüsselung der Gehirnaktivität 定向半简化学习,应用脑活动解码 2505.17939v2 -
994 05-27 LLaMEA-BO: A Large Language Model Evolutionary Algorithm for Automatically Generating Bayesian Optimization Algorithms LLaMEA-BO: Ein evolutionärer Algorithmus für die automatische Generierung Bayesischer Optimierungsalgorithmen LLAMEA-BO:用于自动生成贝耶斯优化优化生成的大型语言模型进化演化算法 2505.21034v1 -
995 05-27 Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models Optimierung des Case-Based-Reasoning-Systems für die Generierung funktionaler Testskripte mit großen Sprachmodellen 为具有大语言模型的功能测试脚本生成优化基于个案的理由说明系统 2503.20576v3 -
996 05-27 Generalizable and Robust Spectral Method for Multi-view Representation Learning Verallgemeinerbare und robuste Spektralmethode für Multi-View Representative Learning 多视角代表制学习通用和强力光谱方法 2411.02138v3 -
997 05-27 FeatInv: Spatially resolved mapping from feature space to input space using conditional diffusion models FeatInv: Räumlich aufgelöstes Mapping vom Feature Space zum Input Space mit bedingten Diffusionsmodellen FeatInv:使用有条件扩散模型从地物空间到输入空间的空间空间的空间分辨率绘图 2505.21032v1 -
998 05-27 TabAttackBench: A Benchmark for Adversarial Attacks on Tabular Data TabAttackBench: Ein Benchmark für feindliche Angriffe auf Tabellendaten TabAttack Bench: 表格数据对抗性攻击基准 2505.21027v1 -
999 05-27 PaSa: An LLM Agent for Comprehensive Academic Paper Search PaSa: Ein LLM-Agent für umfassende wissenschaftliche Papiersuche Pasa: 法学硕士全面学术论文搜索代理 2501.10120v2 -
1000 05-27 Multi-Mode Process Control Using Multi-Task Inverse Reinforcement Learning Multi-Mode-Prozesssteuerung mit Multi-Task Inverse Verstärkungslernen 利用多任务反向强化学习进行多模式程序控制 2505.21026v1 -
1001 05-27 Text-Queried Audio Source Separation via Hierarchical Modeling Textbefragte Audioquelle Trennung über Hierarchische Modellierung 通过等级制建模模式对文本查询的音频源分离 2505.21025v1 -
1002 05-27 Pause Tokens Strictly Increase the Expressivity of Constant-Depth Transformers Pause Tokens erhöhen streng die Expressivität der konstant-tiefen Transformer 严格提高常数面变换器的表达性 2505.21024v1 -
1003 05-27 NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation NeuralOM: Neurales Ozeanmodell für die Simulation von Subsaisonal-zu-Seasonal 神经力OM:次季节到季节模拟神经海洋模型 2505.21020v1 -
1004 05-27 Cardiac Digital Twins at Scale from MRI: Open Tools and Representative Models from ~55000 UK Biobank Participants Cardiac Digital Twins auf Scale von MRI: Offene Werkzeuge und repräsentative Modelle von ~55000 britischen Biobank-Teilnehmern 来自MRI的大规模心脏病数字双对:来自~55000英国生物库参与者的开放工具和代表模型 2505.21019v1 -
1005 05-27 Federated Instrumental Variable Analysis via Federated Generalized Method of Moments Federated Instrumental Variable Analysis via Federated Generalized Method of Moments 通过联邦通用时数方法进行的联邦仪器变量分析 2505.21012v1 -
1006 05-27 Unified Alignment Protocol: Making Sense of the Unlabeled Data in New Domains Unified Alignment Protocol: Sense der unmarkierten Daten in neuen Domains 统一对齐协议: 在新域域中感知无标签数据 2505.21010v1 -
1007 05-27 Transformers in Protein: A Survey Transformer in Protein: Eine Umfrage 蛋白质变换器:调查 2505.20098v2 -
1008 05-27 Fairness in Federated Learning: Fairness for Whom? Fairness im Federated Learning: Fairness für wen? 联邦学习中的公平性:谁的公平性? 2505.21584v1 -
1009 05-27 Efficient and Unbiased Sampling from Boltzmann Distributions via Variance-Tuned Diffusion Models Effiziente und unvoreingenommene Probenahme von Boltzmann Distributionen über Variance-Tuned Diffusion Modelle Boltzmann分销公司通过差异传播模型进行高效和无偏见的抽样 2505.21005v1 -
1010 05-27 BIPNN: Learning to Solve Binary Integer Programming via Hypergraph Neural Networks BIPNN: Lernen, Binäre Integer-Programmierung über Hypergraph Neuronale Netzwerke zu lösen BIPNN: 学习通过超光速神经网络解决二元整数编程 2505.20997v1 -
1011 05-27 Efficient Identity and Position Graph Embedding via Spectral-Based Random Feature Aggregation Effiziente Einbettung von Identitäts- und Positionsdiagrammen über spektralbasierte Random Feature Aggregation 通过光谱-基于随机地物聚合的高效身份和位置图嵌入 2505.20992v1 -
1012 05-27 Identifying Super Spreaders in Multilayer Networks Identifizieren von Superspreizern in Multilayer-Netzwerken 识别多层网络中的超级传播器 2505.20980v1 -
1013 05-27 Deep k-grouping: An Unsupervised Learning Framework for Combinatorial Optimization on Graphs and Hypergraphs Deep k-grouping: Ein unüberwachter Lernrahmen für die kombinatorische Optimierung von Graphen und Hypergraphen 深 k 组: 图形和高光谱组合优化的无人监督的学习框架 2505.20972v1 -
1014 05-27 Semantic Communication meets System 2 ML: How Abstraction, Compositionality and Emergent Languages Shape Intelligence Semantische Kommunikation trifft System 2 ML: Wie Abstraktion, Kompositionalität und Emergente Sprachen Formintelligenz 语义通信满足系统2 ML:如何抽象、组成和新兴语言形式情报 2505.20964v1 -
1015 05-27 Resampling Filter Design for Multirate Neural Audio Effect Processing Resampling Filter Design für Multirate Neural Audio Effect Processing 多立体神经音频效果处理的抽取过滤器设计 2501.18470v2 -
1016 05-27 Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization Effiziente und Mikrofon-Fehler-Tolerante 3D-Soundquelle Lokalisierung 高效的麦克风和麦克风-默认的 3D 声音源源本地化 2505.20961v1 -
1017 05-27 Personalized Clustering via Targeted Representation Learning Personalisiertes Clustering über gezieltes Repräsentationslernen 通过有针对性的代表学习进行个性化集群组合 2412.13690v3 -
1018 05-27 Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models Auswirkungen von Frequenzkomponenten auf Mitgliedschafts-Inferenzangriffe für Diffusionsmodelle enthüllen 频率组成部分对传播模型的传播成员推断攻击的不懈影响 2505.20955v1 -
1019 05-27 More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives Mehr ist nicht immer besser? Viel-Shot-In-Context-Lernen mit differenzierten und neugewichtigen Zielen verbessern 越多越好,越多越好?用差异化和再加权目标,加强多热化的内流学习 2501.04070v3 -
1020 05-27 Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the role of model complexity Doppelter Abstieg trifft auf Out-of-Distribution Detection: Theoretische Erkenntnisse und empirische Analyse zur Rolle der Modellkomplexität 双重人种与分配外探测:关于模型复杂性作用的理论洞察和经验分析 2411.02184v2 -
1021 05-27 Recovering Fairness Directly from Modularity: a New Way for Fair Community Partitioning Fairness direkt aus Modularität zu gewinnen: ein neuer Weg für faire Gemeinschaftspartitionierung 直接从模式中恢复公平:公平社区分割的新途径 2505.22684v1 -
1022 05-27 Scattering Networks on Noncommutative Finite Groups Streunetze für nichtkommutative Finite-Gruppen 关于非调解性有限集团的散射网络 2505.20950v1 -
1023 05-27 shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python shapr: Erklären von Machine Learning-Modellen mit bedingten Shapley-Werten in R und Python Shapr:解释R和Python中带有有条件阴影值的机器学习模型 2504.01842v2 -
1024 05-27 Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training Zwei Experten sind alles, was Sie zum Lenken Denken brauchen: Kognitive Bemühungen in MoE-Reasoning-Modellen ohne zusätzliches Training verstärken 两位专家是指导思考所需要的两个专家:在没有额外培训的情况下加强教育部理由说明模式中的认知努力 2505.14681v2 -
1025 05-27 Efficient Spectral Control of Partially Observed Linear Dynamical Systems Effiziente Spektralsteuerung teilweise beobachteter linearer dynamischer Systeme 局部观察线性动态系统的有效光谱控制 2505.20943v1 -
1026 05-27 Towards Training One-Step Diffusion Models Without Distillation Auf dem Weg zum Training von Ein-Schritt-Diffusionsmodellen ohne Destillation 培训不蒸馏的单级传播模型 2502.08005v3 -
1027 05-27 Revisiting Sparsity Constraint Under High-Rank Property in Partial Multi-Label Learning Überprüfung der Sparsamkeitsbeschränkungen unter Hochrangigem Eigentum im Teil-Multi-Label-Lernen 重新审视部分多标签学习中高等级属性下的平等限制 2505.20938v1 -
1028 05-27 EPIC: Efficient Position-Independent Caching for Serving Large Language Models EPIC: Effizientes positionsunabhängiges Caching für das Servieren großer Sprachmodelle EPIC: 高效的、独立定位的为大语言模式服务的工作 2410.15332v3 -
1029 05-27 Linear Bandits with Non-i.i.d. Noise Lineare Banditen mit Non-i.i.d. Lärm 带有非i.i.d. 噪音的线形强盗 2505.20017v2 -
1030 05-27 NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion NatADiff: Adversariale Grenzführung für natürliche Adversariale Diffusion NatadADiff: 自然反向扩散反向边界指南 2505.20934v1 -
1031 05-27 MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate Models MLMC-basierte Ressourcenadäquatitätsbewertung mit aktiven Learning-Trained-Surrogate-Modellen 以MLMC为基础的基于MLMC的资源充足性评估,与积极学习、经过培训的代用模型进行资源充足性评估 2505.20930v1 -
1032 05-27 Label Leakage in Federated Inertial-based Human Activity Recognition Label-Leakage in Föderated Inertial-based Human Activity Recognition 以联邦为本的人类活动确认中 联邦内地人类活动确认中的Label渗漏 2505.20924v1 -
1033 05-27 Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective Multi-Agenten-Weltmodellierung aus einer diffusionsinspirierten Perspektive Revue passieren 从传播启发的视角重新审视多股权世界建模 2505.20922v1 -
1034 05-27 Humble AI in the real-world: the case of algorithmic hiring Humble KI in der realen Welt: der Fall der algorithmischen Einstellung 现实世界中的黄土人工智能:算法雇用案例 2505.20918v1 -
1035 05-27 A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models Eine zerkleinerte Stein-Diskrepanz für die Beurteilung der Passform von inhomogenen Zufallsgraphenmodellen 用于评估不相容随机图模型是否适合的内核化石 Stein 差异性评估 2505.21580v1 -
1036 05-27 Exploring the Boundary of Diffusion-based Methods for Solving Constrained Optimization Erforschung der Grenzen von diffusionsbasierten Methoden zur Lösung eingeschränkter Optimierung 探索以传播为基础的解决受限制的优化的解决方法的界限 2502.10330v3 -
1037 05-27 A data augmentation strategy for deep neural networks with application to epidemic modelling Eine Datenvergrößerungsstrategie für tiefe neuronale Netzwerke mit Anwendung in der Epidemiemodellierung 用于流行病建模的深层神经网络数据增强战略 2502.21033v2 -
1038 05-27 “Oh LLM, I’m Asking Thee, Please Give Me a Decision Tree”: Zero-Shot Decision Tree Induction and Embedding with Large Language Models “Oh LLM, ich frage dich, bitte gib mir einen Entscheidungsbaum”: Nullschnelle Entscheidungsbauminduktion und Einbettung mit großen Sprachmodellen “哦,LLM,我问你,请给我一棵决定树”: “零热决定树上演和嵌入大语言模型” 2409.18594v2 -
1039 05-27 Music Foundation Model as Generic Booster for Music Downstream Tasks Music Foundation Modell als Generic Booster für Downstream-Aufgaben 音乐基金会模式,作为音乐下流任务通用推进器 2411.01135v3 -
1040 05-27 Simple Relative Deviation Bounds for Covariance and Gram Matrices Einfache relative Abweichungen für Kovarianz und Gram Matrices 常数和小数母体的简单相对偏差宽度 2410.05754v3 -
1041 05-27 Enhancing Performance of Explainable AI Models with Constrained Concept Refinement Leistungssteigerung erklärbarer KI-Modelle mit eingeschränkter Konzeptverfeinerung 增强可解释的AI 概念改进模型的绩效 2502.06775v2 -
1042 05-27 Achieving binary weight and activation for LLMs using Post-Training Quantization Erreichen des binären Gewichts und Aktivierung für LLMs mit Post-Training Quantization 利用培训后量化办法使LLMMs实现二进制加权和激活 2504.05352v2 -
1043 05-27 Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers Frequency-Aware Maskierte Autoencoder für die Erkennung menschlicher Aktivität mit Beschleunigungsmessern 使用加速计识别人类活动的频率软件 2502.17477v2 -
1044 05-27 How Do Transformers Learn Variable Binding in Symbolic Programs? Wie lernen Transformer variable Bindungen in Symbolischen Programmen? 变换者如何在符号程序中学习变数绑定 ? 2505.20896v1 -
1045 05-27 DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition DeepConvContext: Ein mehrstufiger Ansatz zur Zeitreihenklassifizierung in der Anerkennung menschlicher Aktivität 深刻信念:人类活动确认中的时间序列分类的多比额表办法 2505.20894v1 -
1046 05-27 One-Time Soft Alignment Enables Resilient Learning without Weight Transport One-Time Soft Alignment ermöglicht resilientes Lernen ohne Gewicht Transport 一次性软对齐使有弹性的学习无需体力运输 2505.20892v1 -
1047 05-27 ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention ComplexEhemaliger: Disruptived Advance Transformer Inferenz-Fähigkeit über Head-Specific Complex Vector Achtung 复杂形式:通过头部特定复杂矢量的注意,干扰推进变压器推断能力 2505.10222v2 -
1048 05-27 Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality Macht-Rechts-Dekay-Verlust für große Sprachmodell Finetuning: Fokussierung auf Informationssparsität zur Verbesserung der Generationsqualität 大语言模型调整的功率法减退损失:侧重于信息平等以提高世代质量 2505.16900v3 -
1049 05-27 Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective Auf dem Weg zur Analyse und dem Verständnis der Grenzen von VAPO: Eine theoretische Perspektive 分析和理解VAPO的局限性:理论视角 2505.17997v2 -
1050 05-27 Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning Fedivertex: ein Graph Dataset auf Basis dezentralisierter sozialer Netzwerke für vertrauenswürdiges maschinelles Lernen Fedivertex:基于分散社会网络的图表数据集,用于可信赖的机器学习 2505.20882v1 -
1051 05-27 Generalizable Heuristic Generation Through Large Language Models with Meta-Optimization Generalisierbare Heuristische Generation durch große Sprachmodelle mit Meta-Optimierung 通过配有元-优化的大型语言模型实现可普遍实现的超营养代 2505.20881v1 -
1052 05-27 Conditional Distribution Compression via the Kernel Conditional Mean Embedding Conditional Distribution Compression über den Kernel Conditional Mean Embedding 通过内核有条件平均嵌入式压缩有条件分发 2504.10139v2 -
1053 05-27 Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials – A minireview Machine Learning - Driven Materials Discovery: Locking Next-Generation Functional Materials – Eine Minireview 机器学习 – – 驱动材料发现:解锁下一轮启动功能材料 – – 小型审查 2503.18975v2 -
1054 05-27 In Context Learning with Vision Transformers: Case Study Im Kontext Lernen mit Vision Transformers: Fallstudie 与愿景变异者进行背景学习:案例研究 2505.20872v1 -
1055 05-27 RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs RL-SPH: Lernen, um durchführbare Lösungen für Integer-Lineare-Programme zu erreichen RL-SPH:学习为整数线性方案找到可行的解决办法 2411.19517v5 -
1056 05-27 Leveraging Diffusion Models for Parameterized Quantum Circuit Generation Nutzung von Diffusionsmodellen für die parameterisierte Quantum Circuit Generation 利用可计量量子电路生成的传播模型 2505.20863v1 -
1057 05-27 Model Agnostic Differentially Private Causal Inference Modell Agnostisch unterschiedliche private Kausalableitung 示范性Agnistic 区分法私人原因推断 2505.19589v2 -
1058 05-27 UOD: Unseen Object Detection in 3D Point Cloud UOD: Unsichtbare Objekterkennung in 3D-Punkt-Cloud UOD: 3D点云中未见物体探测 2401.03846v2 -
1059 05-27 Aggregation Buffer: Revisiting DropEdge with a New Parameter Block Aggregation Buffer: DropEdge mit einem neuen Parameterblock erneut aufrufen 聚合缓冲:用新参数块重新检查下坡面 2505.20840v1 -
1060 05-27 Tuning LLM Judge Design Decisions for 1/1000 of the Cost Tuning LLM Richter Design Entscheidungen für 1/1000 der Kosten 1 000美元费用1 000美元法官设计决定 2501.17178v4 -
1061 05-27 HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling HAD: Hybride Architektur Destillation übertrifft Lehrer in genomischer Sequenzmodellierung HAD:混合结构蒸馏(混合结构蒸馏) 2505.20836v1 -
1062 05-27 Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens Jenseits von Semantik: Die unvernünftige Wirksamkeit von vernünftigen Zwischenmarken 超越语义:无理性中肯的不合理效力 2505.13775v2 -
1063 05-27 Concentration Distribution Learning from Label Distributions Konzentrationsverteilung Lernen von Etikettenverteilungen 从标签分发中学习 2505.21576v1 -
1064 05-27 The Third Pillar of Causal Analysis? A Measurement Perspective on Causal Representations Die dritte Säule der Kausalanalyse? Eine Messperspektive auf Kausaldarstellungen Causal 分析的第三个支柱? Causal 代表比例的衡量观点 2505.17708v2 -
1065 05-27 HybridLinker: Topology-Guided Posterior Sampling for Enhanced Diversity and Validity in 3D Molecular Linker Generation HybridLinker: Topologie-geführte hintere Probenahme für verbesserte Diversität und Validität in der 3D-Molekularlinker-Generation GlubLinker: 3D 分子联系器生成中加强多样性和有效性的地形学-指导外表抽样 2502.17349v3 -
1066 05-27 Do We Need All the Synthetic Data? Towards Targeted Synthetic Image Augmentation via Diffusion Models Brauchen wir alle synthetischen Daten? Auf dem Weg zu einer gezielten Synthetischen Bildvergrößerung über Diffusionsmodelle 我们需要所有合成数据吗?通过扩散模型实现有针对性的合成图像增强 2505.21574v1 -
1067 05-27 Spectral-inspired Neural Operator for Data-efficient PDE Simulation in Physics-agnostic Regimes Spektral-inspirierter Neuraloperator für dateneffiziente PDE-Simulation in physik-agnostischen Regimes 物理 – – 不可知系统数据高效PDE模拟光导神经操作器 2505.21573v1 -
1068 05-27 Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise Konvergenz von Clipped-SGD für Convex $(L_0,L_1)$-Smooth-Optimierung mit schwerfälligem Lärm 使用 Cllipped-SGD 组合(L_0,L_1) $- 与重故障噪音平滑优化 2505.20817v1 -
1069 05-27 Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting Mischung aus Low-Rank-Anpassung mit Teilparameter-Sharing für Zeitreihen-Prognose 低级别适应与时间序列预测部分参数共享混合 2505.17872v2 -
1070 05-27 Interpretable Credit Default Prediction with Ensemble Learning and SHAP Interpretierbare Credit Default Vorhersage mit Ensemble Learning und SHAP 组合学习和SHAP的可解释信用默认预测 2505.20815v1 -
1071 05-27 Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains Geometry Aware Operator Transformer als effizientes und präzises Neural Surrogate für PDEs auf willkürlichen Domains 操作者变异器作为任意域中PDEs的高效和准确神经外壳 2505.18781v2 -
1072 05-27 Thickness-aware E(3)-Equivariant 3D Mesh Neural Networks Dicke bewusst E(3)-Equivariante 3D-Mesh-Neurale Netze E(3)-等离 3D 3D 气象神经网络 2505.21572v1 -
1073 05-27 Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs Schrittweise adaptive Integration von überwachtem Feinabstimmungs- und Verstärkungslernen für aufgabenspezifische LLMs 监督特定任务专责性微调和强化学习的渐进式适应性整合 2505.13026v2 -
1074 05-27 Simple yet Effective Graph Distillation via Clustering Einfache und dennoch effektive Graphendestillation über Clustering 通过集群进行简单而有效的图形蒸馏 2505.20807v1 -
1075 05-27 FCOS: A Two-Stage Recoverable Model Pruning Framework for Automatic Modulation Recognition FCOS: Ein zweistufiges, wiederherstellbares Modell-Beschneidungs-Framework für die automatische Modulationserkennung FCOS: 自动调整识别的双层可回收模型保护框架 2505.21571v1 -
1076 05-27 Quantum Machine Learning in Healthcare: Evaluating QNN and QSVM Models Quantum Machine Learning in Healthcare: Bewertung von QNN- und QSVM-Modellen QNN和QSVM模型评估 QNN和QSVM模型 2505.20804v1 -
1077 05-27 Sentiment Reasoning for Healthcare Sentiment Reasoning für die Gesundheitsversorgung 保健的情感理由 2407.21054v4 -
1078 05-27 Leaner Transformers: More Heads, Less Depth Leaner Transformer: Mehr Köpfe, weniger Tiefe 皮质变形器: 更多的头, 更少深度 2505.20802v1 -
1079 05-27 Multi-VQC: A Novel QML Approach for Enhancing Healthcare Classification Multi-VQC: Ein neuartiger QML-Ansatz zur Verbesserung der Gesundheitsklassifikation 多VQC:加强保健分类的新QML方法 2505.20797v1 -
1080 05-27 A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models Eine Graphenperspektive zur Untersuchung struktureller Wissensmuster in großen Sprachmodellen 《大语言模式知识结构模式研究图示展望》 2505.19286v2 -
1081 05-27 Amortized Bayesian Workflow Amortisierter Bayesischer Workflow 摊还的贝耶斯人工作流量 2409.04332v2 -
1082 05-27 Where You Place the Norm Matters: From Prejudiced to Neutral Initializations Wo Sie die Norm-Materien platzieren: Von voreingenommenen zu neutralen Initialisierungen 将规范问题放在哪里: 从偏见到中立初始化 2505.11312v3 -
1083 05-27 Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset Verbesserung der tragbaren Wasserhahn-Audioerkennung durch Unterklasse-Annotation im HD-Epic-Datensatz 通过在HD-Epic数据集中分级注解,加强穿戴式塔普水音频探测 2505.20788v1 -
1084 05-27 LIB-KD: Learning Inductive Bias, Not Just Parameters A New Perspective on Knowledge Distillations LIB-KD: Induktive Bias lernen, nicht nur Parameter Eine neue Perspektive auf Wissensdestillationen LIB-KD:学习感性偏见,而不仅仅是知识蒸馏的新视角参数 2310.00369v3 -
1085 05-27 Low-Rank Adapting Models for Sparse Autoencoders Low-Rank Anpassungsmodelle für Sparse Autoencoder 普通自动解析器低 Rank 适应模型 2501.19406v2 -
1086 05-27 STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation STITCH-OPE: Trajektorienstiche mit geführter Diffusion für Off-Policy-Bewertung STSTTCH-OPE: 非政策评价的引导传播的轨迹 2505.20781v1 -
1087 05-27 SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences SpecExtend: Ein Drop-in-Enhancement für spekulative Decoding von langen Sequenzen 外观:对长期序列的投机性代谢的减少增强 2505.20776v1 -
1088 05-27 T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning T-REX: Mixture-of-Rank-One-Experts mit semantischer Intuition für Multi-Task Large Language Model Finetuning T-REX:多任务大语言模型微调中具有语义认知度的多任务大语言模型微调混合型兰克单方专家 2404.08985v2 -
1089 05-27 Non-invasive maturity assessment of iPSC-CMs based on optical maturity characteristics using interpretable AI Nicht-invasive Bewertung der Laufzeit von iPSC-CMs auf der Grundlage optischer Reifemerkmale unter Verwendung interpretierbarer KI 使用可解释的AI根据光学成熟度特性对iPSC-CMMs进行非侵入性成熟度评估 2505.20775v1 -
1090 05-27 TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state TimePro: Effiziente Multivariate Langzeit-Zeitreihen-Prognose mit variabler und zeitversetzter Hyperstate 具有可变和时间warware超状态预测的高效多变长期时间序列 2505.20774v1 -
1091 05-27 MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning MetaSlot: Durchbruch durch die feste Anzahl von Slots im Objekt-Zentrischen Lernen MetaSlot: 打破对象中心学习中的固定空格数 2505.20772v1 -
1092 05-27 ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools ChemHAS: Hierarchische Agenzien-Stacking zur Verbesserung von Chemiewerkzeugen ChemHAS:加强化学工具的等级代理人 2505.21569v1 -
1093 05-27 Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games Divide-Fuse-Conquer: Eliciting “Aha Momente” in Multi-Szenario-Spiele 分裂-裂变:在多种场景运动会中激发“哈动力” 2505.16401v2 -
1094 05-27 Robust and Explainable Detector of Time Series Anomaly via Augmenting Multiclass Pseudo-Anomalies Robuster und erklärbarer Detektor der Zeitreihenanomalie durch Augmenting-Multiclass-Pseudoanomalien 通过增强多级优度反射器反射反射器,对时间序列时间序列进行强力和可解释的探测器 2505.20765v1 -
1095 05-27 ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval ConText-CIR: Von Konzepten lernen im Text für das komponierte Bild-Retrieval ConText-CIR:从合成图像检索文本中的概念学习 2505.20764v1 -
1096 05-27 Learning to Explain Air Traffic Situation Erklären der Lage im Luftverkehr 学习解释空中交通状况 2502.10764v2 -
1097 05-27 Practical estimation of the optimal classification error with soft labels and calibration Praktische Schätzung des optimalen Klassifizierungsfehlers mit Softlabels und Kalibrierung 用软标签和校准校准对最佳分类错误的实际估计 2505.20761v1 -
1098 05-27 Multi-Stage Speaker Diarization for Noisy Classrooms Mehrstufige Speaker-Diarisierung für Lärmklassenräume 多级发言人 多级发言人 吵闹教室的响声 2505.10879v2 -
1099 05-27 Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model Paarweise Optimale Transporte für Training All-to-All Flow-Based Condition Transfer Modell 以对等方式最佳运输培训全到所有流动条件转让模式 2504.03188v2 -
1100 05-27 Scalable Model Merging with Progressive Layer-wise Distillation Skalierbares Modell Zusammenführen mit progressiver schichtweiser Destillation 可缩放模型与递进图层蒸馏法合并 2502.12706v2 -
1101 05-27 Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction Uni-Instruct: Einstufiges Diffusionsmodell durch Unified Diffusion Divergence Instruction Uni- Instruct: 通过统一扩散分散指令单步扩散模型 2505.20755v1 -
1102 05-27 Stationary MMD Points for Cubature Stationäre MMD-Punkte für Kubature Cubature 固定的 MMMD点 2505.20754v1 -
1103 05-27 EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models EaqVLA: Kodierungsorientierte Quantisierung für Vision-Language-Action-Modelle EaqVLA: 愿景-语言-行动模式的编码和一致的量化 2505.21567v1 -
1104 05-27 Map Space Belief Prediction for Manipulation-Enhanced Mapping Karte Raum Glaube Vorhersage für manipulations-verbesserte Mapping 人工-增强绘图的地图空间信仰预测 2502.20606v2 -
1105 05-27 MOLLM: Multi-Objective Large Language Model for Molecular Design – Optimizing with Experts MOLLM: Multi-Objective Large Language Model for Molecular Design – Optimierung mit Experten MOLLM: 分子设计多目标大语言模型 – – 与专家优化 2502.12845v2 -
1106 05-27 ‘Hello, World!’: Making GNNs Talk with LLMs “Hallo, Welt!”: GNNs mit LLMs sprechen zu lassen “你好,世界!” “让GNNs和LLMs说话” 2505.20742v1 -
1107 05-27 Can Small Language Models Learn, Unlearn, and Retain Noise Patterns? Können kleine Sprachmodelle Geräuschmuster lernen, nicht lernen und erhalten? 小语言模型能够学习、不学习和保留噪音模式吗? 2407.00996v3 -
1108 05-27 Detecting Informative Channels: ActionFormer Informative Kanäle erkennen: AktionEhemaliger 检测信息渠道:行动前 2505.20739v1 -
1109 05-27 Adversarial bandit optimization for approximately linear functions Adversariale Bandit-Optimierung für etwa lineare Funktionen 大约直线功能的对面土匪优化 2505.20734v1 -
1110 05-27 SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution SPA-RL: Verstärkung der LLM-Agenten durch schrittweise Fortschrittszuweisung SPA-RL:通过逐步推进加强LLM代理 2505.20732v1 -
1111 05-27 Semi-supervised Clustering Through Representation Learning of Large-scale EHR Data Halbüberwachtes Clustering durch Repräsentationslernen von EHR-Großdaten 通过代表学习大规模电子人力资源数据,进行半监督的集群组合 2505.20731v1 -
1112 05-27 What LLMs Miss in Recommendations: Bridging the Gap with Retrieval-Augmented Collaborative Signals Was LLMs in Empfehlungen vermissen: Die Lücke mit retrieval-Augmented Collaborative Signals überbrücken 在建议中错过了什么的LLM女士:用检索增强的合作信号弥合差距 2505.20730v1 -
1113 05-27 Energy-based generator matching: A neural sampler for general state space Energiebasierte Generator-Matching: Ein neuronaler Sampler für den allgemeinen Zustandsraum 基于能源的发电机匹配:一般状态空间的神经取样器 2505.19646v2 -
1114 05-27 A reinforcement learning agent for maintenance of deteriorating systems with increasingly imperfect repairs Ein Verstärkungs-Lernmittel für die Instandhaltung von verschlechternden Systemen mit zunehmend unvollkommenen Reparaturen 强化学习代理,用于维护修理越来越不完善的恶化系统 2505.20725v1 -
1115 05-27 LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation LeDiFlow: Erlernter, verteilungsgeführter Fluss passend zur beschleunigten Bildgenerierung LediFlow:为加速图像生成而实现的派发指导流动匹配 2505.20723v1 -
1116 05-27 Diffusion Model-based Activity Completion for AI Motion Capture from Videos Diffusion Modellbasierte Aktivitätsvervollständigung für AI Motion Capture aus Videos AI 从视频中抓取 AI 运动的传播示范活动完成 2505.21566v1 -
1117 05-27 Recurrent Neural Operators: Stable Long-Term PDE Prediction Recurrent Neural Operators: Stabile Langzeit-PDE-Vorhersage 经常性神经操作员:稳定的长期PDE预测 2505.20721v1 -
1118 05-27 ProgCo: Program Helps Self-Correction of Large Language Models ProgCo: Programm hilft bei der Selbstkorrektur großer Sprachmodelle ProgC:帮助大语言模式自我校正方案 2501.01264v2 -
1119 05-27 LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models LatentExplainer: Erklären von latenten Darstellungen in tiefgenerativen Modellen mit multimodalen großen Sprachmodellen 前任Explainer:在多模式大语言模型的深创模型中解释前述表述 2406.14862v6 -
1120 05-27 PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints PCDCNet: Ein Surrogate-Modell für die Luftqualitätsprognose mit physikalisch-chemischer Dynamik und Einschränkungen PCDCNet:利用物理化学动态和制约因素进行空气质量预测的替代模型 2505.19842v2 -
1121 05-27 What is Fair? Defining Fairness in Machine Learning for Health Was ist fair? Fairness im maschinellen Lernen für die Gesundheit definieren 什么是公平?界定机器保健学习的公平性 2406.09307v5 -
1122 05-27 Are Data Embeddings effective in time series forecasting? Sind Daten-Embeddings in der Zeitreihenvorhersage wirksam? 数据嵌入在时间序列预测中是否有效? 2505.20716v1 -
1123 05-27 Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting Wideband RF Radiance Field Modellierung mit Frequenz eingebettet 3D Gaussian Splatting 使用频率组合的 3D 高斯平面 2505.20714v1 -
1124 05-27 Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis Funktioniert Graph Prompt? Eine Datenbetriebsperspektive mit theoretischer Analyse 《图表迅速工作吗? 带有理论分析的数据操作视角》 2410.01635v2 -
1125 05-27 Time-Series Learning for Proactive Fault Prediction in Distributed Systems with Deep Neural Structures Time-Series Learning für proaktive Fehlervorhersage in verteilten Systemen mit tiefen neuralen Strukturen 深心神经结构分布系统预发性故障预测时间序列学习 2505.20705v1 -
1126 05-27 NeUQI: Near-Optimal Uniform Quantization Parameter Initialization NeUQI: Beinahe-optimale einheitliche Quantisierung Parameter Initialisierung NeUQI: 近最佳统一量化参数初始化 2505.17595v2 -
1127 05-27 Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases Zwischen Circuits und Chomsky: Pre-Pretraining auf Formal Languages Imparts Linguistic Biases 巡回巡回和乔姆斯基之间:正式语言语言语言预科培训 2502.19249v2 -
1128 05-27 vCache: Verified Semantic Prompt Caching vCache: Verifizierter semantischer Prompt-Caching vCache: 校验语义快速缓冲 2502.03771v3 -
1129 05-27 Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model Multi-Instance-Lernen als Downstream-Aufgabe des selbstüberwachten Learning-basierten vortrainierten Modells 将多机构学习作为自监督学习模式培训前模式的下游任务 2505.21564v1 -
1130 05-27 Sparsified State-Space Models are Efficient Highway Networks Sparsifizierte State-Space-Modelle sind effiziente Highway-Netzwerke 国家空间模型是高效公路网 2505.20698v1 -
1131 05-27 Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models Token-Level Akzeptieren oder ablehnen: Ein Micro Alignment-Ansatz für große Sprachmodelle 接受或拒绝时肯级别:大语言模式微调整方法 2505.19743v2 -
1132 05-27 Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series Generieren von Hypothesen dynamischer Kausalgraphen in der Neurowissenschaft: Nutzung generativer Faktorenmodelle beobachteter Zeitreihen 在神经科学中生成动态因果图的假设:利用观测时间序列的生成因数模型 2505.20697v1 -
1133 05-27 Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration Navigieren Sie das Unbekannte: Verbesserung der LLM-Vernunft mit intrinsischer Motivation geführte Exploration 导航未知:利用内在动力性引导探索加强LLM 2505.17621v2 -
1134 05-27 Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets Temporale Saliency-geführte Destillation: Ein skalierbares Framework für die Destillierung von Videodatensätzen 时间性盐度-指导蒸馏:用于蒸馏视频数据集的可缩放框架 2505.20694v1 -
1135 05-27 Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages Phir Hera Fairy: Ein englisches Märchen ist ein starker Faker der fließenden Rede in Low-Resource indischen Sprachen Phir Hera Fairy:英国仙女是印度低资源语言流利流利的有力名人 2505.20693v1 -
1136 05-27 Evidential Deep Active Learning for Semi-Supervised Classification Evidentielles tiefes aktives Lernen für semi-überwachte Klassifikation 半监督分类的证明深层积极学习 2505.20691v1 -
1137 05-27 Accelerating RL for LLM Reasoning with Optimal Advantage Regression Beschleunigung der RL für LLM-Vernunft mit optimaler Regression 以最优优势回归加速 LLL 来计算LLM 加速RL 原因 2505.20686v1 -
1138 05-27 A Survey of LLM $\times$ DATA Eine Umfrage über LLM $\times$ DATEN 对LLLM 美元-美元-美元-美元-数据数据的调查 2505.18458v2 -
1139 05-27 MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning MODULI: Locking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning MODULI:通过离线多目标强化学习扩散模型解锁普及 2408.15501v2 -
1140 05-27 SELF-PERCEPT: Introspection Improves Large Language Models’ Detection of Multi-Person Mental Manipulation in Conversations SELF-PERCEPT: Introspection verbessert die Erkennung von Multi-Person-Gedankenmanipulation in Gesprächen durch große Sprachmodelle SELF-PERCEPT: 调查改进大语言模型在对话中探测多人心理操纵 2505.20679v1 -
1141 05-27 Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System Viele Köpfe sind besser als eins: Verbesserte wissenschaftliche Idee-Generation durch ein LLM-basiertes Multi-Agent-System 许多领导人比一个领导人好得多:由以LLM为基础的多种机构系统改进科学思想的一代 2410.09403v4 -
1142 05-27 LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation LLM-geführtes Stärkungslernen: Bewältigung von Ausbildungsengpässen durch politische Modulation LLM-LLM-指导强化学习:通过政策调整解决培训瓶颈问题 2505.20671v1 -
1143 05-27 From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation Vom Sehen zum Tun: Überbrücken von Vernunft und Entscheidung für die Robotermanipulation 从看到做:机器人操纵的搭桥理由和决定 2505.08548v2 -
1144 05-27 RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts RE-Bench: Bewertung der KI-FuE-Fähigkeiten von Sprachmodellagenten gegen menschliche Experten RE-BENCH: 对照人类专家评估语言模范代理商的AI研究与开发的前沿能力 2411.15114v2 -
1145 05-27 Predicting and Understanding College Student Mental Health with Interpretable Machine Learning Vorhersagen und Verständnis College Student Mental Health mit Interpretable Machine Learning 预测和理解学院学生心理健康与可解释机器学习 2503.08002v2 -
1146 05-27 Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers Continuous-Time-Achtung: PDE-geführte Mechanismen für lange Sequenztransformatoren 持续关注:长序列变换者PDE-指导机制 2505.20666v1 -
1147 05-27 Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond Auf dem Weg zu LLM Unlearning Resilient to Relearning Attacks: Eine scharfsinnige Minimierungsperspektive und darüber hinaus 走向LLM 学会学会学会学会重新学习攻击的不学习能力:锐化-尽量减少知识的视角及展望 2502.05374v4 -
1148 05-27 BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models BLAST: Ausgewogene Zeitreihen für universelle Vorhersagemodelle BLAST: 通用预测模型平衡抽样时间序列 2505.17871v2 -
1149 05-27 Generalized and Personalized Federated Learning with Foundation Models via Orthogonal Transformations Generalisiertes und personalisiertes Federated Learning mit Gründungsmodellen über Orthogonale Transformationen 通过矫形转变形成基础模型的通用和个性化联邦学习 2505.19888v2 -
1150 05-27 ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning ReMA: Meta-Denken lernen für LLMs mit Multi-Agenten-Verstärkungs-Lernen ReMA:学习多机构强化学习的LLMLM的元思维 2503.09501v3 -
1151 05-27 How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines Wie können neurale Netzwerke mit Skalierungsgesetzen ausgebaut werden? Eine Umfrage und praktische Leitlinien 如何提升具有扩展法的神经网络? 2502.12051v3 -
1152 05-27 Enhancing Time Series Forecasting via a Parallel Hybridization of ARIMA and Polynomial Classifiers Verbesserung der Zeitreihenprognose über eine parallele Hybridisierung von ARIMA und Polynom-Klassifikatoren 通过ARIMA和多边分类的平行混合预测增强时间序列 2505.06874v2 -
1153 05-27 An Optimisation Framework for Unsupervised Environment Design Ein Rahmen für die Optimierung des unbeaufsichtigten Umweltdesigns 无人监督环境设计优化框架 2505.20659v1 -
1154 05-27 When More is Less: Understanding Chain-of-Thought Length in LLMs Wenn mehr weniger ist: Verstehst du die Kettenlänge in LLMs? 越少越多: 了解LLM 中所寻求的链条长度 2502.07266v3 -
1155 05-27 Prompting Decision Transformers for Zero-Shot Reach-Avoid Policies Prompting Decision Transformers für Zero-Shot-Reach-Aoid-Politiken 推动零热切无损政策决策变革者 2505.19337v2 -
1156 05-27 New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class Neuer Paradigma der Adversarial Training: Freigabe von Genauigkeit-Robustheit-Trade-Off über Dummy-Klasse 反向培训新范例:通过Dummi类实现释放准确性-交战交易 2410.12671v2 -
1157 05-27 FRABench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities FRABench und GenEval: Skalierung feinkörniger Aspekte Bewertung über Aufgaben, Modalitäten hinweg FRA Bench和GenEval:扩大对各任务、方式、方式和方式的精细评价 2505.12795v2 -
1158 05-27 Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning Voronoi-Grid-basiertes Pareto-Front-Lernen und seine Anwendung auf kollaboratives Federated Learning 以Voronoi-Grid为基础的Pareto阵线学习及其在联邦学习合作组织中的应用 2505.20648v1 -
1159 05-27 Moment Expansions of the Energy Distance Momenterweiterungen der Energieentfernung 扩大能源距离时间 2505.20647v1 -
1160 05-27 Evaluating Training in Binarized Neural Networks Through the Lens of Algorithmic Information Theory Bewertung der Ausbildung in Binarized Neural Networks durch die Linse der algorithmischen Informationstheorie 通过分析信息理论的透镜评估神经网络的觉测培训 2505.20646v1 -
1161 05-27 Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain Aufgabenoptimierte konvolutionäre recurrente Netzwerke richten sich an taktile Verarbeitung im Nagetierhirn 与鼠脑中触摸处理相适应的 任务优化的革命经常网络 2505.18361v2 -
1162 05-27 Can Past Experience Accelerate LLM Reasoning? Kann vergangene Erfahrung LLM Reasoning beschleunigen? 以往经验能否加快LLM理由解释? 2505.20643v1 -
1163 05-27 PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation PosterO: Strukturierung von Layout-Strukturen zur Aktivierung von Sprachmodellen in der Generierung von generalisierten Content-Aware-Layouts PosterO: 构建布局树以在通用内容软件布局生成中启用语言模型 2505.07843v2 -
1164 05-27 Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation Rethinking MUSHRA: Bewältigung moderner Herausforderungen in der Text-zu-Speech-Bewertung 重新思考MUSHRA:应对文本到语音评价中的现代挑战 2411.12719v3 -
1165 05-27 Pointing the Way: Refining Radar-Lidar Localization Using Learned ICP Weights Den Weg weisen: Verfeinerung der Radar-Lidar-Lokalisierung mit erfahrenen ICP-Gewichten 指向方向:利用比较方案所积累的重量改进雷达-里达尔的本地化 2309.08731v4 -
1166 05-27 GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration GMoE: Stärkung von LLMs Feinsteuerung über MoE Graph Collaboration GMOE:通过教育部图表合作,赋予LLMs Fine-Turning女士权力 2412.16216v3 -
1167 05-27 Non-identifiability distinguishes Neural Networks among Parametric Models Nicht-Identifizierbarkeit unterscheidet neurale Netzwerke zwischen parametrischen Modellen 不可识别性将神经网络区分为参数模型 2504.18017v2 -
1168 05-27 Scintillation pulse characterization with spectrum-inspired temporal neural networks: case studies on particle detector signals Scintillation-Pulscharakterisierung mit spektruminspirierten zeitlichen neuronalen Netzwerken: Fallstudien zu Partikeldetektor-Signalen 与受频谱启发的时时神经网络的闪烁脉冲定性:粒子探测器信号案例研究 2410.07267v3 -
1169 05-27 Policy Design for Two-sided Platforms with Participation Dynamics Politikgestaltung für zweiseitige Plattformen mit Partizipationsdynamik 具有参与动态的双面平台政策设计 2502.01792v2 -
1170 05-27 Explaining Concept Shift with Interpretable Feature Attribution Erklären von Konzeptverschiebungen mit interpretierbarer Eigenschaftszuweisung 解释解释概念转变与可解释性地物归属 2505.20634v1 -
1171 05-27 Adaptive Backtracking Line Search Adaptive Rückverfolgungszeilensuche 适应性后回跟踪线搜索 2408.13150v2 -
1172 05-27 Test-Time Learning for Large Language Models Test-Time Learning für große Sprachmodelle 大语言模型试验时间学习 2505.20633v1 -
1173 05-27 Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training Einschließlich flexibler Bildkonditionierung in Text-zu-Video-Diffusionsmodelle ohne Training 将灵活的图像条件纳入无培训的文本到视频传播模型 2505.20629v1 -
1174 05-27 Position: Adopt Constraints Over Penalties in Deep Learning Position: Überstrapazierte Strafen im Deep Learning adoptieren 职位:在深深学习中采用约束措施以凌驾刑罚 2505.20628v1 -
1175 05-27 JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes JaxRobotarium: Schulung und Einsatz von Multi-Roboter-Politik in 10 Minuten JaxRobotior:10分钟内培训和部署多机器人政策 2505.06771v2 -
1176 05-27 Knowledge Distillation Approach for SOS Fusion Staging: Towards Fully Automated Skeletal Maturity Assessment Wissensdestillationsansatz für SOS-Fusionsstaging: Auf dem Weg zu einer vollautomatischen Skeletalreifebewertung 利用知识蒸馏方法解决求求求融合问题:全面自动化骨骼成熟期评估 2505.21561v1 -
1177 05-27 SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation SeqPO-SiMT: Sequentielle Politikoptimierung für die gleichzeitige maschinelle Übersetzung SeqPO-SIMT:同步机器翻译的序列政策优化 2505.20622v1 -
1178 05-27 Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning Mehrstufige Zertifizierte Verteidigung gegen vergiftende Angriffe im Offline-Verstärkungslernen 多级认证防卫,防止在离线强化学习中进行毒物攻击 2505.20621v1 -
1179 05-27 An Inexact Halpern Iteration with Application to Distributionally Robust Optimization Eine ungenaue Halpern-Iteration mit Anwendung zur distributiv robusten Optimierung 用于分布强力优化优化的不精确 Halpern 迭代 2402.06033v3 -
1180 05-27 SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds SoftPQ: Robuste Instance Segmentierungsbewertung über Soft Matching und Tunable Thresholds 软PQ:通过软匹配和金枪鱼分量阈值进行强力实例分化评价 2505.12155v2 -
1181 05-27 Real-Time Stress Monitoring, Detection, and Management in College Students: A Wearable Technology and Machine-Learning Approach Echtzeit-Stress-Monitoring, Detection und Management in College-Studenten: Ein Wearable-Technologie- und Machine-Learning-Ansatz 大学生实时应力监测、检测和管理:穿戴技术和机械学习方法 2505.15974v2 -
1182 05-27 LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers LLM-FE: Automatisiertes Feature Engineering für Tabellendaten mit LLMs als Evolutionsoptimierer LLM-FE: 制表数据的自动地貌工程,LLMM作为进化优化器 2503.14434v2 -
1183 05-27 PhySense: Sensor Placement Optimization for Accurate Physics Sensing PhySense: Sensor-Platzierungs-Optimierung für präzise Physik Sensing 感应:精确物理遥感传感器定位优化 2505.18190v2 -
1184 05-27 Intelligent Incident Hypertension Prediction in Obstructive Sleep Apnea Intelligente Hypertonie-Vorhersage bei obstruktiver Schlafapnoe 阻碍睡眠的智能性事件超强度预测 2505.20615v1 -
1185 05-27 A Concentration Bound for TD(0) with Function Approximation Ein Konzentrationsbund für TD(0) mit Funktionsannäherung 具有函数接近度的 TD(0) 的浓度界值 2312.10424v3 -
1186 05-27 REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning 实际检索: 数学理由的回收增量精液预言 2505.20613v1 -
1187 05-27 Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models Roboflow100-VL: Ein Multi-Domain-Objekterkennungs-Benchmark für Vision-Language-Modelle 机器人流100-VL:愿景-语言模型多功能物体探测基准 2505.20612v1 -
1188 05-27 Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings Hierarchische Mamba trifft auf Hyperbolische Geometrie: Ein neues Paradigma für strukturierte Spracheinbettungen 等级式 Mamba 相遇超双曲几何: 结构化语言嵌入的新范式 2505.18973v2 -
1189 05-27 Integral Imprecise Probability Metrics Integral Ungenaue Wahrscheinlichkeits-Metriken 综合综合不全性障碍 概率概率度量 2505.16156v2 -
1190 05-27 Improving Generative Inverse Design of Rectangular Patch Antennas with Test Time Optimization Verbesserung des generativen Inversen Designs von rechteckigen Patchantennen mit Testzeitoptimierung 改进带测试时间优化的矩形补边天线的生成反向设计 2505.18188v2 -
1191 05-27 InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling InstGenIE: Generative Bildbearbeitung mit Mask-aware Caching und Scheduling effizient gemacht InstGenie: 生成图像编辑, 高效使用防面具图像缓冲和排程 2505.20600v1 -
1192 05-27 Randomly Sampled Language Reasoning Problems Explain Limits of LLMs Zufällig gemusterte Sprachbegründungsprobleme erklären Grenzen von LLMs 随机抽样 语言原因问题解释LLMM限制 2501.02825v5 -
1193 05-26 (1) GenMol: A Drug Discovery Generalist with Discrete Diffusion GenMol: Ein Drug Discovery Generalist mit diskreter Diffusion GenMol: 具有分辨扩散作用的药物发现通俗主义者 2501.06158v2 -
1194 05-26 Prot2Token: A Unified Framework for Protein Modeling via Next-Token Prediction Prot2Token: Ein einheitliches Framework für Proteinmodellierung über Next-Token-Vorhersage Prot2Token:通过次声预测建立蛋白模型的统一框架 2505.20589v1 -
1195 05-26 Bidirectional Variational Autoencoders Bidirektionale Variationale Autoencoder 双向多向自动自动编码器 2505.16074v2 -
1196 05-26 Balancing Performance and Costs in Best Arm Identification Ausgewogene Leistung und Kosten bei der Ermittlung der besten Waffen 平衡最佳武器识别的性能和费用 2505.20583v1 -
1197 05-26 Training a Generally Curious Agent Ein allgemein neugieriger Agent ausbilden a 训练一般好奇剂 2502.17543v3 -
1198 05-26 Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL Strg-DNA: Kontrollierbare Zell-Typ-spezifische Regulatorische DNA-Design über eingeschränkte RL Ctrl-DNA:通过受控RL设计可控细胞-Type-具体监管DNA 2505.20578v1 -
1199 05-26 Emotion Classification In-Context in Spanish Emotion Classification In-Context auf Spanisch 西班牙文《情感分类西班牙文内引文》 2505.20571v1 -
1200 05-26 Bi-Level Unsupervised Feature Selection Bi-Level-Unüberwachte Feature-Auswahl 双级不受监督的地物选择 2505.20563v1 -
1201 05-26 Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning Jenseits von Markovian: Reflektierende Exploration über Bayes-Adaptive RL für LLM-Reasoning 马尔科维安之后:通过Bayes-Adapative RL进行反射勘探,用于LLM 理由分析 2505.20561v1 -
1202 05-26 Advancing Molecular Machine Learning Representations with Stereoelectronics-Infused Molecular Graphs Advancing Molecular Machine Learning Representations mit stereoelectronics-infused Molecular Graphs 具有立体电子成份式分子图的分子机学习演示 2408.04520v2 -
1203 05-26 Causal Composition Diffusion Model for Closed-loop Traffic Generation Causal Composition Diffusion Modell für die Closed-Loop-Verkehrserzeugung 闭闭环交通流量生成原因构成传播模式 2412.17920v3 -
1204 05-26 Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text Task-informierte Anti-Kurriculum durch Masken verbessert Downstream-Performance auf Text 通过遮罩改进文字下流业绩,以任务化的反文体 2502.12953v2 -
1205 05-26 Learning a Pessimistic Reward Model in RLHF Ein pessimistisches Belohnungsmodell in RLHF lernen 在RLHF学习悲观奖励模式 2505.20556v1 -
1206 05-26 A ZeNN architecture to avoid the Gaussian trap Eine ZeNN-Architektur, um die Gaussische Falle zu vermeiden 避免高斯陷阱的 ZeNN 建筑 2505.20553v1 -
1207 05-26 Estimating Motor Symptom Presence and Severity in Parkinson’s Disease from Wrist Accelerometer Time Series using ROCKET and InceptionTime Abschätzung von Motorsymptome und Schweregrad bei Parkinson-Krankheit aus der Wrist Accelerometer Time Serie mit ROCKET und InceptionTime 利用 ROCKET 和 受孕时间从风速计时间序列中估计帕金森氏病的机动症状存在和严重性 2304.11265v3 -
1208 05-26 TAPIP3D: Tracking Any Point in Persistent 3D Geometry TAPIP3D: Verfolgung eines beliebigen Punktes in persistenter 3D-Geometrie TAPIP3D:跟踪持久性三维几何中的任何点 2504.14717v2 -
1209 05-26 Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling Erreichen von Anpassungsfähigkeit und Optimität für mehrarmige Banditen mit Expenential-Kullback Leibler Maillard Sampling 利用Expernitial-Kullback Leiber Leiber Maillard抽样,实现多武装强盗的适应性和最佳性 2502.14379v2 -
1210 05-26 Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes Quantum Speedups bei der Bedauernsanalyse von Unendlichen Horizon durchschnittlichen Markov-Entscheidungsprozessen 对无限地平地平平线平均回报Markov决定程序进行遗憾分析时的量量加速 2310.11684v4 -
1211 05-26 RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs RL nur im Namen? Analyse der strukturellen Annahmen im RL-Post-Training für LLMs 仅限名称的RL?分析在RL为LLMs提供的培训后培训中的结构假设 2505.13697v2 -
1212 05-26 Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models Kovariate-adjusted Deep Causal Learning für heterogene Panel-Datenmodelle 异质小组数据模型的共变调整深因学习 2505.20536v1 -
1213 05-26 Rotary Masked Autoencoders are Versatile Learners Rotary Masked Autoencoder sind vielseitige Lerner 扶轮式遮罩自动算术员是多功能学习者 2505.20535v1 -
1214 05-26 HiPoNet: A Multi-View Simplicial Complex Network for High Dimensional Point-Cloud and Single-Cell Data HiPoNet: Ein Multi-View-Komplexnetzwerk für hochdimensionale Point-Cloud- und Single-Cell-Daten HipoNet:高多面点和单细胞数据多视图简易复杂的网络 2502.07746v2 -
1215 05-26 One-shot Robust Federated Learning of Independent Component Analysis One-shot Robust Federated Learning of Independent Component Analysis 强力学习独立构成部分分析 2505.20532v1 -
1216 05-26 Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate Vorhersage-erweitert Monte Carlo: Eine Machine-Learning-Ansicht auf Steuerungsvariate 预测增强的蒙特卡洛:关于控制Variatte的机械学习观点 2412.11257v2 -
1217 05-26 Fast Calculation of Feature Contributions in Boosting Trees Schnelle Berechnung von Feature-Beiträgen bei der Förderung von Bäumen 快速计算推动树的特性贡献 2407.03515v2 -
1218 05-26 Training Articulatory Inversion Models for Inter-Speaker Consistency Training Artikulatorische Inversionsmodelle für die Konsistenz zwischen den Lautsprechern 供发言者间和谐使用的培训用人工转换模型 2505.20529v1 -
1219 05-26 DYMAG: Rethinking Message Passing Using Dynamical-systems-based Waveforms DYMAG: Nachricht neu denken Passieren mit Dynamisch-Systeme-basierten Wellenformen DYMAG: 利用动态系统波形重新思考信息传递方式 2309.09924v5 -
1220 05-26 Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks Lernpolitische Ausschüsse für effektive Personalisierung in MDPs mit unterschiedlichen Aufgaben 在有不同任务的多边发展方案中促进有效个性化的学习政策委员会 2503.01885v2 -
1221 05-26 Towards Fully FP8 GEMM LLM Training at Scale Auf dem Weg zum vollständigen RP8 GEMM LLM Training auf Scale GEMM GEMM LLM 大规模培训 2505.20524v1 -
1222 05-26 Scaling over Scaling: Exploring Test-Time Scaling Pareto in Large Reasoning Models Skalierung über Skalierung: Untersuchung von Test-Zeit-Skalierung Pareto in großen vernünftigen Modellen 缩放过缩放: 探索大型理由模型中的测试时间缩放派 2505.20522v1 -
1223 05-26 Semi-Explicit Neural DAEs: Learning Long-Horizon Dynamical Systems with Algebraic Constraints Halbexplizite neurale DAEs: Lernen von langhorizontigen dynamischen Systemen mit algebraischen Einschränkungen 半显性神经DAEs:学习具有代数限制的长毛利区动态系统 2505.20515v1 -
1224 05-26 On a Neural Implementation of Brenier’s Polar Factorization Über eine neurale Umsetzung von Breniers Polarfaktorisierung 布赖尼尔极地化的神经实施 2403.03071v4 -
1225 05-26 A Novel Convolutional Neural Network-Based Framework for Complex Multiclass Brassica Seed Classification Ein neuartiges konvolutionäres neurales Netzwerk-basiertes Framework für die komplexe Klassifizierung von mehrstufigen Brassica-Samen 复杂多级巴西种子种子分类新革命神经网络框架 2505.21558v1 -
1226 05-26 Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures Beispiel und Karte aus einem einzigen Convex-Potential: Erzeugung mit konjugierenden Momenten 单一汇合潜能的样本和地图:使用协同时间措施生成 2503.10576v2 -
1227 05-26 Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review Verkörperte KI mit Basismodellen für mobile Serviceroboter: Ein Systematischer Test 与 “ 移动服务机器人:系统审查 “ 基金会模型 2505.20503v1 -
1228 05-26 Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification Erklären Sie: Evidenz-getriebene Vorhersagen für erklärbare Drogenziel-Identifikation 寻求解释:对可解释药物目标识别的由证据驱动的预测 2402.04068v4 -
1229 05-26 CLEVRER-Humans: Describing Physical and Causal Events the Human Way CLEVRER-Mensch: Physikalische und kausale Ereignisse auf menschliche Weise beschreiben CLEVRER-人类:将自然和因果事件描述为人类道路 2310.03635v2 -
1230 05-26 Distributionally Robust Optimization Verteilungsstarke Optimierung 分布强力优化 2411.02549v3 -
1231 05-26 Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data Vermeiden Sie das Vergessen, indem Sie globale Wissensgradienten im Föderierten Lernen mit nicht-ID-Daten bewahren 避免在使用非二二二维数据进行联邦学习时因保留全球知识进步而被遗忘 2505.20485v1 -
1232 05-26 Towards Efficient Training of Graph Neural Networks: A Multiscale Approach Auf dem Weg zu einer effizienten Ausbildung von Graphen-Neuralen Netzwerken: Ein multiskaliger Ansatz 争取对图形神经网络进行有效培训:一种多部门办法 2503.19666v3 -
1233 05-26 CardioPatternFormer: Pattern-Guided Attention for Interpretable ECG Classification with Transformer Architecture CardioPatternFormer: Mustergeführte Aufmerksamkeit für die Interpretierbare EKG-Klassifikation mit Transformer-Architektur 卡尔迪·皮德·皮德罗·弗德:对具有变形结构的可解释的ECG分类的典型引导关注 2505.20481v1 -
1234 05-26 Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective Sparsamkeit für stichprobeneffizientes Preference-Lernen: Eine theoretische Perspektive 利用差距促进抽样有效优先学习:理论视角 2501.18282v3 -
1235 05-26 From learnable objects to learnable random objects Von lernbaren Objekten zu lernbaren zufälligen Objekten 从可学习对象到可学习随机对象 2504.00847v2 -
1236 05-26 Stochastic Preconditioning for Neural Field Optimization Stochastische Vorkonditionierung für die Neuralfeldoptimierung 神经场优化的斯托克预设设备 2505.20473v1 -
1237 05-26 WeatherEdit: Controllable Weather Editing with 4D Gaussian Field WeatherEdit: Kontrollierbare Wetterbearbeitung mit 4D Gaussian Field 气象编辑: 4D Gaussian 字段的可控天气编辑 2505.20471v1 -
1238 05-26 Recursive Deep Inverse Reinforcement Learning Rekursives tiefes Inverse-Verstärkung-Lernen 递归深反向强化学习 2504.13241v4 -
1239 05-26 Learning with Expected Signatures: Theory and Applications Lernen mit erwarteten Signaturen: Theorie und Anwendungen 学习与预期签名:理论和应用 2505.20465v1 -
1240 05-26 Federated Learning-Distillation Alternation for Resource-Constrained IoT Federated Learning-Destillation Alternative für ressourcengebundenes IoT 资源培训型IOT 资源培训型IOT替代物 2505.20456v1 -
1241 05-26 Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection Skalierungsgesetze für das Vergessen beim Finetuning mit Vorschulungs-Dateninjektion 调整前数据输入时遗忘法律的扩大范围 2502.06042v2 -
1242 05-26 BlastOFormer: Attention and Neural Operator Deep Learning Methods for Explosive Blast Prediction BlastOFormer: Aufmerksamkeit und neuraler Operator Deep Learning Methoden zur explosiven Blast-Vorhersage BLastO Former: 爆炸性爆炸预测的注意和神经操作员深学习方法 2505.20454v1 -
1243 05-26 Active Learning for Multiple Change Point Detection in Non-stationary Time Series with Deep Gaussian Processes Aktives Lernen für Multiple Change Point Detection in nicht-stationären Zeitreihen mit tiefen Gauß-Prozessen 与深高斯进程一起在非静止时间序列中进行多变点探测活动学习 2505.20452v1 -
1244 05-26 Symmetry constrained neural networks for detection and localization of damage in metal plates Symmetrie eingeschränkte neuronale Netze zur Erkennung und Lokalisierung von Schäden in Metallplatten 用于金属板块损害探测和定位的对称约束神经网络 2409.06084v3 -
1245 05-26 Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach Zeitreihenerstellung unter Datenknappheit: Ein einheitlicher generativer Modellierungsansatz 数据缺乏情况下的时间序列生成:统一生成模式方法 2505.20446v1 -
1246 05-26 HoPE: Hybrid of Position Embedding for Length Generalization in Vision-Language Models HoPE: Hybrid der Positionseinbettung für die Längenverallgemeinerung in Vision-Language-Modelle HoPE:愿景-语言模型中长期通用化所嵌入的立场组合 2505.20444v1 -
1247 05-26 AI Learning Algorithms: Deep Learning, Hybrid Models, and Large-Scale Model Integration KI-Learning-Algorithmen: Deep Learning, hybride Modelle und großformatige Modellintegration AI 学习等级:深学习、混合模型和大型模型整合 2410.09186v3 -
1248 05-26 Holes in Latent Space: Topological Signatures Under Adversarial Influence Löcher im latenten Raum: Topologische Signaturen unter dem Einfluss von Adversarien 低空空洞:在对立影响下的地形签名 2505.20435v1 -
1249 05-26 Kernel Quantile Embeddings and Associated Probability Metrics Kernel-Quantile-Embeddings und zugehörige Wahrscheinlichkeits-Metriken 内核量量嵌入器及相关概率 2505.20433v1 -
1250 05-26 Differentiable Quadratic Optimization For The Maximum Independent Set Problem Unterschiedliche quadratische Optimierung für das maximale unabhängige Set-Problem 最大独立集集问题可区别的二次二次曲线优化 2406.19532v6 -
1251 05-26 Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? Selbstreflektierende Unsicherheiten: Kennen LLMs ihre interne Antwortverteilung? 自我反感的不确定性:LLMs知道他们的内部答案分布吗? 2505.20295v1 -
1252 05-26 Reasoning LLMs are Wandering Solution Explorers Grundlegende LLMs sind wandernde Lösungs-Explorer 理据LLMs是游荡的解决方案探索者 2505.20296v1 -
1253 05-26 Lorentz Local Canonicalization: How to Make Any Network Lorentz-Equivariant Lorentz lokale Canonicalization: Wie man jedes Netzwerk Lorentz-Equivariant Lorentz 本地 Canonicalization : 如何制造任何网络 Lorentz- Equivalication 2505.20280v1 -
1254 05-26 Solving Hidden Monotone Variational Inequalities with Surrogate Losses Lösen versteckter monotoner Variationsungleichheiten mit Surrogatverlusten 解决与代谢损失的隐藏单式单体差异性不平等 2411.05228v3 -
1255 05-26 The Coverage Principle: A Framework for Understanding Compositional Generalization Das Coverage-Prinzip: Ein Rahmen für das Verständnis der kompositorischen Verallgemeinerung 覆盖范围原则:理解普遍组成框架 2505.20278v1 -
1256 05-26 Probabilistic Kernel Function for Fast Angle Testing Probabilistische Kernel-Funktion für schnelle Winkelprüfung 用于快速角测试的概率内核函数 2505.20274v1 -
1257 05-26 Comparing Neural Network Encodings for Logic-based Explainability Vergleich von Neural Network Encodings für Logic-basierte Erklärbarkeit 比较基于逻辑的解释性神经网络编码 2505.20269v1 -
1258 05-26 Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits Ergebnisbasiertes Online-Verstärkungslernen: Algorithmen und grundlegende Grenzen 基于成果的在线强化学习:等级和基本限制 2505.20268v1 -
1259 05-26 syftr: Pareto-Optimal Generative AI syftr: Pareto-Optimal Generative KI Syftr: Pareto- Opmatimal 生成 AI 2505.20266v1 -
1260 05-26 Lifelong Safety Alignment for Language Models Lebenslange Sicherheitsausrichtung für Sprachmodelle 语言模型终身安全比对 2505.20259v1 -
1261 05-26 GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining GRAPE: Optimierung der Datenmischung für ein robustes Multi-Target-Adaptives Vortraining GRAPE: 优化集体强力多目标适应性预备培训的数据混合 2505.20380v1 -
1262 05-26 Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs Position: Mechanische Dolmetschbarkeit sollte Feature-Konsistenz in SAEs priorisieren 位置: 机械可解释性:应优先考虑高级专业环境评估中的地物一致性 2505.20254v1 -
1263 05-26 Unveiling AI’s Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors Enthüllen der Blind-Spots von KI: Ein Oracle für In-Domain-, Out-of-Domain- und Adversarial-Fehler 大赦国际不懈的《盲人点:内地、外地和反向错误的甲骨文》 2410.02384v3 -
1264 05-26 Learning Extrapolative Sequence Transformations from Markov Chains Extrapolative Sequenztransformationen von Markov-Ketten lernen 来自Markov 链条的学习外推序列变换 2505.20251v1 -
1265 05-26 On the Guidance of Flow Matching Über die Anleitung von Flow Matching 流动配对指南 2502.02150v3 -
1266 05-26 TACO: Training-free Sound Prompted Segmentation via Semantically Constrained Audio-visual CO-factorization TACO: Schulungsfreie Klang-Prompt-Segmentierung über semantisch eingeschränkte Audio-visuelle CO-Fabrizierung TACO:通过模拟压缩培训的视听共同推动因素,进行无培训、无培训的音频快速分割 2412.01488v3 -
1267 05-26 Efficient Optimization Accelerator Framework for Multistate Ising Problems Effizientes Optimierungs-Beschleuniger-Framework für Multistate Ising-Probleme 高效高效优化多州化问题加速加速框架 2505.20250v1 -
1268 05-26 RedAHD: Reduction-Based End-to-End Automatic Heuristic Design with Large Language Models RedAHD: Reduktionsbasiertes, End-to-End-Automatisches Heuristisches Design mit großen Sprachmodellen REDAHD: 具有大语言模型的后端至后端自动超量设计 2505.20242v1 -
1269 05-26 DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning DreamPRM: Domain-regewichtetes Prozess-Reward-Modell für multimodale Vernunft DreamPRM: 多边理由解释的负重评分进程奖励模式 2505.20241v1 -
1270 05-26 SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems SITCOM: Triple-Consistent Diffusions-Probenahme für inverse Probleme SITCOM: 反问题递进三联扩散抽样 2410.04479v2 -
1271 05-26 A Temporal Difference Method for Stochastic Continuous Dynamics Eine zeitliche Differenzmethode für stochastische kontinuierliche Dynamik 存储连续动态的时差方法 2505.15544v3 -
1272 05-26 RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning RAGEN: Selbst-Evolution in LLM-Agenten durch Multi-Turn-Verstärkungs-Lernen verstehen 通过多阶段强化学习了解LLM代理商的自我演变 2504.20073v2 -
1273 05-26 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training SFT-Erinnerungen, RL Generalisiert: Eine vergleichende Studie des Stiftungsmodells nach der Ausbildung SFT Memorizes,RL一般化:基金会培训模式模型比较研究 2501.17161v2 -
1274 05-26 Variational Deep Learning via Implicit Regularization Variationales Deep Learning durch Implizite Regularisierung 通过隐性规范化进行不同的深层学习 2505.20235v1 -
1275 05-26 Multimodal Federated Learning With Missing Modalities through Feature Imputation Network Multimodales Federated Learning mit fehlenden Modalitäten durch Feature Imputation Network 通过特征截肢网络以失踪模式进行多模式联邦学习 2505.20232v1 -
1276 05-26 From What to How: Attributing CLIP’s Latent Components Reveals Unexpected Semantic Reliance Von was zu wie: Zuweisen von CLIPs latenten Komponenten zeigt ungeahnte semantische Zuverlässigkeit 从何到如何: 将 CLIP 的内部部件流出异常的语义依赖性归结为 CLIP 的内部批量 。 2505.20229v1 -
1277 05-26 FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models FLAME-MoE: Eine transparente End-to-End-Forschungsplattform für Mixture-of-Experts-Sprachmodelle FLAME-MOE:混合专家语言模型透明端对端研究平台 2505.20225v1 -
1278 05-26 Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects Chain-of-Thought für autonomes Fahren: Eine umfassende Umfrage und Zukunftsaussichten 寻求自主驾驶:全面调查和未来前景 2505.20223v1 -
1279 05-26 Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction Rollen Sie die Würfel & Blick, bevor Sie springen: Gehen über die kreativen Grenzen der Next-Token-Vorhersage 跳跃前的骰子滚动和看一看:超越了次声预测的创造性极限 2504.15266v2 -
1280 05-26 Gradient Flow Matching for Learning Update Dynamics in Neural Network Training Gradient Flow Passend zum Lernen von Update-Dynamik im neuralen Netzwerktraining 神经网络培训中学习更新动态动态的渐进流程匹配 2505.20221v1 -
1281 05-26 Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction Öffnen Sie die Augen von MPNN: Vision verbessert MPNN in Link Prediction MPNNN的 “ 睁开眼 “ :愿景在 “ 连结预测 “ 中加强MPNN 2505.08266v2 -
1282 05-26 New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results Neue Perspektiven auf die Polyak Stepsize: Surrogate-Funktionen und negative Ergebnisse 关于 “ 多边步骤的新观点:代理功能和消极结果 “ 2505.20219v1 -
1283 05-26 Fine-grained List-wise Alignment for Generative Medication Recommendation Feinkörnige List-Wise-Ausrichtung für Generative Medikamente Empfehlung 生产用药建议精制清单调整 2505.20218v1 -
1284 05-26 Parameter-Efficient Fine-Tuning with Column Space Projection Parameter-Effizient Feintuning mit Säulenraumprojektion 带有列空间投射的高效参数精密设计 2505.20211v1 -
1285 05-26 FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings FedECA: Eine Federated External Control Arm Methode für ursächliche Schlussfolgerungen mit Zeit-bis-Event-Daten in verteilten Einstellungen FedECA:在分布环境中利用时间到时间的数据进行因果关系推断的联邦外部控制武器法 2311.16984v9 -
1286 05-26 Temporal Sampling for Forgotten Reasoning in LLMs Zeitliche Probenahme für vergessene Vernunft in LLMs LLM 被遗忘原因的时间抽样 2505.20196v1 -
1287 05-26 FunReason: Enhancing Large Language Models’ Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement FunReason: Erweiterung der Funktion großer Sprachmodelle durch Multiscale-Verluste und automatisierte Datenverfeinerung durch Selbst-Refinement FunReason:通过自我改进、多尺度损失和数据自动化改进加强大语言模型功能 2505.20192v1 -
1288 05-26 Private Geometric Median in Nearly-Linear Time Private Geometrische Medien in fast linearer Zeit 近利时私人几何中位数 2505.20189v1 -
1289 05-26 Research on feature fusion and multimodal patent text based on graph attention network Forschungsarbeiten über Feature Fusion und multimodalen Patenttext auf der Grundlage von Graphen Aufmerksamkeit Netzwerk 根据图示关注网络研究地物聚合和多式专利法 2505.20188v1 -
1290 05-26 UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design UniMoMo: Unified Generative Modellierung von 3D-Molekülen für De Novo Binder Design UniMomo:De Novo Binder 设计3D Molecules的统一生成模型 2503.19300v3 -
1291 05-26 Linearization of ReLU Activation Function for Neural Network-Embedded Optimization: Optimal Day-Ahead Energy Scheduling Linearisierung der ReLU-Aktivierungsfunktion für neurale Netzwerk-Embedded-Optimierung: Optimale Day-Ahead-Energieplanung ReLU神经网络激活功能的线性化 2310.01758v2 -
1292 05-26 Bayesian Optimisation Against Climate Change: Applications and Benchmarks Bayesische Optimierung gegen den Klimawandel: Anwendungen und Benchmarks Bayesian最佳应对气候变化:应用和基准 2306.04343v2 -
1293 05-26 On the Volatility of Shapley-Based Contribution Metrics in Federated Learning Über die Volatilität von Shapley-Based Contribution Metrics im Federated Learning 联邦学习中基于毛质的贡献度量变化无常 2405.08044v4 -
1294 05-26 No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference Kein kostenloses Mittagessen: Nicht-asymptotische Analyse von Vorhersage-Powered Inferenz 无免费午餐:预测力推论的非心理分析 2505.20178v1 -
1295 05-26 The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination Die Macht des iterativen Filterns für überwachtes Lernen mit (schwerer) Kontaminierung 受监督学习(重)污染的迭代过滤功能 2505.20177v1 -
1296 05-26 “KAN you hear me?” Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding “KAN hörst du mich?” Kolmogorov-Arnold-Netzwerke für gesprochenes Sprachverständnis erkunden 探索科尔莫戈洛夫-阿诺尔德语言理解网络 2505.20176v1 -
1297 05-26 mPOLICE: Provable Enforcement of Multi-Region Affine Constraints in Deep Neural Networks mPOLICE: Wahrscheinliche Durchsetzung von Multi-Region Affine-Konstraints in tiefen neuralen Netzwerken MPOLICE: 在深神经网络中以可行方式执行多种区域同系限制 2502.02434v2 -
1298 05-26 Virtual Cells: Predict, Explain, Discover Virtuelle Zellen: Vorhersagen, Erklären, Entdecken 虚拟细胞: 预测、解释、发现 2505.14613v2 -
1299 05-26 A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation Ein theoretischer Rahmen für Grokking: Interpolation gefolgt von Riemannsche Norm Minimierung Grokking理论框架:内插,然后是Riemannian Norm 最小化 2505.20172v1 -
1300 05-26 From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data Von der Ausrichtung zur Weiterentwicklung: Bootstrapping Audio-Language Alignment mit synthetischen Daten 从对齐到推进: 用合成数据推动音频语言对齐 2505.20166v1 -
1301 05-26 Capability-Based Scaling Laws for LLM Red-Teaming Capability-Based Scaling-Gesetze für LLM Red-Teaming LLM 红色团队合作以能力为基础的增强法律 2505.20162v1 -
1302 05-26 Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning Prismatische Synthese: Gradientenbasierte Datendiversifizierung steigert Generalisierung in LLM-Reasoning 理论综合:基于逐步的数据多样化促进LLM理由说明的概括化 2505.20161v1 -
1303 05-26 Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities Gedachte politische Optimierung: Überwindung externer Leitlinien und interner Fähigkeiten 优化政策:将外部指导和内部能力结合起来 2505.15692v2 -
1304 05-26 Polynomial, trigonometric, and tropical activations Polynomische, trigonometrische und tropische Aktivierungen 多边、三角和热带活性 2502.01247v2 -
1305 05-26 On the (Non) Injectivity of Piecewise Linear Janossy Pooling Auf der (Nicht-)Injektivität der stückweise linearen Janossy-Pooling 在Peaxy Linear Janosy 集合的喷射上, 2505.20150v1 -
1306 05-26 SeMe: Training-Free Language Model Merging via Semantic Alignment SeMe: Training-freies Sprachmodell Zusammenführen über semantische Ausrichtung SeME:通过语义一致合并的无培训语言模式 2505.20144v1 -
1307 05-26 Model Stitching by Functional Latent Alignment Modellstitching durch funktionale Latent Alignment 通过功能性前端对齐进行模型切换 2505.20142v1 -
1308 05-26 GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models GUARD: Rollenspiel zur Generierung von Jailbreakings in natürlicher Sprache zur Prüfung der Einhaltung der Leitlinie für große Sprachmodelle GUARD: 利用《大语言模式遵守试验准则准则》创造以自然语言破门破门 2402.03299v5 -
1309 05-26 Error Optimization: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks Fehler-Optimierung: Überwindung exponentieller Signaldekay in tiefen vorausschauenden Codierungsnetzwerken 错误 优化 : 克服深预报编码网络中的指数信号衰减 2505.20137v1 -
1310 05-26 P$^2$ Law: Scaling Law for Post-Training After Model Pruning P$^2$ Gesetz: Skalierungsgesetz für Post-Training nach Modellprüfung P$2美元 法律:示范 “ 谨慎 “ 后培训后培训后扩大法 2411.10272v3 -
1311 05-26 AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings AweDist: Aufmerksamkeitsbewusste Einbettung Destillation für neue Eingabe-Token-Einbettungen AweDist: 新的输入式嵌入式嵌入器的注意嵌入蒸馏 2505.20133v1 -
1312 05-26 InfoBridge: Mutual Information estimation via Bridge Matching InfoBridge: Gegenseitige Informationsschätzung über Bridge Matching InfoBridge:通过桥梁匹配进行相互信息估计 2502.01383v2 -
1313 05-26 Outcome-based Reinforcement Learning to Predict the Future Ergebnisbasiertes Bewehrungslernen zur Vorhersage der Zukunft 基于成果的强化学习,以预测未来 2505.17989v2 -
1314 05-26 Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks Tensorisierung ist ein leistungsfähiges, aber unerforschtes Werkzeug zur Kompression und Interpretationsfähigkeit neuronaler Netzwerke 电温是压缩和解释神经网络的强大但探索不足的工具 2505.20132v1 -
1315 05-26 MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning MolEditRL: Strukturschonende molekulare Bearbeitung durch diskretes Diffusions- und Verstärkungslernen MoldEditRL:通过分解分解和扩散及强化学习保持结构的分子编辑 2505.20131v1 -
1316 05-26 Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach Balance zwischen Interferenz und Korrelation in räumlichen Experimentaldesigns: Ein ursächlicher Graphenschnitt-Ansatz 空间实验设计中平衡干扰和关联:因果图表切割法 2505.20130v1 -
1317 05-26 Uncertainty Quantification for LLM-Based Survey Simulations Ungewissheitsquantifizierung für LLM-basierte Umfragesimulationen 以LLM为基础的LLM调查模拟器的不确定性定量 2502.17773v3 -
1318 05-26 From Tables to Time: How TabPFN-v2 Outperforms Specialized Time Series Forecasting Models Von Tabellen zur Zeit: Wie TabPFN-v2 Modelle der speziellen Zeitreihenvorhersage übertrifft 从表格到时间: TabPFN-v2 如何表现超过专门时间序列预测模型 2501.02945v3 -
1319 05-26 Understanding Generalization in Diffusion Models via Probability Flow Distance Verallgemeinerung in Diffusionsmodellen über Wahrscheinlichkeitsflussentfernung verstehen 通过概率流动远距离理解扩散模型的通用化 2505.20123v1 -
1320 05-26 Likelihood-Ratio Regularized Quantile Regression: Adapting Conformal Prediction to High-Dimensional Covariate Shifts Likelihood-Ratio Regularized Quantile Regression: Anpassung der konformen Vorhersage an hochdimensionale Kovariate Verschiebungen 常规量化递减:调整对高多元共变变化的正规预测 2502.13030v2 -
1321 05-26 Algorithmic Control Improves Residential Building Energy and EV Management when PV Capacity is High but Battery Capacity is Low Algorithmische Steuerung verbessert Wohngebäude Energie-und EV-Management, wenn PV-Kapazität ist hoch, aber Batterie-Kapazität ist gering 当光电池容量高但电池容量低时,控制电量控制改进住宅建筑的能源和EV管理,改善住宅建筑的能源和EV管理 2505.20377v1 -
1322 05-26 Generative diffusion for perceptron problems: statistical physics analysis and efficient algorithms Generative Diffusion für Perceptronprobleme: statistische Physikanalyse und effiziente Algorithmen 生成感官问题扩散:统计物理分析和有效算法 2502.16292v2 -
1323 05-26 Proxy-Free GFlowNet Proxy-freies GFlowNet 无代理的GFlowNet 2505.20110v1 -
1324 05-26 Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning Verfeinerung von Text-zu-Multiview-Diffusion durch Verstärkungslernen 通过强化学习改进微小的中文本到多视图传播 2505.20107v1 -
1325 05-26 Preference-Based Gradient Estimation for ML-Guided Approximate Combinatorial Optimization Präferenzbasierte Gradientenschätzung für ML-geführte annähernde Kombinator-Optimierung ML- Guided 近似组合优化的基于优惠的渐进式测算 2502.19377v2 -
1326 05-26 Spurious Privacy Leakage in Neural Networks Spurious Privacy Leakage in neuralen Netzwerken 神经网络中的净隐私渗漏 2505.20095v1 -
1327 05-26 A fast sound power prediction tool for genset noise using machine learning Ein schnelles Sound-Power-Prognose-Tool für Genset-Rausch mit maschinellem Lernen 利用机器学习来快速可靠电源预测工具,用于使用机器学习的genseet噪音 2505.20079v1 -
1328 05-26 Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior Grokking ExPLAIND: Vereinheitlichung von Modell, Daten und Trainingszuweisung zum Studieren von Modellverhalten Grokking ExPLAIND: 用于研究模型行为的统一模型、数据和培训归属 2505.20076v1 -
1329 05-26 An Out-Of-Distribution Membership Inference Attack Approach for Cross-Domain Graph Attacks Ein Out-Of-Distribution-Mitgliedschaft Inferenz Angriff Ansatz für Cross-Domain Graph Attacks 跨领域石块袭击的批外分配成员推推攻击方法 2505.20074v1 -
1330 05-26 SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety SafeDPO: Ein einfacher Ansatz zur direkten Preference-Optimierung mit erhöhter Sicherheit SafeDPO: 以强化安全方式直接优化优惠的简单办法 2505.20065v1 -
1331 05-26 SAEs Are Good for Steering – If You Select the Right Features SAEs sind gut für das Lenken – wenn Sie die richtigen Funktionen auswählen SAEs 有利于指导 – – 如果您选择了正确的特性 2505.20063v1 -
1332 05-26 Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting Time-VLM: Erforschung multimodaler Vision-Sprachenmodelle für Augmented Time Series Forecasting 时间-VLM:探索扩大时间序列预测的多模式愿景-语言模型 2502.04395v2 -
1333 05-26 Sable: a Performant, Efficient and Scalable Sequence Model for MARL Sable: ein leistungsfähiges, effizientes und skalierbares Sequenzmodell für MARL 电缆:MARL的性能、高效和可缩放序列模型 2410.01706v5 -
1334 05-26 Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations Ankh3: Multi-Task Pretraining mit Sequenz Denoisieren und Vollendung verbessert Proteindarstellungen Ankh3: 具有序列取消和完成的多任务预先培训,加强蛋白质代表制 2505.20052v1 -
1335 05-26 Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits Catoni-Style Change Point Detection für Reue Minimierung in nicht-stationären schwer-gefährdeten Banditen 用于在非连续重型重航匪徒中最遗憾最小化的 卡特托尼- 轮式变速点探测 2505.20051v1 -
1336 05-26 Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks Synthetische Zeitreihenprognosen mit Transformer-Architekturen: Umfangreiche Simulations-Benchmarks 利用变形建筑结构预测合成时间序列:广泛模拟基准 2505.20048v1 -
1337 05-26 Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy Convex-Annäherung von Zwei-Layer-ReLU-Netzwerken für versteckte staatliche differentielle Privatsphäre 隐藏式国家差异隐私双线雷路网络的连接近似 2407.04884v3 -
1338 05-26 Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning Kontrolle des neuralen Zusammenbruchs verbessert Out-of-Distribution Detection und Transfer Learning 控制神经崩溃增强传播外探测和转让学习 2502.10691v2 -
1339 05-26 Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interaktionen Prediction 超越简单星系:公平评估多沙因蛋白因-蛋白因相互作用预测的PLM结构 2505.20036v1 -
1340 05-26 TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks TeleSparse: Praktische Datenschutz-Bewahrung von Tiefen-Neural-Netzwerken 远程分离:深海神经网络的实际隐私保护核查 2504.19274v2 -
1341 05-26 ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers ViTaPEs: Visuotaktile Positionskodierungen für die modulübergreifende Ausrichtung in multimodalen Transformatoren ViTAPEs:多式变换器中跨模式对齐的变量定位位置编码 2505.20032v1 -
1342 05-26 Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions Mehrere Abstiege im Deep Learning als Folge von Order-Chaos-Übergängen 作为有秩序的赵国过渡的一个序列的深层学习中的多种族后裔 2505.20030v1 -
1343 05-26 Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain) Korrelation von Instruktions-Tuning (in multimodalen Modellen) mit visionssprachlicher Verarbeitung (im Gehirn) 与视觉语言处理(大脑中)相交校正(多式联运模式) 2505.20029v1 -
1344 05-26 Multi-modal brain encoding models for multi-modal stimuli Multimodale Gehirnkodierungsmodelle für multimodale Reize 多模式刺激多模式大脑编码模型 2505.20027v1 -
1345 05-26 Gradient Inversion Transcript: Leveraging Robust Generative Priors to Reconstruct Training Data from Gradient Leakage Gradient Inversion Transcript: Leveraging Robust Generative Priors to Reconstruct Trainingsdaten von Gradient Leakage 梯度反转轨迹:从梯度渗漏中重新构建培训数据的杠杆化强力生成前程 2505.20026v1 -
1346 05-26 Human-Aligned Image Models Improve Visual Decoding from the Brain Menschlich ausgerichtete Imagemodelle verbessern die visuelle Dekodierung aus dem Gehirn 人与人之间的图像模型改进大脑的视觉解码 2502.03081v2 -
1347 05-26 Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare Ontologie- und LLM-basierte Datenharmonisierung für das Federated Learning in Healthcare 以本体学和LLM为基础的保健方面联邦学习数据统一 2505.20020v1 -
1348 05-26 ProcessBench: Identifying Process Errors in Mathematical Reasoning ProcessBench: Identifizierung von Prozessfehlern in mathematischer Reasoning 进程快节: 识别数学原因中的进程错误 2412.06559v4 -
1349 05-26 Kernel-based estimators for functional causal effects kernbasierte Schätzwerte für funktionelle kausale Effekte 功能因果效应的内核核心估计值 2503.05024v3 -
1350 05-26 Data-Dependent Regret Bounds for Constrained MABs Datendependent Regret Bounds for Constrained MABs 受约束 MAB 的受控数据依赖的 Regret Bounds 2505.20010v1 -
1351 05-26 Prediction-Powered E-Values Voraussichtliche E-Werte 预测力电子价值 2502.04294v2 -
1352 05-26 TabPFN: One Model to Rule Them All? TabPFN: Ein Modell, um sie alle zu beherrschen? TabPFN: 一种模式来统治他们吗? 2505.20003v1 -
1353 05-26 Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents Unvollkommenheit: Simulieren von Studenten mit unterschiedlichen kognitiven Ebenen mit LLM-basierten Agenten 普及缺陷:利用基于LLM的代理物模拟具有不同认知水平的学生 2505.19997v1 -
1354 05-26 Learning Optimal Multimodal Information Bottleneck Representations Optimales Lernen multimodaler Informationen Engpässe Vertretungen 学习最佳最佳多模式信息 2505.19996v1 -
1355 05-26 Distortion Resilience for Goal-Oriented Semantic Communication Distortion Resilienz für zielorientierte semantische Kommunikation 目标导向语义交流的扭曲复原力 2309.14587v2 -
1356 05-26 Federated Domain Generalization with Data-free On-server Matching Gradient Föderierte Domain-Verallgemeinerung mit datenfreiem On-Server-Zustimmungs-Gradient 具有无数据观测站上与渐变匹配的无数据观测器的联邦通用域 2501.14653v2 -
1357 05-26 Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach Bedauerliche Analyse von durchschnittlichen Unichain-MDPs über einen actor-Critic-Ansatz 通过“行动者-批评办法”对平均回报单链式微DP的遗憾分析 2505.19986v1 -
1358 05-26 Bridging The Multi-Modality Gaps of Audio, Visual and Linguistic for Speech Enhancement Überbrückung der Multi-Modalitätslücken von Audio, Visual und Linguistik zur Sprachverbesserung 弥合视听和语言的多模式差距,加强语言、视听能力 2501.13375v2 -
1359 05-26 Rethinking Probabilistic Circuit Parameter Learning Probabilistisches Parameter-Lernen neu denken 重新思考概率电路参数学习 2505.19982v1 -
1360 05-26 Differential Privacy Analysis of Decentralized Gossip Averaging under Varying Threat Models Differential Privacy Analyse dezentralisierter Gossip Average unter unterschiedlichen Bedrohungsmodellen 对不同威胁模式下分散的流民的隐私差异分析 2505.19969v1 -
1361 05-26 Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking) Position: Löse schichtweise lineare Modelle, um neurale dynamische Phänomene zu verstehen (Neuraler Kollaps, Emergence, Lazy/Rich Regime und Grokking) 位置:首先理解神经动态现象的解层图层线性模型(神经崩溃、新出现、Lazy/Rich制度和Grokking) 2502.21009v2 -
1362 05-26 Learning to Select In-Context Demonstration Preferred by Large Language Model Lernen, In-Kontext-Demonstration zu wählen Bevorzugt nach großen Sprachmodellen 学习选择大语言模式首选的文本内演示 2505.19966v1 -
1363 05-26 The Limits of Preference Data for Post-Training Die Grenzen der Präferenzdaten für das Post-Training 培训后优先数据限值 2505.19964v1 -
1364 05-26 Robustly optimal dynamics for active matter reservoir computing Robust optimale Dynamik für das Recreservoir Computing mit aktiven Materien 活性物质储油层计算强有力的最佳动态 2505.05420v2 -
1365 05-26 Explanatory Summarization with Discourse-Driven Planning Erklärende Zusammenfassung mit diskursgetriebener Planung 与 “ 分流规划 “ 结合的解释性总结 2504.19339v3 -
1366 05-26 RAP: Runtime-Adaptive Pruning for LLM Inference RAP: Runtime-Adaptive Pruning für LLM-Inferenz RAP:LLM 推断的运行时间-适应性节制 2505.17138v2 -
1367 05-26 Multi-Type Point Cloud Autoencoder: A Complete Equivariant Embedding for Molecule Conformation and Pose Multi-Type-Punkt-Cloud-Autoencoder: Ein komplettes Equivariant-Embedding für Molekülkonformation und Pose 多类型点云云自动编码器:分子构造和脉冲的完全等同嵌入 2405.13791v3 -
1368 05-26 MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research MLR-Bench: Bewertung von KI-Agenten auf Open-Ended Machine Learning Research MLR-Bench:评估AI公司在开放式机械学习研究方面的代理机构 2505.19955v1 -
1369 05-26 An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning Ein erklärbares Diagnose-Framework für neurodegenerative Dementias durch Verstärkungsoptimierte LLM-Reasoning 通过强化-优化LLM解释性理疗理由的神经医学性痴呆症可解释的诊断框架 2505.19954v1 -
1370 05-26 Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions Welche Datenattribute stimulieren die Mathe- und Code-Reasoning? Eine Untersuchung über Einflussfunktionen 哪些数据属性刺激数学和代码理由? 通过影响函数进行调查 2505.19949v1 -
1371 05-26 SaSi: A Self-augmented and Self-interpreted Deep Learning Approach for Few-shot Cryo-ET Particle Detection SaSi: Ein selbst-augmentierter und selbst-interpretierter Deep-Learning-Ansatz für die wenige Schuss Cryo-ET Partikelerkennung SaSi:对几近的Cryo-ET粒子探测自增强和自我解释的深层学习方法 2505.19948v1 -
1372 05-26 Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees Dynamisch gelerntes Test-Time-Modell-Routing in Sprachmodell Zoos mit Service-Level-Garantien 具有服务级保障的语文示范动物园动态学习测试时间模型运行 2505.19947v1 -
1373 05-26 Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^π$-Realizable MDPs Inverse Q-Learning Done Right: Offline-Imitation Lernen in $Q^π$-realisierbaren MDPs 逆向Q- 学习完成右: 以可变元DP为单位的离线模拟学习($$- $- 可变 MDP) 2505.19946v1 -
1374 05-26 RefinedFields: Radiance Fields Refinement for Planar Scene Representations Verfeinerte Felder: Strahlungsfelder Verfeinerung für planare Szenendarstellungen 精炼田地: 辐射田地 2312.00639v4 -
1375 05-26 Can Visual Encoder Learn to See Arrows? Kann Visual Encoder lernen, Pfeile zu sehen? 视觉编码器能学会看到箭头吗 ? 2505.19944v1 -
1376 05-26 Beyond Freezing: Sparse Tuning Enhances Plasticity in Continual Learning with Pre-Trained Models Beyond Freezing: Sparse Tuning verbessert Plastizität im kontinuierlichen Lernen mit vortrainierten Modellen 超出冻结范围:在继续学习过程中,采用培训前模式,粗略的加注可增强可塑性 2505.19943v1 -
1377 05-26 Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning Aufgabenorientierte kabelarme semantische Kommunikation mit selbstüberwachtem Lernen 以任务为导向的低标签低标签语义交流与自控学习 2505.19940v1 -
1378 05-26 Efficient Time Series Processing for Transformers and State-Space Models through Token Merging Effiziente Zeitreihenverarbeitung für Transformatoren und State-Space-Modelle durch Token Merging 通过 Token 合并对变形器和国家空间模型的有效时间序列处理 2405.17951v2 -
1379 05-26 Constructing a BPE Tokenization DFA Aufbau einer BPE Tokenization DFA 正在构建 BPE 磁盘化 DFA 2405.07671v2 -
1380 05-26 Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent Modellierung von Multi-Task-Modellen, die als adaptives projektives Gradientenabsinken zusammenwachsen 模拟多任务模式模型合并为适应性预测梯度下层 2501.01230v3 -
1381 05-26 Logic Gate Neural Networks are Good for Verification Logic Gate Neural Networks sind gut für die Verifikation 逻辑门神经网络有利于核查 2505.19932v1 -
1382 05-26 JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs JailbreakRadar: Umfassende Bewertung von Jailbreak Attacken gegen LLMs Jailbreb Radar:全面评估对LLMs的越狱袭击 2402.05668v3 -
1383 05-26 Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning Semantic-Aware Ressourcenmanagement für C-V2X Platooning über Multi-Agent Verstärkungslernen 通过多机构强化学习进行 C-V2X 等离子处理的语义软件资源管理 2411.04672v2 -
1384 05-26 Cellwise and Casewise Robust Covariance in High Dimensions Cellwise und Casewise Robuste Kovarianz in hohen Abmessungen 高维度的单元格和大小写常量 2505.19925v1 -
1385 05-26 Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL Bellman-Updates vertrauen lernen: Selektive State-Adaptive Regularisierung für Offline RL 学习信任 Bellman 更新信息: 选择性国家适应性离线转线常规化 2505.19923v1 -
1386 05-26 (Un)supervised Learning of Maximal Lyapunov Functions (Un)überwachtes Lernen von maximalen Lyapunov-Funktionen (无受监督的学习 Maximal Lyapunov 函数的学习 2408.17246v2 -
1387 05-26 A Probabilistic Model for Non-Contrastive Learning Ein probabilistisches Modell für nicht kontrastives Lernen 非交流性学习概率模型 2501.13031v2 -
1388 05-26 APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization APE: Ein datenzentrischer Benchmark für effiziente LLM-Anpassung in der Textzusammenfassung APE: 文本摘要中高效LLM适应数据中心基准 2505.19912v1 -
1389 05-26 Inverse Problem Sampling in Latent Space Using Sequential Monte Carlo Inverse Problem-Sampling im Latent Space mit Sequential Monte Carlo 利用定序蒙特卡洛在低层空间进行逆向问题抽样 2502.05908v2 -
1390 05-26 ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining ESLM: Risiko-Averse Selective Language Modeling für effizientes Vortraining ESLM: 有效培训前风险-反风险选择语言建模 2505.19893v1 -
1391 05-26 APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs APB: Beschleunigen des verteilten Long-Context-Schlussfolgerungens durch Übergeben von komprimierten Kontextblöcken über GPUs APP: 通过通过横跨 GPU 传递压缩的上下文区块加速分布式长文字推文 2502.12085v2 -
1392 05-26 A Langevin sampling algorithm inspired by the Adam optimizer Ein Langevin-Sampling-Algorithmus, inspiriert vom Adam-Optimierer 由亚当优化器启发的Langevin取样算法 2504.18911v2 -
1393 05-26 Learning mechanical systems from real-world data using discrete forced Lagrangian dynamics Mechanische Systeme aus realen Daten mit diskreter, erzwungener Lagrange-Dynamik lernen 使用离散强制拉格朗江动力从真实世界数据中学习机械系统 2505.20370v1 -
1394 05-26 Single-Agent vs. Multi-Agent LLM Strategies for Automated Student Reflection Assessment Single-Agent vs. Multi-Agent LLM-Strategien für die automatisierte Bewertung von Studentenreflexionen 学生自动反省评估战略 2504.05716v2 -
1395 05-26 Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks Zielspezifisches De Novo-Design von Wirkstoff-Kandidatenmolekülen mit Graph Transformer-basierten Generativen Adversarial-Netzwerken 配有基于图形变形器的成形反转基因网络的药物候选分子具体新设计 2302.07868v7 -
1396 05-26 Risk-Averse Reinforcement Learning with Itakura-Saito Loss Risiko-Averse Verstärkungs-Lernen mit Itakura-Saito-Verlust 以Itakuura-Saito损失进行反风险强化学习 2505.16925v2 -
1397 05-26 Explaining the role of Intrinsic Dimensionality in Adversarial Training Erklärung der Rolle der Intrinsischen Dimensionalität im Adversarial Training 解释内在多面性在相互培训中的作用 2405.17130v2 -
1398 05-26 Multi-Graph Inductive Representation Learning for Large-Scale Urban Rail Demand Prediction under Disruptions Multi-Graph Induktives Representationslernen für großflächige Nachfragevorhersage für die Stadtbahn unter Störungen 大型城市铁路需求预测中断下的大型城市铁路需求预测 2408.15619v2 -
1399 05-26 Deep Active Inference Agents for Delayed and Long-Horizon Environments Tiefe aktive Inferenz-Agenten für verzögerte und lang-Horizonte Umgebungen 延迟和长-Horizon环境的深海活性推断剂 2505.19867v1 -
1400 05-26 HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation HS-STAR: Hierarchische Probenahme für selbstlernende Vernunfter über Schwierigkeitsschätzung und Budget-Umverteilung HS-STAR:通过难以估计和预算重新定位为自学理性者进行等级抽样 2505.19866v1 -
1401 05-26 Information-theoretic Generalization Analysis for Expected Calibration Error Informationstheoretische Generalisierungsanalyse für erwarteten Kalibrierungsfehler 预期校准错误信息理论概括分析 2405.15709v2 -
1402 05-26 FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance Fields FruitNeRF++: Eine generalisierte Multi-Fruit-Counting-Methode, die kontrastives Lernen und neurale Strahlungsfelder nutzt 水果NeRF++:通用的多功能计数方法,利用矛盾学习和神经辐射场 2505.19863v1 -
1403 05-26 KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example KAN verbessern wir die HEP-Klassifizierungsaufgaben? Kolmogorov-Arnold Networks für ein LHC-Physikbeispiel KAN我们改进了HEP分类任务? KAN我们改进了HEP分类任务? Kolmogorov-Arnold网络应用到一个LHC物理范例 2408.02743v2 -
1404 05-26 Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity Varianzreduziertes Kaskade Q-Lernen: Algorithmen und Probenkomplexität 差异减少的连级学习:等级和抽样复杂性 2408.06544v2 -
1405 05-26 REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models REA-RL: Reflection-Aware Online-Verstärkungs-Lernen für effiziente große Vernunftmodelle REA-RL:为高效大型理由模型进行反思-软件在线强化学习 2505.19862v1 -
1406 05-26 Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning? Editing as Unlearning: Sind Methoden der Wissensbearbeitung starke Grundlagen für großes Sprachmodell Unlearning? 编辑为 “ 重新学习:知识编辑方法是否为大语言模式的 “ 退出学习 “ 的 “ 大语言模式 “ 的 “ 坚实基线 “ ? 2505.19855v1 -
1407 05-26 DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning DISCOVER: Automatisiertes Curricula für Sparse-Reward-Verstärkungs-Lernen DISCOV: 失学-退职强化学习自动化课程 2505.19850v1 -
1408 05-26 Efficient Deconvolution in Populational Inverse Problems Effiziente Dekonvolution in inversen Bevölkerungsproblemen 人口逆向问题的有效演变 2505.19841v1 -
1409 05-26 One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP Ein Surrogate an Narren: All: Universelle, übertragbare und gezielte Widersacherangriffe mit CLIP 以CLIP取代 “ 愚人Them all “ :通用、可转移和有针对性的对立攻击 2505.19840v1 -
1410 05-26 Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications Multi-Agenten-Verstärkung Lernen in Cybersicherheit: Von Grundlagen zu Anwendungen 网络安全多机构强化多机构网络安全学习:从基础到应用 2505.19837v1 -
1411 05-26 DiffNMR: Advancing Inpainting of Randomly Sampled Nuclear Magnetic Resonance Signals DiffNMR: Advancing Inpainting von zufällig gemusterten Kernmagnetresonanzsignalen DiffNMR:推进随机抽样核磁共振信号的油漆 2505.20367v1 -
1412 05-26 Revisiting Glorot Initialization for Long-Range Linear Recurrences Wiederbesuch der Glorot-Initialisierung für langanhaltende lineare Wiederholungen 重新审查长频线性线性重现的地球初始化 2505.19827v1 -
1413 05-26 Foundation Models for Tabular Data within Systemic Contexts Need Grounding Basismodelle für tabellarische Daten in systemischen Kontexten benötigen Erdung 系统环境中需要依据的表格数据基础模型 2505.19825v1 -
1414 05-26 An Introductory Survey to Autoencoder-based Deep Clustering – Sandboxes for Combining Clustering with Deep Learning Eine Einführungsstudie zum Autoencoder-basierten Deep Clustering – Sandboxen für die Kombination von Clustering mit Deep Learning 以自动编码器为基础的深层集束 – – 将集束与深层学习相结合的沙箱的介绍性调查 2504.02087v2 -
1415 05-26 LAPA-based Dynamic Privacy Optimization for Wireless Federated Learning in Heterogeneous Environments LAPA-basierte Dynamic Privacy Optimization for Wireless Federated Learning in heterogenen Umgebungen 以LAPA为基础的在多种不同环境无线联邦学习的动态隐私优化 2505.19823v1 -
1416 05-26 Poison in the Well: Feature Embedding Disruption in Backdoor Attacks Gift im Brunnen: Feature Einbetten von Disruption in Backdoor-Angriffe 井中毒:幕后袭击中的特异性嵌入干扰 2505.19821v1 -
1417 05-26 InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory InfoCons: Identifizieren von interpretierbaren kritischen Konzepten in Punktwolken über Informationstheorie 信息库:通过信息理论确定点云中可解释的关键概念 2505.19820v1 -
1418 05-26 Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates Schnelle differenzierbare Modale Simulation von nichtlinearen Strings, Membranen und Platten 非线性字符串、膜和平板等非线性字符串的快速可区分模式模拟 2505.05940v2 -
1419 05-26 Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models Jailbreak-AudioBench: In-Depth-Bewertung und Analyse von Jailbreak-Bedrohungen für große Audio-Sprachenmodelle 监狱破碎-AudioBennch:对大型音频语言模型的监狱破碎威胁进行内部评价和分析 2501.13772v2 -
1420 05-26 Density Ratio-Free Doubly Robust Proxy Causal Learning Dichte Verhältnis-frei doppelt robust Proxy Kausal Lernen 低密度比率-无杜布利强力代理原因学习 2505.19807v1 -
1421 05-26 Continuous Simplicial Neural Networks Kontinuierliche simplizielle Neuralnetze 简单连续神经网络 2503.12919v2 -
1422 05-26 Modulated differentiable STFT and balanced spectrum metric for freight train wheelset bearing cross-machine transfer monitoring under speed fluctuations Modulierte differenzierbare STFT und symmetrische Spektralmetrik für Güterzug-Radsatzlager-Übertragungsüberwachung unter Geschwindigkeitsschwankungen 根据速度波动情况对具有跨机械转移监测的货运火车轮轮车采用机动机动的可机动机动式STFT和平衡频谱度指标 2406.11917v3 -
1423 05-26 Exploring Consciousness in LLMs: A Systematic Survey of Theories, Implementations, and Frontier Risks Erforschung des Bewusstseins in LLMs: Eine systematische Untersuchung von Theorien, Implementierungen und Grenzrisiken 探索LLMM中的觉悟:对理论、实施和前沿风险的系统调查 2505.19806v1 -
1424 05-26 GraphAU-Pain: Graph-based Action Unit Representation for Pain Intensity Estimation GraphAU-Pain: Darstellung der Graph-basierten Aktionseinheit für Schmerzintensitätsabschätzung 图AAU-Pain: 以图表为基础的行动股 疼痛强度估计代表 2505.19802v1 -
1425 05-26 Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks Nicht-asymptotische Konvergenzanalyse des stochastischen Gradienten Hamiltonian Monte Carlo Algorithmus mit diskontinuierlichem stochastischem Gradienten mit Anwendungen zum Training von ReLU-Neuralnetzwerken 对随机梯度汉密尔顿·汉密尔顿·蒙特-蒙特卡洛算法进行非症状趋同分析,使用不连续的随机梯度,并用于RELU神经网络培训 2409.17107v2 -
1426 05-26 The Missing Point in Vision Transformers for Universal Image Segmentation Der fehlende Punkt in Vision Transformers für die universelle Bildsegmentierung 通用图像分割的愿景变异器中的缺失点 2505.19795v1 -
1427 05-26 What Can RL Bring to VLA Generalization? An Empirical Study Was kann RL zur VLA-Verallgemeinerung bringen? Eine empirische Studie RL能带给VLA的概括化带来什么?经验研究。 2505.19789v1 -
1428 05-26 MedDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support MedDreamer: Modellbasiertes Verstärkungslernen mit latenter Imagination auf komplexen EHRs für die klinische Entscheidungsunterstützung Medreamer:以模型为基础的强化学习,对临床决定支助的复杂电子人力资源进行中层想象 2505.19785v1 -
1429 05-26 Out-of-distribution Reject Option Method for Dataset Shift Problem in Early Disease Onset Prediction Out-of-Distribution Ablehnung der Option Methode für Datensatz Verschiebung Problem bei Früherkrankungen Beginn Vorhersage 用于早期疾病上移预测中数据集移位问题的不分发拒绝选项方法 2405.19864v2 -
1430 05-26 Mol-LLM: Multimodal Generalist Molecular LLM with Improved Graph Utilization Mol-LLM: Multimodaler Generalist Molecular LLM mit verbesserter Graphenverwendung Mol-LLM:利用改进图表的多式通用主义分子有限力M 2502.02810v2 -
1431 05-26 Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models Fortschritte bei der Klassifikation medizinischer Bilder durch Modelle der Fine-Tuning Natural Domain Foundation 通过精美开发自然域基金会模型提高医学图像分类 2505.19779v1 -
1432 05-26 Query Performance Prediction using Relevance Judgments Generated by Large Language Models Abfrage der Leistungsvorhersage anhand von Relevanzurteilen, die von großen Sprachmodellen erzeugt werden 使用大语言模型产生的相关性判断的查询性绩效预测 2404.01012v3 -
1433 05-26 Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO Verständnis der Leistungslücke im Preference Learning: Eine Dichotomie von RLHF und DPO 了解优先学习方面的绩效差距:RLHF和DPO的二分切开术 2505.19770v1 -
1434 05-26 Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases Diff-Def: Diffusionsgenerierte Deformationsfelder für Bedingte Atlase Diff- Def: 用于条件图集的 Diff- Def: 用于条件图集的 Dif- 扩散- 驱动解析字段 2403.16776v2 -
1435 05-26 Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding Agentic Predictor: Leistungsvorhersage für Agentic Workflows über Multi-View-Encoding AG 预测员:通过多查看编码对AG-工作流程的性能预测 2505.19764v1 -
1436 05-26 Unfolding AlphaFold’s Bayesian Roots in Probability Kinematics AlphaFolds Bayesische Wurzeln in der Wahrscheinlichkeitskinematik entfalten 将 AlphaFold 的贝叶根在概率 Kinematics 中卸载 2505.19763v1 -
1437 05-26 In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement In-Context-Demonstrationsfragen: Zur Prompt-Optimierung für Pseudo-Supervision-Verfeinerung 内文示范事项:关于Psuedo-监督改进的迅速优化 2410.03124v2 -
1438 05-26 Semantic-Aware Interpretable Multimodal Music Auto-Tagging Semantic-Aware Interpretierbare multimodale Musik Auto-Tagging 解析多式音乐 自动调制 2505.17233v2 -
1439 05-26 CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement CIDRe: Ein referenzfreies Multi-Aspekt-Kriterium für die Qualitätsmessung von Code Comment CIDRe: 守则评论质量衡量的无参考性、无参考性、多特征的多标准标准 2505.19757v1 -
1440 05-26 Discrete Markov Bridge Diskretierte Markov-Brücke 分立马尔科夫桥 2505.19752v1 -
1441 05-26 Machine Learning Algorithm for Noise Reduction and Disease-Causing Gene Feature Extraction in Gene Sequencing Data Maschinelles Lernen Algorithmen zur Lärmreduzierung und krankheitsverursachende Gen-Feature-Extraktion in Gensequenzierungsdaten 用于减少噪音和在基因测序数据中进行疾病传播的基因特征采掘的机器学习算法 2505.19740v1 -
1442 05-26 Weighted Leave-One-Out Cross Validation Gewichtete Leave-One-Out Cross-Validierung 加权请假一次性离职后交叉验证 2505.19737v1 -
1443 05-26 Using Time Structure to Estimate Causal Effects Zeitstruktur zur Schätzung von Kausalitätseffekten verwenden 利用时间结构估计因果关系 2504.11076v2 -
1444 05-26 Accelerating Nash Learning from Human Feedback via Mirror Prox Beschleunigendes Nash-Lernen aus menschlichem Feedback über Spiegelprox 通过镜像Prox从人类反馈中加快学习 2505.19731v1 -
1445 05-26 Stuffed Mamba: Oversized States Lead to the Inability to Forget Gefüllte Mamba: Übergroße Staaten führen zu der Unfähigkeit zu vergessen 马姆巴:国家规模过大,导致无法忘却 2410.07145v2 -
1446 05-26 A Structured Tour of Optimization with Finite Differences Eine strukturierte Tour der Optimierung mit endlichen Unterschieden 结构化优化与有限差异旅游 2505.19720v1 -
1447 05-26 OCN: Effectively Utilizing Higher-Order Common Neighbors for Better Link Prediction OCN: Höhere Ordnung effektiv nutzen gemeinsame Nachbarn für bessere Link-Vorhersage OCN:有效利用高端共同邻居改善联系预测 2505.19719v1 -
1448 05-26 Graceful Forgetting in Generative Language Models Anmutiges Vergessen in generativen Sprachmodellen 在创用语言模型中优雅地忘却 2505.19715v1 -
1449 05-26 MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning MT$^{3}$: Skalierung von MLLM-basierten Textbildmaschinenübersetzungen über Multi-Task-Verstärkungslernen MT$=%3}$:通过多任务强化学习,扩大基于MLLM的文本图像机翻译 2505.19714v1 -
1450 05-26 On the Relation between Rectified Flows and Optimal Transport Über die Beziehung zwischen rektifizierten Strömungen und optimalem Verkehr 纠正性流动与最佳运输之间的关系 2505.19712v1 -
1451 05-26 Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems Automatisierte wissenschaftliche Entdeckung: Von der Gleichungserkundung zu autonomen Entdeckungssystemen 自动科学发现:从赤道发现到自主发现系统 2305.02251v2 -
1452 05-26 Solving Euler equations with Multiple Discontinuities via Separation-Transfer Physics-Informed Neural Networks Lösen von Euler-Gleichungen mit mehreren Diskontinuitäten über Separation-Transfer-Physik-informierte Neuronale Netzwerke 通过分离-传输、物理内建神经网络解决多断裂的电动方程式 2505.20361v1 -
1453 05-26 Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction Zukunftsorientierte Navigation: Dynamische Hindernisvermeidung mit einer heißen energiebasierten Multimodal-Bewegungsvorhersage 面向未来的导航:以单热能源为基础的多模式动力预测,动态障碍避免动态障碍 2505.00237v2 -
1454 05-26 HRP: High-Rank Preheating for Superior LoRA Initialization HRP: Hochanker Vorwärmung für die Superior LoRA Initialisierung HRP: 高级LORA初始化的高热预热 2502.07739v3 -
1455 05-26 Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments Mosaic: Datenfreies Wissen Destillieren über Mixture-of-Experts für Heterogene verteilte Umgebungen Mosaic:通过混合专家进行无数据知识蒸馏,促进异基因分布式环境 2505.19699v1 -
1456 05-26 Graph Guided Diffusion: Unified Guidance for Conditional Graph Generation Graph Guided Diffusion: Unified Guidance for Conditional Graph Generation 向导扩散:有条件图形生成统一指南 2505.19685v1 -
1457 05-26 CauSkelNet: Causal Representation Learning for Human Behaviour Analysis CauSkelNet: Kausales Repräsentationslernen für die menschliche Verhaltensanalyse CauSkelNet: 人类行为分析的因果关系学习 2409.15564v3 -
1458 05-26 Deep Actor-Critics with Tight Risk Certificates Deep Actor-Critics mit engen Risikozertifikaten 具有严格风险证书的深行为者-批评者 2505.19682v1 -
1459 05-26 Cut out and Replay: A Simple yet Versatile Strategy for Multi-Label Online Continual Learning Cut out und Replay: Eine einfache, aber vielseitige Strategie für Multi-Label Online Continual Learning 剪切和重放:一个简单但通俗易懂的多标签在线持续学习战略 2505.19680v1 -
1460 05-26 Optimal Multi-Fidelity Best-Arm Identification Optimale Multi-Fidelity Best-Arm-Identifikation 最佳最佳多纤维最佳武器标识 2406.03033v2 -
1461 05-26 Bridging Privacy and Robustness for Trustworthy Machine Learning Überbrückung von Privatsphäre und Robustheit für vertrauenswürdiges maschinelles Lernen 连接隐私和强力,促进可信赖的机器学习 2403.16591v4 -
1462 05-26 Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling Zero-Shot-Streaming-Text zur Sprachsynthese mit Transducer und Auto-Regressive Modellierung 零热流文本,用于带有传感器和自动递减建模的语音合成 2505.19669v1 -
1463 05-26 GTR: Graph-Table-RAG for Cross-Table Question Answering GTR: Graph-Table-RAG für Cross-Table-Frageantworten GTR:用于跨表问题解答的图表表-RAG 2504.01346v3 -
1464 05-26 Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning Mehrbildbeschreibungen für mehrsprachige, leichte Kognitive Impairment-Erkennung durch kontrastives Lernen enthüllen 通过差异学习发现多语种轻视认知缺陷的单形多语种描述 2505.17067v2 -
1465 05-26 Best-Arm Identification in Unimodal Bandits Best-Arm-Identifikation in unimodalen Banditen 统一强盗中的最佳武器识别 2411.01898v2 -
1466 05-26 MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE MoESD: Spekulatives Decoding-Potential zur Beschleunigung von Sparse MoE enthüllen MOESD: Unveil 投机性代谢潜力加速偏散的中导体 2505.19645v1 -
1467 05-26 Navigating Conflicting Views: Harnessing Trust for Learning Navigieren gegensätzlicher Ansichten: Vertrauen fürs Lernen gewinnen 引导冲突观点:利用信任学习 2406.00958v3 -
1468 05-26 When fractional quasi p-norms concentrate Wenn fraktioniertes Quasi-P-Normen-Konzentrat 当分微分准微调集中时 2505.19635v1 -
1469 05-26 Decoupling Spatio-Temporal Prediction: When Lightweight Large Models Meet Adaptive Hypergraphs Entkoppelung Spatio-Temporale Vorhersage: Wenn leichte große Modelle adaptive Hypergraphen treffen 脱钩的SPadio-TT时间预测:当轻量大模型与适应性高光谱相匹配时 2505.19620v1 -
1470 05-26 SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows SESaMo: Symmetrie-verstärkende stochastische Modulation für normalisierende Strömungen SESaMo: 正常流动的对称性-强化斯托调动 2505.19619v1 -
1471 05-26 When the Left Foot Leads to the Right Path: Bridging Initial Prejudice and Trainability Wenn der linke Fuß auf den rechten Weg führt: Überbrückung von anfänglichen Vorurteilen und Trainingsfähigkeit 当左脚引向右路时:弥合最初的偏见和可训练性 2505.12096v2 -
1472 05-26 Learning and Interpreting Gravitational-Wave Features from CNNs with a Random Forest Approach Erlernen und Dolmetschen von Gravitational-Wave-Features von CNNs mit einem zufälligen Waldansatz 使用随机森林方法从有线电视新闻网读取和解释引力维学特征 2505.20357v1 -
1473 05-26 Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models Diagnostizieren und Abmildern von Modalitätsstörungen in multimodalen großen Sprachmodellen 多式联运大语言模型中的诊断和减缓模式干预 2505.19616v1 -
1474 05-26 Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning Vielfältigkeit ist eine unvermeidliche und inhärente Herausforderung im multimodalen Lernen 多重性是多模式学习中不可避免和内在的挑战。 2505.19614v1 -
1475 05-26 Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling Skrull: Auf dem Weg zu einem effizienten langen Kontext Feinabstimmung durch Dynamic Data Scheduling Skrull:通过动态数据安排,实现高效长处微调 2505.19609v1 -
1476 05-26 Energy-based Preference Optimization for Test-time Adaptation Energiebasierte Preference-Optimierung für die Testzeitanpassung 以能源为基础的试验时间适应最佳应用 2505.19607v1 -
1477 05-26 Kuramoto-FedAvg: Using Synchronization Dynamics to Improve Federated Learning Optimization under Statistical Heterogeneity Kuramoto-FedAvg: Synchronisationsdynamik zur Verbesserung der Federated Learning Optimization unter statistischer Heterogenität Kuramoto-FedAvg:利用同步动态改善统计多样性下的联邦学习优化 2505.19605v1 -
1478 05-26 Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis Machine Translation Models für Englisch-Hindi Sprachpaare bewerten: Eine vergleichende Analyse 英文-中文语文配对评价机器翻译模型:比较分析 2505.19604v1 -
1479 05-26 Distributional Reinforcement Learning with Dual Expectile-Quantile Regression Verstärktes Lernen mit Dual Expectile-Quantile Regression 双预期量递减分布强化学习 2305.16877v4 -
1480 05-26 Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging Rep3D: Große 3D-Kernel mit Low-Rank-Empfangsmodellierung für die medizinische Bildgebung neu parametrieren Rep3D: 医疗成像低射感应模型的大型 3D 内核再修复 2505.19603v1 -
1481 05-26 Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression Speichereffiziente visuelle Autoregressive Modellierung mit Scale-Aware-KV-Cache-Kompression KV缓存压缩的内存有效视觉自动递减模型 2505.19602v1 -
1482 05-26 Preference Optimization by Estimating the Ratio of the Data Distribution Präferenzoptimierung durch Schätzung des Verhältnisses der Datenverteilung 通过估计数据分配比率实现最佳优化 2505.19601v1 -
1483 05-26 Inconsistent Tokenizations Cause Language Models to be Perplexed by Japanese Grammar Inkonsistente Tokenisierungen führen dazu, dass Sprachmodelle von japanischer Grammatik verblüfft werden. 前后不一致的招数导致语言模式被日语语法所混淆 2505.19599v1 -
1484 05-26 Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs Residual Connections und Normalisierung können eine Übersäuerung in GNNs wahrscheinlich verhindern 残留连接和正常化可可可避免防止全球NN的过度移动 2406.02997v3 -
1485 05-26 How Well Can Differential Privacy Be Audited in One Run? Wie gut kann die Privatsphäre in einem einzigen Lauf überprüft werden? 如何在单一运行中对差异隐私进行审计? 2503.07199v2 -
1486 05-26 Learning to Reason without External Rewards Vernunft lernen ohne externe Belohnungen 学习没有外部奖励的理性 2505.19590v1 -
1487 05-26 WQLCP: Weighted Adaptive Conformal Prediction for Robust Uncertainty Quantification Under Distribution Shifts WQLCP: Gewichtete adaptive konforme Vorhersage für robuste Unsicherheit Quantifizierung unter Verteilungsverschiebungen WQLCP: 分配变化下强势不确定性量化的加权适应性统一预测 2505.19587v1 -
1488 05-26 Accelerating Prefilling for Long-Context LLMs via Sparse Pattern Sharing Beschleunigung der Vorfüllung für Langkontext-LLMs über Sparse Pattern Sharing 通过 Sparse 模式共享加速预填长文本 LLMs 2505.19578v1 -
1489 05-26 GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning GraLoRA: Granulare Low-Rank-Anpassung für den Parameter-Effizient Feintuning GRALORA: 用于参数有效精密调整的颗粒式低兰克适应 2505.20355v1 -
1490 05-26 Situationally-Aware Dynamics Learning Situational-Aware Dynamics Learning 情况认知动态学习 2505.19574v1 -
1491 05-26 Truncated Kernel Stochastic Gradient Descent on Spheres Beschnittener Kern Stochastischer Gradient Abstieg auf Sphären 球体上被排出核心内核岩层渐变源 2410.01570v5 -
1492 05-26 MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model MSD-LLM: Schiffshaft in Hafenstaatkontrolle mit großem Sprachmodell vorhersagen MSD-LLM:用大语言模型预测港口国控制检查中船舶扣留情况 2505.19568v1 -
1493 05-26 BackSlash: Rate Constrained Optimized Training of Large Language Models BackSlash: Rate Constrained Optimized Training of Large Language Models 对大语言模式优化培训 2504.16968v3 -
1494 05-26 Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams Lego Sketch: Ein skalierbares neurales Netzwerk für das Sketching von Datenströmen Lego Sletch: 一个可缩放的内存放大神经网络,用于切割数据流 2505.19561v1 -
1495 05-26 EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding EuroCon: Benchmarking Parlament Beratung für politische Konsensfindung EuroCon:确定议会审议政治共识结果的基准 2505.19558v1 -
1496 05-26 Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics Ausrichten von Multiclass Neural Network Klassifikator Kriterium mit Task Performance Metrics 将多等神经网络分类标准与任务性性能计量对齐 2405.20954v2 -
1497 05-26 On scalable and efficient training of diffusion samplers Zur skalierbaren und effizienten Schulung von Diffusionssammlern 对推广采样员进行可推广和高效率的培训 2505.19552v1 -
1498 05-26 Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach Entsperren der Macht von Diffusionsmodellen in der sequentiellen Empfehlung: Ein einfacher und effektiver Ansatz 在 “ 序列建议:简单而有效办法 “ 中解锁扩散模型扩散能力 2505.19544v1 -
1499 05-26 Cuff-KT: Tackling Learners’ Real-time Learning Pattern Adjustment via Tuning-Free Knowledge State Guided Model Updating Cuff-KT: Anpassung von Lernmustern in Echtzeit durch Tuning-Free Knowledge State Guided Model Aktualisieren CUff-KT:通过更新无资-无知识国家指导模式,解决学生实时学习模式调整问题 2505.19543v1 -
1500 05-26 FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation FastCache: Schnelles Caching für Difffusionstransformator durch erlernbare lineare Annäherung 快速缓存: 通过可学习的线性近似化快速缓存扩散变异器 2505.20353v1 -
1501 05-26 R3: Robust Rubric-Agnostic Reward Models R3: Robuste Rubric-Agnostische Belohnungsmodelle R3:坚固的Rubric-不可知奖赏模型 2505.13388v2 -
1502 05-26 Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs Amulett: Neuausrichtung während der Testzeit für Personalisierte Präferenzanpassung von LLMs 缩略图:在试验期间重新对准,以适应LLMM的个性化偏好 2502.19148v2 -
1503 05-26 CITRAS: Covariate-Informed Transformer for Time Series Forecasting CITRAS: Kovariat-informierter Transformer für die Zeitreihenprognose CITRAS: 用于时间序列预测的共变-内建变换器 2503.24007v2 -
1504 05-26 Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games Kontinuierliche Zeitanalyse von schweren Ball Momentum in Min-Max-Spiele Min-Min-Max运动会重球势连续分析 2505.19537v1 -
1505 05-26 Training-Free Multi-Step Audio Source Separation Schulungsfreie Mehrstufen-Audio-Quellentrennung 无培训的多步骤多步骤音频来源分离 2505.19534v1 -
1506 05-26 ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models ExAnte: Ein Benchmark für Ex-Ante-Schlussfolgerungen in großen Sprachmodellen ExAnte:大语言模型前推定基准 2505.19533v1 -
1507 05-26 Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning Fox im Henhouse: Supply-Chain-Hintertür greift gegen Verstärkungslernen an Henhouse的狐狸:供应-Chain对加强学习的后门攻击 2505.19532v1 -
1508 05-26 Minimalist Softmax Attention Provably Learns Constrained Boolean Functions Minimalistische Softmax-Achtung lernt nachweislich eingeschränkte Boolean-Funktionen 最小软性软性关注 2505.19531v1 -
1509 05-26 SLOT: Sample-specific Language Model Optimization at Test-time Steckplatz: Beispielspezifische Sprachmodelloptimierung zur Testzeit SPLOT: 测试时特定抽样语文示范模式优化 2505.12392v2 -
1510 05-26 Navigating loss manifolds via rigid body dynamics: A promising avenue for robustness and generalisation Navigieren von Verlustkrümmern über starre Körperdynamik: Ein vielversprechender Weg für Robustheit und Verallgemeinerung 通过僵硬体体体动态来控制损失方块:加强和普及的有希望的途径 2505.19527v1 -
1511 05-26 Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate Rethinking Gating Mechanism in Sparse MoE: Arbiträre Modalitätsinputs mit vertrauensgeführtem Tor bearbeiten 微粒MOE中的重新思考定位机制:用信任引导门处理任意模式投入 2505.19525v1 -
1512 05-26 Semi-Supervised Model-Free Bayesian State Estimation from Compressed Measurements Halbüberwachte modellfreie bayesische Staatsschätzung aus komprimierten Messungen 根据压缩计量法对贝耶斯州无模式模型的半有效估算 2407.07368v5 -
1513 05-26 Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning Anwendungen und Wirkungsbewertung generativer adversarialer Netzwerke im semi-überwachten Lernen 半监测学习中产生反效果网络的应用和效果评价 2505.19522v1 -
1514 05-26 Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures Dynamisches Lernen unter Umweltauflagen durch messinduzierte Bundle-Strukturen 通过衡量产生的捆绑结构,在环境制约因素下学习动力 2505.19521v1 -
1515 05-26 SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback SIPDO: Closed-Loop Prompt Optimierung über Synthetic Data Feedback SIPDO:通过合成数据反馈,通过闭闭电话快速优化 2505.19514v1 -
1516 05-26 Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models Benchmarking multimodaler Wissenskonflikt für große multimodale Modelle 确定大型多式联运模式多模式知识冲突基准 2505.19509v1 -
1517 05-26 Multimodal Machine Translation with Visual Scene Graph Pruning Multimodale maschinelle Übersetzung mit visuellen Szenendiagrammen 带有视觉场景图的多式机器翻译 2505.19507v1 -
1518 05-26 Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment Verständnis, warum große Sprachmodelle in der Zeitreihenanalyse unwirksam sein können: Die Auswirkungen der Modalitätsausrichtung 理解为何大语言模型在时间序列分析中无效:方式调整的影响 2410.12326v2 -
1519 05-26 DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation DOGe: Defensive Output Generation für LLM-Schutz vor Wissensdestillation DOGe: 防知识蒸馏保护LLM的防御性产出产生 2505.19504v1 -
1520 05-26 Differentially private ratio statistics Statistiken über unterschiedliche private Verhältnisse 差异性私人比率统计 2505.20351v1 -
1521 05-26 Learning for Dynamic Combinatorial Optimization without Training Data Lernen für dynamische kombinatorische Optimierung ohne Trainingsdaten 没有培训数据的动态组合优化学习 2505.19497v1 -
1522 05-26 MetaSTNet: Multimodal Meta-learning for Cellular Traffic Conformal Prediction MetaSTNet: Multimodales Meta-Learning für zellulären Verkehr Konforme Vorhersage MetaSTNet: 细胞交通预测的多模式元学习 2505.21553v1 -
1523 05-26 Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval Discounted Online Convex-Optimierung: Einheitlicher Bedauern über einen kontinuierlichen Intervall 贴现的在线 Convex 优化: 连续间隔的统一遗憾 2505.19491v1 -
1524 05-26 Understanding Transformer from the Perspective of Associative Memory Transformer aus der Perspektive des assoziativen Gedächtnisses verstehen 从共同记忆的角度理解变异器 2505.19488v1 -
1525 05-26 VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning VLMLight: Verkehrssignalsteuerung über Vision-Language Meta-Control und Dual-Branch-Reasoning VLMLight:通过视觉语言、超控制和双层理由解释控制交通信号控制 2505.19486v1 -
1526 05-26 Understanding the learned look-ahead behavior of chess neural networks Das gelernte Look-Ahead-Verhalten von neuronalen Schachnetzwerken verstehen 了解国际象棋神经网络所学的直视行为 2505.21552v1 -
1527 05-26 Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs Gewinnen Sie schnell oder verlieren Sie langsam: Ausgleichende Geschwindigkeit und Genauigkeit in Latenz-Sensitive Entscheidungen von LLMs 慢赢或慢输:LLMs的延缓敏感决定中平衡速度和准确性 2505.19481v1 -
1528 05-26 Revolutionizing Wildfire Detection with Convolutional Neural Networks: A VGG16 Model Approach Revolutionierung der Wildfire-Detektion mit konvolutionären neuralen Netzwerken: Ein VGG16-Modellansatz 与革命神经神经网络一起革命性野火探测革命:VGG16示范方法 2505.19479v1 -
1529 05-26 Weighted quantization using MMD: From mean field to mean shift via gradient flows Gewichtete Quantisierung mit MMD: Vom mittleren Feld zur mittleren Verschiebung über Gradientenströme 使用 MMD 加权量化: 从平均字段到通过梯度流转移 2502.10600v2 -
1530 05-26 Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables Informationstheoretische Generalisierungsanalyse für VQ-VAEs: Eine Rolle latenter Variablen VQ-VAEs 信息理论概括分析:隐性变量的作用 2505.19470v1 -
1531 05-26 Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory Diversity-getriebene Generative Datensatzdestillation basierend auf Diffusionsmodell mit selbstadaptivem Speicher 基于带有自适应内存的传播模型的传播模型的多样化生成数据集蒸馏 2505.19469v1 -
1532 05-26 Parrot: Multilingual Visual Instruction Tuning Papagei: Mehrsprachige visuelle Anleitung Parrot: 多语言视觉教学图示 2406.02539v3 -
1533 05-26 Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin Auf dem Weg zum Ende der Ausbildung zur automatischen Spracherkennung für nigerianische Pidgin 走向尼日利亚皮吉纳自动语音识别的端至端培训 2010.11123v2 -
1534 05-26 Decision Flow Policy Optimization Optimierung der Entscheidungsflusspolitik 优化决策流程政策 2505.20350v1 -
1535 05-26 Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs Herkunfts-Tracer: Eine Methode zur Erkennung von LoRA-Feinabstimmungs-Ursprungen in LLMs 来源追踪器:用LLMM探测LORA精导来源的方法 2505.19466v1 -
1536 05-26 Residual Cross-Attention Transformer-Based Multi-User CSI Feedback with Deep Joint Source-Channel Coding Residual Cross-Attention Transformer-basierte Multi-User CSI Feedback mit Deep Joint Source-Channel Coding CSI 与深源-源-汇联合编码的反馈 2505.19465v1 -
1537 05-26 Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation Ihr Klassifikator kann mehr: Auf dem Weg zur Überbrückung der Lücken in Klassifizierung, Robustheit und Generation 您的分类员可以做更多的事情: 缩小分类、强健和代际差距 2505.19459v1 -
1538 05-26 Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians Recurrent Self-Attention Dynamics: Eine energie-agnostische Perspektive von Jacobians 《自我注意动态:雅各布人对能源不可知的视角》 2505.19458v1 -
1539 05-26 MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering MM-Prompt: Cross-Modal Prompt Tuning zur kontinuierlichen visuellen Fragestellung MM-Prompt: 用于持续视觉问答的跨模式快速测试 2505.19455v1 -
1540 05-26 MetaGMT: Improving Actionable Interpretability of Graph Multilinear Networks via Meta-Learning Filtration MetaGMT: Durch Meta-Learning Filtration die Durchführbarkeit von Graphen-Multilinearen Netzwerken verbessern MetGMT:通过Met-Learn Filtation改进图形多线网络可操作的解释性 2505.19445v1 -
1541 05-26 Discovering Forbidden Topics in Language Models Verbotene Themen in Sprachmodellen entdecken 发现语言模型中的禁止专题 2505.17441v2 -
1542 05-26 MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding MORE-Brain:可解释和可通用跨主题FMRI视觉解码专家有条不紊混合 2505.15946v2 -
1543 05-26 RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features RDI: Eine gegnerische Robustheitsbewertungsmetrik für tiefe neuronale Netzwerke basierend auf modellstatistischen Merkmalen RDI:基于示范统计特征的深神经网络对抗性强力评价标准 2504.18556v2 -
1544 05-26 Fairness Practices in Industry: A Case Study in Machine Learning Teams Building Recommender Systems Fairness Practices in der Industrie: Eine Fallstudie in Machine Learning Teams Bau von Recommender Systemen 工业公平做法:机械学习小组建立建议系统个案研究 2505.19441v1 -
1545 05-26 The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models Die Geburt des Wissens: Emergente Funktionen über Zeit, Raum und Maßstab in großen Sprachmodellen 知识的诞生:跨越时间、空间和大语言模型规模的新兴特征 2505.19440v1 -
1546 05-26 Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression Kann komprimierte LLMs wirklich handeln? Eine empirische Bewertung der Agentischen Fähigkeiten in der LLM-Kompression 能否压缩LLM Really Act? 对LLM Actrables in LLM Corpression的代理能力进行经验评估。 2505.19433v1 -
1547 05-26 Advanced long-term earth system forecasting by learning the small-scale nature Fortschrittliche Langzeitprognosen des Erdsystems durch Erlernen der kleinmaßstäblichen Natur 学习小规模性质,进行高级长期地球系统预测 2505.19432v1 -
1548 05-26 Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage Bedeutung Gewichteter Score passend für Diffusion Sampler mit erweiterten Modus Abdeckung 具有强化模式覆盖率的传播采样器比对重要加权分数 2505.19431v1 -
1549 05-26 MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision MAS-ZERO: Konzipieren von Multi-Agenten-Systemen mit Zero Supervision MAS-ZERO: 设计无监督的多机构系统 2505.14996v2 -
1550 05-26 WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference WINA: Gewichtsinformierte Neuronen-Aktivierung zur Beschleunigung der Large Language Model Inferenz WINA: 加速大语言模型推断:超速超高语言速变 速超速超时超高电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 速 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 2505.19427v1 -
1551 05-26 The Role of Diversity in In-Context Learning for Large Language Models Die Rolle der Vielfalt im In-Context-Lernen für große Sprachmodelle 多样性在为大语言模式进行内文学习方面的作用 2505.19426v1 -
1552 05-26 Structure Disruption: Subverting Malicious Diffusion-Based Inpainting via Self-Attention Query Perturbation Strukturstörung: Verringern von bösartiger Diffusions-basierter Inpainting durch Selbstaufmerksamkeit Abfrage Störung 结构混乱:通过自控查询干扰来改变恶意扩散的涂漆 2505.19425v1 -
1553 05-26 Each Graph is a New Language: Graph Learning with LLMs Jeder Graph ist eine neue Sprache: Graph Learning mit LLMs 每图都是一种新语言:用LLMM学习图表 2501.11478v3 -
1554 05-26 Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift Im Moment falsch dann: Nicht-Stationäre Direktpräferenz-Optimierung unter Preference Drift 右,右,错误 然后: 非标准直接首选优化 在偏好驱动器下 2407.18676v2 -
1555 05-26 SaVe-TAG: Semantic-aware Vicinal Risk Minimization for Long-Tailed Text-Attributed Graphs SaVe-TAG: Semantisch-bewusst Vicinal Risk Minimierung für langgestreckte Text-Attribute Graphen SaVe-TAG: 长途脱轨文本可归图解析相邻风险最小化 2410.16882v3 -
1556 05-26 Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling Streng eingeschränkte generative Modellierung über Split Augmented Langevin Sampling 通过分分扩大Langevin抽样进行严格约束的生成模型模拟 2505.18017v2 -
1557 05-26 Toward Physics-Informed Machine Learning for Data Center Operations: A Tropical Case Study Auf dem Weg zum physikinformierten maschinellen Lernen für Rechenzentrumsoperationen: Eine Tropische Fallstudie 争取为数据中心业务进行物理一体化机械学习:热带案例研究 2505.19414v1 -
1558 05-26 Future Link Prediction Without Memory or Aggregation Zukünftige Link-Vorhersage ohne Gedächtnis oder Aggregation 没有记忆或聚合的未来联系预测 2505.19408v1 -
1559 05-26 FedHERO: A Federated Learning Approach for Node Classification Task on Heterophilic Graphs FedHERO: Ein Federated Learning Approach für Knotenklassifikation Aufgaben auf heterophilen Graphen FEFHERO: 异生物图节点分类任务联邦学习方法 2504.21206v2 -
1560 05-26 Exploring the Possibility of TypiClust for Low-Budget Federated Active Learning Erforschung der Möglichkeit des TypiClusts für budgetarmes, föderiertes aktives Lernen 探讨低预算联邦积极学习的TypiClust 2505.19404v1 -
1561 05-26 KHRONOS: a Kernel-Based Neural Architecture for Rapid, Resource-Efficient Scientific Computation KHRONOS: Eine Kernel-basierte Neuralarchitektur für schnelle, ressourceneffiziente wissenschaftliche Berechnung KHRONOS:一个以核心为基础的神经结构,用于快速、资源高效科学计算 2505.13315v2 -
1562 05-26 Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs Können LLMs helfen, Erkenntnisse über LLMs zu enthüllen? Eine groß angelegte, sich entwickelnde Literaturanalyse von Frontier LLMs LLMs 帮助发现关于LLM的见识? 大型、不断发展的前沿LMS文学分析 2502.18791v3 -
1563 05-26 Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent Auf dem Weg zum Verständnis der Verallgemeinerbarkeit des verzögerten stochastischen Absinkens 了解拖延的拖延的逐步后世后代的普遍适用性 2308.09430v4 -
1564 05-26 Are Time-Series Foundation Models Deployment-Ready? A Systematic Study of Adversarial Robustness Across Domains Sind Time-Series-Stiftungsmodelle bereit? Eine systematische Studie über die widerrechtliche Robustheit über Domains hinweg 时间-系列基金会的模型是部署-准备模型吗? 2505.19397v1 -
1565 05-26 Uniform convergence of the smooth calibration error and its relationship with functional gradient Einheitliche Konvergenz des glatten Kalibrierfehlers und seines Verhältnisses mit dem funktionellen Gradienten 平稳校准误差及其与功能梯度的关系统一汇合 2505.19396v1 -
1566 05-26 Towards the Causal Complete Cause of Multi-Modal Representation Learning Auf dem Weg zur kausalen vollständigen Ursache des multi-Modalen Repräsentationslernens 走向多模式代表制学习的事业完全原因 2407.14058v6 -
1567 05-26 Alignment of large language models with constrained learning Ausrichtung großer Sprachmodelle mit eingeschränktem Lernen 大型语言模式与限制学习的结合 2505.19387v1 -
1568 05-26 JingFang: An Expert-Level Large Language Model for Traditional Chinese Medicine Clinical Consultation and Syndrome Differentiation-Based Treatment JingFang: Ein sachverständiges Sprachmodell für die traditionelle chinesische Medizin Klinische Beratung und Syndromdifferenzierungsbasierte Behandlung JingFang:中国传统医学临床咨询和综合症差别治疗专家级大语言模式 2502.04345v2 -
1569 05-26 Unsupervised Anomaly Detection Using Diffusion Trend Analysis for Display Inspection Unüberwachte Anomalieerkennung mit Diffusion Trendanalyse für Display-Inspektion 用于显示检查的利用扩散趋势分析进行无监督异常探测 2407.09578v2 -
1570 05-25 (7) SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning SALSA-RL: Stabilitätsanalyse im Latent Space of Actions zur Stärkung des Lernens SALSA-RL:加强学习行动空间的稳定分析 2502.15512v2 -
1571 05-25 Foundations of Top-$k$ Decoding For Language Models Grundlagen von Top-$k$ Dekodierung für Sprachmodelle 语言模式最高价基数 2505.19371v1 -
1572 05-25 SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition SETransformer: Eine hybride, auf Aufmerksamkeit basierende Architektur für robuste menschliche Aktivitätserkennung 转型:以关注为基础的混合结构,以确认强有力的人类活动 2505.19369v1 -
1573 05-25 One Step Diffusion via Shortcut Models Ein Schritt Diffusion über Shortcut-Modelle 通过快捷键模型进行单步扩散 2410.12557v2 -
1574 05-25 Adaptive Diffusion Guidance via Stochastic Optimal Control Adaptive Diffusionsführung über stochastische Optimale Kontrolle 通过斯托卡优化控制进行适应性扩散指导 2505.19367v1 -
1575 05-25 FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation FD-Bench: Modularer und fairer Benchmark für datengetriebene Fluidsimulation FD-时区:数据驱动流流模拟模块化公平基准 2505.20349v1 -
1576 05-25 Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments Konsistenzbasierte abduktive Begründung über Wahrnehmungsfehler mehrerer vortrainierter Modelle in neuartigen Umgebungen 创新环境中多个未受过培训的多种模式的认知错误的基于一致性的直截力理由 2505.19361v1 -
1577 05-25 Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval Optimierte Text-Embedding-Modelle und Benchmarks für die Amharische Passage Retrieval 阿姆光通过通过检索的最佳文本嵌入模型和基准 2505.19356v1 -
1578 05-25 FlashMD: long-stride, universal prediction of molecular dynamics FlashMD: Langstride, universelle Vorhersage der molekularen Dynamik FlashMD:长途、全方位预测分子动态 2505.19350v1 -
1579 05-25 Communication-Efficient Multi-Device Inference Acceleration for Transformer Models Kommunikationseffiziente Multi-Device-Inferenzbeschleunigung für Transformer-Modelle 变换模型的通信效率高多变量推推加速 2505.19342v1 -
1580 05-25 Flow Q-Learning Fluss Q-Lernen 流动学习 2502.02538v2 -
1581 05-25 Improving Compositional Generation with Diffusion Models Using Lift Scores Verbesserung der kompositorischen Generierung mit Diffusionsmodellen mit Lift-Scores 利用使用提升分数的传播模型改善组成型 2505.13740v2 -
1582 05-25 TRANSIT your events into a new mass: Fast background interpolation for weakly-supervised anomaly searches Übertragen Sie Ihre Ereignisse in eine neue Masse: Schnelle Hintergrundinterpolation für schwach überwachte Anomaliensuche 将您的事件转换成一个新的质量: 快速背景内插, 用于受微弱监督的异常搜索 2503.04342v2 -
1583 05-25 WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper WhisperD: Dementia Spracherkennung und Filler-Worterkennung mit Whisper 耳语:痴呆症言语识别和用耳语探测填字词 2505.21551v1 -
1584 05-25 Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales LLM Absolute Relevanz Urteile auf feinkörnigen Ordinalwaagen 理論或非理論:LLM 关于精准奥氏比额的绝对相关性判决 2505.19334v1 -
1585 05-25 Bayesian Comparisons Between Representations Bayesische Vergleiche zwischen Repräsentationen 代表之间的贝叶比较 2411.08739v3 -
1586 05-25 Paying Alignment Tax with Contrastive Learning Steuern mit kontraproduktivem Lernen ausgleichen 与反向学习支付一致税 2505.19327v1 -
1587 05-25 An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces Eine Adversarial Analyse von Thompson Sampling für Full-Information Online-Lernen: von Finite zu Unendlichen Aktionsräumen 对Thompson网上全面信息学习抽样分析:从有限到无限行动空间 2502.14790v4 -
1588 05-25 Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models Regress, nicht raten – Ein Rückschritt-ähnlicher Verlust an Zahlenzeichen für Sprachmodelle Regress, don’t guess - 语言模型数字调的回归式损失 2411.02083v2 -
1589 05-25 PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders PIGPVAE: Physik-informierte Gauß-Prozessvariationelle Autoencoder PIGPVAE: 物理化高斯进程变异自动编码器 2505.19320v1 -
1590 05-25 Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? Sind Transformer durch die Verbindung getrennter Kenntnisse in Trainingsdaten in der Lage, Vernunft zu erreichen? 将培训数据方面的单独知识连接起来的变换者是否具有理性? 2501.15857v6 -
1591 05-25 Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics Effort-aware Fairness: Aufnahme einer philosophisch-informierten, menschlich-zentrierten Nennung von Effort in algorithmische Fairness-Metriken 努力做到公平:将了解哲学、以人为中心的努力理念纳入到算法公平度量中 2505.19317v1 -
1592 05-25 Demand Selection for VRP with Emission Quota Auswahl der Nachfrage nach VRP mit Emissionsquoten 具有排放配额的VRP需求选择 2505.19315v1 -
1593 05-25 Concept Reachability in Diffusion Models: Beyond Dataset Constraints Konzept-Erreichbarkeit in Diffusions-Modellen: Jenseits von Datensatzbeschränkungen 传播模型中可达到的概念:超越数据集的制约 2505.19313v1 -
1594 05-25 Stochastic Hessian Fittings with Lie Groups Stochastische hessische Beschläge mit Lie Groups 配有谎言组的假体装配机 2402.11858v5 -
1595 05-25 Fractional-Boundary-Regularized Deep Galerkin Method for Variational Inequalities in Mixed Optimal Stopping and Control Fraktional-Boundary-Regularized Deep Galerkin-Methode für unterschiedliche Ungleichheiten in gemischten Optimalen Stoppen und Steuern 用于混合最佳制止和控制中差异性不平等的 分数-界分- 常规深加热法 2505.19309v1 -
1596 05-25 From Single Images to Motion Policies via Video-Generation Environment Representations Von Einzelbildern zu Motion Policies über Video-Generation Umweltvertretungen 从单一图像到通过视频环境代表从单一图像到运动政策 2505.19306v1 -
1597 05-25 Time Series Embedding Methods for Classification Tasks: A Review Zeitreihen Einbetten von Methoden für die Klassifizierung Aufgaben: Eine Überprüfung 分类任务所含方法:审查 2501.13392v2 -
1598 05-25 LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols LLM-basierte Emulation der Funkressourcenkontrollschicht: Auf dem Weg zu KI-Native RAN-Protokollen 基于LLM的无线电资源控制层模拟模拟无线电资源控制层:迈向AI-NTRAN议定书 2505.16821v2 -
1599 05-25 On the status of current quantum machine learning software Zum Status der aktuellen Quantenmaschinen-Lernsoftware 关于当前量子机器学习软件现状 2503.08962v2 -
1600 05-25 100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability? 100-LongBench: Sind de facto Long-Context-Benchmarks wortwörtlich die Lang-Context-Fähigkeit zu bewerten? 100-LongBench:事实上的长文本基准是否实际评价长文本能力? 2505.19293v1 -
1601 05-25 Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering Hypercube-RAG: Hypercube-based Retrieval-Augmented Generation for In-domain Scientific Question-Answering Hypercube-RAG: 内地科学问题解答的超立方体回收回溯性养代 2505.19288v1 -
1602 05-25 Provably Overwhelming Transformer Models with Designed Inputs Wahrscheinlich überwältigende Transformer-Modelle mit designten Eingängen 具有设计投入的、可预见地压得压得压倒的变压器模型 2502.06038v2 -
1603 05-25 A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning Eine Momentaufnahme des Einflusses: Ein lokales Daten-Attributions-Framework für Online-Verstärkungs-Lernen 《影响概览:在线强化学习地方数据归属框架》 2505.19281v1 -
1604 05-25 Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization Optimaler Transport Barycenter über Nonconvex-Concave Minimax-Optimierung 通过非 connconvex- concave Minimax 优化化优化运输博利中心 2501.14635v2 -
1605 05-25 Achieving $\tilde{\mathcal{O}}(1/N)$ Optimality Gap in Restless Bandits through Gaussian Approximation Erreichen von $\tilde{\mathcal{O}(1/N)$ Optimality Gap in ruhelosen Banditen durch Gaußsche Annäherung 通过高斯近似度实现无休止强盗的最佳差距 $\ tilde\ mathcal{O\\\\\\\\\\( n)$ 2410.15003v2 -
1606 05-25 Cellular Traffic Prediction via Byzantine-robust Asynchronous Federated Learning Zelluläre Verkehrsvorhersage über byzantinisches-robustes Asynchrones Federated Learning 通过Byzantine-Robust 亚同步联谊会学习的细胞交通预测 2505.19263v1 -
1607 05-25 Towards a Spatiotemporal Fusion Approach to Precipitation Nowcasting Auf dem Weg zu einem Spatiotemporalen Fusionsansatz zur Niederschlagung von Nowcasting 迈向对降水即时播送采取相向时间融合办法 2505.19258v1 -
1608 05-25 Learning-Augmented Online Bipartite Fractional Matching Learning-Augmented Online Bipartite Fraktional Matching 学习增强的在线双两派人数配对 2505.19252v1 -
1609 05-25 Empirical Privacy Variance Empirische Datenschutzvarianz 隐私经验差异 2503.12314v2 -
1610 05-25 Improving Value Estimation Critically Enhances Vanilla Policy Gradient Verbesserung der Wertschätzung Kritisch verbessert Vanilla Policy Gradient 显著加强香草政策梯度 2505.19247v1 -
1611 05-25 To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers To CoT or To Loop? Ein formaler Vergleich zwischen Ketten-of-Thought und Schleiftransformatoren 尝试链和循环变换器之间的正式比较 2505.19245v1 -
1612 05-25 ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment ActiveDPO: Aktive Direktpräferenzoptimierung für eine stichprobeneffiziente Ausrichtung 主动式DPO:为抽样有效对齐积极直接首选优化 2505.19241v1 -
1613 05-25 CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling CLIP-UP: Ein einfaches und effizientes Mixture-of-Experts CLIP Training Rezept mit Sparse Upcycling CLIP-UP:一个简单、高效的专家混合体 CLIP 与粗垃圾垃圾垃圾垃圾处理有关的培训名额 2502.00965v2 -
1614 05-25 LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models LLLMs: Eine datengestützte Untersuchung der sich entwickelnden Forschung über Grenzen großer Sprachmodelle LLLMs:关于大语言模式限制的不断发展的研究数据驱动调查 2505.19240v1 -
1615 05-25 Learning Transformer-based World Models with Contrastive Predictive Coding Transformer-basierte Weltmodelle mit kontradiktivem Predictive Coding lernen 以学习变换器为基础的世界差异预测编码模式 2503.04416v2 -
1616 05-25 Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees Effiziente Politikoptimierung in robusten, eingeschränkten MDPs mit Iterationskomplexitätsgarantien 在强力约束下,在具有迭接复杂度保障的多用途发展方案中提高政策效率的优化 2505.19238v1 -
1617 05-25 To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging Eine Welt in einem Funken Neuron zu sehen: Entwirren von Multi-Task-Interferenzen für trainingsfreies Modellverschmelzen 《在中世纪的火花中看到世界:为无培训模式合并拆散多任务干预》 2503.05320v2 -
1618 05-25 CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models CoreMatching: Co-adaptive Sparse Inference Framework mit Token und Neuron Pruning für eine umfassende Beschleunigung von Vision-Language-Modellen 核心配料:与Token 和Neron Prurning 共同调适的简单推断框架,以全面加速视觉语言模型 2505.19235v1 -
1619 05-25 Learning Flexible Forward Trajectories for Masked Molecular Diffusion Flexible Forward-Trajektorien für maskierte molekulare Diffusion lernen 蒙面分子扩散学习灵活前向轨迹 2505.16790v2 -
1620 05-25 Statistical Collusion by Collectives on Learning Platforms Statistische Kollusion von Kollektiven über Lernplattformen 学习平台集体统计协作 2502.04879v3 -
1621 05-25 Imitation Learning via Focused Satisficing Imitation Learning via Focused Satisficing 通过有重点的满意度学习模拟学习 2505.14820v2 -
1622 05-25 CLEVER: A Curated Benchmark for Formally Verified Code Generation CLEVER: Ein kuratierter Benchmark für die formal verifizierte Codegenerierung 正式核实的代码生成基准 2505.13938v3 -
1623 05-25 Scalarisation-based risk concepts for robust multi-objective optimisation Scalarisierungsbasierte Risikokonzepte für eine robuste multiobjektive Optimierung 实现稳健的多目标优化的以尺度化为基础的风险风险概念 2405.10221v4 -
1624 05-25 Dynamic Angle Selection in X-Ray CT: A Reinforcement Learning Approach to Optimal Stopping Dynamische Winkelauswahl in X-Ray CT: Ein verstärkten Lernansatz zum optimalen Stoppen X- Ray CT: 优化停止的强化学习方法 2503.12688v2 -
1625 05-25 Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More Sprachmodelle, Graph Searching und Überwachung Ehebruch: Wenn mehr Aufsicht weniger ist und wie man mehr macht 语言模式、图图搜索和监督通配:越少越少监督,如何做越多 2503.10542v3 -
1626 05-25 Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law Skalierungsgesetze für gradienten Abstieg und Zeichenabstieg für lineare Bigram-Modelle unter Zipf’s Gesetz 齐普夫法下线形大梁模型的渐渐后裔和信号后裔法律扩大法 2505.19227v1 -
1627 05-25 LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models LLaDA 1.5: Varianzreduzierte Preference-Optimierung für große Sprachdiffusionsmodelle LLADA 1.5:大语言传播模式差异-减少优惠 2505.19223v1 -
1628 05-25 A Novel Transformer-Based Self-Supervised Learning Method to Enhance Photoplethysmogram Signal Artifact Detection Eine neuartige, auf Transformer basierende, selbstüberwachte Lernmethode zur Verbesserung der Photoplethysmogramm-Signal-Artefakt-Erkennung 一种基于新颖变形器的以自我监督为基础的学习方法,用以加强光膜成像信号异形探测 2401.01013v2 -
1629 05-25 Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding Where Paths Collide: Eine umfassende Untersuchung der klassischen und lernbasierten multi-agenten Pathfinding 路径相撞之处:对经典和以学习为基础的多方代理调查的全面调查 2505.19219v1 -
1630 05-25 Clustering by Nonparametric Smoothing Clustering durch nichtparametrisches Glätten 以非参数平滑为群集 2503.09134v2 -
1631 05-25 Symmetries in Overparametrized Neural Networks: A Mean-Field View Symmetrien in überparametrisierten Neuralen Netzwerken: Eine Mittelfeldansicht 过度对称的神经神经网络的对称性:平均实地观点 2405.19995v3 -
1632 05-25 Adaptive Cyclic Diffusion for Inference Scaling Adaptive zyklische Diffusion zur Inferenzskalierung 用于推断力缩放的适应性二次循环传播 2505.14036v2 -
1633 05-25 SpeakStream: Streaming Text-to-Speech with Interleaved Data SpeakStream: Streaming von Text-zu-Speech mit interleaved Daten 语音Stream:用断开数据流流流文本到语音 2505.19206v1 -
1634 05-25 Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety Benign Proben Materie! Feinabstimmung auf Aussergewöhnliche Benign Proben stark bricht Sicherheit 重大事件 重大事件 重大事件 安全 重大事件 重大事件 重大事件 重大事件 重大事件 2505.06843v2 -
1635 05-25 FedGuCci: Making Local Models More Connected in Landscape for Federated Learning FedGuCci: Lokale Modelle in der Landschaft für das Federated Learning stärker miteinander verbunden FedGuCci:使地方模型在全局景观中更紧密地连接起来,促进联邦学习 2402.18949v3 -
1636 05-25 iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use iTool: Verstärkte Feinsteuerung mit dynamischer Kalibrierung bei fortgeschrittenem Werkzeugeinsatz i Tool:加强先进工具使用动态缺乏度校准的精细测试 2501.09766v4 -
1637 05-25 Diffusion Instruction Tuning Diffusions-Anleitung Tuning 传播指示图 2502.06814v2 -
1638 05-25 Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation Krümmung Dynamischer Black-Box-Angriff: Wiederherstellung der gegnerischen Robustheit durch dynamische Krümmungsschätzung 曲线 动态黑盒攻击: 通过动态曲线估计, 重新审视对抗性对称稳健性 2505.19194v1 -
1639 05-25 Interpretable Graph Learning Over Sets of Temporally-Sparse Data Interpretable Graph Learning Over Sets von temporär-Spardaten 一组暂时分隔数据上的解释性图表学习 2505.19193v1 -
1640 05-25 I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts I2MoE:可解释的多式多式互动意识混合企业专家 2505.19190v1 -
1641 05-25 Chordless Structure: A Pathway to Simple and Expressive GNNs Chordless Structure: Ein Weg zu einfachen und expressiven GNNs 无字结构:通往简单和表达性全球NNN的路径 2505.19188v1 -
1642 05-25 Heterogeneous networks in drug-target interaction prediction Heterogene Netzwerke in der Vorhersage von Wechselwirkungen mit Drogenzielen 药物目标相互作用预测中的不同类型网络 2504.16152v2 -
1643 05-25 A Physics-preserved Transfer Learning Method for Differential Equations Eine physikkonservierte Transfer-Lernmethode für Differentialgleichungen 不同等分法的受物理保留转移学习方法 2505.01281v2 -
1644 05-25 CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization CAGES: Kostenbewusste Gradienten-Entropie Suche nach effizienter lokaler Multi-Fidelity Bayesian-Optimierung CAGES: 成本-软件软件渐进式 Entropy 搜索以高效的本地多纤维贝叶斯优化 2405.07760v2 -
1645 05-25 Federated Learning: From Theory to Practice Föderiertes Lernen: Von der Theorie zur Praxis 联邦学习:从理论到实践 2505.19183v1 -
1646 05-25 DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation DiTAR: Diffusion Transformer Autoregressive Modellierung für Sprachgenerierung DITAR: 发声的传播变异器自动递减模型 2502.03930v3 -
1647 05-25 Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees Auf dem Weg zu Graph Foundation Models: Allgemeines Lernen über Graphen über Task-Trees 走向图图基础模型:通过TLT-Trees对图的学习概观 2412.16441v3 -
1648 05-25 Nteasee: Understanding Needs in AI for Health in Africa – A Mixed-Methods Study of Expert and General Population Perspectives Nteasee: Die Bedürfnisse von KI für die Gesundheit in Afrika verstehen – Eine gemischte Studie von Experten und allgemeinen Bevölkerungsperspektiven Nteasee:了解大赦国际关于非洲保健的需要 – – 专家和一般人口观点混合方法研究 2409.12197v4 -
1649 05-25 Beyond Message Passing: Neural Graph Pattern Machine Beyond Message Passing: Neural Graph Pattern Machine 超过消息传递: 神经图样机 2501.18739v2 -
1650 05-25 Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli Saliency-guided Emotion Modeling: Vorhersage von Zuschauerreaktionen aus Video-Stimuli 以色素为指导的情感建模:视频刺激的预测查看器反应 2505.19178v1 -
1651 05-25 Mixture of Lookup Experts Mischung von Lookup-Experten 查找专家混合 2503.15798v2 -
1652 05-25 Computational Inertia as a Conserved Quantity in Frictionless and Damped Learning Dynamics Computational Inertia als konservierte Menge in friktionsloser und gedämpfter Lerndynamik 计算无损和断裂学习动力学的计算因电量 2505.19171v1 -
1653 05-25 JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models JEDI: Die Macht der Jensen-Shannon-Divergenz bei entwirrenden Diffusionsmodellen JEDI: 詹森-夏农分解扩散模型的分解力量 2505.19166v1 -
1654 05-25 CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter CORAL: Lerne konsistente Repräsentationen über mehrstufiges Training mit leichterem spekulativen Entwurfer CORAL: 利用轻型投机性起草者在多阶段培训中学习一致的代表性 2502.16880v3 -
1655 05-25 Efficient Training of Multi-task Neural Solver for Combinatorial Optimization Effiziente Schulung von Multi-Task-Neural Solver zur kombinatorischen Optimierung 综合优化多任务神经溶剂高效培训 2305.06361v5 -
1656 05-25 Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation Divide-Then-Aggregat: Eine effiziente Tool-Learning-Methode über parallele Tool-Invokation 分离后生成工具:通过平行工具使用使用效率高的工具学习方法 2501.12432v2 -
1657 05-25 Mean-Shift Distillation for Diffusion Mode Seeking Mean-Shift-Destillation für den Diffusionsmodus 用于扩散模式搜索的中质蒸馏 2502.15989v2 -
1658 05-25 Do Large Language Models (Really) Need Statistical Foundations? Brauchen große Sprachmodelle (wirklich) statistische Grundlagen? 大语言模式(真正)是否需要统计基础? 2505.19145v1 -
1659 05-25 ADGSyn: Dual-Stream Learning for Efficient Anticancer Drug Synergy Prediction ADGSyn: Dual-Stream-Lernen für effiziente Anti-Krebs-Arzneimittel-Synergie-Vorhersage ADGSyn:双层学习促进高效抗癌药物协同效应预测 2505.19144v1 -
1660 05-25 AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering durch Verstärkungslernen AdaCot:通过强化学习开拓探索的探索链 2505.11896v2 -
1661 05-25 CER: Confidence Enhanced Reasoning in LLMs CER: Vertrauen in LLMs gestärkte Vernunft CER: LLM 中增强信任的理由 2502.14634v2 -
1662 05-25 Uncertainty Quantification for Physics-Informed Neural Networks with Extended Fiducial Inference Ungewissheitsquantifizierung für physikinformierte Neuronale Netzwerke mit erweiterter fiduzieller Schlussfolgerung 具有扩展影响推断力的物理成形神经网络的不确定性量化 2505.19136v1 -
1663 05-25 Incentivizing High-Quality Human Annotations with Golden Questions Anreize für hochwertige menschliche Anmerkungen mit goldenen Fragen 以金质问题激励高品质人文说明 2505.19134v1 -
1664 05-25 Fast and Accurate Power Load Data Completion via Regularization-optimized Low-Rank Factorization Schnelle und präzise Leistungslastdatenvervollständigung über Regularisierungsoptimierte Low-Rank-Fabrikisierung 通过正规化、优化低射速电荷因子化完成快速和准确电源负载数据 2505.19133v1 -
1665 05-25 Rank-One Modified Value Iteration Rang eins geänderte Wert Iteration Ran- One 修改值迭代 2505.01828v2 -
1666 05-25 Natural Language Generation from Visual Events: Challenges and Future Directions Natürliche Sprachgenerierung aus visuellen Veranstaltungen: Herausforderungen und Zukunftsrichtungen 从视觉活动中产生自然语言:挑战和未来方向 2502.13034v2 -
1667 05-25 Interacting Large Language Model Agents. Interpretable Models and Social Learning Interagieren von Large Language Model Agents. Interpretierbare Modelle und soziales Lernen 跨大语言示范工具、可解释模型和社会学习 2411.01271v2 -
1668 05-25 Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins Adaptive Sensorlenkungsstrategie mit tief greifendem Verstärkungslernen für die dynamische Datenerfassung in digitalen Zwillingen 利用深强化学习促进数字双对动态数据采集的适应感感感感指导战略 2504.10248v2 -
1669 05-25 Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods Birke SGD: Ein Baumdiagramm-Framework für lokale und asynchrone SGD-Methoden Birch SGD: 当地和非同步 SGD 方法树图框架 2505.09218v2 -
1670 05-25 Deep Active Speech Cancellation with Mamba-Masking Network Deep Active Speech Stornierung mit Mamba-Masking Network 使用 Mamba- Masking 网络的深活动语音取消 2502.01185v2 -
1671 05-25 Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers Erforschung der Magnitudenerhaltung und Rotationsmodulation in Diffusionstransformatoren 在扩散变异器中探索磁力保护与旋转调节 2505.19122v1 -
1672 05-25 FP4 All the Way: Fully Quantized Training of LLMs RP4: Vollständig quantifizierte Ausbildung von LLMs FP4 全程:充分量化的LLMM培训 2505.19115v1 -
1673 05-25 Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling Verwandeln von Müll in Schatz: Beschleunigen von Inferenzen von großen Sprachmodellen mit Token-Recycling 将垃圾垃圾变成宝库:加快使用 Tok 回收利用大语言模型的推论 2408.08696v3 -
1674 05-25 Stochastic Compositional Optimization with Compositional Constraints Stochastische kompositorische Optimierung mit kompositorischen Einschränkungen 具有组成限制的斯托具组成优化 2209.04086v2 -
1675 05-25 An Interpretable Representation Learning Approach for Diffusion Tensor Imaging Ein interpretierbarer Representations-Lernansatz für Diffusion Tensor Imaging 传播显像成像的可解释代表性学习方法 2505.19110v1 -
1676 05-25 Optimization-Inspired Few-Shot Adaptation for Large Language Models Optimization-Inspired Wenig-Shot-Anpassung für große Sprachmodelle 优化- 激发了对大语言模型的微热适应 2505.19107v1 -
1677 05-25 Statistical inference for Linear Stochastic Approximation with Markovian Noise Statistische Schlussfolgerung zur linearen stochastischen Annäherung an Markovsche Geräusche 与Markovian噪音的线性斯托口接近的统计推推 2505.19102v1 -
1678 05-25 Towards Robust Influence Functions with Flat Validation Minima Auf dem Weg zu robusten Einflussfunktionen mit Flat Validation Minima 以平滑校准微型方式向强力影响函数方向 2505.19097v1 -
1679 05-25 A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random Ein einheitliches Framework zur variablen Auswahl im modellbasierten Clustering mit Fehlen nicht zufällig 以模型为基础的集束模式中变量选择的统一框架, 随机不失踪 2505.19093v1 -
1680 05-25 ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models ReadBench: Vermessen der Dichte an Text Visuelle Lesefähigkeit von Vision-Sprachen-Modellen ” 阅读 “ :衡量视觉-语言模型的阅读能力 2505.19091v1 -
1681 05-25 CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations CMoS: Die Vorhersage der Zeitreihen durch die Linse der spaltweisen räumlichen Korrelationen neu denken CMoS: 重新思考时间序列,通过整节空间交汇的镜头预测 2505.19090v1 -
1682 05-25 Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes Temperatur ist alles, was Sie für die Generalisierung in Langevin Dynamics und anderen Markov-Prozessen benötigen Langevin Dynamics 和其他Markov 进程需要的温度是全部您需要的普遍化 2505.19087v1 -
1683 05-25 Jodi: Unification of Visual Generation and Understanding via Joint Modeling Jodi: Vereinheitlichung der visuellen Erzeugung und des Verständnisses durch gemeinsame Modellierung Jodi:通过联合建模统一视觉生成和理解 2505.19084v1 -
1684 05-25 Geometric Determinations Of Characteristic Redshifts From DESI-DR2 BAO and DES-SN5YR Observations: Hints For New Expansion Rate Anomalies Geometrische Bestimmung charakteristischer Rotverschiebungen aus DESI-DR2 BAO und DES-SN5YR Beobachtungen: Hinweise für neue Erweiterungsraten Anomalien DESSI-DD2 BAO和DES-SN5YR观测的典型变迁的几何测定:新扩张率异常现象的提示 2505.19083v1 -
1685 05-25 On Continuity of Robust and Accurate Classifiers Über die Kontinuität von robusten und präzisen Klassifikatoren 关于强力和准确性分类的连续性 2309.17048v2 -
1686 05-25 Flow Annealed Importance Sampling Bootstrap meets Differentiable Particle Physics Flow Annealed Bedeutung Sampling Bootstrap trifft differenzierbare Teilchenphysik 流动的隐形重要性取样器装置符合可区分的粒子物理 2411.16234v2 -
1687 05-25 Cluster-Aware Multi-Round Update for Wireless Federated Learning in Heterogeneous Environments Cluster-Aware Multi-Round Update für drahtloses Federated Learning in heterogenen Umgebungen 为不同不同环境无线联邦学习提供多功能集群软件多功能更新 2505.06268v2 -
1688 05-25 Recalibrating binary probabilistic classifiers Rekalibrierung von binären probabilistischen Klassifikatoren 重新计算二进制概率分解器 2505.19068v1 -
1689 05-25 Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management Adversarial Bandit über Bandits: Hierarchische Bandits für Online-Konfigurationsmanagement 反强盗强盗: 用于在线配置管理的等级强盗 2505.19061v1 -
1690 05-25 An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection Eine erste Erkundung von Feinsteuerungs-Kleinsprachenmodellen für intelligente Vertragsrepentrancy Sicherheitserkennung 初步探索智能合同留置率易变性探测智能合同微调小型语言模型 2505.19059v1 -
1691 05-25 Policy Gradient with Tree Expansion Politischer Gradient mit Baumerweiterung 随着树树扩张的政策渐变 2301.13236v2 -
1692 05-25 Distributionally Robust Deep Q-Learning Verteilungsstarkes tiefes Q-Lernen 分布强力深学习 Q- 学习 2505.19058v1 -
1693 05-25 An Embarrassingly Simple Defense Against LLM Abliteration Attacks Eine erschreckend einfache Verteidigung gegen LLM-Abliterationsangriffe 一种令人尴尬的简单防御 对付LLM 缩写攻击 2505.19056v1 -
1694 05-25 Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning Computerische Kosten im Deep-Verstärkung-Lernen durch Randomized Policy Learning reduzieren 降低深强化学习的计算成本 2505.19054v1 -
1695 05-25 Structured Reinforcement Learning for Combinatorial Decision-Making Strukturiertes Stärkungslernen für kombinatorische Entscheidungsfindung 结构强化学习促进综合决策决策 2505.19053v1 -
1696 05-25 Efficient Data Selection at Scale via Influence Distillation Effiziente Datenauswahl auf Scale durch Einflussdestillation 通过影响蒸馏在规模上高效数据选择 2505.19051v1 -
1697 05-25 SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models SliM-LLM: Salience-getriebene Mixed-Precision-Quantisierung für große Sprachmodelle SliM-LLM:大语言模型的盐度驱动混合精度量 2405.14917v2 -
1698 05-25 PII-Scope: A Comprehensive Study on Training Data PII Extraction Attacks in LLMs PII-Scope: Eine umfassende Studie über Trainingsdaten PII-Extraktionsangriffe in LLMs PII-范围:关于培训数据的综合研究 2410.06704v2 -
1699 05-25 When Models Don’t Collapse: On the Consistency of Iterative MLE Wenn Modelle nicht zusammenbrechen: Über die Konsistenz iterativer MLE 当模型不折叠时: 在迭代 MLE 一致性上 2505.19046v1 -
1700 05-25 Offline Clustering of Linear Bandits: Unlocking the Power of Clusters in Data-Limited Environments Offline-Clustering von linearen Banditen: Entriegelung der Macht von Clustern in datenbeschränkten Umgebungen 线性强盗离线集群:解锁数据限制环境中的群集力量 2505.19043v1 -
1701 05-25 Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias Turb-L1: Langfristige Turbulenzen erreichen, die durch das Greifen spektraler Bias verfolgt werden Turb-L1:通过处理光辉双鱼,实现长期动荡追踪 2505.19038v1 -
1702 05-25 Optimal Conformal Prediction under Epistemic Uncertainty Optimale konforme Vorhersage unter epistemischer Unsicherheit 在不确定性下最优化的共变预测 2505.19033v1 -
1703 05-25 SoK: Dataset Copyright Auditing in Machine Learning Systems SoK: Datensatz Copyright Auditing in Machine Learning Systemen SoK:机器学习系统中的数据集版权审计 2410.16618v2 -
1704 05-25 Learn Beneficial Noise as Graph Augmentation Benefitial Noise als Graph Augmentation lernen 学习以图增益为受益噪音 2505.19024v1 -
1705 05-25 A Smart Healthcare System for Monkeypox Skin Lesion Detection and Tracking Ein intelligentes Gesundheitssystem für Monkeypox-Hautläsionserkennung und -verfolgung 用于探测和跟踪猴子天花皮肤皮层的智能保健系统 2505.19023v1 -
1706 05-25 Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs Unbestimmte Quantifizierung auf Funktionsebene für die Kalibrierung von Feinabstimmungen auf LLMs 对LLMML进行校准微调的不确定性定量 2410.06431v3 -
1707 05-25 AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer AnchorFormer: Differentielle Anker-Achtung für effizienten Vision Transformer Anchor Former: 高效愿景变异器的可区别的锁定器注意 2505.16463v2 -
1708 05-25 When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers Wann ist Task Vector für die Modellbearbeitung wahrscheinlich wirksam? Eine Generalisierungsanalyse von nichtlinearen Transformern 任务矢量何时对模式编辑有效? 非线性变换器的概括分析 2504.10957v3 -
1709 05-25 Fractured Chain-of-Thought Reasoning Zersplitterte Kette von nachdenklichen Gründen 断断断断断断断断断断断断的探讨链原因 2505.12992v2 -
1710 05-25 Lorentzian Graph Isomorphic Network Lorentzian Graph Isomorphic Network Lorentzian 图形异形网络 2504.00142v4 -
1711 05-25 Querying Kernel Methods Suffices for Reconstructing their Training Data Abfrage von Kernel-Methoden Möglichkeiten zur Wiederherstellung ihrer Trainingsdaten 查询重新构建其培训数据所需的核心内核方法 2505.19019v1 -
1712 05-25 Accurate and Efficient Multivariate Time Series Forecasting via Offline Clustering Genaue und effiziente Multivariate Zeitreihenprognose über Offline-Clustering 通过离线群集预测准确而高效的多变量时间序列 2505.05738v2 -
1713 05-25 Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis Ausbildung nichtlinearer Transformer für den Schlussfolgerungsketten-of-Thought: Eine theoretische Generalisierungsanalyse 培训非线性非线性变换器,用于研究链推论:理论一般分析 2410.02167v3 -
1714 05-25 Understanding the Robustness of Graph Neural Networks against Adversarial Attacks Verständnis der Robustheit von Graphen-Neuralen Netzwerken gegen feindliche Angriffe 理解反对反向攻击的平面神经网络的强大力 2406.13920v2 -
1715 05-25 WorldEval: World Model as Real-World Robot Policies Evaluator WorldEval: Weltmodell als Real-World-Roboterpolitik Evaluator WorldEval:世界作为真实世界机器人政策评价人的世界模式 2505.19017v1 -
1716 05-25 Tokenizing Electron Cloud in Protein-Ligand Interaction Learning Tokenizing Electron Cloud in Protein-Ligand Interaktion Lernen 将电云投入蛋白碱的相互作用学习 2505.19014v1 -
1717 05-25 Faithful Group Shapley Value Treue Gruppe Shapley Wert 忠实的群群形状值 2505.19013v1 -
1718 05-25 Alberta Wells Dataset: Pinpointing Oil and Gas Wells from Satellite Imagery Alberta Wells Datensatz: Pinpointing Öl- und Gasquellen aus Satellitenbildern 艾伯塔·韦尔斯数据集:从卫星图象中点出石油和天然气井 2410.09032v3 -
1719 05-25 FERGI: Automatic Scoring of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction FERGI: Automatische Bewertung von Benutzereinstellungen für die Text-zu-Bild-Erzeugung aus spontaner Gesichtsausdrucksreaktion FERGI: 自动自发面性表达反应生成文本到图像的用户首选项自动排序 2312.03187v4 -
1720 05-25 Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization Handhabung von Etikettengeräuschen über Instance-Level-Schwierigkeitsmodellierung und dynamische Optimierung 通过实度难度建模和动态优化处理标签噪音 2505.00812v2 -
1721 05-25 Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding Galaxy Walker: Geometry-aware VLMs für Galaxy-Skala Verständnis Galaxy Walker: 用于银河系统系统理解的几何觉测甚低LMS 2503.18578v3 -
1722 05-25 Inductive Gradient Adjustment For Spectral Bias In Implicit Neural Representations Induktive Gradientenanpassung für Spektralbien in impliziten Neuraldarstellungen 隐含神经表层旁观生物的感应梯度调整 2410.13271v2 -
1723 05-25 Semi-pessimistic Reinforcement Learning Halbpessimistisches Erlernen der Verstärkung 半悲观强化学习 2505.19002v1 -
1724 05-25 Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs Automatische und strukturschonende Sparsifikation von Hybrid-Neural-ODEs 混合神经代码的自动和结构软件分离 2505.18996v1 -
1725 05-25 Reinforcement Learning for Reasoning in Large Language Models with One Training Example Verstärktes Lernen zur Vernunft in großen Sprachmodellen mit einem Trainingsbeispiel 采用 “ 一个培训实例 “ 采用大语言模式强化学习 2504.20571v2 -
1726 05-25 PDFBench: A Benchmark for De novo Protein Design from Function PDFBench: Ein Benchmark für De novo Protein Design von der Funktion PDFBench:从函数调出新蛋白设计基准 2505.20346v1 -
1727 05-25 STRICT: Stress Test of Rendering Images Containing Text STRICT: Stresstest von Rendering-Bildern mit Text STICT: 含有文字的图像的显示压力测试 2505.18985v1 -
1728 05-25 AmorLIP: Efficient Language-Image Pretraining via Amortization AmorLIP: Effizientes Sprach-Bild-Vortraining über Amortisation AmorLIP:通过摊销进行高效的语文图像预培训 2505.18983v1 -
1729 05-25 Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning Mamba als Continual Learner lernen: Meta-Learning Selective State Space Models für effizientes Continual Learning Mamba作为不断学习者学习Mamba:高效持续学习的元学习选择性国家空间模型 2412.00776v4 -
1730 05-25 LLMScan: Causal Scan for LLM Misbehavior Detection LLMScan: Kausalscan zur Erkennung von LLM-Missverhalten LLMScan:用于LLM Misbehavavor探测的成因扫描 2410.16638v4 -
1731 05-25 FedSKC: Federated Learning with Non-IID Data via Structural Knowledge Collaboration FedSKC: Föderiertes Lernen mit nicht-ID-Daten über strukturelle Wissenskooperation FDSKC:通过结构性知识协作,采用非IID数据的联邦学习 2505.18981v1 -
1732 05-25 GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization GhostPrompt: Jailbreaking Text-to-image Generative Modelle basierend auf dynamischer Optimierung GhostPropt:基于动态最佳化的破狱用文字到图像生成模型 2505.18979v1 -
1733 05-25 ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting ScaleBiO: Skalierbare Bilevel-Optimierung für LLM-Datenumgewichtung 缩放 BIO: LLM 数据重新加权的可缩放双级优化 2406.19976v2 -
1734 05-25 GraSS: Scalable Influence Function with Sparse Gradient Compression GraSS: Skalierbare Einflussfunktion mit Sparse Gradient Compression GraSS: 带有微缩梯度压缩的可缩放影响函数 2505.18976v1 -
1735 05-25 The Final Layer Holds the Key: A Unified and Efficient GNN Calibration Framework Die letzte Ebene hält den Schlüssel: Ein einheitliches und effizientes GNN-Kalibrierungssystem 最后层掌握着关键:统一有效的全球NNN校准框架 2505.11335v2 -
1736 05-25 MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models MoLAE: Mischung aus latenten Experten für Parameter-Effiziente Sprachmodelle MoLAE:参数有效语言模型原始专家混合 2503.23100v2 -
1737 05-25 Multi-Step Consistency Models: Fast Generation with Theoretical Guarantees Multi-Step-Konsistenzmodelle: Schnelle Generation mit theoretischen Garantien 多层次一致性模式:有理论保障的快速一代 2505.01049v2 -
1738 05-25 Genetic Influences on Brain Aging: Analyzing Sex Differences in the UK Biobank using Structural MRI Genetische Einflüsse auf das Altern des Gehirns: Analyse von Geschlechtsunterschieden in der britischen Biobank mittels struktureller MRT 对大脑老龄化的遗传基因影响:利用结构MRI分析联合王国生物库中的性别差异 2505.20344v1 -
1739 05-25 Protein Design with Dynamic Protein Vocabulary Protein Design mit dynamischem Protein Vokabular 配有动态蛋白质词汇词典的蛋白因设计 2505.18966v1 -
1740 05-25 Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models Expansion Span: Kombinieren von Fading Memory und Retrieval in Hybrid State Space Models 扩展空间:在混合国家空间模型中将平缓内存和检索合并 2412.13328v2 -
1741 05-25 How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation Wie richten und ergänzen Bilder LiDAR? Auf dem Weg zu einer harmonisierten multimodalen 3D-Panoptischen Segmentierung 图像如何对齐和补充 LiDAR ? 2505.18956v1 -
1742 05-25 Online Knowledge Distillation with Reward Guidance Online-Wissensdestillation mit lohnender Anleitung 网上知识蒸馏与奖励指导 2505.18952v1 -
1743 05-25 The Price of Format: Diversity Collapse in LLMs Der Preis des Formats: Diversity Collapse in LLMs 格式价格:多样化在LLMM中崩溃 2505.18949v1 -
1744 05-25 Exact Expressive Power of Transformers with Padding Exakte Expressive Kraft von Transformatoren mit Padding 带有斜面的变形器的精确表达力 2505.18948v1 -
1745 05-25 Minimax Optimal Reinforcement Learning with Quasi-Optimism Minimax Optimales Stärkungslernen mit Quasi-Optimismus 以准适应主义进行最优化强化学习 2503.00810v2 -
1746 05-25 Efficient Pauli channel estimation with logarithmic quantum memory Effiziente Pauli-Kanalschätzung mit logarithmischem Quantenspeicher 具有对数量内存的高效保利频道估计 2309.14326v4 -
1747 05-25 Structural Alignment Improves Graph Test-Time Adaptation Struktural Alignment verbessert Graph Test-Time Anpassung 结构调整改进图示测试时间适应 2502.18334v2 -
1748 05-25 Chi-Square Wavelet Graph Neural Networks for Heterogeneous Graph Anomaly Detection Chi-Square Wavelet Graph Neural Networks für Heterogene Graph Anomalie Detection 用于异源图异常异常图探测的千平方波浪图神经网络 2505.18934v1 -
1749 05-25 Can Large Language Models Infer Causal Relationships from Real-World Text? Können große Sprachmodelle Kausalbeziehungen aus Real-World Text ableiten? 大语言模型能否从真实世界文本中推断出因果关系? 2505.18931v1 -
1750 05-25 Hybrid Neural-MPM for Interactive Fluid Simulations in Real-Time Hybrid-Neural-MPM für interaktive Fluidsimulationen in Echtzeit 用于实时交互流力模拟的神经-MPM混合神经-MPM 2505.18926v1 -
1751 05-25 Graph-Based Operator Learning from Limited Data on Irregular Domains Graph-based Operator Lernen von begrenzten Daten über irreguläre Domains 以图图为基础的操作员 学习关于非常规域域的有限数据 2505.18923v1 -
1752 05-25 ALPCAHUS: Subspace Clustering for Heteroscedastic Data ALPCAHUS: Subraum-Clustering für heterosexuelle Daten ALPCAHUS: 用于河流测量数据的子空间集群 2505.18918v1 -
1753 05-25 Behavior Injection: Preparing Language Models for Reinforcement Learning Verhaltensinjektion: Vorbereitung von Sprachmodellen für verstärktes Lernen 行为注射:为强化学习准备语言模式 2505.18917v1 -
1754 05-25 PySAD: A Streaming Anomaly Detection Framework in Python PySAD: Ein Streaming-Anomaly Detection-Framework in Python PySAD: Python 流动异常检测框架 2009.02572v2 -
1755 05-25 Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach Multimodale LLMs unter Verteilungsverschiebungen verstehen: Ein informationstheoretischer Ansatz 在分销变更下理解多式LLMs:信息理论方法 2502.00577v2 -
1756 05-25 On the Role of Label Noise in the Feature Learning Process Über die Rolle von Etikettengeräuschen im Feature-Learning-Prozess 关于标签噪音在专题学习过程中的作用 2505.18909v1 -
1757 05-25 Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations Stärkere Durchsetzung der Instruktionshierarchie durch Augmented Intermediate Representations 通过扩大中级代表,加强执行指示分级制度 2505.18907v1 -
1758 05-24 (6) Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services Pre-trained Encoder-Schlussfolgerung: Enthüllen Upstream-Encoder in Downstream Machine Learning Services 培训前编码器推断:在下游机器学习服务中向上游编码器 2408.02814v2 -
1759 05-24 PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models PromptWise: Online-Lernen für kostenbewusste Prompt-Zuweisung in generativen Modellen 快速Wise:在创用模型中进行成本-软件快速指派在线学习 2505.18901v1 -
1760 05-24 Beyond Domain Randomization: Event-Inspired Perception for Visually Robust Adversarial Imitation from Videos Beyond Domain Randomization: Event-inspirierte Wahrnehmung für visuell robuste Adversarial Imitation aus Videos 超出域随机化: 视频中视觉强力反逆模仿受事件启发的感知 2505.18899v1 -
1761 05-24 Marginal Fairness: Fair Decision-Making under Risk Measures Marginal Fairness: Faire Entscheidungsfindung im Rahmen von Risikomaßnahmen 边际公平:风险措施下的公平决策 2505.18895v1 -
1762 05-24 Conformal Prediction for Uncertainty Estimation in Drug-Target Interaction Prediction Konforme Vorhersage für Unsicherheitsschätzungen in der Drogen-Ziel-Interaktionsvorhersage 药物-目标相互作用预测中不确定性估计的 非正式预测 2505.18890v1 -
1763 05-24 Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators Ermöglichung unstrukturierter Spars-Beschleunigung bei strukturierten Spars-Beschleunigern 启用结构散开加速器, 启用无结构的分散加速器 2403.07953v3 -
1764 05-24 Neural Encoding and Decoding at Scale Neurale Enkodierung und Dekodierung auf Scale 缩放时神经编码和解码 2504.08201v4 -
1765 05-24 Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey Datenvergrößerung für die Zeitreihenklassifikation: Eine umfangreiche empirische Studie und umfassende Umfrage 时间-系列分类数据扩充:广泛经验研究和全面调查 2310.10060v6 -
1766 05-24 KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning KerZOO: Kernel-Funktion informierte Zeroth-Order-Optimierung für präzise und beschleunigte LLM-Feinsteuerung KerZOO:为准确和加速 LLM 精密推荐而优化使用核心(KerZOO): 2505.18886v1 -
1767 05-24 LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders LORE: Lagrangian-optimierte robuste Einbettungen für visuelle Encoder Lagrangian- 优化的视觉编码器强力嵌入器 2505.18884v1 -
1768 05-24 LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity LinGen: Auf dem Weg zur High-Resolution Minute-Length Text-to-Video-Generation mit linearer Computational Complexity LinGen:迈向具有线性比较复杂度的高分辨率分钟-语言文本到视频的生成 2412.09856v2 -
1769 05-24 Partition Generative Modeling: Masked Modeling Without Masks Partition Generative Modellierung: Maskenmodellierung ohne Masken 生成建模:没有遮罩的蒙面建模 2505.18883v1 -
1770 05-24 RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models RefLoRA: Refactored Low-Rank-Anpassung für effizientes Feintuning großer Modelle RefLORA:为对大型模型进行高效微调而进行重构的低Rank适应 2505.18877v1 -
1771 05-24 Non-Stationary Lipschitz Bandits Nicht-stationäre Lipschitz Banditen 非固定的利普施奇茨猛匪 2505.18871v1 -
1772 05-24 Sci-LoRA: Mixture of Scientific LoRAs for Cross-Domain Lay Paraphrasing Sci-LoRA: Mischung aus wissenschaftlichen LoRAs für Cross-Domain Lay Paraphrasing Sci-LORA:将科学LORA混合起来,用于跨域地谱图谱绘制 2505.18867v1 -
1773 05-24 Distribution-Aware Mobility-Assisted Decentralized Federated Learning Distribution-Aware Mobility-Assisted Dezentrales Federated Learning 分发通知 – – 流动协助 – – 分权力下放的联邦学习 2505.18866v1 -
1774 05-24 Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning Geführt von Guardrails: Steuerungsbarrierenfunktionen als Sicherheitsinstruktoren für das Roboterlernen 由警卫队指导:作为机器人学习安全教官的控制障碍功能 2505.18858v1 -
1775 05-24 USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations USDC: Ein Datensatz von $\underline{U}$ser $\underline{S}$tance und $\underline{D}$ogmatism in langen $\underline{C}$onversations USCC: 以 $\ underline{U}$ser $\ underline{S}$tance 和 $\ underline{D}$ogmatism 的数据集, 以 Long $\ underline{C} 美元对数值 2406.16833v2 -
1776 05-24 Toward Malicious Clients Detection in Federated Learning Auf dem Weg zu bösartigen Kunden Erkennung im Föderierten Lernen 争取在联邦学习中发现恶意客户 2505.09110v2 -
1777 05-24 Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation Korruption-Bewusst Training von latenten Video-Diffusions-Modellen für robuste Text-zu-Video-Generation 原始视频视频传播模型的反腐败知识培训 2505.21545v1 -
1778 05-24 On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization Auf die Wirkung des negativen Gradienten in der Gruppe Relative Tiefenverstärkung Optimierung 对群体相对深强化优化中的负梯度效应的影响 2505.18830v1 -
1779 05-24 Multi-Agent Best Arm Identification in Stochastic Linear Bandits Multi-Agent Best Arm Identification in stochastische Linear Banditen 斯托切斯定线强盗中多代理最佳武器识别 2411.13690v2 -
1780 05-24 Improved Regret and Contextual Linear Extension for Pandora’s Box and Prophet Inequality Verbesserte regret und kontextuelle lineare Erweiterung für Pandora’s Box und Prophet Inequality 改进潘多拉盒子和先知不平等的遗憾和背景扩展线性扩展 2505.18828v1 -
1781 05-24 A Real-World Energy Management Dataset from a Smart Company Building for Optimization and Machine Learning Ein Echtzeit-Energiemanagement-Datensatz aus einem Smart Company Building für Optimierung und maschinelles Lernen 最佳优化和机器学习智能公司大楼的 “ 现实世界能源管理数据集 “ 2503.11469v2 -
1782 05-24 How to build a consistency model: Learning flow maps via self-distillation Wie man ein Konsistenzmodell baut: Flusskarten über Selbstdestillation lernen 如何建立一致性模式:通过自我蒸馏学习流程图 2505.18825v1 -
1783 05-24 Robust multi-coil MRI reconstruction via self-supervised denoising Robuste Multi-Coil-MRT-Rekonstruktion durch selbstüberwachte Denoisierung 通过自我监督的自监管的去注水进行强有力的多石油MRI重建 2411.12919v4 -
1784 05-24 Fully tensorial approach to hypercomplex neural networks Voller Tensoransatz für hyperkomplexe neuronale Netzwerke 对超复合性神经神经网络采取完全强制的全方位方法 2407.00449v3 -
1785 05-24 Stealing Training Graphs from Graph Neural Networks Stealing Training Graphen aus Graph Neural Networks 图表神经网络中的偷窃培训图 2411.11197v2 -
1786 05-24 GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets GRoQ-LoCO: Generalist und Roboter-agnostische Quadruped Locomotion Control mit Offline-Datensätzen GROQ-LoCO:使用离线数据集的通用和机器人-不可知性四分流移动控制 2505.10973v3 -
1787 05-24 Preference Leakage: A Contamination Problem in LLM-as-a-judge Bevorzugte Leckage: Ein Kontaminierungsproblem im LLM-as-a-Richter 优先渗漏:LLM-作为法官的LLM中的污染问题 2502.01534v2 -
1788 05-24 Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis Erforschung der QUIC-Dynamik: Ein großformatiger Datensatz für verschlüsselte Verkehrsanalyse 探索 QUIC 动态动态:加密流量分析的大型数据集 2410.03728v6 -
1789 05-24 DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services DiSCo: Geräte-Server Kollaborative LLM-basierte Text-Streaming-Dienste DisCo: 设备-服务器协作协作LLM基于LLM的文本流服务 2502.11417v2 -
1790 05-24 Operator-Informed Score Matching for Markov Diffusion Models Operator-Informed Score Matching für Markov Diffusion Modelle Markov 扩散模型的操作员不完善的评分匹配 2406.09084v2 -
1791 05-24 Expert-Agnostic Learning to Defer Experten-Agnostisches Lernen zur Abwehr 专家 – – 无法无天学习 2502.10533v2 -
1792 05-24 Partial Distribution Matching via Partial Wasserstein Adversarial Networks Teilverteilung Passend über Teilwasserstein Adversarial Networks 通过部分瓦森斯坦对冲网络进行部分配配 2409.10499v2 -
1793 05-24 MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation MAPLE: Verbesserung der Review Generation mit Multi-Aspect Prompt Learning in erklärbarer Empfehlung MMALE: 在可解释建议中以多角度迅速和迅速的分解方式加强审查的产生 2408.09865v2 -
1794 05-24 Governing Equation Discovery from Data Based on Differential Invariants Regulierende Gleichungs-Entdeckung aus Daten basierend auf unterschiedlichen Invarianten 从基于差异内在变量的数据中分离出来的数据 2505.18798v1 -
1795 05-24 Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection Überwachung von Graphen-Neuralnetzwerken für unbeaufsichtigte Graphenanomalienerkennung 用于不受监督的异常图图探测的 保护图形神经网络 2404.16366v2 -
1796 05-24 Leveraging Per-Instance Privacy for Machine Unlearning Per-Instance-Leveraging-Privatsphäre für das maschinelle Lernen 利用个人隐私促进机器脱学 2505.18786v1 -
1797 05-24 A physics-guided smoothing method for material modeling with digital image correlation (DIC) measurements Ein physikgeführtes Glättverfahren für die Materialmodellierung mit Messungen der digitalen Bildkorrelation (DIC) 采用物理制导平滑法进行数字图像相关测量材料建模 2505.18784v1 -
1798 05-24 Soft Weighted Machine Unlearning Weichgewichtete Maschine nicht lernen 软加权机器脱学 2505.18783v1 -
1799 05-24 One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion Eine Politik, aber viele Welten: Eine skalierbare, einheitliche Politik für vielseitige humanoide Lokomotion 一个政策,但许多世界:一个可扩展的统一政策,促进有生命力的人类活动 2505.18780v1 -
1800 05-24 HD-PiSSA: High-Rank Distributed Orthogonal Adaptation HD-PiSSA: High-Rank verteilte Orthogonalanpassung HD-PiSSA: 高射分散的正心调整适应 2505.18777v1 -
1801 05-24 Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models Starke Mitgliedschafts-Inferenzangriffe auf massive Datensätze und (Moderate) große Sprachmodelle 对大规模数据集和(口头)大语言模型的强烈成员推论攻击 2505.18773v1 -
1802 05-24 CageNet: A Meta-Framework for Learning on Wild Meshes CageNet: Ein Meta-Rahmen für das Lernen auf Wild Meshes CageNet:野生动物类学习的元框架 2505.18772v1 -
1803 05-24 Dual-Path Stable Soft Prompt Generation for Domain Generalization Dual-Path stabile Soft Prompt Generation für Domain-Verallgemeinerung 两平面稳定软软生成域通用化快速生成 2505.18770v1 -
1804 05-24 Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization Vielfacher Wasserstein Gradient Descent Algorithmus für Multi-Objective Distributional Optimization 多目标分布优化多瓦森斯坦梯度底源值 2505.18765v1 -
1805 05-24 Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model Textgeführte Multi-Property-Molekularoptimierung mit einem Diffusions-Sprachenmodell 带有传播语言模型的文本引导多财产分子优化 2410.13597v2 -
1806 05-24 How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark Wie wird LLM-Reasoning vom irrelevanten Kontext abgelenkt? Eine Analyse mit einem kontrollierten Benchmark LLM 为何被不相关背景所忽略? 2505.18761v1 -
1807 05-24 The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation Die Suche nach einer effizienten Begründung: Ein datenzentrischer Benchmark zur CoT-Destillation 有效合理理由的查询:COT蒸馏的数据中心基准 2505.18759v1 -
1808 05-24 Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees Lean and Mean Adaptive Optimization via Subset-Norm und Subspace-Momentum mit Konvergenzgarantien 通过具有聚合担保的子元和子空间动力及子空间动力进行皮和平均适应性优化 2411.07120v2 -
1809 05-24 Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding Reduzierung der Speicherung vortrainierter neuraler Netzwerke durch ratenkontrainierte Quantisierung und Entropiecodierung 通过受费率限制的量化和元件编码减少储存预培训神经网络 2505.18758v1 -
1810 05-24 Smart Energy Guardian: A Hybrid Deep Learning Model for Detecting Fraudulent PV Generation Smart Energy Guardian: Ein hybrides Deep-Learning-Modell zur Erkennung betrügerischer PV-Generation 智能能源守护者:发现欺诈性光电池发电的混合深学习模式 2505.18755v1 -
1811 05-24 HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting HiMoE: Heterogenitäts-informierte Mixture-of-Experts für faire räumlich-zeitliche Vorhersagen HimMoE:公平空间-时空预报专家的异异质性异构混合 2412.00316v3 -
1812 05-24 Season-Independent PV Disaggregation Using Multi-Scale Net Load Temporal Feature Extraction and Weather Factor Fusion Saisonunabhängige PV-Disaggregation mittels Multi-Scale Net Load Temporal Feature Extraktion und Wetterfaktor Fusion 使用多种规模净负荷时间特征抽取和天气因素融合的季节独立光电池拆分 2505.18747v1 -
1813 05-24 C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging C3R: Kanalkonditionierte Zelldarstellungen zur einheitlichen Auswertung in der Mikroskopie-Bildgebung C3R:用于对显微镜成像进行统一评价的有条件细胞代表的频道 2505.18745v1 -
1814 05-24 Interpretable Company Similarity with Sparse Autoencoders Interpretierbare Firmenähnlichkeit mit Sparse Autoencodern 与Sparse Autoencolders 相似 2412.02605v3 -
1815 05-24 Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models Feature-Extraktion und -Lenkung für eine verbesserte Kettenbildung in Sprachmodellen 语言模型中强化研究链理由的特征采掘和指南 2505.15634v2 -
1816 05-24 An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records Ein interpretierbarer Deep-Learning-Rahmen für die Vorhersage von Krankenhausrückübernahmen aus elektronischen Gesundheitsakten 预测医院从电子健康记录中读取的医院可解释的深学习框架 2310.10187v2 -
1817 05-24 AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping AuroRA: Breaking Low-Rank Engpass von LoRA mit nichtlinearer Kartierung AuroRA:用非线性绘图法打破LORA的低兰克瓶尾裂 2505.18738v1 -
1818 05-24 Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings 知识强化画画视觉表现神经网络 2105.08190v2 -
1819 05-24 MADCAT: Combating Malware Detection Under Concept Drift with Test-Time Adaptation MADCAT: Bekämpfung der Malware-Erkennung unter Konzept Drift mit Test-Zeit-Anpassung MADCAT: 在 “ 漂流 “ 概念下,通过测试-时间适应来打击 “ 恶意探测 “ 2505.18734v1 -
1820 05-24 ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search ReGUIDE: Dateneffizientes GUI Grounding über räumliche Vernunft und Suche 数据高效界面:通过空间理性和搜索进行数据高效界面定位 2505.15259v2 -
1821 05-24 Reward-Driven Interaction: Enhancing Proactive Dialogue Agents through User Satisfaction Prediction Reward-Driven Interaction: Verbesserung proaktiver Dialog-Agenten durch Nutzerzufriedenheitsvorhersage 回报率互动:通过用户满意度预测加强积极主动的对话机构 2505.18731v1 -
1822 05-24 Influence Functions for Scalable Data Attribution in Diffusion Models Einflussfunktionen für skalierbare Datenzuweisungen in Diffusionsmodellen 扩散模型中可缩放数据归属的影响函数 2410.13850v5 -
1823 05-24 Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling Message-Passing State-Space-Modelle: Verbesserung des Graphen-Lernens mit moderner Sequenzmodellierung 传递信息的国家空间模型:利用现代序列模型改进图表学习 2505.18728v1 -
1824 05-24 Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints Längenunabhängige Verallgemeinerungsgrenzen für tiefe SSM-Architekturen über Rademacher Kontraktion und Stabilitätsbeschränkungen 通过雷德马赫公司收缩和稳定制约因素对深层的SMS结构进行长度独立概括的界限 2405.20278v3 -
1825 05-24 Audio Geolocation: A Natural Sounds Benchmark Audio Geolocation: Ein natürlicher Klang Benchmark 音频地理定位:自然声音基准 2505.18726v1 -
1826 05-24 LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning LoTA-QAF: Lossless Ternary Adaptation für Quantization-Aware Fine-Tuning LoTA-QAF:量化软件微调的无损失田间适应 2505.18724v1 -
1827 05-24 Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization Optimales Transport-basiertes Token-Gewichtungssystem für verbesserte Preference-Optimierung 增强优惠优化的优化运输托肯加权计划 2505.18720v1 -
1828 05-24 Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer Neurale Parameter Suche nach schlankeren Modellen und besserer Übertragung 搜索细微精制模型和更好传输的神经参数 2505.18713v1 -
1829 05-24 Learning on LLM Output Signatures for gray-box Behavior Analysis Lernen auf LLM-Ausgangssignaturen für graue Verhaltensanalyse 学习用于灰箱行为分析的 LLM 输出签名 2503.14043v2 -
1830 05-24 Steering LLM Reasoning Through Bias-Only Adaptation Steuerung der LLM-Vernunft durch Bias-Only-Anpassung 仅有的偏差调整导致的偏差调整 2505.18706v1 -
1831 05-24 (Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models (Implizit) Ensembles von Ensembles: Epistemische Ungewissheit bricht in großen Modellen zusammen 群集集合:大型模型中的不确定性粒子折叠 2409.02628v2 -
1832 05-24 Data Overvaluation Attack and Truthful Data Valuation in Federated Learning Datenüberbewertung Angriff und Truthful Data Bewertung im Föderierten Lernen 联邦学习联盟的数据评价高估攻击和真实数据估值 2502.00494v3 -
1833 05-24 MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention MonarchAchtung: Null-Schuss-Umwandlung zu schneller, Hardware-Bewusst strukturierter Aufmerksamkeit MonarchAttention: 零热转换为快速硬件软件 2505.18698v1 -
1834 05-24 Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study Kann LLMs in Graph Continual Learning Katastrophisches Vergessen lindern? Eine systematische Studie LLMs LLM 能够减轻图持续学习中的灾难性遗忘吗?系统研究 2505.18697v1 -
1835 05-24 Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment Revisiting Model Inversion Evaluation: Von irreführenden Standards zur zuverlässigen Datenschutzbewertung 重新审视示范反向评价:从错误领导标准到可靠隐私评估 2505.03519v3 -
1836 05-24 Simultaneous Optimization of Efficiency and Degradation in Tunable HTL-Free Perovskite Solar Cells with MWCNT-Integrated Back Contact Using a Machine Learning-Derived Polynomial Regressor Gleichzeitige Optimierung von Effizienz und Degradation in Tunablen HTL-freien Perovskite-Solarzellen mit MWCNT-Integriert Zurück Kontakt mit einem maschinenlernenden Polynom-Regressor 利用机械学习多面制反转器,与MWCNT综合后退联系,同时优化金枪鱼可HTL-无 Perovskite的无Perovskite太阳能电池的效率和退化 2505.18693v1 -
1837 05-24 Variational Schrödinger Diffusion Models Variationelle Schrödinger-Diffusionsmodelle 挥发模型 2405.04795v5 -
1838 05-24 Large Language Models in the Task of Automatic Validation of Text Classifier Predictions Große Sprachmodelle in der Aufgabe der automatischen Validierung von Textklassifikatoren Vorhersagen 文本分类自动验证任务中的大语言模型 2505.18688v1 -
1839 05-24 Predictive Performance of Deep Quantum Data Re-uploading Models Predictive Performance von Deep Quantum Data Re-Uploading-Modellen 深量量数据数据重新加载模型的预测性性能 2505.20337v1 -
1840 05-24 A fast algorithm to minimize prediction loss of the optimal solution in inverse optimization problem of MILP Ein schneller Algorithmus zur Minimierung des Vorhersageverlusts der optimalen Lösung im inversen Optimierungsproblem von MILP 快速算法,以尽量减少MILP反优化问题最佳解决办法的预测损失 2405.14273v3 -
1841 05-24 Thinking like a CHEMIST: Combined Heterogeneous Embedding Model Integrating Structure and Tokens Wie ein CHEMIST denken: Kombiniertes Heterogenes Einbetten von Modellintegrationsstrukturen und Tokens 思考像CHEMIST: 混合异基因嵌入模型集成结构和调子 2502.17986v2 -
1842 05-24 Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi Erweiterung des Aktionsraums mit Konventionen zur Verbesserung der Multi-Agenten-Kooperation in Hanabi 与公约扩大行动空间,以改进哈纳比多剂合作 2412.06333v3 -
1843 05-24 COPA: Comparing the incomparable in multi-objective model evaluation COPA: Vergleich des Unvergleichbaren in der multiobjektiven Modellauswertung CCOPA: 比较在多目标模式评价中无法比较的模型评价 2503.14321v2 -
1844 05-24 End-to-End Framework for Predicting the Remaining Useful Life of Lithium-Ion Batteries End-to-End-Framework zur Vorhersage der verbleibenden Nutzungsdauer von Lithium-Ionen-Batterien 预测锂-碘电池剩余使用寿命的端至端框架 2505.16664v2 -
1845 05-24 A Quantum Approximation Scheme for k-Means Ein Quantenannäherungsprogramm für k-Means k- Means 的量接近量计划 2308.08167v3 -
1846 05-24 Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations Erzeugen der Vollfeld-Evolution der physikalischen Dynamik aus irregulären Sparse-Beobachtungen 从不定期的偏差观测中生成物理动态全场演变 2505.09284v2 -
1847 05-24 Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment? Findet Repräsentationsintervention wirklich Wunschvorstellungen und Ausgeglichenheit wieder? 代表权干预是否真正确定了理想概念和目的一致? 2505.18672v1 -
1848 05-24 Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape Flat-LoRA: Low-Rank Anpassung über eine flache verlorene Landschaft Flat-LORA: 适应平坦损失地貌的低Rank适应 2409.14396v2 -
1849 05-24 DeCaFlow: A Deconfounding Causal Generative Model DeCaFlow: Ein entkonfoundierendes Kausalgeneratives Modell DeCaFlow:一个破碎的因果创造模型 2503.15114v2 -
1850 05-24 Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems Selbstüberwachtes Evolutionsoperator-Lernen für hochdimensionelle dynamische Systeme 高多元动态系统学习 2505.18671v1 -
1851 05-24 Memory-Efficient Super-Resolution of 3D Micro-CT Images Using Octree-Based GANs: Enhancing Resolution and Segmentation Accuracy Speichereffiziente Super-Resolution von 3D-Mikro-CT-Bildern mit oktree-basierten GANs: Verbesserung der Auflösung und Segmentierung Genauigkeit 使用以屋底为主的GANs:加强分辨率和分解准确度 2505.18664v1 -
1852 05-24 Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees Adaptive Vorhersage-Powered AutoEval mit Zuverlässigkeit und Effizienzgarantien 具有可靠性和效率保障的适应性预测力自动评估 2505.18659v1 -
1853 05-24 Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics Robustheit in großen Sprachmodellen: Eine Umfrage zu Mitigationsstrategien und Evaluationsmetrics 大语言模式的强强力:减轻战略调查和评价 2505.18658v1 -
1854 05-24 LLM-QFL: Distilling Large Language Model for Quantum Federated Learning LLM-QFL: Destillieren eines großen Sprachmodells für Quantum-Federated Learning LLM-QFL:为量子联邦学习保留大语言模式 2505.18656v1 -
1855 05-24 On the Emergence of Linear Analogies in Word Embeddings Zur Entstehung linearer Analogien in Word-Embeddings 单线模拟在文字嵌入中的出现 2505.18651v1 -
1856 05-24 Flow Matching for Geometric Trajectory Simulation Flow Matching für geometrische Trajektoriensimulation 几何轨迹模拟流程匹配 2505.18647v1 -
1857 05-24 Randomized Midpoint Method for Log-Concave Sampling under Constraints Randomisierte Midpoint-Methode für Log-Concave-Sampling unter Einschränkungen 制约下对日志集点取样的随机中点方法 2405.15379v2 -
1858 05-24 STaRFormer: Semi-Supervised Task-Informed Representation Learning via Dynamic Attention-Based Regional Masking for Sequential Data StaRFormer: Halbüberwachtes Task-Informiertes Representation-Lernen über dynamisches, aufmerksamkeitsbasiertes regionales Masking für sequentielle Daten STARFormer:通过动态关注-基于关注的区域按顺序数据区域掩码,进行半超常任务化代表性学习 2504.10097v2 -
1859 05-24 ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation ThanoRA: Aufgabe Heterogenität bewusst Multi-Task Low-Rank-Anpassung 塔诺拉:任务差异性-软件多功能、多任务、低风险适应 2505.18640v1 -
1860 05-24 Graph-Supported Dynamic Algorithm Configuration for Multi-Objective Combinatorial Optimization Graphunterstützte dynamische Algorithmenkonfiguration für multi-objektive Kombinator-Optimierung 多目标组合优化多目标组合优化支持的图形支持动态算法配置 2505.16471v2 -
1861 05-24 DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection DitHub: Modulares Framework zur inkrementellen Open-Vocabulary-Objekterkennung DitHub: 递增开放词汇物体探测模块框架 2503.09271v2 -
1862 05-24 Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees Multi-Step Alignment als Markov Games: Ein optimaler Online-Gradient-Abstieg mit Konvergenzgarantien 作为Markov运动会的多步对齐:带有一致保障的乐观的在线逐渐递增人种方法 2502.12678v2 -
1863 05-24 Leveraging Structural Knowledge in Diffusion Models for Source Localization in Data-Limited Graph Scenarios Nutzung struktureller Kenntnisse in Diffusionsmodellen für die Quellenlokalisierung in datenbeschränkten Graphenszenarien 利用传播模型中的结构性知识,在数据限制的图表假设情景中实现源本地化 2502.17928v2 -
1864 05-24 Asymmetric Duos: Sidekicks Improve Uncertainty Asymmetrische Duos: Sidekicks verbessern Unsicherheit 非对称 Duos: 侧边icks 改善不确定性 2505.18636v1 -
1865 05-24 You Can Wash Hands Better: Accurate Daily Handwashing Assessment with a Smartwatch Sie können Hände besser waschen: Genaue tägliche Handwäsche Bewertung mit einer Smartwatch 你可以更好地洗手:用智能观察准确进行每日洗手评估 2112.06657v5 -
1866 05-24 Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding Denken Sie, bevor Sie akzeptieren: Semantische Reflektierende Verifizierung für schnellere spekulative Dekodierung 在你接受之前先想想: 快速投机代号的语义反省校验 2505.18629v1 -
1867 05-24 HARP: Hesitation-Aware Reframing in Transformer Inference Pass HARP: Hezitation-Aware Reframing in Transformer Inferenz Pass HARP: 变压器推断通过中的偏移-软件重新配置 2412.07282v2 -
1868 05-24 QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual Explanations QUCE: Die Minimierung und Quantifizierung pfadbasierter Unsicherheiten für generative gegenfaktische Erklärungen QUCE: 产生反事实解释的路径不确定性的最小化和量化 2402.17516v5 -
1869 05-24 Mind The Gap: Deep Learning Doesn’t Learn Deeply Mind The Gap: Deep Learning lernt nicht tief 思想差距:深学习不深入学习 2505.18623v1 -
1870 05-24 Trust, or Don’t Predict: Introducing the CWSA Family for Confidence-Aware Model Evaluation Vertrauen oder nicht voraussagen: Einführung der CWSA-Familie für vertrauensbewusste Modellbewertung 信任或不要预测:介绍CWSA家庭促进信任-了解模型评价 2505.18622v1 -
1871 05-24 Neural Solver Selection for Combinatorial Optimization Neural Solver Selection zur kombinatorischen Optimierung 组合优化的神经溶剂选择 2410.09693v2 -
1872 05-24 Federated Class-Incremental Learning with Hierarchical Generative Prototypes Föderiertes Klassen-Inkrementelles Lernen mit Hierarchischen Generativen Prototypen 具有等级制起源原型的联邦高级高等程度学习 2406.02447v4 -
1873 05-24 MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation MAVL: Ein mehrsprachiger Audio-Video-Text Datensatz für animierte Song-Übersetzung MAVL: 动动歌曲翻译多语种视听歌词数据集 2505.18614v1 -
1874 05-24 MLRan: A Behavioural Dataset for Ransomware Analysis and Detection MLRan: Ein Verhaltensdatensatz für Ransomware Analyse und Erkennung MLran:用于分析和探测Ransomware 分析和探测的行为数据集 2505.18613v1 -
1875 05-24 An Artificial Intelligence Model for Early Stage Breast Cancer Detection from Biopsy Images Ein Modell der Künstlichen Intelligenz zur Früherkennung von Brustkrebs aus Biopsiebildern 早期从生物心理图像中检测乳腺癌的人工智能模型 2505.20332v1 -
1876 05-24 Exemplar-Free Continual Learning for State Space Models Beispielfreies kontinuierliches Lernen für Staatsraummodelle 国家空间模型免税免费持续学习 2505.18604v1 -
1877 05-24 LLM-Meta-SR: Learning to Evolve Selection Operators for Symbolic Regression LLM-Meta-SR: Lernen, Auswahloperatoren für symbolische Regression zu entwickeln LLM-Meta-SR:学习如何向演进中的反射反射选择操作员学习 2505.18602v1 -
1878 05-24 Learning to Program Quantum Measurements for Machine Learning Lernen, Quantenmessungen für maschinelles Lernen zu programmieren 学习机器学习量度方案 2505.13525v2 -
1879 05-24 Sum of Squares Circuits Summe der Quadrate Schaltungen 平方电路总和 2408.11778v3 -
1880 05-24 LLMs for Supply Chain Management LLMs für Supply Chain Management 供应链管理LLMs 2505.18597v1 -
1881 05-24 MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations MisoDICE: Multi-Agent-Imitation aus nicht gekennzeichneten Mixed-Quality-Demonstrationen MisoDICE:从未贴标签的混合质量示范中多机构吸收 2505.18595v1 -
1882 05-24 Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks Bayesian Meta-Reinforcement Learning mit Laplace Variational Recurrent Networks 采用拉位变换经常网络加强Bayesian Met-加强学习 2505.18591v1 -
1883 05-24 CiRL: Open-Source Environments for Reinforcement Learning in Circular Economy and Net Zero CiRL: Open-Source-Umgebungen für verstärktes Lernen in der Kreislaufwirtschaft und Net Zero CIRL: 在循环经济和净零中加强学习的开放源环境 2505.21536v1 -
1884 05-24 Model Extrapolation Expedites Alignment Modell Extrapolation Expeditionen Ausrichtung 模型外推快速调整 2404.16792v4 -
1885 05-24 Continuous Multi-Task Pre-training for Malicious URL Detection and Webpage Classification Kontinuierliches Multi-Task-Vortraining für bösartige URL-Erkennung und Webpage-Klassifikation 恶意URL探测和网页分类连续多任务连续培训 2402.11495v2 -
1886 05-24 REAL: Representation Enhanced Analytic Learning for Exemplar-free Class-incremental Learning REAL: Darstellungsverstärktes analytisches Lernen für exemplarisch-freies Klassen-inkrementelles Lernen 实际:为免世禁初级入门学习加强代表性分析学习 2403.13522v2 -
1887 05-24 AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models AFL: Ein eingleisiger analytischer Ansatz für das Federated Learning mit vortrainierten Modellen ACL: 采用培训前模式的联邦学习单一分析方法 2405.16240v2 -
1888 05-24 Mechanical in-sensor computing: a programmable meta-sensor for structural damage classification without external electronic power Mechanische In-Sensor-Computing: ein programmierbarer Meta-Sensor für die Klassifizierung von Strukturschäden ohne externe elektronische Leistung 传感器中的机械内传感器计算:可编程的元传感器,用于结构损害分类,无外部电子电源 2505.18579v1 -
1889 05-24 Trust-Region Twisted Policy Improvement Vertrauensregion verdrehte politische Verbesserung 改变政策改进 2504.06048v3 -
1890 05-24 TabICL: A Tabular Foundation Model for In-Context Learning on Large Data TabICL: Ein tabellarisches Grundlagenmodell für das In-Context-Lernen mit großen Datenmengen TabICL: 大型数据内部知识学习表示基础模型 2502.05564v2 -
1891 05-24 DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandit Environments DAL: Ein praktisches Prior-Free Black-Box Framework für nicht-stationäre Bandit-Umgebungen DAL:非高度强盗环境实际的、事先免费的黑盒框架 2501.19401v2 -
1892 05-24 Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks Konvergenzanalyse des natürlichen Gradientenabstiegs für überparameterisierte physikinformierte neurale Netzwerke 超参数物理内成形神经神经网络的自然梯分源相趋同分析 2408.00573v3 -
1893 05-24 Autocomp: LLM-Driven Code Optimization for Tensor Accelerators Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v1 -
1894 05-24 Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs Steigerung der Effizienz und Exploration bei der Stärkung des Lernens für LLMs 提高LLMM 强化学习的效率和探索 2505.18573v1 -
1895 05-24 VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis VISTA: Vision-Language-Schlussfolgerung für eine trainingsfreie Analyse der Stock-Zeitreihen VISTA:无培训-库存无培训-时间-系列分析的远景-语言推断 2505.18570v1 -
1896 05-24 Learning without Isolation: Pathway Protection for Continual Learning Lernen ohne Isolation: Pfadschutz für kontinuierliches Lernen 无孤立的学习:持续学习的路径保护 2505.18568v1 -
1897 05-24 ReflectDiffu:Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework ReflectDiffu: Reflect zwischen emotional-intent Ansteckung und Mimicry für Empathetic Response Generation über ein RL-Diffusion Framework 反省:通过RL-扩散框架,对情感-情感内聚变和Mmimimicry之间的反射,以便产生同情性反应 2409.10289v3 -
1898 05-24 Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods Learning Fluid-Struktur-Interaktion Dynamik mit physikinformierten Neuronalen Netzwerken und eingetauchten Grenzmethoden 与物理内成形神经网络和混合边界方法的互动动态 2505.18565v1 -
1899 05-24 Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning Gelenk-Stochastische-Annäherung Random Fields mit Anwendung auf semi-überwachtes Lernen 应用到半监督学习的混合随机场 2505.20330v1 -
1900 05-24 Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning Gelenkstochastische Approximation Autoencoder mit Anwendung auf semi-überwachtes Lernen 应用到半监督学习的 联合研究- 接近自动校方 2505.18558v1 -
1901 05-24 LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis LAMDA: Ein Longitudinal Android Malware Benchmark für Konzept Drift Analyse LAMDA: 关于概念漂流分析的纵向和机器人毛毛虫基准 2505.18551v1 -
1902 05-24 ReflectGAN: Modeling Vegetation Effects for Soil Carbon Estimation from Satellite Imagery ReflectGAN: Modellierung von Vegetationseffekten für Bodenkohlenstoffschätzungen aus Satellitenbildern 反射GAN:从卫星图像中模拟土壤碳估计的植被效应 2505.18546v1 -
1903 05-24 B-score: Detecting biases in large language models using response history B-Score: Voreingenommenheit in großen Sprachmodellen anhand der Antworthistorie erkennen B-序号:利用回应历史在大型语言模型中发现偏见 2505.18545v1 -
1904 05-24 Benchmarking Poisoning Attacks against Retrieval-Augmented Generation Benchmarking von Giftangriffen gegen retrieval-angereicherte Generation 制定基准,确定对回收一代人进行中毒袭击的基准 2505.18543v1 -
1905 05-24 Mind Your Vision: Multimodal Estimation of Refractive Disorders Using Electrooculography and Eye Tracking Denken Sie an Ihre Vision: Multimodale Abschätzung refraaktiver Störungen mittels Elektrookulographie und Eye Tracking 思考你的愿景:利用电光学和眼视跟踪对折发性失常进行多模式估计 2505.18538v1 -
1906 05-24 Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD Konvergenz, Haft und Flucht: Stochastische Dynamik in der Nähe kritischer Punkte in SGD 聚合、粘合和逃离:SGD中近临界点的斯托卡动态 2505.18535v1 -
1907 05-24 CMoE: Converting Mixture-of-Experts from Dense to Accelerate LLM Inference CMoE: Konvertieren von Mischungen von Experten aus Dense zu beschleunigter LLM-Inferenz CMoE: 将混合专家从高能转换为加速LLM推理 2502.04416v2 -
1908 05-24 Preserving AUC Fairness in Learning with Noisy Protected Groups AUC Fairness beim Lernen mit geräuschgeschützten Gruppen bewahren 维护AUC在与噪音保护群体学习中的公平公平 2505.18532v1 -
1909 05-24 SMART: Self-Aware Agent for Tool Overuse Mitigation SMART: Self-Aware Agent für Tool Overuse Mitigation SMART: 减少工具过度使用自智能剂 2502.11435v2 -
1910 05-24 Compositional Generalization via Forced Rendering of Disentangled Latents Zusammensetzungelle Verallgemeinerung durch Zwangsverleumdung entwirrter Latente 通过强迫拆散的内流流流体 2501.18797v2 -
1911 05-24 CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs CLaDMoP: Übertragbare Modelle aus erfolgreichen klinischen Studien über LLMs lernen CLADMOP:通过LLMs成功临床试验学习可转让模型 2505.18527v1 -
1912 05-24 Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition Skalierbare Gauß-Prozesse mit niederrassiger Tiefenkernzersetzung 可缩放高斯进程,且低射深内核内核分解 2505.18526v1 -
1913 05-24 LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs LiSTEN: Soft Token-Embeddings für neurale Audio-LLMs lernen LISTEN: 神经音频LMS学习软软制嵌入器 2505.18517v1 -
1914 05-24 Test-Time Adaptation with Binary Feedback Test-Zeit-Anpassung mit Binär-Feedback 带有二进制反馈的测试时间适应 2505.18514v1 -
1915 05-24 Enhancing Training Data Attribution with Representational Optimization Verbesserung der Schulungsdatenzuweisung mit repräsentativer Optimierung 提高培训数据分配,优化代表性 2505.18513v1 -
1916 05-24 AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking AcuRank: Ungewissheits-Bewusst-Adaptive-Computation für Listwise-Reranking AcuRank: 列表排序的不确定性- 软件适应性计算 2505.18512v1 -
1917 05-24 SPDEBench: An Extensive Benchmark for Learning Regular and Singular Stochastic PDEs SPDEBench: Ein umfassender Benchmark für das Lernen regelmäßiger und singulärer stochastischer PDEs SPDEBENCH: 定期学习和单声速学项目的广泛基准 2505.18511v1 -
1918 05-24 How Particle System Theory Enhances Hypergraph Message Passing Wie Partikelsystemtheorie die Hypergraph-Nachricht verbessert 粒子系统理论如何增强超光速消息传递 2505.18505v1 -
1919 05-24 Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks Repräsentationslernen mit gegenseitigem Einfluss von Modalitäten für die Knotenklassifikation in multimodalen Heterogenen Netzwerken 多模式不同形式网络节点分类方式相互影响,代表学习 2505.07895v2 -
1920 05-24 LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes LiDAR-EDIT: LiDAR-Datenerstellung durch Bearbeiten der Objektlayouts in realen Szenen LiDAR-EDIT:通过在真实世界景点中编辑对象布局生成LIDAR数据 2412.00592v3 -
1921 05-24 EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents EscapeBench: Auf dem Weg zu mehr kreativer Intelligenz von Sprachmodell-Agenten 逃避:努力推进语言示范代理的创意智能 2412.13549v2 -
1922 05-24 Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks Wahrnehmungs-informierte neurale Netzwerke: Jenseits physikinformierter neuraler Netzwerke 感知内化神经网络:超越物理内化神经网络 2505.03806v2 -
1923 05-24 Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection Gruppenadaptive Schwellenoptimierung für robuste KI-generierte Texterkennung 强力AI-发光的文本探测的集团-适应性阈值优化 2502.04528v4 -
1924 05-24 Knowledge Grafting of Large Language Models Wissen Graften von großen Sprachmodellen 大语言模式知识转让 2505.18502v1 -
1925 05-24 MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning MENTOR: Mixture-of-Experts-Netzwerk mit Task-Oriented Perturbation für visuelles Verstärkungslernen INTOOR: 视力强化学习中以任务为导向的干扰干扰模拟专家网络 2410.14972v2 -
1926 05-24 G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning G1: LLMs zur Vernunft bringen bei Diagrammen mit Verstärkungslernen G1:在加强学习的图表方面向理性者传授法学硕士 2505.18499v1 -
1927 05-24 Quantum Feature Space of a Qubit Coupled to an Arbitrary Bath Quanten-Feature-Raum eines Qubits in Verbindung mit einem willkürlichen Bad 与任意浴室结合的Qubit夫妇的 量量地貌空间 2505.03397v3 -
1928 05-24 FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers FuseGPT: Lernbare Ebenen Fusion generativer vortrainierter Transformer FuseGPT: 训练前改造器的产生型先导变异器的可学习层融合 2411.14507v2 -
1929 05-24 Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking Beyond Masked and Unmasked: Diskrete Diffusion Models via Partial Masking 超越遮盖和无遮盖:通过部分遮盖分解扩散模型 2505.18495v1 -
1930 05-24 FedHL: Federated Learning for Heterogeneous Low-Rank Adaptation via Unbiased Aggregation FedHL: Föderiertes Lernen für heterogene Low-Rank-Anpassung durch unvoreingenommene Aggregation FFHL:通过无偏见的聚合体进行异种性、低兰克低差异适应的联邦学习 2505.18494v1 -
1931 05-24 TextArena TextArena TextArenna 文本 2504.11442v2 -
1932 05-24 Statistical Inference under Performativity Statistische Schlussfolgerung unter Performativität 性能下统计推断值 2505.18493v1 -
1933 05-24 Synthesizing and Adapting Error Correction Data for Mobile Large Language Model Applications Synchronisieren und Anpassen von Fehlerkorrekturdaten für mobile Großsprachen-Modellanwendungen 合成和调整移动大语言模型应用错误校正数据 2505.18488v1 -
1934 05-24 Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning Bodily Bewusstsein in visuellen Darstellungen für effizientes politisches Lernen geerdet 提高政策学习效率的视觉表现方面的共同认识 2505.18487v1 -
1935 05-24 The Prompt is Mightier than the Example Die Aufforderung ist mächtiger als das Beispiel 火急比例子更强 2505.18485v1 -
1936 05-24 DiffPuter: Empowering Diffusion Models for Missing Data Imputation DiffPuter: Empowering Diffusion Modelle für fehlende Daten-Imputation DiffPuter:赋予缺失数据计算传播模型权力 2405.20690v2 -
1937 05-24 Change Point Detection in the Frequency Domain with Statistical Reliability Punkterkennung im Frequenzbereich mit statistischer Zuverlässigkeit ändern 具有统计可靠性的频率域的更改点探测 2502.03062v2 -
1938 05-24 Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective Sigmoid-Selbstaufmerksamkeit hat eine geringere Probenkomplexität als Softmax-Selbstaufmerksamkeit: Eine Mischung aus Experten-Perspektive 与 Softmax自觉:混合专家视角相比,Sigmoid自觉的样本复杂性较低。 2502.00281v2 -
1939 05-24 Provably Robust Training of Quantum Circuit Classifiers Against Parameter Noise Wahrscheinlich robustes Training von Quantum Circuit Klassifikatoren gegen Parametergeräusche 针对参数噪音的量子电路分级器的可证实的强力培训 2505.18478v1 -
1940 05-24 CAPE: Covariate-Adjusted Pre-Training for Generalized Epidemic Time Series Forecasting CAPE: Kovariat-adjustierte Vorschulung für generalisierte epidemische Zeitreihen CAPE: 通用流行病时间序列预测共同调整前培训 2502.03393v3 -
1941 05-24 Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive Survey Große Sprachmodelle nutzen, um grundlegende Herausforderungen im Graphenlernen zu bewältigen: Eine umfassende Umfrage 使用大语言模式应对图表学习中的基本挑战:全面调查 2505.18475v1 -
1942 05-24 Performance and Generalizability Impacts of Incorporating Geolocation into Deep Learning for Dynamic PM2.5 Estimation Leistung und Verallgemeinerbarkeit Auswirkungen der Einbeziehung von Geolocation in Deep Learning für dynamische PM2.5 Abschätzung 将地理定位纳入深入学习以进行动态PP2.5估算的绩效和通用性影响 2505.18461v1 -
1943 05-24 EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks EdgeAgentX: Ein neuartiges Framework für Agentische KI am Rand in militärischen Kommunikationsnetzwerken EdgeAgengengenderX:军事通信网络边缘地带AAA剂性AI新框架 2505.18457v1 -
1944 05-24 On the Limitations and Possibilities of Nash Regret Minimization in Zero-Sum Matrix Games under Noisy Feedback Über die Einschränkungen und Möglichkeiten der Nash Regret Minimierung in Zero-Sum Matrix Games unter Noisy Feedback 根据噪音反馈在零-苏姆母体运动会中尽量减少纳什迟缓的限制和可能性 2306.13233v3 -
1945 05-24 Reinforcement Learning for Stock Transactions Verstärkungslernen für Aktientransaktionen 证券交易强化学习 2505.16099v2 -
1946 05-24 Anchored Diffusion Language Model Verankertes Diffusions-Sprachenmodell 原成品的传播语言模式 2505.18456v1 -
1947 05-24 On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts Zur Minimax-Abschätzung von Parametern in Softmax-kontaminierter Mischung von Experten 关于Softmax 被污染的专家混合体参数最小估计 2505.18455v1 -
1948 05-24 $μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts $μ$-MoE: Test-Time Pruning als Mikro-Grained Mixture-of-Experts 美元-MoE:作为微粒混合剂专家进行试验时休整 2505.18451v1 -
1949 05-24 Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting Breaking Silos: Adaptive Modellfusion löst bessere Zeitreihen voraus 破碎硅:适应性模型融合解锁更好的时间序列预测 2505.18442v1 -
1950 05-24 DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces DB-KSVD: Skalierbare alternierende Optimierung für das Entwirren hochdimensionaler Einbettungsräume DB-KSVD: 拆分高多元嵌入空间的可缩放变换最佳优化 2505.18441v1 -
1951 05-24 Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methoden für dezentralisiertes Mehr-Agenten-Verstärkungs-Lernen 分散式多机构强化学习的深神经立体-集中式多机构强化学习方法中全球最佳程度趋同 2505.18433v1
Article 0
Title@2025-05-29 (4): From Chat Logs to Collective Insights: Aggregative Question Answering
Title: From Chat Logs to Collective Insights: Aggregative Question Answering | Von Chat Logs zu Collective Insights: Aggregative Question Answering | 从聊天日志到集体透视:聚合问题解答 2505.23765v1 |
Authors: Wentao Zhang, Woojeong Kim, Yuntian Deng
Conversational agents powered by large language models (LLMs) are rapidly becoming integral to our daily interactions, generating unprecedented amounts of conversational data. Such datasets offer a powerful lens into societal interests, trending topics, and collective concerns. Yet, existing approaches typically treat these interactions as independent and miss critical insights that could emerge from aggregating and reasoning across large-scale conversation logs. In this paper, we introduce Aggregative Question Answering, a novel task requiring models to reason explicitly over thousands of user-chatbot interactions to answer aggregative queries, such as identifying emerging concerns among specific demographics. To enable research in this direction, we construct a benchmark, WildChat-AQA, comprising 6,027 aggregative questions derived from 182,330 real-world chatbot conversations. Experiments show that existing methods either struggle to reason effectively or incur prohibitive computational costs, underscoring the need for new approaches capable of extracting collective insights from large-scale conversational data.
由大型语言模型(LLMs)驱动的交汇代理机构正在迅速成为我们日常互动的有机组成部分,产生前所未有的对话数据数量。这类数据集为社会利益、趋势话题和集体关注提供了强大的透镜。然而,现有方法通常将这些互动视为独立和缺乏从大规模对话日志的汇总和推理中可能产生的关键洞察力。在本文中,我们引入了聚合问题回答,这是一项新颖的任务,要求模型明确解释数千个用户-聊天机器人互动,以解答聚合问题,例如确定特定人口群中新出现的关切问题。为了能够进行这方面的研究,我们建立了一个基准,即WildChat-AQA,由182,330个实时聊天室对话产生的6,027个汇总问题组成。实验表明,现有的方法要么是试图有效解释,要么是产生令人望而望而却望而却步的计算成本,这突出表明需要采取新的方法,能够从大规模对话数据中获取集体见解。
Article 1
Title@2025-05-29 (4): Differential Information: An Information-Theoretic Perspective on Preference Optimization
Title: Differential Information: An Information-Theoretic Perspective on Preference Optimization | Differentialinformation: Eine informationstheoretische Perspektive zur Preference-Optimierung | 差别信息:关于首选优化的信息理论观点 2505.23761v1 |
Authors: Yunjae Won, Hyunji Lee, Hyeonbin Hwang, Minjoon Seo
Direct Preference Optimization (DPO) has become a standard technique for aligning language models with human preferences in a supervised manner. Despite its empirical success, the theoretical justification behind its log-ratio reward parameterization remains incomplete. In this work, we address this gap by utilizing the Differential Information Distribution (DID): a distribution over token sequences that captures the information gained during policy updates. First, we show that when preference labels encode the differential information required to transform a reference policy into a target policy, the log-ratio reward in DPO emerges as the uniquely optimal form for learning the target policy via preference optimization. This result naturally yields a closed-form expression for the optimal sampling distribution over rejected responses. Second, we find that the condition for preferences to encode differential information is fundamentally linked to an implicit assumption regarding log-margin ordered policies-an inductive bias widely used in preference optimization yet previously unrecognized. Finally, by analyzing the entropy of the DID, we characterize how learning low-entropy differential information reinforces the policy distribution, while high-entropy differential information induces a smoothing effect, which explains the log-likelihood displacement phenomenon. We validate our theoretical findings in synthetic experiments and extend them to real-world instruction-following datasets. Our results suggest that learning high-entropy differential information is crucial for general instruction-following, while learning low-entropy differential information benefits knowledge-intensive question answering. Overall, our work presents a unifying perspective on the DPO objective, the structure of preference data, and resulting policy behaviors through the lens of differential information.
直接偏好优化(DPO)已成为以监督方式使语言模式与人类偏好相一致的一种标准技术。尽管它取得了经验上的成功,但其日志-鼠标奖励参数的理论理由仍然不完整。在这项工作中,我们通过使用差异信息分布(DID):在象征性序列上分配,捕捉政策更新过程中获得的信息。首先,我们表明,当偏爱标签将将参考政策转化为目标政策所需的差异信息编码成一个目标政策时,DPO的正轨偏差奖励将成为通过偏好优化学习目标政策的独特最佳形式。这自然产生一种封闭式的表达形式,用于最佳抽样分布,而不是被拒绝的答复。第二,我们发现,对差异信息进行编码的偏好与一个隐含的假设从根本上联系在一起,即对在政策更新政策更新过程中广泛使用的政策偏差分配。最后,我们通过分析数据变现的精度,我们如何学习低偏差的视角加强了政策分布,而高偏差信息则带来一种顺畅的效果,这解释了结果的正统化的理论性分析结果,同时,我们学习了我们关于数据流化数据流化的演化的理论性分析,从而验证了我们的数据。
Article 2
Title@2025-05-29 (4): Model Immunization from a Condition Number Perspective
Title: Model Immunization from a Condition Number Perspective | Modell Immunisierung aus einem Zustand Anzahl Perspektive | 从条件数字角度进行示范免疫 2505.23760v1 |
Authors: Amber Yijia Zheng, Cedar Site Bai, Brian Bullins, Raymond A. Yeh
Model immunization aims to pre-train models that are difficult to fine-tune on harmful tasks while retaining their utility on other non-harmful tasks. Though prior work has shown empirical evidence for immunizing text-to-image models, the key understanding of when immunization is possible and a precise definition of an immunized model remain unclear. In this work, we propose a framework, based on the condition number of a Hessian matrix, to analyze model immunization for linear models. Building on this framework, we design an algorithm with regularization terms to control the resulting condition numbers after pre-training. Empirical results on linear models and non-linear deep-nets demonstrate the effectiveness of the proposed algorithm on model immunization. The code is available at https://github.com/amberyzheng/model-immunization-cond-num.
示范免疫旨在为难以微调有害任务,同时保留其用于其他非有害任务的训练前模型。虽然先前的工作已经表明对文本到图像模型进行免疫的经验证据,但对于何时可能进行免疫的关键理解以及对免疫模式的准确定义仍然不明确。在这项工作中,我们提议了一个框架,以赫森矩阵的条件编号为基础,分析线性模型的免疫模式。在这个框架的基础上,我们设计了一个算法,以规范条款控制培训前产生的条件数字。线性模型和非线性深网的经验结果显示了拟议模式免疫算法的有效性。该代码可在https://github.com/amberyzheng/model-immunization-cond-num上查阅。
Article 3
Title@2025-05-29 (4): Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint
Title: Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint | Puzzlet von Puzzles: Wenn Vision-Language-Modelle keinen Hinweis aufnehmen können | 由谜题拼取的谜题: 当视觉语言模型无法使用提示时 2505.23759v1 |
Authors: Heekyung Lee, Jiaxin Ge, Tsung-Han Wu, Minwoo Kang, Trevor Darrell, David M. Chan
Rebus puzzles, visual riddles that encode language through imagery, spatial arrangement, and symbolic substitution, pose a unique challenge to current vision-language models (VLMs). Unlike traditional image captioning or question answering tasks, rebus solving requires multi-modal abstraction, symbolic reasoning, and a grasp of cultural, phonetic and linguistic puns. In this paper, we investigate the capacity of contemporary VLMs to interpret and solve rebus puzzles by constructing a hand-generated and annotated benchmark of diverse English-language rebus puzzles, ranging from simple pictographic substitutions to spatially-dependent cues (“head” over “heels”). We analyze how different VLMs perform, and our findings reveal that while VLMs exhibit some surprising capabilities in decoding simple visual clues, they struggle significantly with tasks requiring abstract reasoning, lateral thinking, and understanding visual metaphors.
通过图像、空间安排和符号替代将语言编码成像的Rebus 拼图、视觉拼图、视觉拼图,对当前的视觉语言模型(VLM)构成了独特的挑战。 与传统的图像字幕或问答任务不同,变复解决需要多式抽象、象征性推理以及掌握文化、语音和语言标语。 在本文中,我们调查当代VLMs通过构建一个手动生成的和附加注释的多种英语复交拼图的基准来解释和解决变现拼图的能力,从简单的图像替代到空间依赖的提示(“头”到“耳目 ” 。 我们分析了不同的VLMs是如何运作的,我们的发现表明,虽然VLMs在解码简单的视觉线索方面表现出一些惊人的能力,但是他们与需要抽象推理、横向思维和理解视觉比喻的任务进行了巨大的斗争。
Article 4
Title@2025-05-29 (4): REOrdering Patches Improves Vision Models
Title: REOrdering Patches Improves Vision Models | REOrdering Patches verbessert Vision Modelle | 重新排列补丁改进愿景模式 2505.23751v1 |
Authors: Declan Kutscher, David M. Chan, Yutong Bai, Trevor Darrell, Ritwik Gupta
Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is permutation-equivariant, modern long-sequence transformers increasingly rely on architectural approximations that break this invariance and introduce sensitivity to patch ordering. We show that patch order significantly affects model performance in such settings, with simple alternatives like column-major or Hilbert curves yielding notable accuracy shifts. Motivated by this, we propose REOrder, a two-stage framework for discovering task-optimal patch orderings. First, we derive an information-theoretic prior by evaluating the compressibility of various patch sequences. Then, we learn a policy over permutations by optimizing a Plackett-Luce policy using REINFORCE. This approach enables efficient learning in a combinatorial permutation space. REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3.01% and Functional Map of the World by 13.35%.
变压器等序列模型需要输入作为一维序列。 在视觉中, 这通常涉及使用固定的行主( raster- scan) 排序平整图像。 虽然完全自省是异位的, 现代长序变压器越来越依赖建筑近似, 打破这种偏差并引入修补顺序的灵敏度。 我们显示补丁顺序会大大影响这种环境中的模型性能, 简单替代物, 如列主或希尔伯特曲线, 产生显著的精确性变。 我们为此提议了 REODER, 是一个发现任务优化补丁排序的两阶段框架。 首先, 我们先通过评估各种补差序列的可压缩性来获得信息理论性。 然后, 我们通过利用 REINFORCE 优化 Plackett-Luce 政策来学习一种对调的政策。 这种方法可以使调色空间中的有效学习。 REOrorder 能够提高图像Net-1 K 的顶级精度, 最高为3. 01% , 和 World 地图 13. 35% 。
Article 5
Title@2025-05-29 (4): Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Title: Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences? | Verzerrung der AI Alignment: Optimiert Preference Optimization für Preferences? | AI对齐的扭曲:偏好优化是否优化优惠? 2505.23749v1 |
Authors: Paul Gölz, Nika Haghtalab, Kunhe Yang
After pre-training, large language models are aligned with human preferences based on pairwise comparisons. State-of-the-art alignment methods (such as PPO-based RLHF and DPO) are built on the assumption of aligning with a single preference model, despite being deployed in settings where users have diverse preferences. As a result, it is not even clear that these alignment methods produce models that satisfy users on average – a minimal requirement for pluralistic alignment. Drawing on social choice theory and modeling users’ comparisons through individual Bradley-Terry (BT) models, we introduce an alignment method’s distortion: the worst-case ratio between the optimal achievable average utility, and the average utility of the learned policy. The notion of distortion helps draw sharp distinctions between alignment methods: Nash Learning from Human Feedback achieves the minimax optimal distortion of $(\frac{1}{2} + o(1)) \cdot \beta$ (for the BT temperature $\beta$), robustly across utility distributions, distributions of comparison pairs, and permissible KL divergences from the reference policy. RLHF and DPO, by contrast, suffer $\geq (1 - o(1)) \cdot \beta$ distortion already without a KL constraint, and $e^{\Omega(\beta)}$ or even unbounded distortion in the full setting, depending on how comparison pairs are sampled.
在培训前,大型语言模式与基于对口比较的人类偏好相一致。尽管在用户有不同偏好的环境中,但大型语言模式在培训前后与人类偏好相符。尽管在用户有不同的偏好,但基于假设与单一偏爱模式保持一致的假设,建立了最先进的调整方法(如基于PPPO的RLHF和DPO)。因此,甚至还不清楚这些调整方法是否产生平均满足用户的模型 – – 这是多元一致的最低要求。根据社会选择理论和通过个人Bradleley-Tery(BT)模型对用户进行比较的模型,我们引入了一种调整方法的扭曲:最佳可实现的平均效用与所学政策的平均效用之间的最坏比例。扭曲概念有助于在调整方法之间作出鲜明的区分:Nash从人类反馈中学习,实现了美元(frac{12}+o(1)的最小最大最佳扭曲值,(beta)\be$(BT温度$\beta$),强的分布,比较配对比的配和允许的KL与参考政策的差异(甚至RHF)和OD=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Article 6
Title@2025-05-29 (4): Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Title: Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence | Raum-MLLM: Steigerung der MLLM-Kapazitäten in visueller räumlicher Intelligenz | 空间-MLLM:增强以视觉为基础的空间情报中的MLLM能力 2505.23747v1 |
Authors: Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan
Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced performance on 2D visual tasks. However, improving their spatial intelligence remains a challenge. Existing 3D MLLMs always rely on additional 3D or 2.5D data to incorporate spatial awareness, restricting their utility in scenarios with only 2D inputs, such as images or videos. In this paper, we present Spatial-MLLM, a novel framework for visual-based spatial reasoning from purely 2D observations. Unlike conventional video MLLMs which rely on CLIP-based visual encoders optimized for semantic understanding, our key insight is to unleash the strong structure prior from the feed-forward visual geometry foundation model. Specifically, we propose a dual-encoder architecture: a pretrained 2D visual encoder to extract semantic features, and a spatial encoder-initialized from the backbone of the visual geometry model-to extract 3D structure features. A connector then integrates both features into unified visual tokens for enhanced spatial understanding. Furthermore, we propose a space-aware frame sampling strategy at inference time, which selects the spatially informative frames of a video sequence, ensuring that even under limited token length, the model focuses on frames critical for spatial reasoning. Beyond architecture improvements, we construct the Spatial-MLLM-120k dataset and train the model on it using supervised fine-tuning and GRPO. Extensive experiments on various real-world datasets demonstrate that our spatial-MLLM achieves state-of-the-art performance in a wide range of visual-based spatial understanding and reasoning tasks. Project page: https://diankun-wu.github.io/Spatial-MLLM/.
最近多式大语言模型(MLLM)的进步大大增强了2D视觉任务的业绩。然而,改进它们的空间智能仍是一项挑战。现有的3D MLLMS总是依赖额外的3D或2.5D数据来纳入空间意识,在仅包含2D投入的情景中限制了它们的实用性,例如图像或视频。在本文中,我们介绍了Space-MLLMM,这是纯2D观测中基于视觉的空间推理的新框架。与传统视频MLLMS相比,MLLMS最优化地利用基于CLIP的视觉编码器来优化语义理解,我们的关键洞察力是释放之前的强力结构。具体地说,我们提出一个双重编码结构:一个预先训练的 2D 视觉编码器,用于提取语义特征,例如图像或视频定位模型,然后将两者的特征整合到统一的视觉标记中,以便提高空间理解。我们提出一个空间认知框架比值更精细的取样战略,在向上选择一个空间- mal- mill mill 模型,用来在Simal imal imal imal imal lader Slader Slader Slader Slader Slader Slader Slax 上显示一个我们空间模型,我们空间-mailder-mader-lader-lader-laview一个稳定的空间模型,在空间模型,在Slxxxxxxxxxx 。我们空间- 上,在空间模型中,在Slxx 。
Article 7
Title@2025-05-29 (4): To Trust Or Not To Trust Your Vision-Language Model’s Prediction
Title: To Trust Or Not To Trust Your Vision-Language Model’s Prediction | Vertrauen oder nicht Vertrauen in die Vorhersage Ihres Vision-Sprache-Modells | 相信或不相信你的视觉语言模型的预测 2505.23745v1 |
Authors: Hao Dong, Moru Liu, Jian Liang, Eleni Chatzi, Olga Fink
Vision-Language Models (VLMs) have demonstrated strong capabilities in aligning visual and textual modalities, enabling a wide range of applications in multimodal understanding and generation. While they excel in zero-shot and transfer learning scenarios, VLMs remain susceptible to misclassification, often yielding confident yet incorrect predictions. This limitation poses a significant risk in safety-critical domains, where erroneous predictions can lead to severe consequences. In this work, we introduce TrustVLM, a training-free framework designed to address the critical challenge of estimating when VLM’s predictions can be trusted. Motivated by the observed modality gap in VLMs and the insight that certain concepts are more distinctly represented in the image embedding space, we propose a novel confidence-scoring function that leverages this space to improve misclassification detection. We rigorously evaluate our approach across 17 diverse datasets, employing 4 architectures and 2 VLMs, and demonstrate state-of-the-art performance, with improvements of up to 51.87% in AURC, 9.14% in AUROC, and 32.42% in FPR95 compared to existing baselines. By improving the reliability of the model without requiring retraining, TrustVLM paves the way for safer deployment of VLMs in real-world applications. The code will be available at https://github.com/EPFL-IMOS/TrustVLM.
视觉语言模型(VLM)在调和视觉和文字模型(VLM)方面展示了强大的能力,使视觉和文字模型(VLM)能够适应多种多式理解和生成的多种应用。虽然VLM在零射和传输学习情景中表现优异,但它们仍然容易被错误分类,往往产生自信但不正确的预测。这种限制在安全关键领域构成了巨大的风险,错误预测可能导致严重后果。在这项工作中,我们引入了信任VLM(VLLM),这是一个没有培训的框架,旨在应对在VLM预测可以信任时进行估算的重大挑战。受到VLMS中观察到的模式差距的激励,以及一些概念在图像嵌入空间中更明显地代表了某些概念的洞察力。我们提出一个新的信任分级功能,利用这一空间来改进对错误分类的检测。我们在17个不同的数据集中严格评价我们的方法,使用4个架构和2 VLMM(VLM),并展示最先进的表现,在AURC的51.87%、AURO(9.14%)和FPRM(M)中的32.42%(FM)比现有的基准更安全的部署要更加可靠。
Article 8
Title@2025-05-29 (4): On the Convergence Analysis of Muon
Title: On the Convergence Analysis of Muon | Zur Konvergenzanalyse von Muon | Muon的趋同分析 2505.23737v1 |
Authors: Wei Shen, Ruichuan Huang, Minhui Huang, Cong Shen, Jiawei Zhang
The majority of parameters in neural networks are naturally represented as matrices. However, most commonly used optimizers treat these matrix parameters as flattened vectors during optimization, potentially overlooking their inherent structural properties. Recently, an optimizer called Muon has been proposed, specifically designed to optimize matrix-structured parameters. Extensive empirical evidence shows that Muon can significantly outperform traditional optimizers when training neural networks. Nonetheless, the theoretical understanding of Muon’s convergence behavior and the reasons behind its superior performance remain limited. In this work, we present a comprehensive convergence rate analysis of Muon and its comparison with Gradient Descent (GD). We further characterize the conditions under which Muon can outperform GD. Our theoretical results reveal that Muon can benefit from the low-rank and approximate blockwise diagonal structure of Hessian matrices – phenomena widely observed in practical neural network training. Our experimental results support and corroborate the theoretical findings.
神经网络中的大多数参数自然以矩阵形式呈现。然而,最常用的优化器将这些矩阵参数视为优化过程中的平坦矢量,可能忽略了它们固有的结构特性。最近,提出了称为Muon的优化器,专门设计以优化矩阵结构参数。广泛的实证证据表明,Muon在培训神经网络时可以大大优于传统优化器。然而,对Muon趋同行为的理论理解及其优异性能背后的原因仍然有限。在这项工作中,我们对Muon的趋同率进行了全面分析,并与GD进行了比较。我们进一步确定了Muon能够超越GD的条件。我们的理论结果表明,Muon可以受益于海珊矩阵的低端和近乎成块的对角结构,这是在实际神经网络培训中广泛观察到的现象。我们的实验结果支持和证实了理论结论。
Article 9
Title@2025-05-29 (4): EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast
Title: EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast | EmotionRankCLAP: Bridging Natural Language Speaking Styles und Ordinal Speech Emotion via Rank-N-Contrast | 情感-RankCLAP:通过Ran-N-Contrast将自然语言语言语言的口语风格和普通语言的情感联系起来 2505.23732v1 |
Authors: Shreeram Suresh Chandra, Lucas Goncalves, Junchen Lu, Carlos Busso, Berrak Sisman
Current emotion-based contrastive language-audio pretraining (CLAP) methods typically learn by na"ively aligning audio samples with corresponding text prompts. Consequently, this approach fails to capture the ordinal nature of emotions, hindering inter-emotion understanding and often resulting in a wide modality gap between the audio and text embeddings due to insufficient alignment. To handle these drawbacks, we introduce EmotionRankCLAP, a supervised contrastive learning approach that uses dimensional attributes of emotional speech and natural language prompts to jointly capture fine-grained emotion variations and improve cross-modal alignment. Our approach utilizes a Rank-N-Contrast objective to learn ordered relationships by contrasting samples based on their rankings in the valence-arousal space. EmotionRankCLAP outperforms existing emotion-CLAP methods in modeling emotion ordinality across modalities, measured via a cross-modal retrieval task.
以情感为基础的对比性语言-语言-语言预培训(CLAP)方法通常通过“将音频样本与相应的文本提示相匹配”来学习。 因此,这一方法未能捕捉情绪的规律性,妨碍了情感之间的理解,并常常由于调整不力而导致音频和文字嵌入之间的模式差异很大。 为了处理这些缺陷,我们引入了情感-Rank-语言预培训(CLAP),这是一种监督式的对比性学习方法,它使用情感言语和自然语言的维性属性,促进共同捕捉细微的情感变异,改善跨模式的对齐。 我们的方法利用一个Rank-N-Contrast目标,通过对比其在价值-繁荣空间的排名来学习定型关系。 情感- CLAP(EMERRank-CAP)比现有的情感- CLAP方法在模式上超越了现有的情感-常态模式模型化方法,通过跨模式的检索任务来衡量。
Article 10
Title@2025-05-29 (4): Keep Everyone Happy: Online Fair Division of Numerous Items with Few Copies
Title: Keep Everyone Happy: Online Fair Division of Numerous Items with Few Copies | Halten Sie alle glücklich: Online Fair Division von zahlreichen Artikeln mit wenigen Kopien | 让人人快乐:许多物品的在线公平分会,只有很少的影印件。 2408.12845v2 |
Authors: Arun Verma, Indrajit Saha, Makoto Yokoo, Bryan Kian Hsiang Low
This paper considers a novel variant of the online fair division problem involving multiple agents in which a learner sequentially observes an indivisible item that has to be irrevocably allocated to one of the agents while satisfying a fairness and efficiency constraint. Existing algorithms assume a small number of items with a sufficiently large number of copies, which ensures a good utility estimation for all item-agent pairs from noisy bandit feedback. However, this assumption may not hold in many real-life applications, for example, an online platform that has a large number of users (items) who use the platform’s service providers (agents) only a few times (a few copies of items), which makes it difficult to accurately estimate utilities for all item-agent pairs. To address this, we assume utility is an unknown function of item-agent features. We then propose algorithms that model online fair division as a contextual bandit problem, with sub-linear regret guarantees. Our experimental results further validate the effectiveness of the proposed algorithms.
本文审议了在线公平分配问题的一个新变体,其中涉及多个代理商,学习者依次观察一个不可分割的项目,必须不可撤销地分配给其中的一个代理商,同时满足公平和效率方面的限制;现有的算法假设少数项目,其副本数量足够多,确保了对来自吵闹的土匪反馈的所有物品代理对的有用性评估;然而,这一假设可能在许多现实应用中并不具备,例如,一个使用平台服务提供商(代理商)的用户数量众多的在线平台(项目)只有几次(项目份数不多),因此难以准确估计所有物品代理商的公用事业。为了解决这一问题,我们假设物品代理商的功能是未知的。我们然后提出一种算法,将在线公平划分模式作为背景的土匪问题,并附带线性遗憾保证。我们的实验结果进一步验证了拟议算法的有效性。
Article 11
Title@2025-05-29 (4): MuLoCo: Muon is a practical inner optimizer for DiLoCo
Title: MuLoCo: Muon is a practical inner optimizer for DiLoCo | MuLoCo: Muon ist ein praktischer Innenoptimierer für DiLoCo | MuLoCo: Muon 是 DiLoCo 的实用内部优化器 2505.23725v1 |
Authors: Benjamin Thérien, Xiaolong Huang, Irina Rish, Eugene Belilovsky
DiLoCo is a powerful framework for training large language models (LLMs) under networking constraints with advantages for increasing parallelism and accelerator utilization in data center settings. Despite significantly reducing communication frequency, however, DiLoCo’s communication steps still involve all-reducing a complete copy of the model’s parameters. While existing works have explored ways to reduce communication in DiLoCo, the role of error feedback accumulators and the effect of the inner-optimizer on compressibility remain under-explored. In this work, we investigate the effectiveness of standard compression methods including Top-k sparsification and quantization for reducing the communication overhead of DiLoCo when paired with two local optimizers (AdamW and Muon). Our experiments pre-training decoder-only transformer language models (LMs) reveal that leveraging Muon as the inner optimizer for DiLoCo along with an error-feedback accumulator allows to aggressively compress the communicated delta to 2-bits with next to no performance degradation. Crucially, MuLoCo (Muon inner optimizer DiLoCo) significantly outperforms DiLoCo while communicating 8X less and having identical memory complexity.
DILOCO是一个强大的框架,用于在网络制约下培训大型语言模型(LLMS),其优势在于增加在数据中心环境中的平行和加速利用。尽管通信频率显著降低,但DILOCO的通信步骤仍然涉及全面减少该模型参数的完整副本。虽然现有工作探索了减少DILOCO的通信的方法,但错误反馈累积器的作用以及内装节能器对压缩作用的影响仍然未得到充分探讨。在这项工作中,我们调查标准压缩方法的有效性,包括高空透析和量化,以减少DILOCO与两个当地优化器(AdamW和Muon)的通信管理费。我们的实验前训练只使用变压器语言模型(LMS)显示,利用Muon作为DILOCO的内部优化器,以及错误反馈累积器的作用,能够将传送的三角盘压缩到2位,而下一个是无性能退化。关键是, Muloco(MUon 内部优化存储器与DLOLO的复杂度小于DLO),大大超越了DLOC。
Article 12
Title@2025-05-29 (4): SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA
Title: SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA | SC-LoRA: Ausbalancieren effizienter Feinsteuerung und Wissenserhaltung über Subraum-kontrainierte LoRA | SC-LORA:通过分空间训练LORA平衡高效微调和知识保护 2505.23724v1 |
Authors: Minrui Luo, Fuhang Kuang, Yu Wang, Zirui Liu, Tianxing He
Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), are indispensable for efficiently customizing Large Language Models (LLMs). However, vanilla LoRA suffers from slow convergence speed and knowledge forgetting problems. Recent studies have leveraged the power of designed LoRA initialization, to enhance the fine-tuning efficiency, or to preserve knowledge in the pre-trained LLM. However, none of these works can address the two cases at the same time. To this end, we introduce Subspace-Constrained LoRA (SC-LoRA), a novel LoRA initialization framework engineered to navigate the trade-off between efficient fine-tuning and knowledge preservation. We achieve this by constraining the output of trainable LoRA adapters in a low-rank subspace, where the context information of fine-tuning data is most preserved while the context information of preserved knowledge is least retained, in a balanced way. Such constraint enables the trainable weights to primarily focus on the main features of fine-tuning data while avoiding damaging the preserved knowledge features. We provide theoretical analysis on our method, and conduct extensive experiments including safety preservation and world knowledge preservation, on various downstream tasks. In our experiments, SC-LoRA succeeds in delivering superior fine-tuning performance while markedly diminishing knowledge forgetting, surpassing contemporary LoRA initialization methods.
高效定制大语言模型(LLMS)离不开低频调试(LORA)方法,特别是低频调试(LORA),这是高效定制大语言模型(LLM)不可或缺的。然而,Vanilla LoRA的趋同速度缓慢,知识被忽略了问题。最近的研究利用了设计Lora的初始化能力,提高了微调效率,或保留了预先培训的LLLMM的知识。然而,所有这些工程都没有能够同时处理这两个案例。为此,我们引入了子空间调试LORA(SC-LORA),这是一个新颖的LORA初始化框架,目的是在高效微调和知识保护之间实现取舍。我们通过限制低层亚空间可培训的LORA适应者的产出来实现这一目标,在低层亚空间中保留了微调数据的背景信息,而保留知识的背景信息最少,以平衡的方式保存。这种限制使得可训练的重度能够主要侧重于微调制调数据的主要特征,同时避免损害保存的知识特征。我们从理论角度分析了我们的方法,并进行了广泛的试验,在不断改进的下层级调整世界知识,同时进行广泛的试验,在不断改进后改进后,在改进后进行。
Article 13
Title@2025-05-29 (4): ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering
Title: ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering | ML-Agent: Verstärkung von LLM-Agenten für autonome Maschinenbautechnik | ML-代理:加强自动机械学习工程的LLM代理 2505.23723v1 |
Authors: Zexi Liu, Jingyi Chai, Xinyu Zhu, Shuo Tang, Rui Ye, Bo Zhang, Lei Bai, Siheng Chen
The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, most existing approaches rely heavily on manual prompt engineering, failing to adapt and optimize based on diverse experimental experiences. Focusing on this, for the first time, we explore the paradigm of learning-based agentic ML, where an LLM agent learns through interactive experimentation on ML tasks using online reinforcement learning (RL). To realize this, we propose a novel agentic ML training framework with three key components: (1) exploration-enriched fine-tuning, which enables LLM agents to generate diverse actions for enhanced RL exploration; (2) step-wise RL, which enables training on a single action step, accelerating experience collection and improving training efficiency; (3) an agentic ML-specific reward module, which unifies varied ML feedback signals into consistent rewards for RL optimization. Leveraging this framework, we train ML-Agent, driven by a 7B-sized Qwen-2.5 LLM for autonomous ML. Remarkably, despite being trained on merely 9 ML tasks, our 7B-sized ML-Agent outperforms the 671B-sized DeepSeek-R1 agent. Furthermore, it achieves continuous performance improvements and demonstrates exceptional cross-task generalization capabilities.
大型语言模式(LLM)代理商的出现大大推动了自主机器学习(ML)工程的发展,然而,大多数现有方法都严重依赖人工快速工程,未能根据不同的实验经验进行适应和优化。我们第一次探索基于学习的代理ML模式,即一个LLM代理商利用在线强化学习(RL),通过互动实验学习ML任务。为了实现这一点,我们提议了一个新型的代理ML培训框架,由三个关键组成部分:(1) 探索性强化微调,使LLM代理商能够产生多种行动,加强RL探索;(2) 渐进式RL,使培训能够采取单一行动步骤,加快经验收集,提高培训效率;(3) 专门针对Agric ML的奖励模块,将各种ML反馈信号整合成对RL优化的一致奖励。 利用这一框架,我们培训ML-A代理商,由7B规模的Quen-2.5LMMLM驱动,明显地推动,尽管我们仅仅接受了9 ML任务的培训,但我们的7BS-S-CS-CS-SLS-CSLS-S-S-SVAx Excal eximstreal ex ex exstrual ex ex ex ex ex eximproformacal ex
Article 14
Title@2025-05-29 (4): Understanding and Mitigating Distribution Shifts For Machine Learning Force Fields
Title: Understanding and Mitigating Distribution Shifts For Machine Learning Force Fields | Verteilungsverschiebungen für maschinelle Lernkräfte verstehen und abmildern | 机器学习领域理解和缩小分布变化 2503.08674v2 |
Authors: Tobias Kreiman, Aditi S. Krishnapriyan
Machine Learning Force Fields (MLFFs) are a promising alternative to expensive ab initio quantum mechanical molecular simulations. Given the diversity of chemical spaces that are of interest and the cost of generating new data, it is important to understand how MLFFs generalize beyond their training distributions. In order to characterize and better understand distribution shifts in MLFFs, we conduct diagnostic experiments on chemical datasets, revealing common shifts that pose significant challenges, even for large foundation models trained on extensive data. Based on these observations, we hypothesize that current supervised training methods inadequately regularize MLFFs, resulting in overfitting and learning poor representations of out-of-distribution systems. We then propose two new methods as initial steps for mitigating distribution shifts for MLFFs. Our methods focus on test-time refinement strategies that incur minimal computational cost and do not use expensive ab initio reference labels. The first strategy, based on spectral graph theory, modifies the edges of test graphs to align with graph structures seen during training. Our second strategy improves representations for out-of-distribution systems at test-time by taking gradient steps using an auxiliary objective, such as a cheap physical prior. Our test-time refinement strategies significantly reduce errors on out-of-distribution systems, suggesting that MLFFs are capable of and can move towards modeling diverse chemical spaces, but are not being effectively trained to do so. Our experiments establish clear benchmarks for evaluating the generalization capabilities of the next generation of MLFFs. Our code is available at https://tkreiman.github.io/projects/mlff_distribution_shifts/.
机器学习力场(MLFFs)是取代昂贵的初始量子机械分子模拟的有希望的替代方案。鉴于化学空间的多样性,人们感兴趣的化学空间的多样性以及产生新数据的成本,我们必须了解MLFFs如何超越其培训分布范围加以概括。为了确定和更好地理解MLFFs的分布变化,我们对化学数据集进行诊断性实验,发现共同变化带来重大挑战,甚至对经过广泛数据培训的大型基础模型也是如此。根据这些观察,我们假设当前受监督的培训方法对MLFFs不够正规化,导致过度装配和学习超出分配系统的不良表现。我们随后提出了两种新方法,作为减缓MLFFs分布变化的初步步骤。我们的方法侧重于测试时间的改进战略,这些战略需要最低计算成本,而不使用昂贵的ab元素参考标签。第一个战略以光谱图表理论为基础,调整测试图表的边缘,使之与图表结构一致。我们第二个战略改进了在测试时间里程/空域模式系统外的配置结构,而不是在测试时间里程中有效地评估我们测得的缩缩缩缩的M-Lsalalalalalalal 战略,通过一个辅助目标,可以减少我们测测测测算系统。
Article 15
Title@2025-05-29 (4): DiffER: Categorical Diffusion for Chemical Retrosynthesis
Title: DiffER: Categorical Diffusion for Chemical Retrosynthesis | DiffER: Kategorische Diffusion für chemische Retrosynthese | DiffER: 化学复制合成的分类扩散 2505.23721v1 |
Authors: Sean Current, Ziqi Chen, Daniel Adu-Ampratwum, Xia Ning, Srinivasan Parthasarathy
Methods for automatic chemical retrosynthesis have found recent success through the application of models traditionally built for natural language processing, primarily through transformer neural networks. These models have demonstrated significant ability to translate between the SMILES encodings of chemical products and reactants, but are constrained as a result of their autoregressive nature. We propose DiffER, an alternative template-free method for retrosynthesis prediction in the form of categorical diffusion, which allows the entire output SMILES sequence to be predicted in unison. We construct an ensemble of diffusion models which achieves state-of-the-art performance for top-1 accuracy and competitive performance for top-3, top-5, and top-10 accuracy among template-free methods. We prove that DiffER is a strong baseline for a new class of template-free model, capable of learning a variety of synthetic techniques used in laboratory settings and outperforming a variety of other template-free methods on top-k accuracy metrics. By constructing an ensemble of categorical diffusion models with a novel length prediction component with variance, our method is able to approximately sample from the posterior distribution of reactants, producing results with strong metrics of confidence and likelihood. Furthermore, our analyses demonstrate that accurate prediction of the SMILES sequence length is key to further boosting the performance of categorical diffusion models.
通过应用传统上为自然语言处理而建造的模型,主要是通过变压器神经网络,自动化学复古法方法最近取得了成功。这些模型展示了在化学产品和反应器SMILES编码之间翻译化学产品和反应器SMILES编码的巨大能力,但因其自反性质而受到限制。我们提议DiffER,一种不使用模板的替代反转合成预测方法,即以绝对扩散的形式进行反转合成预测的替代方法,它使得整个输出SMILES序列能够以一致的方式预测。我们建造了一个综合的传播模型,能够达到顶层3、顶层5和顶层10级无模板方法的最先进的精确性能。我们证明,DiffER是新型无模板模型的强大基准,能够学习实验室环境中使用的各种合成技术,在顶层精确度度测量仪上,超过其他不使用模板的方法。通过构建一个带有新长度预测组件的绝对扩散模型,我们的方法能够从反应器的后端分布到顶端3级的竞争性性能,我们用精确度模型的精确度分析结果进一步展示。
Article 16
Title@2025-05-29 (4): COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents
Title: COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents | COBRA: Kontextueller Bandit-Algorithmus für die Sicherung wahrheitsgetreuer strategischer Agenten | COBRA: 确保真实战略媒介的背景土匪比重 2505.23720v1 |
Authors: Arun Verma, Indrajit Saha, Makoto Yokoo, Bryan Kian Hsiang Low
This paper considers a contextual bandit problem involving multiple agents, where a learner sequentially observes the contexts and the agent’s reported arms, and then selects the arm that maximizes the system’s overall reward. Existing work in contextual bandits assumes that agents truthfully report their arms, which is unrealistic in many real-life applications. For instance, consider an online platform with multiple sellers; some sellers may misrepresent product quality to gain an advantage, such as having the platform preferentially recommend their products to online users. To address this challenge, we propose an algorithm, COBRA, for contextual bandit problems involving strategic agents that disincentivize their strategic behavior without using any monetary incentives, while having incentive compatibility and a sub-linear regret guarantee. Our experimental results also validate the different performance aspects of our proposed algorithm.
本文考虑了涉及多个代理商的背景强盗问题, 学习者按顺序观察背景和代理商报告的武器,然后选择能最大限度地提高系统总体报酬的手臂。 背景强盗的现有工作假设代理商真实地报告其武器,这在许多现实生活中是不现实的。 例如, 考虑一个有多个销售者的在线平台; 一些销售者可能会为了获得好处而歪曲产品质量, 比如让平台优先向在线用户推荐产品。 为了应对这一挑战, 我们建议使用一种算法, COBRA, 用于涉及战略代理商的背景强盗问题, 这些战略代理商在不使用任何货币奖励的情况下不鼓励其战略行为,同时具有激励兼容性和亚线性遗憾保证。 我们的实验结果还验证了我们提议的算法的不同性。
Article 17
Title@2025-05-29 (4): FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control
Title: FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control | FastTD3: Einfaches, schnelles und fähiges Verstärkungslernen für die humanoide Kontrolle | 快速TD3: 人类控制简单、快速和有能力的强化学习 2505.22642v2 |
Authors: Younggyo Seo, Carmelo Sferrazza, Haoran Geng, Michal Nauman, Zhao-Heng Yin, Pieter Abbeel
Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks. In this report, we introduce FastTD3, a simple, fast, and capable RL algorithm that significantly speeds up training for humanoid robots in popular suites such as HumanoidBench, IsaacLab, and MuJoCo Playground. Our recipe is remarkably simple: we train an off-policy TD3 agent with several modifications – parallel simulation, large-batch updates, a distributional critic, and carefully tuned hyperparameters. FastTD3 solves a range of HumanoidBench tasks in under 3 hours on a single A100 GPU, while remaining stable during training. We also provide a lightweight and easy-to-use implementation of FastTD3 to accelerate RL research in robotics.
强化学习(RL)催生了机器人方面的重大进步,但其复杂性和漫长的培训时间仍然是主要的瓶颈。在本报告中,我们引入了快速TD3, 一种简单、快速、有能力的RL算法,大大加快了人类机器人在流行套房的培训,如人形堡、IsaacLab和Mujoco游乐场。我们的配方非常简单:我们培训了一个脱离政策的TD3代理,并进行了若干修改 – – 平行模拟、大批量更新、分布式评论和仔细调控的超参数。快速TD3在3小时内解决了单个A100 GPU上的一系列人形堡任务,同时在培训期间保持稳定。我们还提供了轻量和易于使用的快速TD3,以加速机器人的RL研究。
Article 18
Title@2025-05-29 (4): TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning
Title: TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning | TiRex: Nullschnelle Vorhersagen über lange und kurze Horizonte mit verbessertem In-Context-Lernen | TiRex: 利用强化的内文学习,对长地和短地平线进行零热预测 2505.23719v1 |
Authors: Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian Böck, Günter Klambauer, Sepp Hochreiter
In-context learning, the ability of large language models to perform tasks using only examples provided in the prompt, has recently been adapted for time series forecasting. This paradigm enables zero-shot prediction, where past values serve as context for forecasting future values, making powerful forecasting tools accessible to non-experts and increasing the performance when training data are scarce. Most existing zero-shot forecasting approaches rely on transformer architectures, which, despite their success in language, often fall short of expectations in time series forecasting, where recurrent models like LSTMs frequently have the edge. Conversely, while LSTMs are well-suited for time series modeling due to their state-tracking capabilities, they lack strong in-context learning abilities. We introduce TiRex that closes this gap by leveraging xLSTM, an enhanced LSTM with competitive in-context learning skills. Unlike transformers, state-space models, or parallelizable RNNs such as RWKV, TiRex retains state-tracking, a critical property for long-horizon forecasting. To further facilitate its state-tracking ability, we propose a training-time masking strategy called CPM. TiRex sets a new state of the art in zero-shot time series forecasting on the HuggingFace benchmarks GiftEval and Chronos-ZS, outperforming significantly larger models including TabPFN-TS (Prior Labs), Chronos Bolt (Amazon), TimesFM (Google), and Moirai (Salesforce) across both short- and long-term forecasts.
内文学习,大型语言模型仅使用快速实例执行任务的能力最近已经适应了时间序列预测。这一模式使得零点预测成为了零点预测,因为过去的价值是预测未来价值的背景,使非专家可以使用强大的预测工具,培训数据稀缺时提高了绩效。大多数现有的零点预测方法都依赖变压器结构,尽管在语言上取得成功,但在时间序列预测中往往低于预期,而LSTMS等经常模型往往处于优势。相反,虽然LSTMS因其国家跟踪能力而完全适合时间序列模型,但它们缺乏很强的内流学习能力。我们引入了TiRex,通过利用xLSTM(一个具有竞争力的内流学习技能的增强LSTM)来弥补这一差距。不同于变压器、州空间模型或可平行的RWKKV(TRex)等变压器、短程跟踪模型、长距轨道预测的关键属性。为了进一步促进其状态跟踪能力,我们提议在C-FMS(C-FAR-FAR-FAR-MS)长期预测中采用更大规模的C-MIS-MS(C-MIS-MIS-MIS-MIS-MIS-S-S IMFAR-MIS-I-Misal-S-S-S-Misal-S-S-S-S-S-I-S-Misal-S-S-S-S-S-S-S-MS-MS-I-MS-Tir-Tir-Tir-Tir-Tir-N-N-N-N-N-MS-C-MS-N-NC-S-S-S-M-M-NC-S-S-N-N-N-N-N-N-N-N-N-N-S-N-N-N-N-N-N-N-S-S-S-S-C-N-MS-S-S-S-S-S-S-N-N-N-N-N-N-N-N-N-N-S-S-N-MS-N-N-N-N-N-N-N-MS-N-I-N-N-N-MS-S-S-
Article 19
Title@2025-05-29 (4): Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation
Title: Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation | Fundamentalmodell versteckte Darstellungen für die Herzfrequenzschätzung aus der Auskultation | 基金会 “ 基金会 “ 用于从修术中心速估计的模型隐藏模型代表 2505.20745v2 |
Authors: Jingping Nie, Dung T. Tran, Karan Thakkar, Vasudha Kowtha, Jon Huang, Carlos Avendano, Erdrin Azemi, Vikramjit Mitra
Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information. Recently, self-supervised acoustic representation foundation models (FMs) have been proposed to offer insights into acoustics-based vital signs. However, there has been little exploration of the extent to which auscultation is encoded in these pre-trained FM representations. In this work, using a publicly available phonocardiogram (PCG) dataset and a heart rate (HR) estimation model, we conduct a layer-wise investigation of six acoustic representation FMs: HuBERT, wav2vec2, wavLM, Whisper, Contrastive Language-Audio Pretraining (CLAP), and an in-house CLAP model. Additionally, we implement the baseline method from Nie et al., 2024 (which relies on acoustic features) and show that overall, representation vectors from pre-trained foundation models (FMs) offer comparable performance to the baseline. Notably, HR estimation using the representations from the audio encoder of the in-house CLAP model outperforms the results obtained from the baseline, achieving a lower mean absolute error (MAE) across various train/validation/test splits despite the domain mismatch.
最近,提出了自我监督的声学代表基础模型(FMS),以提供对基于声学的重要信号的洞察力;然而,对于这些经过事先训练的调频演示中电解学的编码程度,几乎没有探索。在这项工作中,我们使用公开可得的光心图数据集(PCG)和心率(HR)估计模型,对六个声学代表调频进行了分层调查:HuBERT、 wav2vec2、 wavLM、Whisper、对比语言学预修课(CLAP)和内部CLAP模型。此外,我们实施了Nie等人(依赖声学特征的)2024年基准方法,并表明,从经过训练的基础模型(FMS)中代表的矢量总体表现可与基线相比。值得注意的是,通过内部CLAP模型音频导导出的数据(CLAP模型的音频导演算率超过CLAP的绝对值/模型),尽管从低基线中得出了不同程度的校程结果,我们还是跨了EMA的精确度。
Article 20
Title@2025-05-29 (4): Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning
Title: Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning | Haut-Lesion-Phenotypisierung über verschachteltes multimodales kontrastives Lernen | 通过Nested多模式反竞争学习进行皮肤脱 Le基因分析 2505.23709v1 |
Authors: Dionysis Christopoulos, Sotiris Spanos, Eirini Baltzi, Valsamis Ntouskos, Konstantinos Karantzalos
We introduce SLIMP (Skin Lesion Image-Metadata Pre-training) for learning rich representations of skin lesions through a novel nested contrastive learning approach that captures complex relationships between images and metadata. Melanoma detection and skin lesion classification based solely on images, pose significant challenges due to large variations in imaging conditions (lighting, color, resolution, distance, etc.) and lack of clinical and phenotypical context. Clinicians typically follow a holistic approach for assessing the risk level of the patient and for deciding which lesions may be malignant and need to be excised, by considering the patient’s medical history as well as the appearance of other lesions of the patient. Inspired by this, SLIMP combines the appearance and the metadata of individual skin lesions with patient-level metadata relating to their medical record and other clinically relevant information. By fully exploiting all available data modalities throughout the learning process, the proposed pre-training strategy improves performance compared to other pre-training strategies on downstream skin lesions classification tasks highlighting the learned representations quality.
我们引入了SLIMP(皮肤悬浮图像-元数据预培训),以便通过一种新颖的巢状对比式学习方法来了解丰富的皮肤损伤表现,该方法捕捉到图像和元数据之间的复杂关系;仅仅基于图像的皮肤瘤检测和皮肤损伤分类,由于成像条件(亮光、颜色、分辨率、距离等)和缺乏临床和临床信息等)的巨大差异而构成重大挑战;临床医生通常采取综合办法,评估病人的风险程度,确定哪些损伤可能是恶性,哪些需要切除,考虑到病人的医学史以及病人其他损伤的外观;受此启发,SLIMP将个别皮肤损伤的外观和元数据与病人病历和其他临床相关信息的元数据结合起来;在整个学习过程中,充分利用所有可用的数据模式,拟议的培训前战略将提高业绩,而与其他培训前战略相比,提高下游皮肤损伤分类工作的业绩,而培训前战略则强调所了解的面貌质量。
Article 21
Title@2025-05-29 (4): Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Title: Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better | Wissensisolierende Vision-Sprache-Action-Modelle: Schnell trainieren, schnell laufen, besser generalisieren | 知识绝知识的愿景-语言-行动模式:快速列车、快速跑车、更普遍化 2505.23705v1 |
Authors: Danny Driess, Jost Tobias Springenberg, Brian Ichter, Lili Yu, Adrian Li-Bell, Karl Pertsch, Allen Z. Ren, Homer Walke, Quan Vuong, Lucy Xiaoyang Shi, Sergey Levine
Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model (VLM) training. However, the constraints of real-time control are often at odds with the design of VLMs: the most powerful VLMs have tens or hundreds of billions of parameters, presenting an obstacle to real-time inference, and operate on discrete tokens rather than the continuous-valued outputs that are required for controlling robots. To address this challenge, recent VLA models have used specialized modules for efficient continuous control, such as action experts or continuous output heads, which typically require adding new untrained parameters to the pretrained VLM backbone. While these modules improve real-time and control capabilities, it remains an open question whether they preserve or degrade the semantic knowledge contained in the pretrained VLM, and what effect they have on the VLA training dynamics. In this paper, we study this question in the context of VLAs that include a continuous diffusion or flow matching action expert, showing that naively including such experts significantly harms both training speed and knowledge transfer. We provide an extensive analysis of various design choices, their impact on performance and knowledge transfer, and propose a technique for insulating the VLM backbone during VLA training that mitigates this issue. Videos are available at https://pi.website/research/knowledge_insulation.
视觉-语言行动模式(VLA)模式提供了一种强有力的方法,通过将端到端的学习与从网络规模的视觉-语言模型(VLM)培训的语义知识转让结合起来,为控制机器人等物理系统的培训控制政策提供培训,将终端到端的学习与从网络规模的视觉-语言模型(VLM)培训的语义知识转让结合起来,然而,实时控制的限制往往与VLM的设计不相符:最强大的VLM模型拥有数百亿或数千亿参数,对实时推断构成障碍,以离散的象征而不是控制机器人所需的持续价值高的输出进行操作。为了应对这一挑战,最近的VLA模型使用专门模块来进行有效的连续控制,例如行动专家或连续产出负责人,这通常需要为预先培训的VLM骨干增加新的未经培训的参数。虽然这些模块提高了实时和控制能力,但它们保存或降低预先培训VLM的语义知识,以及它们对VLA培训动态有何影响。在本文件中,我们从VLA的角度研究这一问题,其中包括在持续传播或流动的行动分析过程中提供这种知识的流学程分析。
Article 22
Title@2025-05-29 (4): (U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEs
Title: (U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEs | (U)NFV: Überwachte und unüberwachte neurale Finite-Volume-Methoden zur Lösung hyperbolischer PDEs | (U) NFV: 被监督和不受监督的解决双曲 PDE 的神经有限量方法 2505.23702v1 |
Authors: Nathan Lichtlé, Alexi Canesse, Zhe Fu, Hossein Nick Zinat Matin, Maria Laura Delle Monache, Alexandre M. Bayen
We introduce (U)NFV, a modular neural network architecture that generalizes classical finite volume (FV) methods for solving hyperbolic conservation laws. Hyperbolic partial differential equations (PDEs) are challenging to solve, particularly conservation laws whose physically relevant solutions contain shocks and discontinuities. FV methods are widely used for their mathematical properties: convergence to entropy solutions, flow conservation, or total variation diminishing, but often lack accuracy and flexibility in complex settings. Neural Finite Volume addresses these limitations by learning update rules over extended spatial and temporal stencils while preserving conservation structure. It supports both supervised training on solution data (NFV) and unsupervised training via weak-form residual loss (UNFV). Applied to first-order conservation laws, (U)NFV achieves up to 10x lower error than Godunov’s method, outperforms ENO/WENO, and rivals discontinuous Galerkin solvers with far less complexity. On traffic modeling problems, both from PDEs and from experimental highway data, (U)NFV captures nonlinear wave dynamics with significantly higher fidelity and scalability than traditional FV approaches.
我们引入了(U)NFV, 这是一种模块式神经网络结构,它概括了解决双曲养护法的经典有限体积(FV)方法。双曲部分偏差方程式(PDE)是难以解决的难题,特别是其物理相关解决方案包含冲击和不连续性的养护法。FV方法被广泛用于其数学属性:与恒温解决方案的趋同、流动保护或整体变异减少,但在复杂环境中往往缺乏准确性和灵活性。神经中量量解决这些局限性的方法是:在保存保护结构的同时,学习更新关于超长空间和时空超时短体积的规则。它既支持关于解决方案数据(NFV)的监督培训,又支持通过弱形残余损失(UNFV)进行不受监督的培训。适用于一阶保护法,(U)NFVV达到比Godunov方法低10倍的错误,优于ENO/WENO和不连续的加勒金溶剂,其复杂性要小得多。关于交通建模的问题,来自PDEs和实验性高速公路数据,(U)NFV捕捉取非直线波波波波波波动力,其传统和可变性方法远。
Article 23
Title@2025-05-29 (4): DiCoFlex: Model-agnostic diverse counterfactuals with flexible control
Title: DiCoFlex: Model-agnostic diverse counterfactuals with flexible control | DiCoFlex: Modell-agnostische diverse Gegenfakten mit flexibler Steuerung | DiCoFlex:具有灵活控制的模型 – – 不可知性多元反事实 2505.23700v1 |
Authors: Oleksii Furman, Ulvi Movsum-zada, Patryk Marszalek, Maciej Zięba, Marek Śmieja
Counterfactual explanations play a pivotal role in explainable artificial intelligence (XAI) by offering intuitive, human-understandable alternatives that elucidate machine learning model decisions. Despite their significance, existing methods for generating counterfactuals often require constant access to the predictive model, involve computationally intensive optimization for each instance and lack the flexibility to adapt to new user-defined constraints without retraining. In this paper, we propose DiCoFlex, a novel model-agnostic, conditional generative framework that produces multiple diverse counterfactuals in a single forward pass. Leveraging conditional normalizing flows trained solely on labeled data, DiCoFlex addresses key limitations by enabling real-time user-driven customization of constraints such as sparsity and actionability at inference time. Extensive experiments on standard benchmark datasets show that DiCoFlex outperforms existing methods in terms of validity, diversity, proximity, and constraint adherence, making it a practical and scalable solution for counterfactual generation in sensitive decision-making domains.
反事实解释在可解释的人工智能(XAI)中发挥着关键作用,它提供了直观的、人所无法理解的替代方法,阐明机器学习模式决定。尽管这些方法很重要,但现有的反事实方法往往需要不断访问预测模型,涉及对每种情况进行计算密集的优化,缺乏适应新的用户定义的限制而不进行再培训的灵活性。在本文中,我们提议DicoFlex,这是一个在单一前方传递过程中产生多种反事实的新颖的、不易理解的、有条件的模型化框架。DicoFlex利用仅以标签数据培训的有条件的正常流动,通过实时用户驱动的定制限制(如在推论时间的宽度和可操作性)来解决关键限制。关于标准基准数据集的广泛实验表明,DicoFlex在有效性、多样性、近距离和约束性方面超越了现有方法,使其成为敏感决策领域反事实生成的一个实用和可扩展的解决办法。
Article 24
Title@2025-05-29 (4): Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms
Title: Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms | Computational Algebra mit Achtung: Transformer Oracles für Border Basis Algorithmen | 注意的计算代数:边境基准比值的变异甲骨文 2505.23696v1 |
Authors: Hiroshi Kera, Nico Pelleriti, Yuki Ishihara, Max Zimmer, Sebastian Pokutta
Solving systems of polynomial equations, particularly those with finitely many solutions, is a crucial challenge across many scientific fields. Traditional methods like Gr"obner and Border bases are fundamental but suffer from high computational costs, which have motivated recent Deep Learning approaches to improve efficiency, albeit at the expense of output correctness. In this work, we introduce the Oracle Border Basis Algorithm, the first Deep Learning approach that accelerates Border basis computation while maintaining output guarantees. To this end, we design and train a Transformer-based oracle that identifies and eliminates computationally expensive reduction steps, which we find to dominate the algorithm’s runtime. By selectively invoking this oracle during critical phases of computation, we achieve substantial speedup factors of up to 3.5x compared to the base algorithm, without compromising the correctness of results. To generate the training data, we develop a sampling method and provide the first sampling theorem for border bases. We construct a tokenization and embedding scheme tailored to monomial-centered algebraic computations, resulting in a compact and expressive input representation, which reduces the number of tokens to encode an $n$-variate polynomial by a factor of $O(n)$. Our learning approach is data efficient, stable, and a practical enhancement to traditional computer algebra algorithms and symbolic computation.
解决多式方程式的系统,特别是那些有有限多种解决方案的系统,是许多科学领域的一项关键挑战。传统方法,如Gr'obner和边界基地,是基本的基本方法,但有很高的计算成本,这些方法激励了最近的深学习方法提高效率,尽管牺牲了产出的正确性。在这项工作中,我们引入了甲骨边边边界基础算法,这是在维持输出保证的同时加速边界基础计算的第一个深学习方法。为此,我们设计和培训了一个基于变异器的变异器,它识别并消除了计算成本昂贵的削减步骤,我们发现这些步骤在算法运行时占据了主导地位。通过在关键计算阶段有选择地援引这个步子,我们实现了与基算法相比高达3.5x的大幅加速因素,但不会损害结果的正确性。为了生成培训数据,我们开发了一个取样方法,并为边界基地提供了第一个抽样标本。我们根据单项(核心的代数计算方法)设计了一种象征性和嵌套式的计算方法,从而形成一个压缩和直观的输入式的输入,从而减少以美元为最高级计算方法的象征值的代数,从而降低了我们将一个数字的模型的模型的增压乘数。
Article 25
Title@2025-05-29 (4): On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Title: On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures | Über die Ausbildungskonvergenz von Transformern für die In-Context-Klassifizierung von Gauß-Mischungen | Gaussian混合物内集成分类变异器培训趋同 2410.11778v3 |
Authors: Wei Shen, Ruida Zhou, Jing Yang, Cong Shen
Although transformers have demonstrated impressive capabilities for in-context learning (ICL) in practice, theoretical understanding of the underlying mechanism that allows transformers to perform ICL is still in its infancy. This work aims to theoretically study the training dynamics of transformers for in-context classification tasks. We demonstrate that, for in-context classification of Gaussian mixtures under certain assumptions, a single-layer transformer trained via gradient descent converges to a globally optimal model at a linear rate. We further quantify the impact of the training and testing prompt lengths on the ICL inference error of the trained transformer. We show that when the lengths of training and testing prompts are sufficiently large, the prediction of the trained transformer approaches the ground truth distribution of the labels. Experimental results corroborate the theoretical findings.
虽然变压器在实践中表现出令人印象深刻的内流学习能力(ICL),但对于使变压器能够执行ICL的基本机制的理论理解仍处于初级阶段,这项工作旨在从理论上研究变压器在内流分类任务的培训动态,我们证明,对于某些假设对高斯混合物的内流分类,通过梯度下降而培训的单层变压器以线性速度与全球最佳模式相融合。我们进一步量化培训和测试快速长度对ICL所培训变压器的推断错误的影响。我们表明,在培训和测试时间足够长的情况下,经过培训的变压器的预测将接近标签的地面真实分布。实验结果证实了理论结论。
Article 26
Title@2025-05-29 (4): From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms
Title: From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms | Von der individuellen Erfahrung zu kollektiven Beweisen: Ein meldepflichtiger Rahmen für die Identifizierung systemischer Schäden | 从个人经验到集体证据:查明系统危害的报告框架 2502.08166v2 |
Authors: Jessica Dai, Paula Gradu, Inioluwa Deborah Raji, Benjamin Recht
When an individual reports a negative interaction with some system, how can their personal experience be contextualized within broader patterns of system behavior? We study the reporting database problem, where individual reports of adverse events arrive sequentially, and are aggregated over time. In this work, our goal is to identify whether there are subgroups–defined by any combination of relevant features–that are disproportionately likely to experience harmful interactions with the system. We formalize this problem as a sequential hypothesis test, and identify conditions on reporting behavior that are sufficient for making inferences about disparities in true rates of harm across subgroups. We show that algorithms for sequential hypothesis tests can be applied to this problem with a standard multiple testing correction. We then demonstrate our method on real-world datasets, including mortgage decisions and vaccine side effects; on each, our method (re-)identifies subgroups known to experience disproportionate harm using only a fraction of the data that was initially used to discover them.
当个人报告与某个系统的负面互动时,他们的个人经验如何在更广泛的系统行为模式中被联系到背景?我们研究报告数据库问题,即关于不利事件的个别报告按顺序出现,并随着时间的推移加以汇总。在这项工作中,我们的目标是确定是否有由相关特征组合来界定的分组,这些分组极有可能与系统发生有害互动。我们将此问题正式确定为顺序假设测试,并查明报告行为的条件,足以推断各分组之间实际伤害率的差异。我们显示,连续假设测试的算法可以用标准的多次测试校正来适用于这一问题。我们然后在现实世界数据集中展示我们的方法,包括按揭决定和疫苗副作用;在每种数据中,我们的方法(重新)仅使用最初用于发现这些数据的一部分,将已知遭受过度伤害的分组确定为已知的分组。
Article 27
Title@2025-05-29 (4): Mobi-$π$: Mobilizing Your Robot Learning Policy
Title: Mobi-$π$: Mobilizing Your Robot Learning Policy | Mobi-$π$: Mobilisierung Ihrer Roboter-Lernpolitik | Mobi-$ 美元:调动机器人学习政策 2505.23692v1 |
Authors: Jingyun Yang, Isabella Huang, Brandon Vu, Max Bajracharya, Rika Antonova, Jeannette Bohg
Learned visuomotor policies are capable of performing increasingly complex manipulation tasks. However, most of these policies are trained on data collected from limited robot positions and camera viewpoints. This leads to poor generalization to novel robot positions, which limits the use of these policies on mobile platforms, especially for precise tasks like pressing buttons or turning faucets. In this work, we formulate the policy mobilization problem: find a mobile robot base pose in a novel environment that is in distribution with respect to a manipulation policy trained on a limited set of camera viewpoints. Compared to retraining the policy itself to be more robust to unseen robot base pose initializations, policy mobilization decouples navigation from manipulation and thus does not require additional demonstrations. Crucially, this problem formulation complements existing efforts to improve manipulation policy robustness to novel viewpoints and remains compatible with them. To study policy mobilization, we introduce the Mobi-$\pi$ framework, which includes: (1) metrics that quantify the difficulty of mobilizing a given policy, (2) a suite of simulated mobile manipulation tasks based on RoboCasa to evaluate policy mobilization, (3) visualization tools for analysis, and (4) several baseline methods. We also propose a novel approach that bridges navigation and manipulation by optimizing the robot’s base pose to align with an in-distribution base pose for a learned policy. Our approach utilizes 3D Gaussian Splatting for novel view synthesis, a score function to evaluate pose suitability, and sampling-based optimization to identify optimal robot poses. We show that our approach outperforms baselines in both simulation and real-world environments, demonstrating its effectiveness for policy mobilization.
在这项工作中,我们制定了政策动员问题:找到一个移动机器人基地,这是在经过有限的相机观点培训的操纵政策方面的新环境;然而,大多数这些政策都是根据从有限的机器人位置和相机观点收集的数据进行的培训;这导致对新机器人位置的概括化不力,从而限制在移动平台上使用这些政策,特别是用于诸如按键或旋转水龙头等精确任务。在这项工作中,我们制定了政策动员问题:找到一个移动机器人基地,这是在经过有限的摄影师观点培训的操纵政策方面新出现的环境。相比之下,对政策本身进行再培训,使之更强有力地对看不见的机器人基地进行初始化,政策动员从操纵到操作,因此不需要额外的演示。 很显然,这一问题的提出补充了目前在移动平台上改进操纵政策对新观点的强大性和与这些观点的兼容性。 为了研究政策动员,我们引入了mobi-$的框架,其中包括:(1) 量化调动特定政策难度的计量标准,(2) 基于RoboCas的模拟移动操作任务,以评价政策动员政策,(3) 可视化分析工具,以及(4) 将一些基线方法用于分析,我们还提出升级的升级的升级。
Article 28
Title@2025-05-29 (4): Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels
Title: Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels | Vereinheitlichende Perspektiven: Plausible gegenfaktische Erklärungen auf globaler, gruppenweiser und lokaler Ebene | 统一观点:关于全球、集团和当地雇员的可视反事实解释 2405.17642v2 |
Authors: Oleksii Furman, Patryk Wielopolski, Łukasz Lenkiewicz, Jerzy Stefanowski, Maciej Zięba
The growing complexity of AI systems has intensified the need for transparency through Explainable AI (XAI). Counterfactual explanations (CFs) offer actionable “what-if” scenarios on three levels: Local CFs providing instance-specific insights, Global CFs addressing broader trends, and Group-wise CFs (GWCFs) striking a balance and revealing patterns within cohesive groups. Despite the availability of methods for each granularity level, the field lacks a unified method that integrates these complementary approaches. We address this limitation by proposing a gradient-based optimization method for differentiable models that generates Local, Global, and Group-wise Counterfactual Explanations in a unified manner. We especially enhance GWCF generation by combining instance grouping and counterfactual generation into a single efficient process, replacing traditional two-step methods. Moreover, to ensure trustworthiness, we innovatively introduce the integration of plausibility criteria into the GWCF domain, making explanations both valid and realistic. Our results demonstrate the method’s effectiveness in balancing validity, proximity, and plausibility while optimizing group granularity, with practical utility validated through practical use cases.
通过可解释的AI(XAI),AI系统日益复杂,增加了透明度的必要性; 反事实解释(CFS)在以下三个层面提供了可操作的“如果是什么”情景:地方CFS提供具体实例的洞察力,全球CFS处理更广泛的趋势,以及集体CFs在具有凝聚力的群体中取得平衡和揭示模式; 尽管为每个微粒层面提供了方法,但实地缺乏一种统一的方法,将这些互补方法结合起来。我们通过提出一种基于梯度的优化方法来解决这一局限性,以统一的方式为产生地方、全球和集团之间反事实解释的不同模型提出一种基于梯度的优化方法。我们特别通过将实例分组和反事实生成合并到一个单一的有效过程,以取代传统的两步方法。此外,为了确保信任性,我们创新地将可信赖性标准纳入GWCF领域,同时作出合理和现实的解释。我们的结果表明,在优化群体粒子性的同时,在优化群体有效性、近近和可信赖性方面,同时通过实际使用案例来验证实用性,从而增强GWCF的生成。
Article 29
Title@2025-05-29 (4): Learning Compositional Functions with Transformers from Easy-to-Hard Data
Title: Learning Compositional Functions with Transformers from Easy-to-Hard Data | Komponative Funktionen mit Transformern von einfachen Daten lernen | 学习从易读数据转换器的学习构成函数 2505.23683v1 |
Authors: Zixuan Wang, Eshaan Nichani, Alberto Bietti, Alex Damian, Daniel Hsu, Jason D. Lee, Denny Wu
Transformer-based language models have demonstrated impressive capabilities across a range of complex reasoning tasks. Prior theoretical work exploring the expressive power of transformers has shown that they can efficiently perform multi-step reasoning tasks involving parallelizable computations. However, the learnability of such constructions, particularly the conditions on the data distribution that enable efficient learning via gradient-based optimization, remains an open question. Towards answering this question, in this work we study the learnability of the $k$-fold composition task, which requires computing an interleaved composition of $k$ input permutations and $k$ hidden permutations, and can be expressed by a transformer with $O(\log k)$ layers. On the negative front, we prove a Statistical Query (SQ) lower bound showing that any SQ learner that makes only polynomially-many queries to an SQ oracle for the $k$-fold composition task distribution must have sample size exponential in $k$, thus establishing a statistical-computational gap. On the other hand, we show that this function class can be efficiently learned, with runtime and sample complexity polynomial in $k$, by gradient descent on an $O(\log k)$-depth transformer via two different curriculum learning strategies: one in which data consists of $k’$-fold composition functions with $k’ \le k$ presented in increasing difficulty, and another in which all such data is presented simultaneously. Our work sheds light on the necessity and sufficiency of having both easy and hard examples in the data distribution for transformers to learn complex compositional tasks.
以变换器为基础的语言模型在一系列复杂的推理任务中表现出了令人印象深刻的能力。 先前的探索变压器显性力量的理论工作表明, 变压器能够高效地执行包含平行计算在内的多步推理任务。 然而, 这样的构造, 特别是数据分配条件的学习性, 以便通过基于梯度的优化来有效学习, 仍然是个未决问题。 在回答这个问题时, 我们研究美元倍数构成任务的学习性, 需要计算美元输入值的跨端构成和美元隐藏的平价, 并且可以通过一个具有$(logal)的变压层来表达。 在负面上, 我们证明一个统计质(SQ) , 特别是数据分配条件, 使得通过基于渐变法的数据结构中, 任何SQ 类的简单质查询, 都必须以 $( $) 为单位, 来显示一个复杂的统计- 计算差距。 在另一方面, 我们显示, 这个函数类可以高效地学习, 运行时间和 数据变压变法中, 通过一个变压式的变压法, 。
Article 30
Title@2025-05-29 (4): Understanding Mode Connectivity via Parameter Space Symmetry
Title: Understanding Mode Connectivity via Parameter Space Symmetry | Mode-Konnektivität über Parameter Raumsymmetrie verstehen | 通过参数空间对称法理解模式连通性 2505.23681v1 |
Authors: Bo Zhao, Nima Dehmamy, Robin Walters, Rose Yu
Neural network minima are often connected by curves along which train and test loss remain nearly constant, a phenomenon known as mode connectivity. While this property has enabled applications such as model merging and fine-tuning, its theoretical explanation remains unclear. We propose a new approach to exploring the connectedness of minima using parameter space symmetry. By linking the topology of symmetry groups to that of the minima, we derive the number of connected components of the minima of linear networks and show that skip connections reduce this number. We then examine when mode connectivity and linear mode connectivity hold or fail, using parameter symmetries which account for a significant part of the minimum. Finally, we provide explicit expressions for connecting curves in the minima induced by symmetry. Using the curvature of these curves, we derive conditions under which linear mode connectivity approximately holds. Our findings highlight the role of continuous symmetries in understanding the neural network loss landscape.
神经网络迷宫往往通过曲线连接, 火车和测试损失几乎保持不变, 这是一种被称为模式连接的现象。 虽然此属性使得模型合并和微调等应用得以进行, 但其理论解释仍然不清楚。 我们提出一种新的方法, 利用参数空间对称来探索微型连接性。 通过将对称组的地形学与微型对称组联系起来, 我们得出线性网络微型网的连接部分的数量, 并显示跳过连接会减少这个数量。 然后我们用参数对称来检查模式连接和线性模式连接在最小值中占相当一部分的值时, 我们用参数对称来检查模式连接性连接性连接性。 最后, 我们用这些曲线的曲线的曲线来得出线性连接性模式连接性条件。 我们的发现突出了持续对称性连接在理解神经网络损失景观中所起的作用 。
Article 31
Title@2025-05-29 (4): SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem
Title: SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem | SVRPBench: Ein realistischer Maßstab für stochastisches Fahrzeugrouting-Problem | SVRPBench: 蒸汽车辆流出问题的现实基准 2505.21887v2 |
Authors: Ahmed Heakl, Yahia Salaheldin Shaaban, Martin Takac, Salem Lahlou, Zangir Iklassov
Robust routing under uncertainty is central to real-world logistics, yet most benchmarks assume static, idealized settings. We present SVRPBench, the first open benchmark to capture high-fidelity stochastic dynamics in vehicle routing at urban scale. Spanning more than 500 instances with up to 1000 customers, it simulates realistic delivery conditions: time-dependent congestion, log-normal delays, probabilistic accidents, and empirically grounded time windows for residential and commercial clients. Our pipeline generates diverse, constraint-rich scenarios, including multi-depot and multi-vehicle setups. Benchmarking reveals that state-of-the-art RL solvers like POMO and AM degrade by over 20% under distributional shift, while classical and metaheuristic methods remain robust. To enable reproducible research, we release the dataset and evaluation suite. SVRPBench challenges the community to design solvers that generalize beyond synthetic assumptions and adapt to real-world uncertainty.
不确定情况下的强力航向是现实世界物流的核心,但大多数基准是静态、理想化的环境。我们介绍了SVRPBench,这是第一个在城市规模车辆航道中捕捉高纤维性随机动态的开放基准。它覆盖了500多例,有多达1000名客户,模拟了现实的交付条件:根据时间的拥堵、逻辑正常的延误、概率性事故,以及基于经验的住宅和商业客户时间窗口。我们的输油管道产生了多种多样的、限制性强的情景,包括多功能和多车辆设置。基准显示,在分布式转换中,POM和AM等最先进的RL解答器在20 % , 而传统和计量方法依然健全。为了进行再生研究,我们发布了数据集和评价套件。SVRPBench挑战社区设计超越合成假设并适应现实世界不确定性的解决方案。
Article 32
Title@2025-05-29 (4): Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
Title: Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds | Bayesische Optimierung durch menschliches Feedback: Nah-optimale Reue-Bounds | Bayesian 人体反馈的优化:接近最佳的冷却环 2505.23673v1 |
Authors: Aya Kayal, Sattar Vakili, Laura Toni, Da-shan Shiu, Alberto Bernacchia
Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\tilde{\mathcal{O}}(\sqrt{\Gamma(T)T})$, where $\Gamma(T)$ represents the maximum information gain$\unicode{x2014}$a kernel-specific complexity term$\unicode{x2014}$and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO$\unicode{x2014}$achieved with richer feedback models$\unicode{x2014}$are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.
通过基于优惠的反馈,巴伊西亚优化(BOO)最近因其新出现的应用而引起极大关注。我们提到这一问题,即巴伊西亚优化来自人类反馈(BOHF),它与传统BO不同,它从一个减少的反馈模式中学习了最佳行动,每个步骤都向学习者透露了两种行动之间的偏好。目标是利用有限的优惠查询来确定最佳行动,通常通过昂贵的人类反馈获得。采用布拉德-泰鲁斯(BTL)的反馈模式(BBTL)为若干算法的性能提供了遗憾界限。在这项工作中,我们在同一个框架内发展了更严格的绩效保障。具体地说,我们从一个减少的反馈模式($\tilde\mathcal{O\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Article 33
Title@2025-05-29 (4): GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
Title: GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents | GSO: Herausfordernde Software-Optimierungsaufgaben zur Bewertung von SWE-Agenten | GSO:评估SWE-Agentics的有挑战的软件优化任务 2505.23671v1 |
Authors: Manish Shetty, Naman Jain, Jinjian Liu, Vijay Kethanaboyina, Koushik Sen, Ion Stoica
Developing high-performance software is a complex task that requires specialized expertise. We introduce GSO, a benchmark for evaluating language models’ capabilities in developing high-performance software. We develop an automated pipeline that generates and executes performance tests to analyze repository commit histories to identify 102 challenging optimization tasks across 10 codebases, spanning diverse domains and programming languages. An agent is provided with a codebase and performance test as a precise specification, and tasked to improve the runtime efficiency, which is measured against the expert developer optimization. Our quantitative evaluation reveals that leading SWE-Agents struggle significantly, achieving less than 5% success rate, with limited improvements even with inference-time scaling. Our qualitative analysis identifies key failure modes, including difficulties with low-level languages, practicing lazy optimization strategies, and challenges in accurately localizing bottlenecks. We release the code and artifacts of our benchmark along with agent trajectories to enable future research.
开发高性能软件是一项复杂的任务,需要专门知识。我们引入了GSO,这是评价语言模型开发高性能软件能力的基准。我们开发了一个自动管道,生成和执行绩效测试,以分析存储库,承诺历史查明10个代码库的102项挑战性优化任务,涵盖不同的领域和编程语言。向代理商提供了一个代码库和性能测试,作为精确的规格,并负责提高运行时间效率,以专家开发师的优化为衡量标准。我们的定量评估显示,领先的SWE-Agency 进行了巨大的斗争,取得了不到5%的成功率,即便在推论时间上也有有限的改进。我们的质量分析确定了关键的失败模式,包括使用低度语言的困难、采用懒惰性优化战略,以及在准确定位瓶颈方面存在的挑战。我们发布了基准的代码和工艺以及代理轨迹,以利今后的研究。
Article 34
Title@2025-05-29 (4): Maximizing Confidence Alone Improves Reasoning
Title: Maximizing Confidence Alone Improves Reasoning | Maximierung des Vertrauens allein verbessert die Vernunft | 使信心最大化单独提高合理性 2505.22660v2 |
Authors: Mihir Prabhudesai, Lili Chen, Alex Ippoliti, Katerina Fragkiadaki, Hao Liu, Deepak Pathak
Reinforcement learning (RL) has enabled machine learning models to achieve significant advances in many fields. Most recently, RL has empowered frontier language models to solve challenging math, science, and coding problems. However, central to any RL algorithm is the reward function, and reward engineering is a notoriously difficult problem in any domain. In this paper, we propose RENT: Reinforcement Learning via Entropy Minimization – a fully unsupervised RL method that requires no external reward or ground-truth answers, and instead uses the model’s entropy of its underlying distribution as an intrinsic reward. We find that by reinforcing the chains of thought that yield high model confidence on its generated answers, the model improves its reasoning ability. In our experiments, we showcase these improvements on an extensive suite of commonly-used reasoning benchmarks, including GSM8K, MATH500, AMC, AIME, and GPQA, and models of varying sizes from the Qwen and Mistral families. The generality of our unsupervised learning method lends itself to applicability in a wide range of domains where external supervision is unavailable.
强化学习(RL)使机器学习模式在许多领域取得了显著进步。 最近,RL授权前沿语言模式解决具有挑战性的数学、科学和编码问题。然而,任何RL算法的核心是奖赏功能,而奖赏工程则是任何领域一个臭名昭著的困难问题。 在本文中,我们提议RENT:通过最小化强化学习(Entropy最小化) – – 一种完全不受监督的RL方法,不需要外部奖赏或地面真相回答,而是使用模型基本分布的螺旋状作为内在奖赏。我们发现,通过加强能够对其生成的答案产生高度模型信心的思维链,模型提高了其推理能力。我们在实验中展示了这些改进之处,展示了一套广泛通用的推理基准,包括GSM8K、MATH500、AMC、AIME和GPQA,以及来自Quen和Mistral家庭不同大小的模式。我们未超超超的学习方法的通用性适用于无法进行外部监督的广泛领域。
Article 35
Title@2025-05-29 (4): SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
Title: SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression | SLiM: Ein-Schuss-Quantisierung und Sparsamkeit mit Low-Rank-Annäherung für LLM-Gewichtskompression | SLiM: LLM 重量压缩的单射量和与低级别近似相近的分数 2410.09615v3 |
Authors: Mohammad Mozaffari, Amir Yazdanbakhsh, Maryam Mehri Dehnavi
Conventional model compression techniques for LLMs address high memory consumption and slow inference challenges but typically require computationally expensive retraining to preserve accuracy. In contrast, one-shot compression methods eliminate retraining cost, but struggle to achieve accuracy comparable to dense models. This paper presents SLIM, a new one-shot compression framework that holistically integrates hardware-friendly quantization, sparsity, and low-rank approximation into a unified process. First, we formulate the quantization process using a probabilistic approach (SLIM-Quant) that enables us to apply uniform quantization. Then, we use an existing one-shot pruning method to apply semi-structured sparsity on top of the quantized weights. Finally, to compensate for the introduced aggregated quantization and sparsity error, we use a novel saliency function with unique invertible and additive features that enables us to mathematically compute the value of low-rank adapters. SLIM improves model accuracy by up to 5.66% (LLaMA-2-7B) for 2:4 sparsity with 4-bit weight quantization, outperforming prior methods. Models compressed with SLIM achieve up to 4.3x and 3.8x on Nvidia RTX3060 and A100 GPUs, respectively. Additionally, they achieve up to 0.23x end-to-end memory reduction in comparison to their dense counterparts. We also propose an optional PEFT recipe that further improves accuracy by up to 1.66% (LLaMA-2-13B) compared to SLIM without fine-tuning.
LLMS的常规模型压缩技术解决了高内存消耗和缓慢发酵的挑战,但通常需要计算昂贵的再培训才能保持准确性。相比之下,一发压缩方法消除了再培训成本,但努力达到与密度模型相近的精确度。本文展示了SLIM,这是一个新的一发压缩框架,在整体上将硬件友好的量化、宽度和低调近似值整合到一个统一的进程中。首先,我们采用概率化方法(SLIM-Quant)来制定量化进程,这使我们能够应用统一的量化方法(SLIM-Qunat)来进行统一。然后,我们使用现有的一发压缩方法,将半成型的压缩方法消除再精确度成本成本成本,但在四比重的重量顶端上应用半结构的松散度。最后,为了补偿引入的复合四分化和宽度错误,我们使用新的显眼功能,将硬件友好易读和低端适应器的价值进行数学的计算。SLIM将模型提高到5.66%(LMA-2-7B)进一步将模型改进模型的准确性精确度提高到2.4比重重量,将SIM-ral-ral-revilx分别推出S-S-48-48-40至S-40-MA-MA-S-SIM-40-40-S-S-MA-S-S-S-S-S-S-I-S-I-S-S-S-S-S-S-40-S-S-S-S-MA-S-S-S-S-S-S-MA-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-S-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-
Article 36
Title@2025-05-29 (4): LoLA: Low-Rank Linear Attention With Sparse Caching
Title: LoLA: Low-Rank Linear Attention With Sparse Caching | LoLA: Low-Rank Lineare Aufmerksamkeit mit Sparse Caching | LoLA: 低兰克线性注意, 以粗糙的缓存 2505.23666v1 |
Authors: Luke McDermott, Robert W. Heath Jr., Rahul Parhi
Transformer-based large language models suffer from quadratic complexity at inference on long sequences. Linear attention methods are efficient alternatives, however, they fail to provide an accurate approximation of softmax attention. By additionally incorporating sliding window attention into each linear attention head, this gap can be closed for short context-length tasks. Unfortunately, these approaches cannot recall important information from long contexts due to “memory collisions”. In this paper , we propose LoLA: Low-rank Linear Attention with sparse caching. LoLA separately stores additional key-value pairs that would otherwise interfere with past associative memories. Moreover, LoLA further closes the gap between linear attention models and transformers by distributing past key-value pairs into three forms of memory: (i) recent pairs in a local sliding window; (ii) difficult-to-memorize pairs in a sparse, global cache; and (iii) generic pairs in the recurrent hidden state of linear attention. As an inference-only strategy, LoLA enables pass-key retrieval on up to 8K context lengths on needle-in-a-haystack tasks from RULER. It boosts the accuracy of the base subquadratic model from 0.6% to 97.4% at 4K context lengths, with a 4.6x smaller cache than that of Llama-3.1 8B. LoLA demonstrates strong performance on zero-shot commonsense reasoning tasks among 1B and 8B parameter subquadratic models. Finally, LoLA is an extremely lightweight approach: Nearly all of our results can be reproduced on a single consumer GPU.
以变换器为基础的大型语言模型在长序列的推论中具有二次复杂性。 线性关注方法是高效的替代方法, 但是它们无法提供精确的软体关注近似值。 此外, 线性关注方法通过将滑动窗口关注点纳入每个线性关注头, 这一差距可以因短期上下文任务而缩小。 不幸的是, 由于“ 模拟碰撞” , 这些方法无法回忆长背景下的重要信息 。 在本文中, 我们提议 LoLLAA: 低端线性关注, 且缓冲不小。 LoLA 单独存储了额外的关键值配对配对, 否则会干扰过去的关联记忆。 此外, LoLA 进一步缩小线性关注模型和变异器之间的差距, 将过去的键性值配对分配成三种记忆形式 :(i) 本地滑动窗口中的最近一对; (ii) 难以在“ 全球缓冲器” 中进行模拟的双对; (iii) 经常隐藏线性关注状态下的通用对配对。 一种只发光化策略, LoLSastA 能够让直截段段内上到8K- 直径B 的直径直径直线性操作的直径直线性操作, 直径直线性 A 直线性 A 直线性A 的直径直径直径直径直线性 A 直线性 A 直对 直径直径直径对 直对 直对 直对 直线性 A 。
Article 37
Title@2025-05-29 (4): AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction
Title: AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction | AMBER: Adaptive Mesh-Generierung durch iterative Mesh-Auflösungsvorhersage | 以迭代网目分辨率预测的适应性代谢代谢 2505.23663v1 |
Authors: Niklas Freymuth, Tobias Würth, Nicolas Schreiber, Balazs Gyenes, Andreas Boltres, Johannes Mitsch, Aleksandar Taranovic, Tai Hoang, Philipp Dahlinger, Philipp Becker, Luise Kärger, Gerhard Neumann
The cost and accuracy of simulating complex physical systems using the Finite Element Method (FEM) scales with the resolution of the underlying mesh. Adaptive meshes improve computational efficiency by refining resolution in critical regions, but typically require task-specific heuristics or cumbersome manual design by a human expert. We propose Adaptive Meshing By Expert Reconstruction (AMBER), a supervised learning approach to mesh adaptation. Starting from a coarse mesh, AMBER iteratively predicts the sizing field, i.e., a function mapping from the geometry to the local element size of the target mesh, and uses this prediction to produce a new intermediate mesh using an out-of-the-box mesh generator. This process is enabled through a hierarchical graph neural network, and relies on data augmentation by automatically projecting expert labels onto AMBER-generated data during training. We evaluate AMBER on 2D and 3D datasets, including classical physics problems, mechanical components, and real-world industrial designs with human expert meshes. AMBER generalizes to unseen geometries and consistently outperforms multiple recent baselines, including ones using Graph and Convolutional Neural Networks, and Reinforcement Learning-based approaches.
使用精密元素法(FEM)尺度模拟复杂的物理系统,其成本和准确性与基本网格的分辨率相仿。适应性 meshes通过在关键区域改进分辨率来提高计算效率,但通常需要由一位人类专家进行任务特定的超光度或烦琐的手工设计。我们提议通过专家重建(AMBER)进行适应性模拟,这是对网状适应的一种监督的学习方法。从粗略的网格开始,AMBER迭接地预测了缩放场,即从几何到目标网格的本地元件大小的函数映射,并利用这一预测来利用一个箱外网格生成一个新的中间网格。这一过程通过一个等级式的图形神经网络来启动,并依靠通过在培训期间将专家标签自动投射到AMBER生成的数据上来增强数据。我们从粗略的网格中对2D和3D数据集进行了评估,其中包括经典物理学问题、机械部件和与人类专家模拟的实界工业设计。 AMBER 将一般地分为看不见的和连续的超缓度。
Article 38
Title@2025-05-29 (4): Bayesian Perspective on Memorization and Reconstruction
Title: Bayesian Perspective on Memorization and Reconstruction | Bayesische Perspektive auf Erinnerung und Wiederaufbau | Bayes人对记忆和重建的看法 2505.23658v1 |
Authors: Haim Kaplan, Yishay Mansour, Kobbi Nissim, Uri Stemmer
We introduce a new Bayesian perspective on the concept of data reconstruction, and leverage this viewpoint to propose a new security definition that, in certain settings, provably prevents reconstruction attacks. We use our paradigm to shed new light on one of the most notorious attacks in the privacy and memorization literature - fingerprinting code attacks (FPC). We argue that these attacks are really a form of membership inference attacks, rather than reconstruction attacks. Furthermore, we show that if the goal is solely to prevent reconstruction (but not membership inference), then in some cases the impossibility results derived from FPC no longer apply.
我们从新的贝叶斯人的角度看待数据重建的概念,并利用这个观点提出一个新的安全定义,在某些环境下,可以明显地防止重建攻击。我们利用我们的范式,对隐私和记忆文献中最臭名昭著的攻击之一——指纹代码攻击(FCC ) —— 提供新的信息。 我们争辩说,这些攻击实际上是成员推论攻击的一种形式,而不是重建攻击。 此外,我们表明,如果目标仅仅在于防止重建(而不是成员推论),那么在某些情况下,从FPC产生的不可能的结果不再适用。
Article 39
Title@2025-05-29 (4): Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation
Title: Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation | Aktives Layer-Kontrastives Decodieren reduziert Halluzination bei der Generierung von Großsprachenmodellen | 大型语言模式生成中活性多语言解层解码减少幻觉 2505.23657v1 |
Authors: Hongxiang Zhang, Hao Chen, Tianyi Zhang, Muhao Chen
Recent decoding methods improve the factuality of large language models~(LLMs) by refining how the next token is selected during generation. These methods typically operate at the token level, leveraging internal representations to suppress superficial patterns. Nevertheless, LLMs remain prone to hallucinations, especially over longer contexts. In this paper, we propose Active Layer-Contrastive Decoding (ActLCD), a novel decoding strategy that actively decides when to apply contrasting layers during generation. By casting decoding as a sequential decision-making problem, ActLCD employs a reinforcement learning policy guided by a reward-aware classifier to optimize factuality beyond the token level. Our experiments demonstrate that ActLCD surpasses state-of-the-art methods across five benchmarks, showcasing its effectiveness in mitigating hallucinations in diverse generation scenarios.
最近解码方法通过精炼代代代中如何选择下一个符号来提高大型语言模型~(LLMs)的实际情况质量。 这些方法通常在象征性层面运作,利用内部代表来压制表面模式。 尽管如此,LLMs仍然容易产生幻觉,特别是在较长的环境下。 在本文中,我们提议了一种新的解码战略,即积极的多层调解码战略,即积极决定代中何时应用对比层。通过将解码作为一个相继的决策问题,ActLCD采用了一种强化学习政策,由有奖分的分类师指导,使事实质量在象征性层面之外达到最佳水平。我们的实验表明,AcLCD超越了五个基准的最新方法,显示了它在减少不同代中幻觉方面的有效性。
Article 40
Title@2025-05-29 (4): How does Transformer Learn Implicit Reasoning?
Title: How does Transformer Learn Implicit Reasoning? | Wie lernt Transformer Implizite Vernunft? | 变形者如何学习隐含理由? 2505.23653v1 |
Authors: Jiaran Ye, Zijun Yao, Zhidian Huang, Liangming Pan, Jinxin Liu, Yushi Bai, Amy Xin, Liu Weichuan, Xiaoyin Che, Lei Hou, Juanzi Li
Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly – producing correct answers without explicitly verbalizing intermediate steps – but the underlying mechanisms remain poorly understood. In this paper, we study how such implicit reasoning emerges by training transformers from scratch in a controlled symbolic environment. Our analysis reveals a three-stage developmental trajectory: early memorization, followed by in-distribution generalization, and eventually cross-distribution generalization. We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures. To interpret these behaviors, we introduce two diagnostic tools: cross-query semantic patching, which identifies semantically reusable intermediate representations, and a cosine-based representational lens, which reveals that successful reasoning correlates with the cosine-base clustering in hidden space. This clustering phenomenon in turn provides a coherent explanation for the behavioral dynamics observed across training, linking representational structure to reasoning capability. These findings provide new insights into the interpretability of implicit multi-hop reasoning in LLMs, helping to clarify how complex reasoning processes unfold internally and offering pathways to enhance the transparency of such models.
最近的工作表明,大型语言模型(LLMS)可以隐含地进行多动脉推理 – – 提出正确的答案,而没有明确地解释中间步骤 – – 但基本机制仍然不易理解。在本文中,我们研究这些隐含的推理如何通过在受控制的象征性环境中从零开始培训变压器而产生。我们的分析揭示了一个三阶段的发展轨迹:早期记忆,随后是分布式的概括,最终是跨分布式的概括化。我们发现,用原子三联体进行的培训没有必要,而是加速学习,而第二波的概括化取决于对特定组成结构的询问程度的暴露。为了解释这些行为,我们引入了两种诊断工具:交叉拼写语的语义拼接,它识别了可重新使用的语义中间表达器,以及基于共弦的表达镜,它揭示了成功的推理与隐蔽空间的正基组合有关。这种组合现象反过来为整个培训中观察到的行为动态提供了一致的解释,将代表性结构与推理能力联系起来。这些发现为LLMSMS的隐含多动性多动推理提供了新的理解性解释性提供了新的见解。
Article 41
Title@2025-05-29 (4): Optimization-Free Diffusion Model – A Perturbation Theory Approach
Title: Optimization-Free Diffusion Model – A Perturbation Theory Approach | Optimierungsfreies Diffusionsmodell – Ein Perturbationstheorie-Ansatz | 优化-无优化传播模式 – – 扰动理论方法 2505.23652v1 |
Authors: Yuehaw Khoo, Mathias Oster, Yifan Peng
Diffusion models have emerged as a powerful framework in generative modeling, typically relying on optimizing neural networks to estimate the score function via forward SDE simulations. In this work, we propose an alternative method that is both optimization-free and forward SDE-free. By expanding the score function in a sparse set of eigenbasis of the backward Kolmogorov operator associated with the diffusion process, we reformulate score estimation as the solution to a linear system, avoiding iterative optimization and time-dependent sample generation. We analyze the approximation error using perturbation theory and demonstrate the effectiveness of our method on high-dimensional Boltzmann distributions and real-world datasets.
传播模型已成为基因模型的强大框架,通常依靠优化神经网络,通过前方SDE模拟来估计得分函数。在这项工作中,我们提出了一种替代方法,既无优化,又无前方SDE。通过扩大与扩散过程相关的落后的科尔莫戈罗夫操作员的零星的分数功能,我们重新将得分估计作为线性系统的解决方案,避免迭代优化和根据时间生成样本。我们使用扰动理论分析近似误差,并展示我们在高维波尔茨曼分布和真实世界数据集上的方法的有效性。
Article 42
Title@2025-05-29 (4): Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation
Title: Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation | Merge-Friendly Post-Training Quantization für Multi-Target Domain-Anpassung | 多目标域适应培训后量化 2505.23651v1 |
Authors: Juncheol Shin, Minsang Seok, Seonggon Kim, Eunhyeok Park
Model merging has emerged as a powerful technique for combining task-specific weights, achieving superior performance in multi-target domain adaptation. However, when applied to practical scenarios, such as quantized models, new challenges arise. In practical scenarios, quantization is often applied to target-specific data, but this process restricts the domain of interest and introduces discretization effects, making model merging highly non-trivial. In this study, we analyze the impact of quantization on model merging through the lens of error barriers. Leveraging these insights, we propose a novel post-training quantization, HDRQ - Hessian and distant regularizing quantization - that is designed to consider model merging for multi-target domain adaptation. Our approach ensures that the quantization process incurs minimal deviation from the source pre-trained model while flattening the loss surface to facilitate smooth model merging. To our knowledge, this is the first study on this challenge, and extensive experiments confirm its effectiveness.
合并模型已成为一种强大的技术,可以将特定任务的权重结合起来,在多目标领域的适应中实现优异性能。然而,当应用到实际情景,例如量化模型时,就会出现新的挑战。在实际情景中,量化往往适用于特定目标数据,但这一过程限制了关注领域,并引入了分化效应,使模型高度非三边性合并。在本研究中,我们分析了量化对通过错误障碍透镜合并模型的影响。利用这些洞察力,我们提出了一种新的培训后量化(HDHRQ - Hessian和远处常规化量化)方案,旨在考虑将模型合并用于多目标领域的适应。我们的方法确保量化进程在平整损失表面的同时,尽可能避免偏离源源前培训模式,从而便利模型的顺利合并。据我们所知,这是关于这一挑战的首次研究,并且广泛的实验证实了其有效性。
Article 43
Title@2025-05-29 (4): Optimal Bounds for Adversarial Constrained Online Convex Optimization
Title: Optimal Bounds for Adversarial Constrained Online Convex Optimization | Optimale Grenzen für die Online-Konvergenzoptimierung | 优化在线电传优化优化 2503.13366v4 |
Authors: Ricardo N. Ferreira, Cláudia Soares
Constrained Online Convex Optimization (COCO) can be seen as a generalization of the standard Online Convex Optimization (OCO) framework. At each round, a cost function and constraint function are revealed after a learner chooses an action. The goal is to minimize both the regret and cumulative constraint violation (CCV) against an adaptive adversary. We show for the first time that is possible to obtain the optimal $O(\sqrt{T})$ bound on both regret and CCV, improving the best known bounds of $O \left( \sqrt{T} \right)$ and $\tilde{O} \left( \sqrt{T} \right)$ for the regret and CCV, respectively. Based on a new surrogate loss function enforcing a minimum penalty on the constraint function, we demonstrate that both the Follow-the-Regularized-Leader and the Online Gradient Descent achieve the optimal bounds.
受约束的在线 convex 优化( COCO) 可以被视为对标准在线 Convex 优化( OCO) 框架的概括化。 在每一回合中, 在学习者选择一个动作后都会显示成本函数和约束功能。 目标是尽量减少对适应性对手的遗憾和累积约束性违反( CCV) 。 我们第一次显示有可能获得对遗憾和CCV 的约束的最佳美元, 改进已知的美元左( scqrt{ T}\right) 和 $\ tilde{ O} 左(\ lft (\ qrt{ T}\right) 的最佳界限。 基于对约束功能实施最低处罚的新套位损失功能, 我们证明后续带和在线梯族都达到了最佳界限 。
Article 44
Title@2025-05-29 (4): Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Title: Continuous Chain of Thought Enables Parallel Exploration and Reasoning | Kontinuierliche Gedankenkette ermöglicht parallele Erkundung und Vernunft | 连续思考链有助于平行探索和推理 2505.23648v1 |
Authors: Halil Alperen Gozeten, M. Emrullah Ildiz, Xuechen Zhang, Hrayr Harutyunyan, Ankit Singh Rawat, Samet Oymak
Current language models generate chain-of-thought traces by autoregressively sampling tokens from a finite vocabulary. While this discrete sampling has achieved remarkable success, conducting chain-of-thought with continuously-valued tokens (CoT2) offers a richer and more expressive alternative. Our work examines the benefits of CoT2 through logical reasoning tasks that inherently require search capabilities and provide optimization and exploration methods for CoT2. Theoretically, we show that CoT2 allows the model to track multiple traces in parallel and quantify its benefits for inference efficiency. Notably, one layer transformer equipped with CoT2 can provably solve the combinatorial “subset sum problem” given sufficient embedding dimension. These insights lead to a novel and effective supervision strategy where we match the softmax outputs to the empirical token distributions of a set of target traces. Complementing this, we introduce sampling strategies that unlock policy optimization and self-improvement for CoT2. Our first strategy samples and composes $K$ discrete tokens at each decoding step to control the level of parallelism, and reduces to standard CoT when $K=1$. Our second strategy relies on continuous exploration over the probability simplex. Experiments confirm that policy optimization with CoT2 indeed improves the performance of the model beyond its initial discrete or continuous supervision.
当前的语言模型通过从限定词汇中自动递增抽样符号产生思维链痕迹。 虽然这种离散的抽样已经取得了显著的成功, 使用持续价值的象征(CoT2) 进行思维链(CoT2) 提供了更丰富和更直观的替代方法。 我们的工作通过逻辑推理任务来审查COT2的好处,这些逻辑推理任务必然需要搜索能力,并为CoT2. 提供优化和探索方法。 我们从理论上表明, CoT2 允许该模型平行跟踪多种痕迹并量化其推论效率的好处。 值得注意的是, 一个配有 CoT2 的层变异器可以在足够嵌入层面之后, 解决组合“ 子集问题 ” 。 这些洞察力导致一种新颖和有效的监督战略, 我们通过将软负载输出与一组目标痕迹的经验符号分布相匹配。 作为补充, 我们引入采样战略, 解开政策优化和自我简化为CO2, 我们的第一个战略样本在控制平行水平的每个解码步骤中都配置$K=1美元, 并降低为标准的COT 标准值, 当模型显示其连续的精确度时, 我们的优化战略依赖于持续探索政策。
Article 45
Title@2025-05-29 (4): Are Reasoning Models More Prone to Hallucination?
Title: Are Reasoning Models More Prone to Hallucination? | Sind vernünftigere Modelle eher halluzinierend? | 理性模型更能让人产生幻觉吗? 2505.23646v1 |
Authors: Zijun Yao, Yantao Liu, Yanxu Chen, Jianhui Chen, Junfeng Fang, Lei Hou, Juanzi Li, Tat-Seng Chua
Recently evolved large reasoning models (LRMs) show powerful performance in solving complex tasks with long chain-of-thought (CoT) reasoning capability. As these LRMs are mostly developed by post-training on formal reasoning tasks, whether they generalize the reasoning capability to help reduce hallucination in fact-seeking tasks remains unclear and debated. For instance, DeepSeek-R1 reports increased performance on SimpleQA, a fact-seeking benchmark, while OpenAI-o3 observes even severer hallucination. This discrepancy naturally raises the following research question: Are reasoning models more prone to hallucination? This paper addresses the question from three perspectives. (1) We first conduct a holistic evaluation for the hallucination in LRMs. Our analysis reveals that LRMs undergo a full post-training pipeline with cold start supervised fine-tuning (SFT) and verifiable reward RL generally alleviate their hallucination. In contrast, both distillation alone and RL training without cold start fine-tuning introduce more nuanced hallucinations. (2) To explore why different post-training pipelines alters the impact on hallucination in LRMs, we conduct behavior analysis. We characterize two critical cognitive behaviors that directly affect the factuality of a LRM: Flaw Repetition, where the surface-level reasoning attempts repeatedly follow the same underlying flawed logic, and Think-Answer Mismatch, where the final answer fails to faithfully match the previous CoT process. (3) Further, we investigate the mechanism behind the hallucination of LRMs from the perspective of model uncertainty. We find that increased hallucination of LRMs is usually associated with the misalignment between model uncertainty and factual accuracy. Our work provides an initial understanding of the hallucination in LRMs.
最近发展起来的大型推理模型(LRMs)显示,在解决复杂任务时,有长期思维链推理能力(CoT)推理能力(LRMs)的强大表现。由于这些LRM多数是通过正式推理任务的培训后开发的,因此,它们是否广泛运用推理能力来帮助减少寻求事实的任务中的幻觉,现在仍然不清楚和辩论。例如,DeepSeek-RS1报告提高了简单QA(一个寻求事实的基准)的性能,而OpenAI-o3则观察到了更严重的幻觉。这种差异自然引起以下研究问题:推理模型更易产生幻觉吗?本文从三个角度处理问题。(1) 我们首先对LRMMs的幻觉进行整体评价。我们的分析显示,LRMRMs在培训后的全面演练中,经过寒冷监督的微调(SFT)和可核查的奖励RLL通常会减轻其幻觉。相比之下,光学和RM的L培训后演算过程通常也会改变我们错觉的正确性结果。
Article 46
Title@2025-05-29 (4): Towards Unified Attribution in Explainable AI, Data-Centric AI, and Mechanistic Interpretability
Title: Towards Unified Attribution in Explainable AI, Data-Centric AI, and Mechanistic Interpretability | Auf dem Weg zu einer einheitlichen Attribution in erklärbarer KI, datenzentraler KI und mechanistischer Interpretierbarkeit | 实现可解释的AI、数据集中AI和机械可解释性的统一归属 2501.18887v3 |
Authors: Shichang Zhang, Tessa Han, Usha Bhalla, Himabindu Lakkaraju
The increasing complexity of AI systems has made understanding their behavior critical. Numerous interpretability methods have been developed to attribute model behavior to three key aspects: input features, training data, and internal model components, which emerged from explainable AI, data-centric AI, and mechanistic interpretability, respectively. However, these attribution methods are studied and applied rather independently, resulting in a fragmented landscape of methods and terminology. This position paper argues that feature, data, and component attribution methods share fundamental similarities, and a unified view of them benefits both interpretability and broader AI research. To this end, we first analyze popular methods for these three types of attributions and present a unified view demonstrating that these seemingly distinct methods employ similar techniques (such as perturbations, gradients, and linear approximations) over different aspects and thus differ primarily in their perspectives rather than techniques. Then, we demonstrate how this unified view enhances understanding of existing attribution methods, highlights shared concepts and evaluation criteria among these methods, and leads to new research directions both in interpretability research, by addressing common challenges and facilitating cross-attribution innovation, and in AI more broadly, with applications in model editing, steering, and regulation.
AI系统越来越复杂,因此理解它们的行为至关重要。许多解释方法已经发展成许多,将模型行为分为三个关键方面:投入特征、培训数据和内部模型组成部分,这些组成部分分别来自可解释的AI、以数据为中心的AI和机械解释。然而,这些归属方法是相当独立的研究和应用,造成方法和术语的不成体系。本立场文件认为,特征、数据和组成部分归属方法具有基本相似性,统一看待这些方法既有利于解释性,又有利于更广泛的AI研究。为此,我们首先分析这三类属性的流行方法,提出统一的观点,表明这些似乎截然不同的方法在不同方面采用相似的技术(如扰动、梯度和线性近似),因此主要在它们的观点上不同,而不是在技术上不同。然后,我们展示这种统一的观点如何增进对现有归属方法的理解,突出这些方法的共同概念和评价标准,并导致在解释性研究方面找到新的研究方向,解决共同的挑战,促进交叉归属创新,在AI中更为广泛地应用模式编辑、指导和监管。
Article 47
Title@2025-05-29 (4): Global optimization of graph acquisition functions for neural architecture search
Title: Global optimization of graph acquisition functions for neural architecture search | Globale Optimierung von Graphen-Erfassungsfunktionen für die neuronale Architektursuche | 全球优化用于神经结构搜索的图图获取功能 2505.23640v1 |
Authors: Yilin Xie, Shiqiang Zhang, Jixiang Qing, Ruth Misener, Calvin Tsay
Graph Bayesian optimization (BO) has shown potential as a powerful and data-efficient tool for neural architecture search (NAS). Most existing graph BO works focus on developing graph surrogates models, i.e., metrics of networks and/or different kernels to quantify the similarity between networks. However, the acquisition optimization, as a discrete optimization task over graph structures, is not well studied due to the complexity of formulating the graph search space and acquisition functions. This paper presents explicit optimization formulations for graph input space including properties such as reachability and shortest paths, which are used later to formulate graph kernels and the acquisition function. We theoretically prove that the proposed encoding is an equivalent representation of the graph space and provide restrictions for the NAS domain with either node or edge labels. Numerical results over several NAS benchmarks show that our method efficiently finds the optimal architecture for most cases, highlighting its efficacy.
图表 Bayesian 优化(BO) 显示了作为神经结构搜索(NAS)的强大和数据效率工具的潜力。 多数现有图表BO 侧重于开发图形替代模型,即网络和/或不同内核的量度,以量化网络之间的相似性。然而,由于绘制图形搜索空间和获取功能的复杂性,作为与图形结构的单独优化任务,对获取优化没有进行很好研究。本文展示了图形输入空间的清晰优化配方,包括可达性和最短路径等属性,这些属性后来被用于绘制图形内核和获取功能。 我们理论上证明,拟议的编码相当于图形空间的等量度,并为NAS 域提供了限制,有节点或边缘标签。 几个NAS 基准的数值结果显示,我们的方法在多数情况下都有效地找到了最佳结构,突出其功效。
Article 48
Title@2025-05-29 (4): Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives
Title: Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives | Position: Skalierung von LLM-Agenten erfordert asymptotische Analyse mit LLM-Primitiven | 位置: 缩放 LLM 代理需要用 LLM 原始功能进行抗药性分析 2502.04358v2 |
Authors: Elliot Meyerson, Xin Qiu
Decomposing hard problems into subproblems often makes them easier and more efficient to solve. With large language models (LLMs) crossing critical reliability thresholds for a growing slate of capabilities, there is an increasing effort to decompose systems into sets of LLM-based agents, each of whom can be delegated sub-tasks. However, this decomposition (even when automated) is often intuitive, e.g., based on how a human might assign roles to members of a human team. How close are these role decompositions to optimal? This position paper argues that asymptotic analysis with LLM primitives is needed to reason about the efficiency of such decomposed systems, and that insights from such analysis will unlock opportunities for scaling them. By treating the LLM forward pass as the atomic unit of computational cost, one can separate out the (often opaque) inner workings of a particular LLM from the inherent efficiency of how a set of LLMs are orchestrated to solve hard problems. In other words, if we want to scale the deployment of LLMs to the limit, instead of anthropomorphizing LLMs, asymptotic analysis with LLM primitives should be used to reason about and develop more powerful decompositions of large problems into LLM agents.
将棘手问题分解成次级问题往往使这些问题更容易解决,更有效率。随着大型语言模型(LLMs)跨越关键可靠性临界临界值以达到日益成熟的能力,人们正日益努力将系统分解成以LLM为基础的代理器,每个代理器都可以被授予子任务。然而,这种分解(即使在自动化的情况下)往往不自然,例如,根据一个人如何将角色分配给人类团队的成员;这些角色是如何接近于最佳的分解?本立场文件认为,需要与LLLM原始体进行无症状分析,以说明这种分解系统的效率,而这种分析的洞见将释放出机会。通过将LLMM的前身作为计算成本的原子单位处理,可以将特定LLM的(往往不透明)内部工作与一组LMs如何精心安排以解决难题的内在效率区分开来。换句话说,如果我们想将LLMS的部署范围缩小到限度,而不是将这种分解的系统的效率提高到更强大的LMsrialms,那么,就应该将LMsrialmas进行更强大的分解分析。
Article 49
Title@2025-05-29 (4): MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment
Title: MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment | MCP Safety Training: Lernen, falsch benachbarte MCP-Exploits mit verbesserter Präferenzausrichtung abzulehnen | MCP 安全培训:学会利用改进的优惠协调,错误拒绝 MCP 剥削 2505.23634v1 |
Authors: John Halloran
The model context protocol (MCP) has been widely adapted as an open standard enabling the seamless integration of generative AI agents. However, recent work has shown the MCP is susceptible to retrieval-based “falsely benign” attacks (FBAs), allowing malicious system access and credential theft, but requiring that users download compromised files directly to their systems. Herein, we show that the threat model of MCP-based attacks is significantly broader than previously thought, i.e., attackers need only post malicious content online to deceive MCP agents into carrying out their attacks on unsuspecting victims’ systems. To improve alignment guardrails against such attacks, we introduce a new MCP dataset of FBAs and (truly) benign samples to explore the effectiveness of direct preference optimization (DPO) for the refusal training of large language models (LLMs). While DPO improves model guardrails against such attacks, we show that the efficacy of refusal learning varies drastically depending on the model’s original post-training alignment scheme–e.g., GRPO-based LLMs learn to refuse extremely poorly. Thus, to further improve FBA refusals, we introduce Retrieval Augmented Generation for Preference alignment (RAG-Pref), a novel preference alignment strategy based on RAG. We show that RAG-Pref significantly improves the ability of LLMs to refuse FBAs, particularly when combined with DPO alignment, thus drastically improving guardrails against MCP-based attacks.
示范背景协议(MCP)被广泛改编为一种开放标准,可以无缝地整合基因性AI剂;然而,最近的工作表明,MCP很容易受到基于检索的“恶性无害”攻击(FBA),允许恶意系统访问和证明盗窃,但要求用户下载直接损害其系统的文件。在这里,我们表明,MCP攻击的威胁模式比以前想象的要广泛得多,即攻击者只需要在网上张贴恶意内容,以欺骗MCP代理人对不受监视的受害者系统进行攻击。为了改进针对这种攻击的警戒系统,我们引入了新的FBA和(truly)良性样本,以探索直接偏好优化(DPO)对大语言模型(LLMS)进行拒绝训练的效果。虽然DPO改进了针对这类攻击的模型保护装置,但我们发现拒绝学习的效果差异很大,取决于模型最初的训练后调整计划(eg),基于GPCP的LM学会学会学会学会学会拒绝极低的进攻。 因此,我们进一步改进了AGG的升级战略。
Article 50
Title@2025-05-29 (4): Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection
Title: Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection | Prompting Whisper für verbesserte wörtliche Transkription und End-to-End-Missue-Erkennung | 逐字记录和终端至终端杂项探测 2505.23627v1 |
Authors: Griffin Dietz Smith, Dianna Yee, Jennifer King Chen, Leah Findlater
Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that incorporating reading text through prompting benefits verbatim transcription performance over fine-tuning, and second, showing that it is feasible to augment speech recognition tasks for end-to-end miscue detection. We conducted two case studies – children’s read-aloud and adult atypical speech – and found that our proposed strategies improve verbatim transcription and miscue detection compared to current state-of-the-art.
阅读声响时的错误(即错误)通常会通过将自动语音识别(ASR)记录抄录与目标读取文本进行比较,来识别在读出声响时产生的错误(即错误),从而在事后发现时通常会发现错误。然而,当ASR不准确地抄录逐字记录稿时,事后方法效果不佳。为了改进当前读出错误注释的方法,我们提议了一个新的端对端结构,通过提示和训练将目标阅读文本纳入,从而改进逐字记录誊写和直接检测错误。我们的贡献包括:第一,表明通过在微调后推动效益逐字抄录功能纳入阅读文本,第二,表明加强语音识别任务以发现端到端错误是可行的。我们进行了两个案例研究:儿童读音和成人非典型的演讲。 我们发现,我们提出的战略改善了逐字记录和错误检测,而与当前的最新技术相比,我们提出的战略则改善了逐字记录和错误检测。
Article 51
Title@2025-05-29 (4): Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Title: Quartet: Native FP4 Training Can Be Optimal for Large Language Models | Quartett: Native FP4 Training kann für große Sprachmodelle optimal sein | 四方:土著FFF4培训可以成为大语言模式的最佳方式 2505.14669v2 |
Authors: Roberto L. Castro, Andrei Panferov, Soroush Tabesh, Oliver Sieberling, Jiale Chen, Mahdi Nikdan, Saleh Ashkboos, Dan Alistarh
Training large language models (LLMs) models directly in low-precision offers a way to address computational costs by improving both throughput and energy efficiency. For those purposes, NVIDIA’s recent Blackwell architecture facilitates very low-precision operations using FP4 variants. Yet, current algorithms for training LLMs in FP4 precision face significant accuracy degradation and often rely on mixed-precision fallbacks. In this paper, we investigate hardware-supported FP4 training and introduce a new approach for accurate, end-to-end FP4 training with all the major computations (i.e., linear layers) in low precision. Through extensive evaluations on Llama-type models, we reveal a new low-precision scaling law that quantifies performance trade-offs across bit-widths and training setups. Guided by this investigation, we design an “optimal” technique in terms of accuracy-vs-computation, called Quartet. We implement Quartet using optimized CUDA kernels tailored for Blackwell, demonstrating that fully FP4-based training is a competitive alternative to FP16 half-precision and to FP8 training. Our code is available at https://github.com/IST-DASLab/Quartet.
低精度培训大型语言模型(LLMS)直接在低精度情况下直接培训大型语言模型(LLMS)为通过提高输送量和能源效率解决计算成本提供了一种方法。为此,NVIDIA最近的Blackwell结构为使用FP4变量的非常低精度操作提供了便利。然而,目前对FP4精度培训LLMS的计算方法面临显著的精度退化,并往往依赖混合精度下降。在本文件中,我们调查硬件支持的FP4培训,并采用新的方法,以所有主要计算(即线性层)的精确度、端到端方FC4培训。通过对Llama型模型的广泛评价,我们揭示了一个新的低精度缩缩缩放法,将性能交易分成四倍宽度和培训设置。在这项调查的指导下,我们设计了一种精度-五价调的“最佳”技术。我们使用为Blackwell定制的CUDA内核内核(即线层层层层层)应用的优化CUDUDAFP4/FPA半级培训。
Article 52
Title@2025-05-29 (4): SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks
Title: SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks | SPACE: SPike-Aware Consistency Enhancement für Test-Time-Anpassung in Spiking Neuronal Networks | 空间:在Spiking神经网络中加强在测试-时间适应方面的SPike-Aware一致性增强 2504.02298v2 |
Authors: Xinyu Luo, Kecheng Chen, Pao-Sheng Vincent Sun, Chris Xing Tian, Arindam Basu, Haoliang Li
Spiking Neural Networks (SNNs), as a biologically plausible alternative to Artificial Neural Networks (ANNs), have demonstrated advantages in terms of energy efficiency, temporal processing, and biological plausibility. However, SNNs are highly sensitive to distribution shifts, which can significantly degrade their performance in real-world scenarios. Traditional test-time adaptation (TTA) methods designed for ANNs often fail to address the unique computational dynamics of SNNs, such as sparsity and temporal spiking behavior. To address these challenges, we propose SPike-Aware Consistency Enhancement (SPACE), the first source-free and single-instance TTA method specifically designed for SNNs. SPACE leverages the inherent spike dynamics of SNNs to maximize the consistency of spike-behavior-based local feature maps across augmented versions of a single test sample, enabling robust adaptation without requiring source data. We evaluate SPACE on multiple datasets. Furthermore, SPACE exhibits robust generalization across diverse network architectures, consistently enhancing the performance of SNNs on CNNs (such as VGG and ResNet), Transformer models, and ConvLSTM architectures. Experimental results show that SPACE outperforms state-of-the-art methods, highlighting its effectiveness and robustness in real-world settings.
作为人工神经网络的一种生物上可信的替代方法,Spiking神经网络(SNNS)作为人工神经网络(ANNS)的一种生物上可信的替代方法,在能源效率、时间处理和生物合理性方面具有明显的优势,然而,SNNS对分布变化非常敏感,在现实世界情景中,这种变化会大大降低其性能。为ANNS设计的传统的测试时间适应方法往往无法解决SNS独特的计算动态,如空间和时间跳动行为。为了应对这些挑战,我们建议SPECE(SPCE)是专门为SNS专门设计的首个无源和单 Inste- Intance TTTA方法。空间利用SNNS内在的激增动态,以便在单一测试样本的扩大版本中最大限度地提高基于峰值的本地地貌图的连贯性,从而能够在不需要源数据的情况下进行稳健的适应。此外,在多种网络结构中,空间展示了强有力的通用,在CNNS(如VGGG和ResNet-stallimes)上不断提高S-stallimings-stallimal-stage-stillings
Article 53
Title@2025-05-29 (4): Instance-Optimality for Private KL Distribution Estimation
Title: Instance-Optimality for Private KL Distribution Estimation | Instanz-Optimalität für private KL-Verteilungsabschätzung | 私人 KL 分布分布估计的实情- 最佳度 2505.23620v1 |
Authors: Jiayuan Ye, Vitaly Feldman, Kunal Talwar
We study the fundamental problem of estimating an unknown discrete distribution $p$ over $d$ symbols, given $n$ i.i.d. samples from the distribution. We are interested in minimizing the KL divergence between the true distribution and the algorithm’s estimate. We first construct minimax optimal private estimators. Minimax optimality however fails to shed light on an algorithm’s performance on individual (non-worst-case) instances $p$ and simple minimax-optimal DP estimators can have poor empirical performance on real distributions. We then study this problem from an instance-optimality viewpoint, where the algorithm’s error on $p$ is compared to the minimum achievable estimation error over a small local neighborhood of $p$. Under natural notions of local neighborhood, we propose algorithms that achieve instance-optimality up to constant factors, with and without a differential privacy constraint. Our upper bounds rely on (private) variants of the Good-Turing estimator. Our lower bounds use additive local neighborhoods that more precisely captures the hardness of distribution estimation in KL divergence, compared to ones considered in prior works.
我们研究的是估算一个未知的离散分配 $p$ 超过 $d 符号的基本问题, 给出的分布样本为 美元 i.d. d. 。 我们有兴趣将真实分布和算法估计之间的 KL 差异最小化。 我们首先建造迷你最大优化的私人估计器。 但是, 最小最大性能未能揭示算法在个人( 非最坏情况) 情况下的性能( 美元 ) 和简单小型最大最大最大DP估计器在真实分布上的经验性能差。 然后, 我们从实例最佳性角度来研究这一问题, 将 $p$ 的算法误差与当地小区( $p$ ) 的最低可实现估计错误相比较。 在本地周围的自然概念下, 我们建议的算法能够达到常数性能, 并且没有差别的隐私限制。 我们的上限值依靠( 私人) 良好估计器的变式。 我们的下限使用比对本地社区进行添加, 比较之前工程的计算。
Article 54
Title@2025-05-29 (4): Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes
Title: Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes | Wenig scharfe Rede Deepfake Detection Anpassung an Gaußsche Prozesse | Gaussian 过程的“深假探测”适应 2505.23619v1 |
Authors: Neta Glazer, David Chernin, Idan Achituve, Sharon Gannot, Ethan Fetaya
Recent advancements in Text-to-Speech (TTS) models, particularly in voice cloning, have intensified the demand for adaptable and efficient deepfake detection methods. As TTS systems continue to evolve, detection models must be able to efficiently adapt to previously unseen generation models with minimal data. This paper introduces ADD-GP, a few-shot adaptive framework based on a Gaussian Process (GP) classifier for Audio Deepfake Detection (ADD). We show how the combination of a powerful deep embedding model with the Gaussian processes flexibility can achieve strong performance and adaptability. Additionally, we show this approach can also be used for personalized detection, with greater robustness to new TTS models and one-shot adaptability. To support our evaluation, a benchmark dataset is constructed for this task using new state-of-the-art voice cloning models.
近来在文本到语音(TTS)模型方面,特别是在语音克隆方面的进步,加强了对适应性和高效深假探测方法的需求。随着TTS系统不断发展,检测模型必须能够有效地适应以极少数据生成的先前不为人知的一代模型。本文介绍了ADD-GP,这是基于音频深藏器探测高山过程(ADD)的几张微小的适应性框架。我们展示了强大的深层嵌入模型与高山过程灵活性的结合如何能够实现强大的性能和适应性。此外,我们展示了这种方法也可以用于个性化检测,对新的TTS模型和一发式适应性更强。为了支持我们的评估,使用新的最新语音克隆模型为这项任务构建了一个基准数据集。
Article 55
Title@2025-05-29 (4): One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Title: One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory | Eine Trajektorie, ein Token: Erdliche Video-Tokenisierung über panoptische Sub-Objekt-Trajektorie | 一个轨迹, 一个 Token: 通过泛光子物件轨迹, 固定的视频轨迹 2505.23617v1 |
Authors: Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi, Ziqi Gao, Vishnu Iyengar, Norimasa Kobori, Quan Kong, Ranjay Krishna
Effective video tokenization is critical for scaling transformer models for long videos. Current approaches tokenize videos using space-time patches, leading to excessive tokens and computational inefficiencies. The best token reduction strategies degrade performance and barely reduce the number of tokens when the camera moves. We introduce grounded video tokenization, a paradigm that organizes tokens based on panoptic sub-object trajectories rather than fixed patches. Our method aligns with fundamental perceptual principles, ensuring that tokenization reflects scene complexity rather than video duration. We propose TrajViT, a video encoder that extracts object trajectories and converts them into semantically meaningful tokens, significantly reducing redundancy while maintaining temporal coherence. Trained with contrastive learning, TrajViT significantly outperforms space-time ViT (ViT3D) across multiple video understanding benchmarks, e.g., TrajViT outperforms ViT3D by a large margin of 6% top-5 recall in average at video-text retrieval task with 10x token deduction. We also show TrajViT as a stronger model than ViT3D for being the video encoder for modern VideoLLM, obtaining an average of 5.2% performance improvement across 6 VideoQA benchmarks while having 4x faster training time and 18x less inference FLOPs. TrajViT is the first efficient encoder to consistently outperform ViT3D across diverse video analysis tasks, making it a robust and scalable solution.
有效的视频象征性化对于放大长视频变压器模型至关重要 。 当前的方法将使用时空补丁的视频象征性化, 导致过度的象征性和计算效率低下 。 最佳象征性减少策略会降低性能, 当相机移动时几乎不会减少物证数量 。 我们引入了有底线的视频象征性化模式, 这个模式可以组织基于全光子子对象轨迹而非固定补丁的物证。 我们的方法符合基本的概念性原则, 确保代号化反映场景复杂性而不是视频持续时间。 我们提议 TrajViT , 是一个视频编码化的视频编码器, 提取对象轨迹, 并将其转换为具有语义意义的代号, 大大减少冗余, 同时保持时间一致性 。 我们通过对比性学习培训, TrajViViViViT 明显超越时空 ViT (ViT) , 例如, TrajViViViVT 超越 ViFL 平均 ViD 格式, 在 ViDA 上获得比 ViL 的高级 ViVA 格式 格式的首次快速分析, 在 ViL 4 ViL 上, 在 ViL 上, 在 ViL 4 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViL 快速的高级分析中, 在 ViL 上, 在 ViD 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViD 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViL 上, 在 ViD 上, 在 ViD 上, 在 ViD 上, 在 Vial 上, 上, 在 Vial 上, 上, 在 ViD 上, 在 Vial 上, 上, 在 ViD 上, 在 Vial 上, 上, 在 Vi 上, 上, 上, 上, 在 ViD 上, 在 Vi 上,在 Vi 上, 在上, 在上, 在上
Article 56
Title@2025-05-29 (4): Causal Machine Learning in IoT-based Engineering Problems: A Tool Comparison in the Case of Household Energy Consumption
Title: Causal Machine Learning in IoT-based Engineering Problems: A Tool Comparison in the Case of Household Energy Consumption | Kausales maschinelles Lernen in IoT-basierten Engineering-Problemen: Ein Tool-Vergleich im Fall des Haushaltsenergieverbrauchs | 以木工工程问题为基础的因果机械学习:家庭能源消费工具比较 2505.12147v2 |
Authors: Nikolaos-Lysias Kosioris, Sotirios Nikoletseas, Gavrilis Filios, Stefanos Panagiotou
The rapid increase in computing power and the ability to store Big Data in the infrastructure has enabled predictions in a large variety of domains by Machine Learning. However, in many cases, existing Machine Learning tools are considered insufficient or incorrect since they exploit only probabilistic dependencies rather than inference logic. Causal Machine Learning methods seem to close this gap. In this paper, two prevalent tools based on Causal Machine Learning methods are compared, as well as their mathematical underpinning background. The operation of the tools is demonstrated by examining their response to 18 queries, based on the IDEAL Household Energy Dataset, published by the University of Edinburgh. First, it was important to evaluate the causal relations assumption that allowed the use of this approach; this was based on the preexisting scientific knowledge of the domain and was implemented by use of the in-built validation tools. Results were encouraging and may easily be extended to other domains.
计算机能力的迅速增长和在基础设施中存储大数据的能力的迅速增长使得机器学习能够在大量领域作出预测,然而,在许多情况下,现有机器学习工具被认为不充分或不正确,因为它们只利用概率依赖性而不是推论逻辑。由于机器学习方法似乎缩小了这一差距。在本文中,比较了两个基于Causal机器学习方法的常用工具及其数学基础背景。工具的运作表现在审查它们对根据爱丁堡大学出版的IDEL家用能源数据集提出的18个问题的答复。首先,必须评估允许使用这一方法的因果关系假设;这是以领域先前的科学知识为基础,通过使用内部验证工具加以实施的。结果令人鼓舞,而且很容易推广到其他领域。
Article 57
Title@2025-05-29 (4): Learning Interpretable Differentiable Logic Networks for Tabular Regression
Title: Learning Interpretable Differentiable Logic Networks for Tabular Regression | Learning Interpretable Differentiable Logic Networks for Tabular Regression | 用于制表递减的可解释可解释逻辑网络 2505.23615v1 |
Authors: Chang Yue, Niraj K. Jha
Neural networks (NNs) achieve outstanding performance in many domains; however, their decision processes are often opaque and their inference can be computationally expensive in resource-constrained environments. We recently proposed Differentiable Logic Networks (DLNs) to address these issues for tabular classification based on relaxing discrete logic into a differentiable form, thereby enabling gradient-based learning of networks built from binary logic operations. DLNs offer interpretable reasoning and substantially lower inference cost. We extend the DLN framework to supervised tabular regression. Specifically, we redesign the final output layer to support continuous targets and unify the original two-phase training procedure into a single differentiable stage. We evaluate the resulting model on 15 public regression benchmarks, comparing it with modern neural networks and classical regression baselines. Regression DLNs match or exceed baseline accuracy while preserving interpretability and fast inference. Our results show that DLNs are a viable, cost-effective alternative for regression tasks, especially where model transparency and computational efficiency are important.
神经网络在许多领域都取得了杰出的成绩;然而,它们的决策过程往往不透明,在资源受限制的环境中,它们的推论可能计算得非常昂贵。我们最近提议了不同的逻辑网络(DLNs)来解决这些问题,以便根据松散的离散逻辑进行列表分类,将其分为不同的形式,从而能够以梯度为基础学习从二元逻辑操作中建立的网络。DLNs提供了可解释的推理,并大大降低了推论成本。我们把DLN框架扩大到受监督的表格回归。具体地说,我们重新设计了最后产出层,以支持连续目标,并将最初的两阶段培训程序统一到一个可区分的阶段。我们根据15个公共回归基准对由此产生的模型进行了评估,将其与现代神经网络和经典回归基线进行比较。回归DLNs在保存可解释性和快速推断性的同时匹配或超过基线精度。我们的结果表明,DLNs是回归任务的可行、成本效益高的替代方法,特别是在模型透明和计算效率重要的情况下。
Article 58
Title@2025-05-29 (4): Inference-time Scaling of Diffusion Models through Classical Search
Title: Inference-time Scaling of Diffusion Models through Classical Search | Inferenzzeit Skalierung von Diffusionsmodellen durch klassische Suche | 通过古典搜索对传播模型进行传播的推断-时间缩放 2505.23614v1 |
Authors: Xiangcheng Zhang, Haowei Lin, Haotian Ye, James Zou, Jianzhu Ma, Yitao Liang, Yilun Du
Classical search algorithms have long underpinned modern artificial intelligence. In this work, we tackle the challenge of inference-time control in diffusion models – adapting generated outputs to meet diverse test-time objectives – using principles from classical search. We propose a general framework that orchestrates local and global search to efficiently navigate the generative space. It employs a theoretically grounded local search via annealed Langevin MCMC and performs compute-efficient global exploration using breadth-first and depth-first tree search. We evaluate our approach on a range of challenging domains, including planning, offline reinforcement learning, and image generation. Across all tasks, we observe significant gains in both performance and efficiency. These results show that classical search provides a principled and practical foundation for inference-time scaling in diffusion models. Project page at diffusion-inference-scaling.github.io.
古典搜索算法长期以来一直是现代人工智能的基础。在这项工作中,我们利用古典搜索的原则,应对传播模型的推论时间控制的挑战 – – 调整产生的产出,以实现不同的测试时间目标。我们提出了一个总体框架,通过本地和全球搜索,以高效地导航基因空间。我们通过annealed Langevin MCMC 进行基于理论的本地搜索,并利用宽度第一和深度第一树搜索进行计算高效的全球探索。我们评估了我们在一系列具有挑战性的领域的做法,包括规划、离线强化学习和图像生成。我们观察了在业绩和效率方面所取得的重大进展。这些结果显示,古典搜索为传播模型的推论时间缩提供了原则和实践基础。
Article 59
Title@2025-05-29 (4): The Generalized Skew Spectrum of Graphs
Title: The Generalized Skew Spectrum of Graphs | Das generalisierte Skew-Spektrum der Graphen | 普通的Skew图象光谱 2505.23609v1 |
Authors: Armando Bellante, Martin Plávala, Alessandro Luongo
This paper proposes a family of permutation-invariant graph embeddings, generalizing the Skew Spectrum of graphs of Kondor & Borgwardt (2008). Grounded in group theory and harmonic analysis, our method introduces a new class of graph invariants that are isomorphism-invariant and capable of embedding richer graph structures - including attributed graphs, multilayer graphs, and hypergraphs - which the Skew Spectrum could not handle. Our generalization further defines a family of functions that enables a trade-off between computational complexity and expressivity. By applying generalization-preserving heuristics to this family, we improve the Skew Spectrum’s expressivity at the same computational cost. We formally prove the invariance of our generalization, demonstrate its improved expressiveness through experiments, and discuss its efficient computation.
本文提出了一组变异图嵌入, 概括了 Kondor & Borgwardt 和 Borgwardt 图表的Skew Spectrum(2008年) 。 我们的方法以群论和和谐分析为基础, 引入了一种新的图形变异物类别, 这些变异体是非正态的, 能够嵌入更丰富的图形结构, 包括可归属图、 多层图和高压图, Skew Spectrum 无法处理这些结构 。 我们的概括化进一步定义了能够平衡计算复杂性和表达性之间的函数组合。 通过对这个家庭应用一般化- 保留超自然论, 我们用同样的计算成本改进Skew Spectrum 的表达性。 我们正式证明了我们一般化的变异性, 通过实验来显示其更清晰的表达性, 并讨论其高效的计算 。
Article 60
Title@2025-05-29 (4): Data Model Design for Explainable Machine Learning-based Electricity Applications
Title: Data Model Design for Explainable Machine Learning-based Electricity Applications | Datenmodell-Design für erklärbare maschinelle Learning-basierte Stromanwendungen | 可解释机器学习用电力应用数据模型设计 2505.23607v1 |
Authors: Carolina Fortuna, Gregor Cerar, Blaz Bertalanic, Andrej Campa, Mihael Mohorcic
The transition from traditional power grids to smart grids, significant increase in the use of renewable energy sources, and soaring electricity prices has triggered a digital transformation of the energy infrastructure that enables new, data driven, applications often supported by machine learning models. However, the majority of the developed machine learning models rely on univariate data. To date, a structured study considering the role meta-data and additional measurements resulting in multivariate data is missing. In this paper we propose a taxonomy that identifies and structures various types of data related to energy applications. The taxonomy can be used to guide application specific data model development for training machine learning models. Focusing on a household electricity forecasting application, we validate the effectiveness of the proposed taxonomy in guiding the selection of the features for various types of models. As such, we study of the effect of domain, contextual and behavioral features on the forecasting accuracy of four interpretable machine learning techniques and three openly available datasets. Finally, using a feature importance techniques, we explain individual feature contributions to the forecasting accuracy.
传统电网向智能电网的过渡、可再生能源使用量的大幅增加、以及电价的飙升,都引发了能源基础设施的数字化转型,使新的、数据驱动的、往往由机器学习模型支持的应用得以实现。然而,大多数发达的机器学习模型依赖单体数据。迄今为止,尚缺乏一项结构化研究,研究元数据和导致多变量数据的额外测量的作用。在这份文件中,我们建议了一种分类学,确定和构建与能源应用有关的各类数据。分类学可用于指导用于培训机器学习模型的具体数据模型的开发。我们以家庭电力预测应用为重点,验证了拟议的分类学在指导选择各类模型特征方面的有效性。因此,我们研究了域、背景和行为特征对四种可解释的机器学习技术和三种公开提供的数据集预测准确性的影响。最后,我们用一种特征重要技术,解释了对预测准确性所作的个人特征贡献。
Article 61
Title@2025-05-29 (4): Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Title: Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model | Muddit: Befreiende Generation jenseits von Text-zu-Bild mit einem Unified Discrete Diffusion Model | Muddit: 利用统一分解传播模型在文本到图像之外解放一代 2505.23606v1 |
Authors: Qingyu Shi, Jinbin Bai, Zhuoran Zhao, Wenhao Chai, Kaidong Yu, Jianzong Wu, Shuangyong Song, Yunhai Tong, Xiangtai Li, Xuelong Li, Shuicheng Yan
Unified generation models aim to handle diverse tasks across modalities – such as text generation, image generation, and vision-language reasoning – within a single architecture and decoding paradigm. Autoregressive unified models suffer from slow inference due to sequential decoding, and non-autoregressive unified models suffer from weak generalization due to limited pretrained backbones. We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities. Unlike prior unified diffusion models trained from scratch, Muddit integrates strong visual priors from a pretrained text-to-image backbone with a lightweight text decoder, enabling flexible and high-quality multimodal generation under a unified architecture. Empirical results show that Muddit achieves competitive or superior performance compared to significantly larger autoregressive models in both quality and efficiency. The work highlights the potential of purely discrete diffusion, when equipped with strong visual priors, as a scalable and effective backbone for unified generation.
单一一代模式旨在在一个单一架构和解码模式中处理不同模式的不同任务 – – 如文本生成、图像生成和视觉语言推理等。自动递减统一模式由于顺序解码而出现缓慢的推论,非自动递增统一模式由于受过训练的骨干有限而出现薄弱的概括化。我们引入了一个统一的离散扩散变压器,即Mudddit,它能够在文字和图像模式之间实现快速和平行的生成。与以前从零开始训练的统一传播模型不同,Mudddit将预先训练过的文本到图像主干网的强直观预感与轻量文本解码相结合,使得在一个统一的架构下能够进行灵活和高质量的多式联运生成。经验性结果显示,Mudddit在质量和效率两方面都具有竞争力或优异性,而自动递增型模型则大得多。工作凸显了纯离散扩散的潜力,在具备强的视觉前导力时,作为统一生成的可缩和有效骨干。
Article 62
Title@2025-05-29 (4): STeCa: Step-level Trajectory Calibration for LLM Agent Learning
Title: STeCa: Step-level Trajectory Calibration for LLM Agent Learning | STeCa: Schritt-Level-Trajektorienkalibrierung für LLM Agent Learning | STeCa:LLM代理学习的职级轨迹校准 2502.14276v2 |
Authors: Hanlin Wang, Jian Wang, Chak Tou Leong, Wenjie Li
Large language model (LLM)-based agents have shown promise in tackling complex tasks by interacting dynamically with the environment. Existing work primarily focuses on behavior cloning from expert demonstrations or preference learning through exploratory trajectory sampling. However, these methods often struggle to address long-horizon tasks, where suboptimal actions accumulate step by step, causing agents to deviate from correct task trajectories. To address this, we highlight the importance of timely calibration and the need to automatically construct calibration trajectories for training agents. We propose Step-Level Trajectory Calibration (STeCa), a novel framework for LLM agent learning. Specifically, STeCa identifies suboptimal actions through a step-level reward comparison during exploration. It constructs calibrated trajectories using LLM-driven reflection, enabling agents to learn from improved decision-making processes. We finally leverage these calibrated trajectories with successful trajectories for reinforced training. Extensive experiments demonstrate that STeCa significantly outperforms existing methods. Further analysis highlights that timely calibration enables agents to complete tasks with greater robustness. Our code and data are available at https://github.com/WangHanLinHenry/STeCa.
大型语言模型(LLM)的代理机构通过动态地与环境互动,在应对复杂任务方面表现出了希望。现有工作主要侧重于通过专家演示或探索性轨迹抽样学习来进行行为克隆;然而,这些方法往往难以解决长期对等任务,因为亚优化行动会一步步积累,使代理机构偏离正确的任务轨迹。为此,我们强调及时校准的重要性,以及自动为培训代理机构建立校准轨迹的必要性。我们提出了逐步轨迹校准(STeCa),这是LLLM代理机构学习的新颖框架。具体地说,STeCa在探索期间通过一步级的奖励比较确定了亚优性行动。它利用LLM驱动的反射来构建校准轨迹,使代理机构能够从改进的决策进程中学习。我们最后利用这些校准轨迹和成功轨迹来强化培训。广泛的实验表明STeCa大大超越了现有方法。进一步的分析强调,及时校准使代理机构能够以更稳健的方式完成任务。我们的代码和数据在 https/Wng/H.
Article 63
Title@2025-05-29 (4): On Transferring Transferability: Towards a Theory for Size Generalization
Title: On Transferring Transferability: Towards a Theory for Size Generalization | Übertragbarkeit: Auf dem Weg zu einer Theorie der Größenverallgemeinerung | 关于转让可转让性:走向一个通用规模理论 2505.23599v1 |
Authors: Eitan Levin, Yuxin Ma, Mateo Díaz, Soledad Villar
Many modern learning tasks require models that can take inputs of varying sizes. Consequently, dimension-independent architectures have been proposed for domains where the inputs are graphs, sets, and point clouds. Recent work on graph neural networks has explored whether a model trained on low-dimensional data can transfer its performance to higher-dimensional inputs. We extend this body of work by introducing a general framework for transferability across dimensions. We show that transferability corresponds precisely to continuity in a limit space formed by identifying small problem instances with equivalent large ones. This identification is driven by the data and the learning task. We instantiate our framework on existing architectures, and implement the necessary changes to ensure their transferability. Finally, we provide design principles for designing new transferable models. Numerical experiments support our findings.
许多现代学习任务需要能够吸收不同大小投入的模型。 因此, 已经为输入为图表、 数据集和点云的域提出了维维独立结构。 图表神经网络的近期工作探讨了一个接受过低维数据培训的模型能否将其性能转移至高维投入。 我们通过引入一个通用的跨维可转移框架扩展了这项工作内容。 我们显示,可转移性与通过识别小问题案例和同等大案例而形成的有限空间的连续性完全吻合。 这个识别由数据和学习任务驱动。 我们对现有结构的框架进行即时化,并进行必要的修改,以确保其可转移性。 最后, 我们为设计新的可转移模型提供了设计原则。 数字实验支持了我们的调查结果。
Article 64
Title@2025-05-29 (4): LLM Performance for Code Generation on Noisy Tasks
Title: LLM Performance for Code Generation on Noisy Tasks | LLM-Performance für Code-Generierung bei lauten Aufgaben | LLM 噪音任务代码生成的LLM性能 2505.23598v1 |
Authors: Radzim Sendyka, Christian Cabrera, Andrei Paleyes, Diana Robinson, Neil Lawrence
This paper investigates the ability of large language models (LLMs) to recognise and solve tasks which have been obfuscated beyond recognition. Focusing on competitive programming and benchmark tasks (LeetCode and MATH), we compare performance across multiple models and obfuscation methods, such as noise and redaction. We demonstrate that all evaluated LLMs can solve tasks obfuscated to a level where the text would be unintelligible to human readers, and does not contain key pieces of instruction or context. We introduce the concept of eager pattern matching to describe this behaviour, which is not observed in tasks published after the models’ knowledge cutoff date, indicating strong memorisation or overfitting to training data, rather than legitimate reasoning about the presented problem. We report empirical evidence of distinct performance decay patterns between contaminated and unseen datasets. We discuss the implications for benchmarking and evaluations of model behaviour, arguing for caution when designing experiments using standard datasets. We also propose measuring the decay of performance under obfuscation as a possible strategy for detecting dataset contamination and highlighting potential safety risks and interpretability issues for automated software systems.
本文调查了大型语言模型(LLMS)认识和解决超出认知范围的任务的能力。我们把注意力集中在竞争性编程和基准任务(LeetCode和MATH)上,比较了多种模型和模糊方法(例如噪音和编辑)的性能。我们证明,所有经过评价的LLMS都能够解决被模糊的任务,使其达到对人类读者不易理解的程度,而没有包含关键的指示或背景。我们引入了渴望模式匹配的概念来描述这种行为,在模型知识截止日期之后公布的任务中没有观察到这种行为,表明高度的记忆化或过度适应培训数据,而不是对所提出的问题进行合理的推理。我们报告了被污染的数据集和不可见的数据集之间不同性能衰减模式的经验证据。我们讨论了在使用标准数据集设计实验时对基准和行为评价的影响,我们主张谨慎。我们还提议测量模糊状态下性能的衰败,作为发现数据污染和突出潜在安全风险以及自动化软件系统可解释性问题的可能战略。
Article 65
Title@2025-05-29 (4): Multilook Coherent Imaging: Theoretical Guarantees and Algorithms
Title: Multilook Coherent Imaging: Theoretical Guarantees and Algorithms | Multilook Coherent Imaging: Theoretische Garantien und Algorithmen | 多视相协调成像:理论保障和理算 2505.23594v1 |
Authors: Xi Chen, Soham Jana, Christopher A. Metzler, Arian Maleki, Shirin Jalali
Multilook coherent imaging is a widely used technique in applications such as digital holography, ultrasound imaging, and synthetic aperture radar. A central challenge in these systems is the presence of multiplicative noise, commonly known as speckle, which degrades image quality. Despite the widespread use of coherent imaging systems, their theoretical foundations remain relatively underexplored. In this paper, we study both the theoretical and algorithmic aspects of likelihood-based approaches for multilook coherent imaging, providing a rigorous framework for analysis and method development. Our theoretical contributions include establishing the first theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the deep image prior hypothesis. Our results capture the dependence of MSE on the number of parameters in the deep image prior, the number of looks, the signal dimension, and the number of measurements per look. On the algorithmic side, we employ projected gradient descent (PGD) as an efficient method for computing the maximum likelihood solution. Furthermore, we introduce two key ideas to enhance the practical performance of PGD. First, we incorporate the Newton-Schulz algorithm to compute matrix inverses within the PGD iterations, significantly reducing computational complexity. Second, we develop a bagging strategy to mitigate projection errors introduced during PGD updates. We demonstrate that combining these techniques with PGD yields state-of-the-art performance. Our code is available at https://github.com/Computational-Imaging-RU/Bagged-DIP-Speckle.
多视一致成像是数字全息学、超声成像和合成孔径雷达等应用中广泛使用的一种技术。这些系统中的一个中心挑战是存在多复制性噪音,通常称为分光,从而降低图像质量。尽管广泛使用一致成像系统,但其理论基础仍然相对没有得到充分探讨。在本文中,我们研究了多视一致成像基于可能性的方法的理论和算法方面,为分析和方法开发提供了一个严格的框架。我们的理论贡献包括:在深层图像假设下,建立最大可能性估计器(MSE)的理论上限。我们的结果反映了MSE对前深图像参数数量的依赖性、外观数量、信号尺寸和每面测量数。在算法方面,我们使用预测的梯度下降值(PGD)作为计算最大可能性解决方案的有效方法。此外,我们引入了两个关键想法,即加强PGD的实际表现。首先,我们采用了新星-沙尔兹算算算算算法,在深度图变缩缩缩缩图中,我们用这些变缩缩的缩缩缩图,我们用GDGDM/DMDA的缩算方法在进行。
Article 66
Title@2025-05-29 (4): Position: Federated Foundation Language Model Post-Training Should Focus on Open-Source Models
Title: Position: Federated Foundation Language Model Post-Training Should Focus on Open-Source Models | Position: Federated Foundation Language Model Nachschulung sollte sich auf Open-Source-Modelle konzentrieren | 立场:联邦基金会语文示范培训后培训应侧重于开放来源模式 2505.23593v1 |
Authors: Nikita Agrawal, Simon Mertel, Ruben Mayer
Post-training of foundation language models has emerged as a promising research domain in federated learning (FL) with the goal to enable privacy-preserving model improvements and adaptations to user’s downstream tasks. Recent advances in this area adopt centralized post-training approaches that build upon black-box foundation language models where there is no access to model weights and architecture details. Although the use of black-box models has been successful in centralized post-training, their blind replication in FL raises several concerns. Our position is that using black-box models in FL contradicts the core principles of federation such as data privacy and autonomy. In this position paper, we critically analyze the usage of black-box models in federated post-training, and provide a detailed account of various aspects of openness and their implications for FL.
基础语言模型的后培训已成为联合会学习(FL)中一个有希望的研究领域,目标是使隐私保护模式的改进和适应用户的下游任务。该领域最近的进展是采用集中的培训后培训方法,以黑盒基础语言模型为基础,在没有机会获得模型重量和结构细节的情况下建立这种模式。虽然在集中培训后使用黑盒模式取得了成功,但在FL中盲目复制却引起了若干关切。我们的立场是,在FL中使用黑盒模式与联邦的核心原则,如数据隐私和自主性相矛盾。在本立场文件中,我们严格分析黑盒模式在黑盒培训后培训中的使用情况,并详细说明开放的各个方面及其对FL的影响。
Article 67
Title@2025-05-29 (4): Accelerated Training of Federated Learning via Second-Order Methods
Title: Accelerated Training of Federated Learning via Second-Order Methods | Beschleunigte Ausbildung des Föderierten Lernens über Methoden der zweiten Ordnung | 通过二级方法加快联邦学习培训 2505.23588v1 |
Authors: Mrinmay Sen, Sidhant R Nair, C Krishna Mohan
This paper explores second-order optimization methods in Federated Learning (FL), addressing the critical challenges of slow convergence and the excessive communication rounds required to achieve optimal performance from the global model. While existing surveys in FL primarily focus on challenges related to statistical and device label heterogeneity, as well as privacy and security concerns in first-order FL methods, less attention has been given to the issue of slow model training. This slow training often leads to the need for excessive communication rounds or increased communication costs, particularly when data across clients are highly heterogeneous. In this paper, we examine various FL methods that leverage second-order optimization to accelerate the training process. We provide a comprehensive categorization of state-of-the-art second-order FL methods and compare their performance based on convergence speed, computational cost, memory usage, transmission overhead, and generalization of the global model. Our findings show the potential of incorporating Hessian curvature through second-order optimization into FL and highlight key challenges, such as the efficient utilization of Hessian and its inverse in FL. This work lays the groundwork for future research aimed at developing scalable and efficient federated optimization methods for improving the training of the global model in FL.
本文探讨了联邦学习联合会(FL)的二级优化方法,探讨了缓慢趋同和为达到全球模式最佳业绩所需的过度通信周期等关键挑战。虽然FL的现有调查主要侧重于与统计和装置标签差异有关的挑战,以及一级FL方法的隐私和安全问题,但对模式培训缓慢问题的关注较少。这种缓慢的培训往往导致需要过多的通信回合或增加通信成本,特别是在客户数据高度差异的情况下。我们在本文件中审查了利用第二级优化来加快培训进程的多种FL方法。我们提供了第二级FL方法的全面分类,并根据趋同速度、计算成本、记忆使用、传承间接费用和全球模型的普及,比较其业绩。我们的调查结果显示,通过第二级优化将赫森曲线纳入FL的可能性,并突出了主要挑战,例如赫桑的有效利用及其在FL的反面。这项工作为今后旨在改进FC可升级和高效全球优化方法的示范研究奠定了基础。
Article 68
Title@2025-05-29 (4): PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation
Title: PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation | PCA für verbesserte Cross-Dataset-Verallgemeinerung in der Brust-Ultraschall-Tumor-Segmentierung | 五氯苯甲醚,用于在乳房超声波肿瘤分割中增强交叉数据的通用性 2505.23587v1 |
Authors: Christian Schmidt, Heinrich Martin Overhoff
In medical image segmentation, limited external validity remains a critical obstacle when models are deployed across unseen datasets, an issue particularly pronounced in the ultrasound image domain. Existing solutions-such as domain adaptation and GAN-based style transfer-while promising, often fall short in the medical domain where datasets are typically small and diverse. This paper presents a novel application of principal component analysis (PCA) to address this limitation. PCA preprocessing reduces noise and emphasizes essential features by retaining approximately 90\% of the dataset variance. We evaluate our approach across six diverse breast tumor ultrasound datasets comprising 3,983 B-mode images and corresponding expert tumor segmentation masks. For each dataset, a corresponding dimensionality reduced PCA-dataset is created and U-Net-based segmentation models are trained on each of the twelve datasets. Each model trained on an original dataset was inferenced on the remaining five out-of-domain original datasets (baseline results), while each model trained on a PCA dataset was inferenced on five out-of-domain PCA datasets. Our experimental results indicate that using PCA reconstructed datasets, instead of original images, improves the model’s recall and Dice scores, particularly for model-dataset pairs where baseline performance was lowest, achieving statistically significant gains in recall (0.57 $\pm$ 0.07 vs. 0.70 $\pm$ 0.05, $p = 0.0004$) and Dice scores (0.50 $\pm$ 0.06 vs. 0.58 $\pm$ 0.06, $p = 0.03$). Our method reduced the decline in recall values due to external validation by $33\%$. These findings underscore the potential of PCA reconstruction as a safeguard to mitigate declines in segmentation performance, especially in challenging cases, with implications for enhancing external validity in real-world medical applications.
在医学图像分割中,当模型在超声波图像域中部署时,有限的外部有效性仍是一个关键障碍。现有解决方案,例如域适应和基于 GAN 风格的转移,虽然很有希望,但往往在医疗领域落后,因为数据基通常规模较小和多样化。本文展示了一种新颖的主要元件分析(PCA)应用,以应对这一限制。CCA预处理噪音,并通过保留大约90美元的数据元差异来强调基本特征。我们评估了我们通过由 0983 B-mode 图像和相应的专家肿瘤分解掩罩组成的六种不同乳腺超声数据集的方法。对于每个数据集来说,一个相应的维度减少了CPA-data数据集,而基于U-Net的分解模型则在每12个数据集中都受到训练。每个在原始数据集中受训的模型都根据其余的5个外部原始数据集(基线结果)来减少噪音。每套模型都用具有挑战性地计算结果,每套货币值为0.07美元。在5 美元 美元 的外部数据模型中,我们的原始数据解算算算出50美元 。我们的原始数据模型中, 降为0.203 。我们的实验结果,特别地显示的成绩,特别地算算算算算得 。
Article 69
Title@2025-05-29 (4): On-Policy RL with Optimal Reward Baseline
Title: On-Policy RL with Optimal Reward Baseline | On-Policy RL mit optimaler Prämienbasis | 具有最佳回报基准的 政策性RL 2505.23585v1 |
Authors: Yaru Hao, Li Dong, Xun Wu, Shaohan Huang, Zewen Chi, Furu Wei
Reinforcement learning algorithms are fundamental to align large language models with human preferences and to enhance their reasoning capabilities. However, current reinforcement learning algorithms often suffer from training instability due to loose on-policy constraints and computational inefficiency due to auxiliary models. In this work, we propose On-Policy RL with Optimal reward baseline (OPO), a novel and simplified reinforcement learning algorithm designed to address these challenges. OPO emphasizes the importance of exact on-policy training, which empirically stabilizes the training process and enhances exploration. Moreover, OPO introduces the optimal reward baseline that theoretically minimizes gradient variance. We evaluate OPO on mathematical reasoning benchmarks. The results demonstrate its superior performance and training stability without additional models or regularization terms. Furthermore, OPO achieves lower policy shifts and higher output entropy, encouraging more diverse and less repetitive responses. These results highlight OPO as a promising direction for stable and effective reinforcement learning in large language model alignment and reasoning tasks. The implementation is provided at https://github.com/microsoft/LMOps/tree/main/opo.
强化学习算法对于使大型语言模式与人类偏好相一致并提高其推理能力至关重要。然而,由于政策限制松散,而且由于辅助模式导致计算效率低下,目前的强化学习算法往往因培训不稳定而受到影响。在这项工作中,我们提议采用最佳奖励基线(OPO),即新的简化强化学习算法(OPO),以应对这些挑战。OPO强调精确的政策培训的重要性,这种培训在经验上稳定了培训过程,并加强了探索。此外,OPO还引入了最佳奖励基线,从理论上将梯度差异降到最低。我们评估了OPO的数学推理基准。结果显示,OPO在没有额外模型或正规化条件的情况下,其业绩和培训稳定性较高。此外,OPO实现了较低的政策变化和产出增量,鼓励了更多多样性和较少重复性的反应。这些结果突出OPO是稳定和有效加强大语言模式调整和推理任务的有希望的方向。在 https://github.com/microcol/LMOps/tree/pine/polo/opopoto。
Article 70
Title@2025-05-29 (4): Improving Time Series Forecasting via Instance-aware Post-hoc Revision
Title: Improving Time Series Forecasting via Instance-aware Post-hoc Revision | Verbesserung der Zeitreihenprognose über Instance-aware Post-hoc-Revision | 改进时间序列预测,通过 “ 热后后预测 “ 改进时间序列预测 2505.23583v1 |
Authors: Zhiding Liu, Mingyue Cheng, Guanhao Zhao, Jiqian Yang, Qi Liu, Enhong Chen
Time series forecasting plays a vital role in various real-world applications and has attracted significant attention in recent decades. While recent methods have achieved remarkable accuracy by incorporating advanced inductive biases and training strategies, we observe that instance-level variations remain a significant challenge. These variations–stemming from distribution shifts, missing data, and long-tail patterns–often lead to suboptimal forecasts for specific instances, even when overall performance appears strong. To address this issue, we propose a model-agnostic framework, PIR, designed to enhance forecasting performance through Post-forecasting Identification and Revision. Specifically, PIR first identifies biased forecasting instances by estimating their accuracy. Based on this, the framework revises the forecasts using contextual information, including covariates and historical time series, from both local and global perspectives in a post-processing fashion. Extensive experiments on real-world datasets with mainstream forecasting models demonstrate that PIR effectively mitigates instance-level errors and significantly improves forecasting reliability.
时间序列预测在现实世界的各种应用中发挥着关键作用,近几十年来吸引了极大关注。虽然最近的方法通过纳入先进的感化偏差和培训战略取得了显著的准确性,但我们注意到,实例层面的差异仍是一个重大挑战。由于分布变化、数据缺失和长尾模式的变化,往往导致对具体实例的预测不尽如人意,即使总体性能看似强劲。为了解决这一问题,我们提议了一个模型-不可知性框架,即PIR,目的是通过预测后识别和订正来提高预测绩效。具体地说,PIR首先通过估计其准确性来查明偏差预测实例。基于这一点,该框架从后处理时的当地和全球角度对预测进行了修改,包括变量和历史时间序列。在现实世界数据集和主流预测模型上进行的广泛实验表明,PIR有效地减轻了实例级错误,并大大提高了预测的可靠性。
Article 71
Title@2025-05-29 (4): Wake-Informed 3D Path Planning for Autonomous Underwater Vehicles Using A* and Neural Network Approximations
Title: Wake-Informed 3D Path Planning for Autonomous Underwater Vehicles Using A* and Neural Network Approximations | Wake-Informierte 3D-Pfadplanung für autonome Unterwasserfahrzeuge mit A*- und Neuralnetzwerk-Annäherungen | 使用A* 和神经网络相近的自动水下车辆的觉醒3D路径规划 2502.01918v2 |
Authors: Zachary Cooper-Baldock, Stephen Turnock, Karl Sammut
Autonomous Underwater Vehicles (AUVs) encounter significant energy, control and navigation challenges in complex underwater environments, particularly during close-proximity operations, such as launch and recovery (LAR), where fluid interactions and wake effects present additional navigational and energy challenges. Traditional path planning methods fail to incorporate these detailed wake structures, resulting in increased energy consumption, reduced control stability, and heightened safety risks. This paper presents a novel wake-informed, 3D path planning approach that fully integrates localized wake effects and global currents into the planning algorithm. Two variants of the A* algorithm - a current-informed planner and a wake-informed planner - are created to assess its validity and two neural network models are then trained to approximate these planners for real-time applications. Both the A* planners and NN models are evaluated using important metrics such as energy expenditure, path length, and encounters with high-velocity and turbulent regions. The results demonstrate a wake-informed A* planner consistently achieves the lowest energy expenditure and minimizes encounters with high-velocity regions, reducing energy consumption by up to 11.3%. The neural network models are observed to offer computational speedup of 6 orders of magnitude, but exhibit 4.51 - 19.79% higher energy expenditures and 9.81 - 24.38% less optimal paths. These findings underscore the importance of incorporating detailed wake structures into traditional path planning algorithms and the benefits of neural network approximations to enhance energy efficiency and operational safety for AUVs in complex 3D domains.
自主水下潜水器(AUV)在复杂的水下环境中,特别是在发射和回收(LAR)等近距离作业中,遇到能源、控制和导航方面的重大挑战,特别是在发射和回收(LAR)等近距离作业中,流动相互作用和后醒效应带来额外的航行和能源挑战;传统路径规划方法未能纳入这些详细的防守结构,导致能源消耗增加、控制稳定性降低、安全风险增加;本文介绍了一种新颖的知情后醒、3D路径规划方法,充分将局部后醒效应和全球潮流纳入规划算法;A* 算法的两个变式,即目前知情的计划者和后醒悟计划设计者,用来评估其有效性,然后对两个神经网络模型进行培训,以近似这些规划者进行实时应用。A* 规划师和NNN模型都使用能源支出增加、路径长度和与高速和动荡地区相遇等重要指标进行评估。结果显示,知情的A* 规划员始终实现最起码的能源支出,并将高速度区域遇到的能源消耗减少至11.3%,但晚知情规划。在19个神经网络模型显示,将更高速度的域域域域域域域域段的计算结果为24显示速度的进度为24的进度。
Article 72
Title@2025-05-29 (4): BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
Title: BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | BioReason: Förderung multimodaler biologischer Vernunft innerhalb eines DNA-LLM-Modells | BioReason:在DNA-LLM模型中激励多式生物理由 2505.23579v1 |
Authors: Adibvafa Fallahpour, Andrew Magnuson, Purav Gupta, Shihao Ma, Jack Naimer, Arnav Shah, Haonan Duan, Omar Ibrahim, Hani Goodarzi, Chris J. Maddison, Bo Wang
Unlocking deep, interpretable biological reasoning from complex genomic data is a major AI challenge hindering scientific discovery. Current DNA foundation models, despite strong sequence representation, struggle with multi-step reasoning and lack inherent transparent, biologically intuitive explanations. We introduce BioReason, a pioneering architecture that, for the first time, deeply integrates a DNA foundation model with a Large Language Model (LLM). This novel connection enables the LLM to directly process and reason with genomic information as a fundamental input, fostering a new form of multimodal biological understanding. BioReason’s sophisticated multi-step reasoning is developed through supervised fine-tuning and targeted reinforcement learning, guiding the system to generate logical, biologically coherent deductions. On biological reasoning benchmarks including KEGG-based disease pathway prediction - where accuracy improves from 88% to 97% - and variant effect prediction, BioReason demonstrates an average 15% performance gain over strong single-modality baselines. BioReason reasons over unseen biological entities and articulates decision-making through interpretable, step-by-step biological traces, offering a transformative approach for AI in biology that enables deeper mechanistic insights and accelerates testable hypothesis generation from genomic data. Data, code, and checkpoints are publicly available at https://github.com/bowang-lab/BioReason
从复杂的基因组数据中解开的深层、可解释的生物推理是妨碍科学发现的一项重大挑战。目前的DNA基础模型,尽管有很强的顺序代表,却与多步推理斗争,缺乏内在的透明、生物学直观的解释。我们引入了BioReason,这是一个开创性架构,首次将DNA基础模型与大语言模型(LLM)深入结合。这种新颖的连接使LLLM能够直接处理和解释基因组信息,将其作为一种基本投入,促进一种新形式的多式联运生物理解。BioReason的尖端多步推理是通过监督的微调和有针对性的强化学习来发展,指导系统产生符合逻辑、生物一致性的推理。关于生物推理基准,包括基于KEGG的疾病路径预测,其精确率从88%提高到97%,以及变异效应预测,BioReason显示平均15%的性能超过强的单一模式基线。关于无形生物实体的理由,并通过可解释、逐步的生物追踪来解释决策,为生物学的系统提供变革方法,在生物学上提供改变性方法,使数据/基因系统生成的模型的模型的模型能够更深层次上得到的数据。
Article 73
Title@2025-05-29 (4): CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
Title: CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring | CoT Red-Handed: Stresstesting Chain-of-Thought-Überwachung | COT 红手:压力测试研究链监测 2505.23575v1 |
Authors: Benjamin Arnav, Pablo Bernabeu-Pérez, Nathan Helm-Burger, Tim Kostolansky, Hannes Whittingham, Mary Phuong
As AI models are deployed with increasing autonomy, it is important to ensure they do not take harmful actions unnoticed. As a potential mitigation, we investigate Chain-of-Thought (CoT) monitoring, wherein a weaker trusted monitor model continuously oversees the intermediate reasoning steps of a more powerful but untrusted model. We compare CoT monitoring to action-only monitoring, where only final outputs are reviewed, in a red-teaming setup where the untrusted model is instructed to pursue harmful side tasks while completing a coding problem. We find that CoT monitoring improves detection by up to 27 percentage points in scenarios where action-only monitoring fails to reliably identify sabotage. However, CoT traces can also contain misleading rationalizations that deceive the monitor, reducing performance in more obvious sabotage cases. To address this, we introduce a hybrid protocol that independently scores both reasoning and final outputs and combines them using a weighted average. This hybrid monitor consistently outperforms both CoT and action-only monitors across all tested models and tasks, with detection rates over four times higher than action-only monitoring for subtle deception scenarios.
由于大赦国际模式的部署越来越具有自主性,因此必须确保它们不会不注意有害行动。作为可能的缓解措施,我们调查“努力链”监测,其中受信任程度较低的监测模式持续监督一个更强大但不信任的模式的中间推理步骤。我们比较“信任度较低的监测”和“只行动”监测,因为只有最后产出才接受审查的“行动”监测,在红色组合中,不受信任的模式被指示执行有害的侧面任务,同时完成编码问题。我们发现,“信任度”监测在只采取行动的监测无法可靠地查明破坏的情景中提高了高达27个百分点的检测。然而,“信任度”跟踪还包含误导性合理化,欺骗了监测,降低了更明显的破坏性案例的绩效。为了解决这一问题,我们引入了一个混合协议,独立地计算推理和最终产出,并使用加权平均数将其组合在一起。这种混合监测始终超越“信任”和“行动”监测,在所有测试的模式和任务中,探测率比“行动”监测率高出4倍多于“行动”监测。
Article 74
Title@2025-05-29 (4): Maximum Likelihood Learning of Latent Dynamics Without Reconstruction
Title: Maximum Likelihood Learning of Latent Dynamics Without Reconstruction | Maximale Wahrscheinlichkeit Lernen von latenten Dynamiken ohne Rekonstruktion | 学习没有重建的原始动力学 2505.23569v1 |
Authors: Samo Hromadka, Kai Biegun, Lior Fox, James Heald, Maneesh Sahani
We introduce a novel unsupervised learning method for time series data with latent dynamical structure: the recognition-parametrized Gaussian state space model (RP-GSSM). The RP-GSSM is a probabilistic model that learns Markovian Gaussian latents explaining statistical dependence between observations at different time steps, combining the intuition of contrastive methods with the flexible tools of probabilistic generative models. Unlike contrastive approaches, the RP-GSSM is a valid probabilistic model learned via maximum likelihood. Unlike generative approaches, the RP-GSSM has no need for an explicit network mapping from latents to observations, allowing it to focus model capacity on inference of latents. The model is both tractable and expressive: it admits exact inference thanks to its jointly Gaussian latent prior, while maintaining expressivity with an arbitrarily nonlinear neural network link between observations and latents. These qualities allow the RP-GSSM to learn task-relevant latents without ad-hoc regularization, auxiliary losses, or optimizer scheduling. We show how this approach outperforms alternatives on problems that include learning nonlinear stochastic dynamics from video, with or without background distractors. Our results position the RP-GSSM as a useful foundation model for a variety of downstream applications.
我们对具有潜伏动态结构的时间序列数据采用了一种新的不受监督的学习方法:识别和平衡高斯州空间模型(RP-GSSM)。RP-GSSM是一个概率模型,它学习Markovian Gaussian潜伏,解释不同时间步骤观测之间的统计依赖性,将对比方法的直觉与概率基因模型的灵活工具结合起来。与对比方法不同,RP-GSSM是一种通过最大可能性学习的有效的概率模型。与基因化方法不同,RP-GSSM不需要从潜层到观测的清晰网络绘图,使其将模型能力集中在潜层的推断上。这个模型既具有可移植性和可表达性:它承认精确推导出不同时间步骤,同时保持与观测和潜层之间任意的非线性神经网络联系的直观性。这些特性使RP-GSSM能够学习与任务相关的潜层,而没有自动规范、辅助性损失或优化的定位。我们展示了模型在下游应用中如何将模型的定位定位定位作为不具有背景的图像基础,我们如何学习了在下流流式模型上的替代方法。
Article 75
Title@2025-05-29 (4): DRO: A Python Library for Distributionally Robust Optimization in Machine Learning
Title: DRO: A Python Library for Distributionally Robust Optimization in Machine Learning | DRO: Eine Python-Bibliothek für Distributional Robuste Optimierung im maschinellen Lernen | DRO: 一个用于在机器学习中进行分配式强力优化的 Python 图书馆 2505.23565v1 |
Authors: Jiashuo Liu, Tianyu Wang, Henry Lam, Hongseok Namkoong, Jose Blanchet
We introduce dro, an open-source Python library for distributionally robust optimization (DRO) for regression and classification problems. The library implements 14 DRO formulations and 9 backbone models, enabling 79 distinct DRO methods. Furthermore, dro is compatible with both scikit-learn and PyTorch. Through vectorization and optimization approximation techniques, dro reduces runtime by 10x to over 1000x compared to baseline implementations on large-scale datasets. Comprehensive documentation is available at https://python-dro.org.
我们引入了Dro,这是一个开放源码的Python图书馆,用于对回归和分类问题进行分布式强力优化(DRO),该图书馆安装了14个DRO配方和9个主干模型,使79种不同的DRO方法成为可能,此外,Dro与Scikit-learn和PyTorrch兼容,通过矢量化和优化近似技术,Dro将运行时间比大规模数据集的基准实施时间减少10x至1000x以上,综合文件可在https://python-dro.org上查阅。
Article 76
Title@2025-05-29 (4): Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Title: Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models | Segment Policy Optimization: Effektive Segment-Level-Kreditvergabe in RL für große Sprachmodelle | 政策优化优化:大语言模式RL中有效的分部一级信用分配 2505.23564v1 |
Authors: Yiran Guo, Lijie Xu, Jie Liu, Dan Ye, Shuang Qiu
Enhancing the reasoning capabilities of large language models effectively using reinforcement learning (RL) remains a crucial challenge. Existing approaches primarily adopt two contrasting advantage estimation granularities: Token-level methods (e.g., PPO) aim to provide the fine-grained advantage signals but suffer from inaccurate estimation due to difficulties in training an accurate critic model. On the other extreme, trajectory-level methods (e.g., GRPO) solely rely on a coarse-grained advantage signal from the final reward, leading to imprecise credit assignment. To address these limitations, we propose Segment Policy Optimization (SPO), a novel RL framework that leverages segment-level advantage estimation at an intermediate granularity, achieving a better balance by offering more precise credit assignment than trajectory-level methods and requiring fewer estimation points than token-level methods, enabling accurate advantage estimation based on Monte Carlo (MC) without a critic model. SPO features three components with novel strategies: (1) flexible segment partition; (2) accurate segment advantage estimation; and (3) policy optimization using segment advantages, including a novel probability-mask strategy. We further instantiate SPO for two specific scenarios: (1) SPO-chain for short chain-of-thought (CoT), featuring novel cutpoint-based partition and chain-based advantage estimation, achieving $6$-$12$ percentage point improvements in accuracy over PPO and GRPO on GSM8K. (2) SPO-tree for long CoT, featuring novel tree-based advantage estimation, which significantly reduces the cost of MC estimation, achieving $7$-$11$ percentage point improvements over GRPO on MATH500 under 2K and 4K context evaluation. We make our code publicly available at https://github.com/AIFrameResearch/SPO.
现有方法主要采用两种对比优势估算方法:Token级别方法(例如PPO)旨在提供细微的优势信号,但由于难以培训准确的批评模型而造成不准确的估计。关于其他极端的轨迹级方法(例如GROP),完全依赖来自最终奖励的粗劣优势信号,导致不精确的信用分配。为解决这些限制,我们提议部分政策优化(SPO),这是一个新的RL框架,在中间颗粒度上利用部分水平优势估算,通过提供比轨迹水平更准确的信用分配信号,并由于培训准确的批评模型而导致估算不准确;关于其他极端的轨级方法(例如GROPO),完全依赖来自最终奖励的粗略优势信号,导致不精确的信用分配。为解决这些限制,我们提议采用部分优势,包括新颖的概率估测战略。我们进一步即时价SPO/MO-GO-MO-MO-MO-S-CRal-GO-C-PO-PO-C-C-PO-GO-C-C-LS-C-C-CO-C-C-PO-PO-C-C-C-C-C-C-C-LO-C-C-C-C-C-C-PO-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-PO-C-C-C-C-C-C-C-C-PAR-C-C-C-C-C-C-C-C-C-C-C-PO-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-PL-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-PAR-C-C-C-C-C-C-C-C-
Article 77
Title@2025-05-29 (4): LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Title: LEXam: Benchmarking Legal Reasoning on 340 Law Exams | LEXam: Benchmarking der rechtlichen Begründung von 340 Rechtsprüfungen | LEXam:340项法律考试的法律依据基准 2505.12864v2 |
Authors: Yu Fan, Jingwei Ni, Jakob Merane, Etienne Salimbeni, Yang Tian, Yoan Hermstrüwer, Yinya Huang, Mubashara Akhtar, Florian Geering, Oliver Dreyer, Daniel Brunner, Markus Leippold, Mrinmaya Sachan, Alexander Stremitzer, Christoph Engel, Elliott Ash, Joel Niklaus
Long-form legal reasoning remains a key challenge for large language models (LLMs) in spite of recent advances in test-time scaling. We introduce LEXam, a novel benchmark derived from 340 law exams spanning 116 law school courses across a range of subjects and degree levels. The dataset comprises 4,886 law exam questions in English and German, including 2,841 long-form, open-ended questions and 2,045 multiple-choice questions. Besides reference answers, the open questions are also accompanied by explicit guidance outlining the expected legal reasoning approach such as issue spotting, rule recall, or rule application. Our evaluation on both open-ended and multiple-choice questions present significant challenges for current LLMs; in particular, they notably struggle with open questions that require structured, multi-step legal reasoning. Moreover, our results underscore the effectiveness of the dataset in differentiating between models with varying capabilities. Adopting an LLM-as-a-Judge paradigm with rigorous human expert validation, we demonstrate how model-generated reasoning steps can be evaluated consistently and accurately. Our evaluation setup provides a scalable method to assess legal reasoning quality beyond simple accuracy metrics. Project page: https://lexam-benchmark.github.io/
尽管最近测试时间的扩大有所进展,但大型语言模型(LLMS)的长期法律推理仍然是一项关键挑战。我们引入了LEXam,这是340次法律考试的新基准,涉及不同学科和学位水平的116个法学院课程;数据集包括英语和德语4 886个法律考试问题,包括2 841个长式、开放式问题和2 045个多种选择问题。除了参考答案外,未决问题还附有明确的指导,概述预期的法律推理方法,如问题识别、规则回顾或规则应用。我们对开放式和多种选择问题的评价对目前的LLMS提出了重大挑战;特别是,它们与需要结构化、多步的法律推理的开放问题作斗争。此外,我们的结果强调数据集在区分不同能力模型方面的有效性。采用LLM-as-a-judge模式,并严格地验证人类专家,我们展示如何连贯和准确地评价模型产生的推理步骤。我们的评价设置提供了一种可扩展的方法,用以评估超出简单精确度度度度的衡量标准的质量。项目: https://lexgis-bisgismamuspage.
Article 78
Title@2025-05-29 (4): Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
Title: Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information | Qwen Look Again: Leitende Vision-Sprachen-Reasoning-Modelle, um visuelle Informationen erneut zu speichern | 再看一遍:指导视觉信息重新阅读的视觉-语言定位依据模式 2505.23558v1 |
Authors: Xu Chu, Xinrong Chen, Guanyu Wang, Zhijie Tan, Kui Huang, Wenyu Lv, Tong Mo, Weiping Li
Inference time scaling drives extended reasoning to enhance the performance of Vision-Language Models (VLMs), thus forming powerful Vision-Language Reasoning Models (VLRMs). However, long reasoning dilutes visual tokens, causing visual information to receive less attention and may trigger hallucinations. Although introducing text-only reflection processes shows promise in language models, we demonstrate that it is insufficient to suppress hallucinations in VLMs. To address this issue, we introduce Qwen-LookAgain (Qwen-LA), a novel VLRM designed to mitigate hallucinations by incorporating a vision-text reflection process that guides the model to re-attention visual information during reasoning. We first propose a reinforcement learning method Balanced Reflective Policy Optimization (BRPO), which guides the model to decide when to generate vision-text reflection on its own and balance the number and length of reflections. Then, we formally prove that VLRMs lose attention to visual tokens as reasoning progresses, and demonstrate that supplementing visual information during reflection enhances visual attention. Therefore, during training and inference, Visual Token COPY and Visual Token ROUTE are introduced to force the model to re-attention visual information at the visual level, addressing the limitations of text-only reflection. Experiments on multiple visual QA datasets and hallucination metrics indicate that Qwen-LA achieves leading accuracy performance while reducing hallucinations. Our code is available at: https://github.com/Liar406/Look_Again.
推算时间缩放推动扩大推理,以提高视觉语言模型(VLMs)的性能,从而形成强大的视觉语言解释模型(VLRMs),从而形成强大的视觉语言解释模型(VLRMs),但长期推理会淡化视觉象征,导致视觉信息受到较少的关注,并可能引起幻觉。虽然引入仅以文字表示的反射过程在语言模型中显示出希望,但我们表明它不足以抑制VLMs中的幻觉。为了解决这一问题,我们引入了Quwen-LAgain(Qwen-LA),这是一个新的VLRMM(VLRM),旨在减轻幻觉,其方法是纳入一个视觉文字反思进程,引导模型在推理过程中重新保留视觉信息。我们首先建议强化学习方法,平衡思考政策优化(BROPO),该方法指导模型决定何时生成视觉反思本身的视觉反思,平衡反射次数和长度。然后,我们正式证明VLRMRMs失去对视觉信息作为推理学进步的注意,并表明在思考过程中补充视觉信息会加强视觉关注。因此,在培训和推理学期间,视觉TVOVY-CY-LVAL-LVAL-LA(OLVALA)在视觉记录中,在视觉判断到可判前的图像反映的图像反映的多次的图像的图像-LVALVALVALVALVA级水平上,在可判读数据到可辨),在可判读。
Article 79
Title@2025-05-29 (4): Learning Parametric Distributions from Samples and Preferences
Title: Learning Parametric Distributions from Samples and Preferences | Parametrische Verteilungen aus Proben und Präferenzen lernen | 抽样和优惠制的学习参数分布 2505.23557v1 |
Authors: Marc Jourdan, Gizem Yüce, Nicolas Flammarion
Recent advances in language modeling have underscored the role of preference feedback in enhancing model performance. This paper investigates the conditions under which preference feedback improves parameter estimation in classes of continuous parametric distributions. In our framework, the learner observes pairs of samples from an unknown distribution along with their relative preferences depending on the same unknown parameter. We show that preference-based M-estimators achieve a better asymptotic variance than sample-only M-estimators, further improved by deterministic preferences. Leveraging the hard constraints revealed by deterministic preferences, we propose an estimator achieving an estimation error scaling of $\mathcal{O}(1/n)$ – a significant improvement over the $\Theta(1/\sqrt{n})$ rate attainable with samples alone. Next, we establish a lower bound that matches this accelerated rate; up to dimension and problem-dependent constants. While the assumptions underpinning our analysis are restrictive, they are satisfied by notable cases such as Gaussian or Laplace distributions for preferences based on the log-probability reward.
语言建模方面的最新进展凸显了偏好反馈在提高模型性能方面的作用。本文调查了偏好反馈改善连续参数分布类别参数估计的条件。 在我们的框架内, 学习者观察的是来自未知分布的样本配对, 以及根据相同未知参数的相对偏好。 我们显示, 以优惠为基础的M- 估测器比只采样的M- 估测器的无症状差异要好得多, 并通过确定性偏好进一步提高。 利用确定性偏好所揭示的硬性限制, 我们建议了一位估测器, 实现美元/ mathcal{O}( 1/ n) 的估算误差比例, 大大高于单凭标本即可得到的 $/ Theta (1/\ sqrt{n} 率。 接下来, 我们设定了一个更低的界限, 与这一加速率相匹配; 最高为尺寸和问题依赖的常数。 尽管我们的分析所依据的假设是限制性的, 但是他们对一些突出的例子感到满意, 例如高斯或拉比特分配基于日- 概率奖励的偏好。
Article 80
Title@2025-05-29 (4): Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling
Title: Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling | Adaptives Federated LoRA in heterogenen drahtlosen Netzwerken mit unabhängiger Probenahme | 具有独立抽样调查的多源无线网络中的联邦适应性 2505.23555v1 |
Authors: Yanzhao Hou, Jiaxiang Geng, Boyu Li, Xiaofeng Tao, Juncheng Wang, Xiaodong Xu, Bing Luo
Federated LoRA has emerged as a promising technique for efficiently fine-tuning large language models (LLMs) on distributed devices by reducing the number of trainable parameters. However, existing approaches often inadequately overlook the theoretical and practical implications of system and data heterogeneity, thereby failing to optimize the overall training efficiency, particularly in terms of wall-clock time. In this paper, we propose an adaptive federated LoRA strategy with independent client sampling to minimize the convergence wall-clock time of federated fine-tuning under both computation and communication heterogeneity. We first derive a new convergence bound for federated LoRA with arbitrary and independent client sampling, notably without requiring the stringent bounded gradient assumption. Then, we introduce an adaptive bandwidth allocation scheme that accounts for heterogeneous client resources and system bandwidth constraints. Based on the derived theory, we formulate and solve a non-convex optimization problem to jointly determine the LoRA sketching ratios and sampling probabilities, aiming to minimize wall-clock convergence time. An efficient and low-complexity algorithm is developed to approximate the solution. Finally, extensive experiments demonstrate that our approach significantly reduces wall-clock training time compared to state-of-the-art methods across various models and datasets.
通过减少可训练参数的数量,联邦洛拉联盟已成为高效微调分布式设备上大型语言模型(LLMs)的一个很有希望的技术,通过减少可训练参数的数量,可以有效地微调分布式设备上的大型语言模型(LLMs),但是,现有的方法往往没有适当地忽视系统和数据差异的理论和实践影响,从而未能优化总体培训效率,特别是墙时时段的培训效率。在本文件中,我们提出了一个适应性的联邦洛拉联盟战略,通过独立客户抽样,尽量减少计算和通信差异性两种情况下联合微调的同步时间。我们首先为具有任意和独立客户抽样的联邦洛拉公司找到新的趋同点,特别是不需要严格的封闭梯度假设。然后,我们引入了适应性带宽分配计划,考虑到各种客户资源和系统带宽限制。根据推理,我们制定并解决非凝固型优化问题,共同确定洛拉的草图比例和取样概率,目的是最大限度地减少墙时段的趋同时间。我们制定了高效和低兼容性的算法,以近解决方案。最后,广泛的实验表明我们的做法大大缩短了各种壁点培训时间和不同状态的数据。
Article 81
Title@2025-05-29 (4): Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters
Title: Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters | Nachhaltiges CO2-basiertes und wassereffizientes LLM-Scheeduling in Geo-verteilten Cloud-Rechenzentren | 地球分布云数据中心的可持续碳软件和水效率高的LLM 2505.23554v1 |
Authors: Hayden Moore, Sirui Qi, Ninad Hogade, Dejan Milojicic, Cullen Bash, Sudeep Pasricha
In recent years, Large Language Models (LLM) such as ChatGPT, CoPilot, and Gemini have been widely adopted in different areas. As the use of LLMs continues to grow, many efforts have focused on reducing the massive training overheads of these models. But it is the environmental impact of handling user requests to LLMs that is increasingly becoming a concern. Recent studies estimate that the costs of operating LLMs in their inference phase can exceed training costs by 25x per year. As LLMs are queried incessantly, the cumulative carbon footprint for the operational phase has been shown to far exceed the footprint during the training phase. Further, estimates indicate that 500 ml of fresh water is expended for every 20-50 requests to LLMs during inference. To address these important sustainability issues with LLMs, we propose a novel framework called SLIT to co-optimize LLM quality of service (time-to-first token), carbon emissions, water usage, and energy costs. The framework utilizes a machine learning (ML) based metaheuristic to enhance the sustainability of LLM hosting across geo-distributed cloud datacenters. Such a framework will become increasingly vital as LLMs proliferate.
近年来,大语言模型(LLM),如ChatGPT、CoPilot和Gemini等,在不同领域被广泛采用。随着LLMs的使用继续增加,许多努力集中于减少这些模型的大量培训间接费用。但是,处理用户对LLMs的要求对环境的影响日益引起关注。最近的研究估计,在推论阶段操作LMs的成本每年可超过培训成本25x。LMs不断被问及,运行阶段的累积碳足迹显示远远超过培训阶段的足迹。此外,估计表明,在推断过程中,每20-50个LMs提出的LMs申请中,就有500毫升的淡水花费。为了与LMS解决这些重要的可持续性问题,我们提出了一个名为SLIT的新框架,以共同优化LMs服务质量(时间到头等)、碳排放、水使用和能源成本。框架利用基于机器的MLAEuric来提高LM公司在地理分布式云中托管服务的可持续性。这一框架将日益成为至关重要的一个框架。
Article 82
Title@2025-05-29 (4): Comparing the Moore-Penrose Pseudoinverse and Gradient Descent for Solving Linear Regression Problems: A Performance Analysis
Title: Comparing the Moore-Penrose Pseudoinverse and Gradient Descent for Solving Linear Regression Problems: A Performance Analysis | Vergleich der Moore-Penrose Pseudoinverse und Gradient Descent zur Lösung linearer Regressionsprobleme: Eine Leistungsanalyse | 将摩尔-彭罗斯-普塞多温和梯底比较以解决线性倒退问题:绩效分析 2505.23552v1 |
Authors: Alex Adams
This paper investigates the comparative performance of two fundamental approaches to solving linear regression problems: the closed-form Moore-Penrose pseudoinverse and the iterative gradient descent method. Linear regression is a cornerstone of predictive modeling, and the choice of solver can significantly impact efficiency and accuracy. I review and discuss the theoretical underpinnings of both methods, analyze their computational complexity, and evaluate their empirical behavior on synthetic datasets with controlled characteristics, as well as on established real-world datasets. My results delineate the conditions under which each method excels in terms of computational time, numerical stability, and predictive accuracy. This work aims to provide practical guidance for researchers and practitioners in machine learning when selecting between direct, exact solutions and iterative, approximate solutions for linear regression tasks.
本文件调查了解决线性回归问题的两种基本方法的比较性能:封闭式摩尔-彭罗斯伪反射和迭代梯度下降法。线性回归是预测型模型的基石,而求解器的选择可以极大地影响效率和准确性。我审查并讨论这两种方法的理论基础,分析其计算复杂性,评价其在具有受控特性的合成数据集和既定真实世界数据集方面的实证行为。我的结果描述了每种方法在计算时间、数字稳定性和预测准确性方面优异的条件。这项工作旨在为研究人员和从业者在选择直线回归任务的直接、精确解决方案和迭接性、近似近似解决方案时,提供机器学习的实际指导。
Article 83
Title@2025-05-29 (4): Diffusion Sampling Correction via Approximately 10 Parameters
Title: Diffusion Sampling Correction via Approximately 10 Parameters | Diffusions-Probenahmekorrektur über ca. 10 Parameter | 通过大约10个参数校正传播抽样校正 2411.06503v3 |
Authors: Guangyi Wang, Wei Peng, Lijiang Li, Wenyu Chen, Yuren Cai, Songzhi Su
While powerful for generation, Diffusion Probabilistic Models (DPMs) face slow sampling challenges, for which various distillation-based methods have been proposed. However, they typically require significant additional training costs and model parameter storage, limiting their practicality. In this work, we propose PCA-based Adaptive Search (PAS), which optimizes existing solvers for DPMs with minimal additional costs. Specifically, we first employ PCA to obtain a few basis vectors to span the high-dimensional sampling space, which enables us to learn just a set of coordinates to correct the sampling direction; furthermore, based on the observation that the cumulative truncation error exhibits an ``S”-shape, we design an adaptive search strategy that further enhances the sampling efficiency and reduces the number of stored parameters to approximately 10. Extensive experiments demonstrate that PAS can significantly enhance existing fast solvers in a plug-and-play manner with negligible costs. E.g., on CIFAR10, PAS optimizes DDIM’s FID from 15.69 to 4.37 (NFE=10) using only 12 parameters and sub-minute training on a single A100 GPU. Code is available at https://github.com/onefly123/PAS.
虽然具有发电能力,但扩散概率模型(DPM)面临缓慢的取样挑战,为此提出了各种蒸馏法方法,但通常需要大量额外的培训费用和模型参数储存,限制其实用性;在这项工作中,我们提议以五氯苯甲醚为基础的适应性搜索(PAS),以尽可能少的额外费用优化DPM的现有解决方案;具体地说,我们首先利用五氯苯甲醚获得一些基础矢量,以跨越高维取样空间,使我们能够只学习一套坐标,以纠正取样方向;此外,根据累积脱轨误差显示“S”形状的观察,我们设计了适应性搜索战略,进一步提高取样效率,并将储存参数的数量减少到大约10个,广泛的实验表明,五氯苯甲醚能够以插装方式大大增强现有的快速解决方案,费用微不足道。
Article 84
Title@2025-05-29 (4): Fast Large Language Model Collaborative Decoding via Speculation
Title: Fast Large Language Model Collaborative Decoding via Speculation | Schnelles Large Language Model Kollaboratives Decodieren über Spekulation | 通过投机进行快速大语言合作示范模式 2502.01662v2 |
Authors: Jiale Fu, Yuchu Jiang, Junkai Chen, Jiaming Fan, Xin Geng, Xu Yang
Large Language Model (LLM) collaborative decoding techniques improve output quality by combining the outputs of multiple models at each generation step, but they incur high computational costs. In this paper, we introduce Collaborative decoding via Speculation (CoS), a novel framework that accelerates collaborative decoding without compromising performance. Inspired by Speculative Decoding–where a small proposal model generates tokens sequentially, and a larger target model verifies them in parallel, our approach builds on two key insights: (1) the verification distribution can be the combined distribution of both the proposal and target models, and (2) alternating each model as the proposer and verifier can further enhance efficiency. We generalize this method to collaboration among n models and theoretically prove that CoS is never slower than standard collaborative decoding, typically achieving faster speed. Extensive experiments demonstrate CoS is 1.11x-2.23x faster than standard collaborative decoding without compromising generation quality. Our code is available at https://github.com/Kamichanw/CoS/.
大型语言模型(LLM)合作解码技术(LLM)通过将多种模型的输出在每一代阶段结合起来,提高了产出质量,但计算成本很高。在本文中,我们引入了通过投机(COS)协作解码(COS),这是一个在不损害性能的情况下加速协作解码的新框架。受一个小型提案模型依次生成代号的投机解码(LLLM)的启发,而一个更大的目标模型平行核查,我们的方法基于两个主要的洞察力:(1) 核查分配可以是提案和目标模型的混合分布,以及(2) 作为提议方和核查方可以进一步提高效率,对每一种模型进行交替。我们将这种方法推广到n模式之间的合作,理论上证明COS从未比标准的合作解码慢过,通常能更快。广泛的实验显示COS比标准的协作解码速度快1.1x-2.23x比标准的代码在不影响生成质量的情况下更快。我们的代码可以在https://github.com/Kamichaw/COS/上查阅。
Article 85
Title@2025-05-29 (4): Domain-Aware Tensor Network Structure Search
Title: Domain-Aware Tensor Network Structure Search | Domain-Aware Tensor Netzwerkstruktur Suche | 域- 软件显示器网络网络结构搜索 2505.23537v1 |
Authors: Giorgos Iacovides, Wuyang Zhou, Chao Li, Qibin Zhao, Danilo Mandic
Tensor networks (TNs) provide efficient representations of high-dimensional data, yet identification of the optimal TN structures, the so called tensor network structure search (TN-SS) problem, remains a challenge. Current state-of-the-art (SOTA) algorithms are computationally expensive as they require extensive function evaluations, which is prohibitive for real-world applications. In addition, existing methods ignore valuable domain information inherent in real-world tensor data and lack transparency in their identified TN structures. To this end, we propose a novel TN-SS framework, termed the tnLLM, which incorporates domain information about the data and harnesses the reasoning capabilities of large language models (LLMs) to directly predict suitable TN structures. The proposed framework involves a domain-aware prompting pipeline which instructs the LLM to infer suitable TN structures based on the real-world relationships between tensor modes. In this way, our approach is capable of not only iteratively optimizing the objective function, but also generating domain-aware explanations for the identified structures. Experimental results demonstrate that tnLLM achieves comparable TN-SS objective function values with much fewer function evaluations compared to SOTA algorithms. Furthermore, we demonstrate that the LLM-enabled domain information can be used to find good initializations in the search space for sampling-based SOTA methods to accelerate their convergence while preserving theoretical performance guarantees.
电线网络(TNS)能够有效地反映高维数据,然而,确定最佳的TN结构,即所谓的高频网络结构搜索(TN-SS)问题,仍然是一项挑战。目前的先进(SOTA)算法在计算上成本很高,因为它们需要广泛的功能评估,而对于现实世界的应用来说,这种评估是令人望而却步的。此外,现有的方法忽视了现实世界数据所固有的宝贵域信息,而且其查明的TN结构缺乏透明度。为此,我们提议了一个新型的TN-SS框架,称为TnLLM,它包含数据域域信息并利用大型语言模型(LLMS)的推理能力直接预测适当的TN结构。拟议的框架涉及一种对域有觉的快速管道,它要求LM根据现实世界关系推导出适当的TN结构结构。我们的方法不仅能够反复优化目标功能,而且还能够为所确定的结构产生域觉悟解释。实验结果表明,TNLLM在S-SS的初始搜索功能上实现了可比较的TN-S-M-LTA的快速搜索功能,而我们使用的域域域域级搜索功能则可以少于SO-MA。
Article 86
Title@2025-05-29 (4): It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Title: It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data | Es ist ein (Blind) Match! Richtung Vision-Sprache Korrespondenz ohne Paralleldaten | 这是一个( Blind) 匹配! 向没有平行数据的视觉语言对应函授 2503.24129v2 |
Authors: Dominik Schnaus, Nikita Araslanov, Daniel Cremers
The platonic representation hypothesis suggests that vision and language embeddings become more homogeneous as model and dataset sizes increase. In particular, pairwise distances within each modality become more similar. This suggests that as foundation models mature, it may become possible to match vision and language embeddings in a fully unsupervised fashion, i.e. without parallel data. We present the first feasibility study, and investigate conformity of existing vision and language foundation models in the context of unsupervised, or “blind”, matching. First, we formulate unsupervised matching as a quadratic assignment problem and introduce a novel heuristic that outperforms previous solvers. We also develop a technique to find optimal matching problems, for which a non-trivial match is very likely. Second, we conduct an extensive study deploying a range of vision and language models on four datasets. Our analysis reveals that for many problem instances, vision and language representations can be indeed matched without supervision. This finding opens up the exciting possibility of embedding semantic knowledge into other modalities virtually annotation-free. As a proof of concept, we showcase an unsupervised classifier, which achieves non-trivial classification accuracy without any image-text annotation.
柏拉图代表假设表明,随着模型和数据集大小的增加,视觉和语言嵌入会变得更加单一。特别是,每种模式内部的相近距离会变得更加相似。这表明,随着基础模型成熟,有可能以完全不受监督的方式,即没有平行数据,匹配视觉和语言嵌入。我们提出第一次可行性研究,调查现有视觉和语言基建模型在不受监督或“盲”匹配背景下的兼容性。首先,我们将不受监督的匹配作为二次分配问题,并引入比以往解决者更相近的新型超模范。我们还开发了一种找到最佳匹配问题的技术,而非三角匹配的可能性很大。第二,我们在四个数据集上进行广泛的研究,运用了一系列的视觉和语言模型。我们的分析表明,对于许多问题的情况,视觉和语言表达确实可以在没有监督的情况下相匹配。这打开了将语义知识嵌入其他模式的令人兴奋的可能性,几乎是无注释的。作为概念的证明,我们展示了一种不受监督的图像分类的准确性,我们展示了一种不精确性。
Article 87
Title@2025-05-29 (4): NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks
Title: NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks | NACHOS: Neurale Architektur Suche nach Hardware eingeschränkt Early Exit Neural Networks | NACHOS: 早期外出神经网络硬件控制系统神经结构搜索 2401.13330v2 |
Authors: Matteo Gambella, Jary Pomponi, Simone Scardapane, Manuel Roveri
Early Exit Neural Networks (EENNs) endow astandard Deep Neural Network (DNN) with Early Exit Classifiers (EECs), to provide predictions at intermediate points of the processing when enough confidence in classification is achieved. This leads to many benefits in terms of effectiveness and efficiency. Currently, the design of EENNs is carried out manually by experts, a complex and time-consuming task that requires accounting for many aspects, including the correct placement, the thresholding, and the computational overhead of the EECs. For this reason, the research is exploring the use of Neural Architecture Search (NAS) to automatize the design of EENNs. Currently, few comprehensive NAS solutions for EENNs have been proposed in the literature, and a fully automated, joint design strategy taking into consideration both the backbone and the EECs remains an open problem. To this end, this work presents Neural Architecture Search for Hardware Constrained Early Exit Neural Networks (NACHOS), the first NAS framework for the design of optimal EENNs satisfying constraints on the accuracy and the number of Multiply and Accumulate (MAC) operations performed by the EENNs at inference time. In particular, this provides the joint design of backbone and EECs to select a set of admissible (i.e., respecting the constraints) Pareto Optimal Solutions in terms of best tradeoff between the accuracy and number of MACs. The results show that the models designed by NACHOS are competitive with the state-of-the-art EENNs. Additionally, this work investigates the effectiveness of two novel regularization terms designed for the optimization of the auxiliary classifiers of the EENN
早期出国神经网络(EENNs)在早期出国分类(EECs)下设置了标准的深心神经网络(DNN),在达到对分类足够信任时,在处理的中间点提供预测,这在有效性和效率方面带来许多好处。目前,EENNes的设计是由专家手工完成的,这是一项复杂和耗时的任务,需要考虑许多方面,包括正确定位、门槛设置和欧亚经济共同体的计算间接费用。因此,研究正在探索利用神经建筑搜索(NAS)实现EENNes设计自动化。目前,文献中为EENNNNes提出了很少的全面的ENAS解决方案,而考虑到骨干和欧亚经济共同体的完全自动化的联合设计战略仍然是一个尚未解决的问题。为此,这项工作提出了为硬体骨架经过训练的早期退出神经网络(NACHOS)进行神经建筑结构搜索,这是国家空间建筑结构中第一个用于设计最佳环境指标的升级框架,满足了对EENNES的精确度和数量的限制。
Article 88
Title@2025-05-29 (4): Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning
Title: Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning | Subgraph Gaussian Einbettungskontrast für selbstüberwachtes Graphen-Darstellungslernen | 自支持图表代表制学习的 Subgraph Gaussian 嵌入式对比对比度 2505.23529v1 |
Authors: Shifeng Xie, Aref Einizade, Jhony H. Giraldo
Graph Representation Learning (GRL) is a fundamental task in machine learning, aiming to encode high-dimensional graph-structured data into low-dimensional vectors. Self-Supervised Learning (SSL) methods are widely used in GRL because they can avoid expensive human annotation. In this work, we propose a novel Subgraph Gaussian Embedding Contrast (SubGEC) method. Our approach introduces a subgraph Gaussian embedding module, which adaptively maps subgraphs to a structured Gaussian space, ensuring the preservation of input subgraph characteristics while generating subgraphs with a controlled distribution. We then employ optimal transport distances, more precisely the Wasserstein and Gromov-Wasserstein distances, to effectively measure the similarity between subgraphs, enhancing the robustness of the contrastive learning process. Extensive experiments across multiple benchmarks demonstrate that \method~outperforms or presents competitive performance against state-of-the-art approaches. Our findings provide insights into the design of SSL methods for GRL, emphasizing the importance of the distribution of the generated contrastive pairs.
图形教学( GRL) 是机器学习的一项基本任务, 目的是将高方图结构数据编码为低维矢量。 自我支持学习( SSL) 方法在 GRL 中被广泛使用, 因为它们可以避免昂贵的人类批注。 在这项工作中, 我们提出一个新的Subgraph Gausian 嵌入对比( SubGEC) 方法。 我们的方法引入了子集子集成模块, 该模块将子集成成到结构化的高斯空间, 确保在生成受控分布的子集时保存输入子集特性。 然后我们使用最佳的运输距离, 更精确地说, 瓦瑟斯坦 和 Gromov- Wasserstein 的距离, 以有效测量子集之间的相似性, 增强对比性学习过程的稳健性。 跨多个基准的大规模实验表明, 血压~ 外形或显示与最新技术方法相比具有竞争力的性能。 我们的发现为 GRSL 设计 SL 方法提供了洞察 设计 方法的洞察力, , 强调生成对比配对分布的重要性 。
Article 89
Title@2025-05-29 (4): Comparative assessment of fairness definitions and bias mitigation strategies in machine learning-based diagnosis of Alzheimer’s disease from MR images
Title: Comparative assessment of fairness definitions and bias mitigation strategies in machine learning-based diagnosis of Alzheimer’s disease from MR images | Vergleichende Bewertung von Fairness-Definitionen und Bias-Minderungsstrategien in der maschinellen Lern-basierten Diagnose der Alzheimer-Krankheit aus MR-Bildern | 对利用MR图像对阿尔茨海默氏病进行机器学习诊断的公平定义和减少偏见战略的比较评估 2505.23528v1 |
Authors: Maria Eleftheria Vlontzou, Maria Athanasiou, Christos Davatzikos, Konstantina S. Nikita
The present study performs a comprehensive fairness analysis of machine learning (ML) models for the diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer’s disease (AD) from MRI-derived neuroimaging features. Biases associated with age, race, and gender in a multi-cohort dataset, as well as the influence of proxy features encoding these sensitive attributes, are investigated. The reliability of various fairness definitions and metrics in the identification of such biases is also assessed. Based on the most appropriate fairness measures, a comparative analysis of widely used pre-processing, in-processing, and post-processing bias mitigation strategies is performed. Moreover, a novel composite measure is introduced to quantify the trade-off between fairness and performance by considering the F1-score and the equalized odds ratio, making it appropriate for medical diagnostic applications. The obtained results reveal the existence of biases related to age and race, while no significant gender bias is observed. The deployed mitigation strategies yield varying improvements in terms of fairness across the different sensitive attributes and studied subproblems. For race and gender, Reject Option Classification improves equalized odds by 46% and 57%, respectively, and achieves harmonic mean scores of 0.75 and 0.80 in the MCI versus AD subproblem, whereas for age, in the same subproblem, adversarial debiasing yields the highest equalized odds improvement of 40% with a harmonic mean score of 0.69. Insights are provided into how variations in AD neuropathology and risk factors, associated with demographic characteristics, influence model fairness.
目前的研究对机器学习(ML)模型进行全面的公平分析,以便根据MRI的神经成像特征,对诊断米氏认知缺陷和阿尔茨海默氏病(AD)的机器学习(MMI)模型进行综合的公平分析。对多科数据集中与年龄、种族和性别有关的双轨关系以及代用特征对这些敏感属性的编码的影响进行了调查。还评估了识别此类偏见的各种公平定义和指标的可靠性。根据最适当的公平措施,对广泛使用的预处理、处理和处理后神经偏差缓解战略的可比性进行了比较分析。此外,还采用了新的综合措施,通过考虑F1分数和对等率比率对公平与业绩之间的权衡进行量化。 所获得的结果显示存在与年龄和种族有关的偏见,但没有观察到严重的性别偏差。 部署的缓解战略在不同敏感属性和子问题中,对广泛使用的处理前处理、处理和处理后神经偏差缓解战略进行了比较分析。在种族和性别方面,拒绝选择的分类使公平性和业绩之间的权衡取价差,在最高等级和最高等级中分别为46%和57%和57%之间,在最高等级之间,最高等级为最高等级为最高等级,最高等级为最高等级为最高等级为最高等级,最高等级为最高等级为最高,最高等级为最高等级为最高等级为最高等级为最高等级为最高等级为最高,最高,最高等级为最高等级为最高等级为最高和最高等级为最高等级为最高等级为最高等级为最高等级为最高等级为最高。
Article 90
Title@2025-05-29 (4): Normalizing Flows are Capable Models for RL
Title: Normalizing Flows are Capable Models for RL | Normalisierende Strömungen sind fähige Modelle für RL | 正常流动是RL的能力模型 2505.23527v1 |
Authors: Raj Ghugare, Benjamin Eysenbach
Modern reinforcement learning (RL) algorithms have found success by using powerful probabilistic models, such as transformers, energy-based models, and diffusion/flow-based models. To this end, RL researchers often choose to pay the price of accommodating these models into their algorithms – diffusion models are expressive, but are computationally intensive due to their reliance on solving differential equations, while autoregressive transformer models are scalable but typically require learning discrete representations. Normalizing flows (NFs), by contrast, seem to provide an appealing alternative, as they enable likelihoods and sampling without solving differential equations or autoregressive architectures. However, their potential in RL has received limited attention, partly due to the prevailing belief that normalizing flows lack sufficient expressivity. We show that this is not the case. Building on recent work in NFs, we propose a single NF architecture which integrates seamlessly into RL algorithms, serving as a policy, Q-function, and occupancy measure. Our approach leads to much simpler algorithms, and achieves higher performance in imitation learning, offline, goal conditioned RL and unsupervised RL.
现代强化学习(RL)算法通过使用强大的概率模型(如变压器、能源基模型以及扩散/流基模型)获得了成功。为此,RL研究人员往往选择支付将这些模型纳入其算法的代价 – – 扩散模型具有表达性,但由于他们依赖解决差异方程式,而自动反向变压器模型是可伸缩的,但通常需要学习离散的表达方式。相比之下,正常流动(NFs)似乎提供了一种有吸引力的替代方法,因为它们使得可能性和取样能够不解决差异方程式或自动反向结构。然而,他们在RL中的潜力受到的注意有限,部分原因是普遍认为正常化流程缺乏足够的表达性。我们表明情况并非如此。根据NFs最近的工作,我们提议了一个单一的NF结构,将无缝合地纳入RL算法,作为政策、Q-功能和占用度尺度。我们的方法导致更简单的算法,并在模仿学习、离线、目标性、目标性、条件和不超光性RL方面实现更高的性功能。
Article 91
Title@2025-05-29 (4): Accelerating AllReduce with a Persistent Straggler
Title: Accelerating AllReduce with a Persistent Straggler | AllReduce mit einem persistenten Straggler beschleunigen | 使用持久性斯特拉格驱动器加速全部拖动 2505.23523v1 |
Authors: Arjun Devraj, Eric Ding, Abhishek Vijaya Kumar, Robert Kleinberg, Rachee Singh
Distributed machine learning workloads use data and tensor parallelism for training and inference, both of which rely on the AllReduce collective to synchronize gradients or activations. However, bulk-synchronous AllReduce algorithms can be delayed by a persistent straggler that is slower to reach the synchronization barrier required to begin the collective. To address this challenge, we propose StragglAR: an AllReduce algorithm that accelerates distributed training and inference in the presence of persistent stragglers. StragglAR implements a ReduceScatter among the remaining GPUs during the straggler-induced delay, and then executes a novel collective algorithm to complete the AllReduce once the straggler reaches the synchronization barrier. StragglAR achieves a 2x theoretical speedup over popular bandwidth-efficient AllReduce algorithms (e.g., Ring) for large GPU clusters with persistent stragglers. On an 8-GPU server, our implementation of StragglAR yields a 22% speedup over state-of-the-art AllReduce algorithms.
分散的机器学习工作量在培训和推论方面使用数据和分解的平行法,两者都依靠 AllReduce 集体组合来同步梯度或激活。 但是, 散装同步的全Reduce 算法可能会被一个持久性的分解器延缓, 而这种分解速度要慢到启动集体所需的同步屏障。 为了应对这一挑战, 我们提议 StragglAR : 一种全Reduce 算法, 加速在持久性排挤者面前的分布式培训和推论。 StragglAR 在 strggler 引发的延缓期间, 在其余的 GPU 中实施一个减少分解器, 然后执行一种新的集体算法, 以在拖动器到达同步屏障后完成全Reduce 。 StragglAR 实现2x理论速度, 超过流行的带宽效率的全Reduce 算法( 如 Ring) 。 在 8- GGPU 服务器上, 我们的 StragglAR 将产生一个超过 22% 的全局-Art- allRuedudes 算法 。
Article 92
Title@2025-05-29 (4): Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents
Title: Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents | Von Mäusen und Maschinen: Ein Vergleich des Lernens zwischen Real World Mäusen und RL Agenten | Mice和Mings:真实世界Mice和RL代理商之间的学习比较 2505.12204v2 |
Authors: Shuo Han, German Espinosa, Junda Huang, Daniel A. Dombeck, Malcolm A. MacIver, Bradly C. Stadie
Recent advances in reinforcement learning (RL) have demonstrated impressive capabilities in complex decision-making tasks. This progress raises a natural question: how do these artificial systems compare to biological agents, which have been shaped by millions of years of evolution? To help answer this question, we undertake a comparative study of biological mice and RL agents in a predator-avoidance maze environment. Through this analysis, we identify a striking disparity: RL agents consistently demonstrate a lack of self-preservation instinct, readily risking ``death’’ for marginal efficiency gains. These risk-taking strategies are in contrast to biological agents, which exhibit sophisticated risk-assessment and avoidance behaviors. Towards bridging this gap between the biological and artificial, we propose two novel mechanisms that encourage more naturalistic risk-avoidance behaviors in RL agents. Our approach leads to the emergence of naturalistic behaviors, including strategic environment assessment, cautious path planning, and predator avoidance patterns that closely mirror those observed in biological systems.
在强化学习(RL)方面最近取得的进展表明,在复杂的决策任务方面,能力令人印象深刻。这一进展提出了一个自然的问题:这些人工系统如何与生物剂进行比较,生物剂是成百上千万年演变形成的?为了帮助回答这个问题,我们对捕食者避险的迷宫环境中的生物小鼠和RL剂进行了比较研究。我们通过这一分析发现了一个显著的差别:RL代理物一贯表明缺乏自我保护本能,很容易冒着“死亡”的风险来提高效率。这些冒险战略与生物剂不同,生物剂表现出复杂的风险评估和避免行为。为缩小生物剂与人工剂之间的这一差距,我们提出了两个新机制,鼓励生物剂中更自然的避免风险行为。我们的方法导致自然行为的出现,包括战略环境评估、谨慎的路径规划以及密切反映生物系统所观察到的捕食者避免模式。
Article 93
Title@2025-05-29 (4): An AI System for Continuous Knee Osteoarthritis Severity Grading Using Self-Supervised Anomaly Detection with Limited Data
Title: An AI System for Continuous Knee Osteoarthritis Severity Grading Using Self-Supervised Anomaly Detection with Limited Data | Ein KI-System für kontinuierliche Knie-Osteoarthritis Schweregraduierung mittels selbstüberwachter Anomalieerkennung mit begrenzten Daten | AI 使用有限数据的自超异常检测系统 2407.11500v2 |
Authors: Niamh Belton, Aonghus Lawlor, Kathleen M. Curran
The diagnostic accuracy and subjectivity of existing Knee Osteoarthritis (OA) ordinal grading systems has been a subject of on-going debate and concern. Existing automated solutions are trained to emulate these imperfect systems, whilst also being reliant on large annotated databases for fully-supervised training. This work proposes a three stage approach for automated continuous grading of knee OA that is built upon the principles of Anomaly Detection (AD); learning a robust representation of healthy knee X-rays and grading disease severity based on its distance to the centre of normality. In the first stage, SS-FewSOME is proposed, a self-supervised AD technique that learns the ‘normal’ representation, requiring only examples of healthy subjects and <3% of the labels that existing methods require. In the second stage, this model is used to pseudo label a subset of unlabelled data as ‘normal’ or ‘anomalous’, followed by denoising of pseudo labels with CLIP. The final stage involves retraining on labelled and pseudo labelled data using the proposed Dual Centre Representation Learning (DCRL) which learns the centres of two representation spaces; normal and anomalous. Disease severity is then graded based on the distance to the learned centres. The proposed methodology outperforms existing techniques by margins of up to 24% in terms of OA detection and the disease severity scores correlate with the Kellgren-Lawrence grading system at the same level as human expert performance. Code available at https://github.com/niamhbelton/SS-FewSOME_Disease_Severity_Knee_Osteoarthritis.
现有Knee Osteoarthrates(OA) 或dinal等级系统(OA) 的诊断准确性和主观性一直是持续辩论和关注的主题。 现有的自动化解决方案经过培训,以效仿这些不完善的系统,同时依靠大型附加说明的数据库进行充分监督的培训。 这项工作提出了基于异常检测(AD) 原则的膝盖 OA 自动连续定级的三阶段方法; 学习健康的膝盖X光和定级疾病严重程度的强有力表现。 在第一阶段, 提出了SS- FewSOME, 一种自我监督的AD技术, 以学习“ 正常” 代表, 仅需要健康科目的实例和 < 3% 现有方法所需的标签。 在第二阶段, 该模型用于将一组未贴标签的数据贴上“ 正常” 或“ 恶性 ” 标签, 并随后与 CLIP 的假标签进行分级。 最后阶段涉及使用拟议的 DIML 内部研究中心(DCRL) 级 学习“正常” 标准/ Ralderalalalalalalalbalalalalal 系统进行再培训。 该模型用于当前正常的常规中心, 和历史中心。 的常规检测中心, 以现有的标准 和现有标准中心为常规 。
Article 94
Title@2025-05-29 (4): SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Title: SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning | SimBa: Einfachheit Bias für das Skalieren von Parametern im Deep Reinforcement Learning | SimBA: 深强化学习中增强参数的简单比值 2410.09754v2 |
Authors: Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno
Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved. Moreover, solely by integrating SimBa architecture into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across DMC, MyoSuite, and HumanoidBench. These results demonstrate SimBa’s broad applicability and effectiveness across diverse RL algorithms and environments.
在CV和NLP中,尽管传统理论表明,较大的网络容易过于过度,但网络数量大,尽管有传统理论表明,更大的网络容易过度,但最近的进展在很大程度上是扩大网络参数的驱动力,尽管有传统理论表明,更大的网络容易过于适应。这些大型网络避免了过度的融合,将一些元素引致简单偏差,引导模式走向简单和普遍的解决办法;然而,在深入的RL中,设计和扩大网络规模的探索探索较少,但利用这个机会,我们提出SimBa这一旨在扩大深度RL的参数的架构,通过输入简单偏差,扩大网络和NLPRP的最近进展,我们提出SimBa。SimBa由三个组成部分组成:(一)观测正常化层,将投入标准化,与运行中统计标准化,投入标准化,(二)剩余的向前向前进块,从输入到产出的线性路径,(三)层,以控制特性大小的线路径,使控制特性大小的大小的尺寸正常化正常化。通过与SimL的参数、MMC、 MySUSite、宽的和人类的系统、宽的系统环境展示这些结果的结果显示结果。
Article 95
Title@2025-05-29 (4): OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data
Title: OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data | OmniEarth-Bench: Auf dem Weg zu einer ganzheitlichen Bewertung der sechs Sphären und der Wechselwirkungen zwischen der Erde und multimodalen Erddaten | Omni地球环境:争取全面评价地球六层和与多模式对地观测地球数据交互作用 2505.23522v1 |
Authors: Fengxiang Wang, Mingshuo Chen, Xuming He, YiFan Zhang, Feng Liu, Zijie Guo, Zhenghao Hu, Jiong Wang, Jingyi Xu, Zhangrui Li, Fenghua Ling, Ben Fei, Weijia Li, Long Lan, Wenjing Yang, Wenlong Zhang, Lei Bai
Existing benchmarks for Earth science multimodal learning exhibit critical limitations in systematic coverage of geosystem components and cross-sphere interactions, often constrained to isolated subsystems (only in Human-activities sphere or atmosphere) with limited evaluation dimensions (less than 16 tasks). To address these gaps, we introduce OmniEarth-Bench, the first comprehensive multimodal benchmark spanning all six Earth science spheres (atmosphere, lithosphere, Oceansphere, cryosphere, biosphere and Human-activities sphere) and cross-spheres with one hundred expert-curated evaluation dimensions. Leveraging observational data from satellite sensors and in-situ measurements, OmniEarth-Bench integrates 29,779 annotations across four tiers: perception, general reasoning, scientific knowledge reasoning and chain-of-thought (CoT) reasoning. This involves the efforts of 2-5 experts per sphere to establish authoritative evaluation dimensions and curate relevant observational datasets, 40 crowd-sourcing annotators to assist experts for annotations, and finally, OmniEarth-Bench is validated via hybrid expert-crowd workflows to reduce label ambiguity. Experiments on 9 state-of-the-art MLLMs reveal that even the most advanced models struggle with our benchmarks, where none of them reach 35\% accuracy. Especially, in some cross-spheres tasks, the performance of leading models like GPT-4o drops to 0.0\%. OmniEarth-Bench sets a new standard for geosystem-aware AI, advancing both scientific discovery and practical applications in environmental monitoring and disaster prediction. The dataset, source code, and trained models were released.
地球科学多式联运学习的现有基准显示,在系统覆盖地球系统组成部分和跨孔互动方面,存在严重的局限性,往往局限于评估层面有限的孤立子系统(仅在人类活动领域或大气中),评估层面有限(不到16项任务)。为了填补这些差距,我们引入了涵盖所有六个地球科学领域(大气、地圈、海洋、冰层、生物圈和人类活动领域)和交叉球体的首个综合多式联运基准OmniEarth-Bench,这是涵盖所有六个地球科学领域(大气、地圈、海洋、冰层、生物圈和人类活动领域)的第一个综合多式联运基准,具有100个专家精准的评价层面。 利用卫星传感器和地表测量的观测数据,OmniEarth-Bench将29 779个说明整合到四个层面:感知、一般推理、科学知识推理和思维链推理(Cobni-Ben-Ben-Ben-Chen),这涉及每个领域2-5专家努力建立权威评价层面的多面评价维度和精确度评估范围,这些方面经过培训的IL-Ben-Ben-Ben-Ben-Ben-CS-S-S-S-S-S-S-S-LS-SLS-S-S-S-S-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SL-S-S-S-SL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-
Article 96
Title@2025-05-29 (4): AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity
Title: AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity | AnkerAchtung: Differenz-Bewusst Sparse Achtung mit Streifen Granularität | 锁定目标: 带条形颗粒的差别- 软件分散注意 2505.23520v1 |
Authors: Yu Zhang, Dong Guo, Fang Wu, Guoliang Zhu, Dian Ding, Yiming Zhang
Large Language Models (LLMs) with extended context lengths face significant computational challenges during the pre-filling phase, primarily due to the quadratic complexity of self-attention. Existing methods typically employ dynamic pattern matching and block-sparse low-level implementations. However, their reliance on local information for pattern identification fails to capture global contexts, and the coarse granularity of blocks leads to persistent internal sparsity, resulting in suboptimal accuracy and efficiency. To address these limitations, we propose \textbf{AnchorAttention}, a difference-aware, dynamic sparse attention mechanism that efficiently identifies critical attention regions at a finer stripe granularity while adapting to global contextual information, achieving superior speed and accuracy. AnchorAttention comprises three key components: (1) \textbf{Pattern-based Anchor Computation}, leveraging the commonalities present across all inputs to rapidly compute a set of near-maximum scores as the anchor; (2) \textbf{Difference-aware Stripe Sparsity Identification}, performing difference-aware comparisons with the anchor to quickly obtain discrete coordinates of significant regions in a stripe-like sparsity pattern; (3) \textbf{Fine-grained Sparse Computation}, replacing the traditional contiguous KV block loading approach with simultaneous discrete KV position loading to maximize sparsity rates while preserving full hardware computational potential. With its finer-grained sparsity strategy, \textbf{AnchorAttention} achieves higher sparsity rates at the same recall level, significantly reducing computation time. Compared to previous state-of-the-art methods, at a text length of 128k, it achieves a speedup of 1.44$\times$ while maintaining higher recall rates.
大型语言模型(LLMS) 使用时间长度较长的大型语言模型(LLMS) 在预填阶段面临巨大的计算挑战,这主要是由于自我注意的二次复杂性造成的。 现有方法通常使用动态模式匹配和块状的低级别执行。 但是,它们依赖本地信息进行模式识别无法捕捉全球背景, 块块的粗微颗粒导致持续的内部偏狭, 导致不优化的准确性和效率。 为了应对这些限制, 我们提议了\ textbf{AnchorAttrant}, 一种认识差异的动态细微关注机制, 能够有效地识别精细条形颗粒的临界关注区域,同时适应全球背景信息, 实现更高速度和准确性。 锚定部分由三个关键部分组成:(1) textbf{Partern- brocor Connorational commation} 利用所有投入的共性, 快速比较一组接近最大分数的锚值; (2) textborf{Def{de-defreal-awareather Stateal deparnial deparity) disal deal deal deal deal deal deal dal dal dal dislational dislation lax lating lating lating lating lax lating lax lax lax lax lax lax lating 和快速算。
Article 97
Title@2025-05-29 (4): Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Title: Hyperspherical Normalization for Scalable Deep Reinforcement Learning | Hypersphärische Normalisierung für skalierbares Deep Reinforcement Learning | 可缩放深强化学习超球常规化 2502.15280v2 |
Authors: Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo
Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains. The code is available at https://dojeon-ai.github.io/SimbaV2.
扩大模型规模和计算使受监督的学习得到一致的绩效改进。然而,这一教训往往不适用于强化学习(RL),因为培训非静止数据模型很容易导致超常和不稳定的优化。作为回应,我们引入了SimbaV2,这是一个新的RL结构,旨在通过超球正常化来稳定优化,(一) 限制重量和特征规范的增长;以及(二) 使用分配价值估计,并给予一定的奖励,以维持不同奖励程度下的稳定梯度。利用软性行为者-批评作为基本算法,SimbaV2 有效地与更大的模型进行升级,并进行更大的计算,在4个领域的57项连续控制任务上实现最先进的业绩。该代码可在https://dojeon-ai.github.io/SimbaV2上查阅。
Article 98
Title@2025-05-29 (4): SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures
Title: SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures | SGD Jittering: Eine Schulungsstrategie für robuste und präzise modellbasierte Architekturen | SGD JGT JUGT JIGT: 强健和准确的建模建筑培训战略 2410.14667v2 |
Authors: Peimeng Guan, Mark A. Davenport
Inverse problems aim to reconstruct unseen data from corrupted or perturbed measurements. While most work focuses on improving reconstruction quality, generalization accuracy and robustness are equally important, especially for safety-critical applications. Model-based architectures (MBAs), such as loop unrolling methods, are considered more interpretable and achieve better reconstructions. Empirical evidence suggests that MBAs are more robust to perturbations than black-box solvers, but the accuracy-robustness tradeoff in MBAs remains underexplored. In this work, we propose a simple yet effective training scheme for MBAs, called SGD jittering, which injects noise iteration-wise during reconstruction. We theoretically demonstrate that SGD jittering not only generalizes better than the standard mean squared error training but is also more robust to average-case attacks. We validate SGD jittering using denoising toy examples, seismic deconvolution, and single-coil MRI reconstruction. Both SGD jittering and its SPGD extension yield cleaner reconstructions for out-of-distribution data and demonstrates enhanced robustness against adversarial attacks.
反面问题旨在从腐败或扰动测量中重建隐蔽数据。虽然大多数工作的重点是提高重建质量,但普遍化的准确性和稳健性同样重要,特别是对于安全关键应用而言。基于模型的建筑(MBAs),例如环状无滚动方法,被认为更容易解释,并实现更好的重建。经验性证据表明,MBAs比黑箱解决问题者更能进行扰动,但MBAs的准确性-紫色交易仍未得到充分探讨。在这项工作中,我们提出了一个简单而有效的MBAs培训计划,称为SGD振动,在重建过程中注入噪音。我们理论上证明SGD的振动不仅比标准的平均平方错误培训更好,而且比普通攻击更强。我们用不注意的玩具、地震变电和单原油MRI等例子来验证SGD振动。SGD的震动及其SPGD扩展为MD提供了更清洁的重建,用于向外分配数据,并显示对对抗性攻击的加强力度。
Article 99
Title@2025-05-29 (4): Joint Localization and Activation Editing for Low-Resource Fine-Tuning
Title: Joint Localization and Activation Editing for Low-Resource Fine-Tuning | Gemeinsame Lokalisierungs- und Aktivierungsbearbeitung für Low-Resource Fine-Tuning | 低资源微调联合定位和启动编辑 2502.01179v4 |
Authors: Wen Lai, Alexander Fraser, Ivan Titov
Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing (or steering) techniques, which modify the activations of specific model components. Due to their extremely small parameter counts, these methods show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods. The code for the method is released at https://github.com/wenlai-lavine/jola.
在低资源情景中,标准的PEFT方法的效力有限,仅举几百个例子; 最近在可解释性研究方面的进展促使启动编辑(或指导)技术的出现,这些技术改变特定模型组件的启动。由于这些技术的参数数极小,这些方法显示了对小型数据集的希望。然而,这些方法的性能在很大程度上取决于如何确定编辑的正确模块,而且往往缺乏不同数据集之间的稳定性。在本文件中,我们提议联合定位和激活编辑(JoLA),这种方法共同学习(1) 变换器中头头要编辑(2) 干预是否应当添加、倍增、或同时和(3) 干预参数本身——矢量作为添加的抵消或倍增缩到主输出。我们通过对三个基准的评价,跨越了共同思维推理、自然语言理解和自然语言生成,证明JoLA 一贯地超越了现有方法。该方法的代码在https://github.lain/lain-laime.
Article 100
Title@2025-05-29 (4): DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration
Title: DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration | DeepFilterGAN: Ein Full-Band-Real-Time-Speech Enhancement-System mit GAN-basierter stochastischer Regeneration | DeepFilterGAN:全频实时语音增强系统,以GAN为基础进行蒸汽再生 2505.23515v1 |
Authors: Sanberk Serbest, Tijana Stojkovic, Milos Cernak, Andrew Harper
In this work, we propose a full-band real-time speech enhancement system with GAN-based stochastic regeneration. Predictive models focus on estimating the mean of the target distribution, whereas generative models aim to learn the full distribution. This behavior of predictive models may lead to over-suppression, i.e. the removal of speech content. In the literature, it was shown that combining a predictive model with a generative one within the stochastic regeneration framework can reduce the distortion in the output. We use this framework to obtain a real-time speech enhancement system. With 3.58M parameters and a low latency, our system is designed for real-time streaming with a lightweight architecture. Experiments show that our system improves over the first stage in terms of NISQA-MOS metric. Finally, through an ablation study, we show the importance of noisy conditioning in our system. We participated in 2025 Urgent Challenge with our model and later made further improvements.
在这项工作中,我们建议采用基于GAN的随机再生功能全波实时语音增强系统。 预测模型侧重于估算目标分布的平均值, 而基因模型则旨在学习完全分布。 这种预测模型的行为可能导致过度压抑, 即删除语音内容。 在文献中, 文献显示, 将预测模型与随机再生框架内的基因模型相结合, 可以减少产出的扭曲。 我们使用这个框架来获取实时语音增强系统。 由于有3.58M参数和低耐久性, 我们的系统是设计用于使用轻量结构实时流的。 实验显示,我们的系统在第一阶段里在新QA- MOS 衡量标准方面有所改进。 最后, 我们通过模拟研究, 展示了我们系统中噪音调节的重要性。 我们参加了2025年的紧急挑战, 并随后做了进一步的改进。
Article 101
Title@2025-05-29 (4): Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds
Title: Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds | Spektrotemporale Modulation: Effiziente und interpretierbare Feature-Darstellung für die Klassifizierung von Sprach-, Musik- und Umweltgeräuschen | 时速变化:演讲、音乐和环境声音的分类化演讲、音乐和环境声音的高效和可解释的地物代表 2505.23509v1 |
Authors: Andrew Chang, Yike Li, Iran R. Roman, David Poeppel
Audio DNNs have demonstrated impressive performance on various machine listening tasks; however, most of their representations are computationally costly and uninterpretable, leaving room for optimization. Here, we propose a novel approach centered on spectrotemporal modulation (STM) features, a signal processing method that mimics the neurophysiological representation in the human auditory cortex. The classification performance of our STM-based model, without any pretraining, is comparable to that of pretrained audio DNNs across diverse naturalistic speech, music, and environmental sounds, which are essential categories for both human cognition and machine perception. These results show that STM is an efficient and interpretable feature representation for audio classification, advancing the development of machine listening and unlocking exciting new possibilities for basic understanding of speech and auditory sciences, as well as developing audio BCI and cognitive computing.
音频DNN在各种机器监听任务上表现出了令人印象深刻的表现;然而,它们的大部分陈述都是计算成本高且无法解释的,留下优化的空间。 在这里,我们建议了一种以光谱时温特征为核心的新颖方法,一种模仿人类听觉皮层中神经生理特征的信号处理方法。 我们基于STM的模型的分类性能,没有经过任何培训,与经过预先训练的来自各种自然学语言、音乐和环境声音的音频DNN的分类性能相似,而后者是人类认知和机器认知的基本类别。 这些结果表明,STM是一种高效且可解释的音频分类特征代表,推动了机器监听的发展,为基本理解语音和听觉科学以及发展音频 BCI 和认知计算提供了令人兴奋的新的可能性。
Article 102
Title@2025-05-29 (4): Why Machine Learning Models Fail to Fully Capture Epistemic Uncertainty
Title: Why Machine Learning Models Fail to Fully Capture Epistemic Uncertainty | Warum Modelle des maschinellen Lernens die epistemische Unsicherheit nicht vollständig erfassen | 机器学习模型为何不能完全捕捉宇宙的不确定性 2505.23506v1 |
Authors: Sebastián Jiménez, Mira Jürgens, Willem Waegeman
In recent years various supervised learning methods that disentangle aleatoric and epistemic uncertainty based on second-order distributions have been proposed. We argue that these methods fail to capture critical components of epistemic uncertainty, particularly due to the often-neglected component of model bias. To show this, we make use of a more fine-grained taxonomy of epistemic uncertainty sources in machine learning models, and analyse how the classical bias-variance decomposition of the expected prediction error can be decomposed into different parts reflecting these uncertainties. By using a simulation-based evaluation protocol which encompasses epistemic uncertainty due to both procedural- and data-driven uncertainty components, we illustrate that current methods rarely capture the full spectrum of epistemic uncertainty. Through theoretical insights and synthetic experiments, we show that high model bias can lead to misleadingly low estimates of epistemic uncertainty, and common second-order uncertainty quantification methods systematically blur bias-induced errors into aleatoric estimates, thereby underrepresenting epistemic uncertainty. Our findings underscore that meaningful aleatoric estimates are feasible only if all relevant sources of epistemic uncertainty are properly represented.
近些年来,提出了基于二阶分布的分解偏向性和共感性不确定性的各种受监督的学习方法。我们争辩说,这些方法未能捕捉成瘾性不确定性的关键组成部分,特别是由于模型偏向往往被忽略的成分。为了表明这一点,我们使用了一种在机器学习模型中更精细的成瘾性不确定性来源分类法,并分析了如何将预期的预测错误的典型偏差差异分解分解成反映这些不确定性的不同部分。我们通过使用模拟评价协议,包括基于程序和数据不确定性组成部分的共感性不确定性,我们说明目前的方法很少能捕捉成瘾性不确定性的全部范围。我们通过理论见解和合成实验,我们表明,高模型偏差可能导致对成瘾性不确定性的错误估计偏差,以及常见的二阶级不确定性量化方法,系统模糊偏差导致的误差,从而低估了这些不确定性。我们的调查结果强调,只有在所有有关的成瘾来源都得到适当体现的情况下,才可行。
Article 103
Title@2025-05-29 (4): Hijacking Large Language Models via Adversarial In-Context Learning
Title: Hijacking Large Language Models via Adversarial In-Context Learning | Entführen von großen Sprachmodellen über das adversarische In-Context-Lernen | 通过对抗性内书学习劫持大语言模式 2311.09948v3 |
Authors: Xiangyu Zhou, Yao Qiang, Saleh Zare Zade, Prashant Khanduri, Dongxiao Zhu
In-context learning (ICL) has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations (demos) in the preconditioned prompts. Despite its promising performance, crafted adversarial attacks pose a notable threat to the robustness of LLMs. Existing attacks are either easy to detect, require a trigger in user input, or lack specificity towards ICL. To address these issues, this work introduces a novel transferable prompt injection attack against ICL, aiming to hijack LLMs to generate the target output or elicit harmful responses. In our threat model, the hacker acts as a model publisher who leverages a gradient-based prompt search method to learn and append imperceptible adversarial suffixes to the in-context demos via prompt injection. We also propose effective defense strategies using a few shots of clean demos, enhancing the robustness of LLMs during ICL. Extensive experimental results across various classification and jailbreak tasks demonstrate the effectiveness of the proposed attack and defense strategies. This work highlights the significant security vulnerabilities of LLMs during ICL and underscores the need for further in-depth studies.
理论内学(ICL)已成为一种强有力的范例,通过在先决条件的提示下,利用标记的例子作为示范(演示),利用LLM执行具体的下游任务,使LLM发挥杠杆作用。尽管其表现令人充满希望,但精心策划的对抗性攻击对LLM的强健性构成了显著的威胁。现有的攻击要么容易发现,需要用户投入,或者对ICL缺乏具体性。为解决这些问题,这项工作引入了针对ICL的新型可转移的迅速注射攻击,目的是劫持LLMS,以产生目标产出或引起有害反应。在我们的威胁模式中,黑客充当了利用基于梯度的快速搜索方法学习和通过迅速注射将无法察觉的对抗性后遗症附在文本内演示中的一种示范出版商。我们还提出使用几支干净的演示镜头的有效防御战略,加强LLMs在ICL期间的强健性。各种分类和破监狱任务的广泛实验结果表明拟议的攻击和防御战略的有效性。这项工作突出了LMS在ICL期间的重大安全脆弱性,并强调需要进一步深入研究。
Article 104
Title@2025-05-29 (4): Epistemic Errors of Imperfect Multitask Learners When Distributions Shift
Title: Epistemic Errors of Imperfect Multitask Learners When Distributions Shift | Epistemische Fehler von unvollkommenen Multitask Learner bei Verteilungsverschiebungen | 发行转移时不完美的多任务学习者 2505.23496v1 |
Authors: Sabina J. Sloman, Michele Caprio, Samuel Kaski
When data are noisy, a statistical learner’s goal is to resolve epistemic uncertainty about the data it will encounter at test-time, i.e., to identify the distribution of test (target) data. Many real-world learning settings introduce sources of epistemic uncertainty that can not be resolved on the basis of training (source) data alone: The source data may arise from multiple tasks (multitask learning), the target data may differ systematically from the source data tasks (distribution shift), and/or the learner may not arrive at an accurate characterization of the source data (imperfect learning). We introduce a principled definition of epistemic error, and provide a generic, decompositional epistemic error bound. Our error bound is the first to (i) consider epistemic error specifically, (ii) accommodate all the sources of epistemic uncertainty above, and (iii) separately attribute the error to each of multiple aspects of the learning procedure and environment. As corollaries of the generic result, we provide (i) epistemic error bounds specialized to the settings of Bayesian transfer learning and distribution shift within $\epsilon$-neighborhoods, and (ii) a set of corresponding generalization bounds. Finally, we provide a novel definition of negative transfer, and validate its insights in a synthetic experimental setting.
当数据过于吵闹时,统计学习者的目标是解决测试时将遇到的数据的隐含不确定性,即确定测试(目标)数据的分布。许多真实世界学习设置引入仅靠培训(源)数据无法解决的隐含不确定性来源:源数据可能来自多重任务(多任务学习),目标数据可能与源数据任务(分布转移)有系统差异,和/或学习者可能无法准确描述源数据(不完善学习)。我们引入了缩略错误的原则定义,并提供了一种通用的、异相的缩略出错误。我们的错误是(一) 具体考虑缩略出错误,(二) 容纳上述所有隐含不确定性来源,以及(三) 将错误与学习程序和环境的多个方面分别归结。作为一般结果的缩略图,我们提供了(一) 缩略图错误与Bayesian的负值设置、 合成转移和最终的缩略图的缩略图(一) 提供我们Gayesian的缩略图, 的缩略图的缩图的缩略图。
Article 105
Title@2025-05-29 (4): Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking
Title: Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking | Diagnose und Bewältigung von Pitfalls in KG-RAG-Datensätzen: Zu zuverlässigerem Benchmarking | 分析和处理KG-RAG数据集的缺陷:争取更可靠的基准 2505.23495v1 |
Authors: Liangliang Zhang, Zhuorui Jiang, Hongliang Chi, Haoyang Chen, Mohammed Elkoumy, Fali Wang, Qiong Wu, Zhengyi Zhou, Shirui Pan, Suhang Wang, Yao Ma
Knowledge Graph Question Answering (KGQA) systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning. However, despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues, including inaccurate or incomplete ground-truth annotations, poorly constructed questions that are ambiguous, trivial, or unanswerable, and outdated or inconsistent knowledge. Through a manual audit of 16 popular KGQA datasets, including WebQSP and CWQ, we find that the average factual correctness rate is only 57 %. To address these issues, we introduce KGQAGen, an LLM-in-the-loop framework that systematically resolves these pitfalls. KGQAGen combines structured knowledge grounding, LLM-guided generation, and symbolic verification to produce challenging and verifiable QA instances. Using KGQAGen, we construct KGQAGen-10k, a ten-thousand scale benchmark grounded in Wikidata, and evaluate a diverse set of KG-RAG models. Experimental results demonstrate that even state-of-the-art systems struggle on this benchmark, highlighting its ability to expose limitations of existing models. Our findings advocate for more rigorous benchmark construction and position KGQAGen as a scalable framework for advancing KGQA evaluation.
知识图表解答系统(KGQA)依靠高质量的基准来评价复杂的多点推理,然而,尽管这些系统得到广泛使用,但广受欢迎的数据集,如WebQSP和CWQ等,却存在关键性的质量问题,包括不准确或不完整的地面真相说明、结构不当的问题模糊、微不足道或无法回答、过时或不一致。通过对16个广受欢迎的KGQA数据集,包括WebQSP和CWQ进行人工审计,我们发现平均事实正确率仅为57 % 。为了解决这些问题,我们引入了KGQAGen,这是一个系统地解决这些陷阱的LLM-loop框架。KGAG将结构化的知识基础、LLM-指导的生成和象征性的核查结合起来,以产生具有挑战性和可核查的QA实例。我们用KQGAG建立以维基数据为基础的十倍和规模基准基准,并评价一套不同的KG-RAG模型。实验结果显示,甚至将KGG的更严格的能力定位定位定位到KGA的模型。
Article 106
Title@2025-05-29 (4): Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
Title: Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Kurzbefehle in audio-visuellen Deepfake-Erkennungsdatensätzen mit unüberwachtem Lernen | 在未经监督的学习的视听深假发现数据集中绕过捷径 2412.00175v3 |
Authors: Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata, Elisabeta Oneata
Good datasets are essential for developing and benchmarking any machine learning system. Their importance is even more extreme for safety critical applications such as deepfake detection - the focus of this paper. Here we reveal that two of the most widely used audio-video deepfake datasets suffer from a previously unidentified spurious feature: the leading silence. Fake videos start with a very brief moment of silence and based on this feature alone, we can separate the real and fake samples almost perfectly. As such, previous audio-only and audio-video models exploit the presence of silence in the fake videos and consequently perform worse when the leading silence is removed. To circumvent latching on such unwanted artifact and possibly other unrevealed ones we propose a shift from supervised to unsupervised learning by training models exclusively on real data. We show that by aligning self-supervised audio-video representations we remove the risk of relying on dataset-specific biases and improve robustness in deepfake detection.
良好的数据集对于开发和设定任何机器学习系统的基准至关重要。 它们的重要性对于安全关键应用来说甚至更为极端, 比如深假检测 — — 本文的焦点。 我们在这里揭示了两个最广泛使用的音像深假数据集存在一个先前不明的虚假特征: 领先的沉默。 假的视频从一个非常短暂的沉默开始, 仅以这一特征为基础, 我们就可以完美地分离真实和假的样本。 因此, 以前的只听音和视频模型利用了假的视频中的沉默, 从而在消除主要沉默时表现得更差。 为了绕过对此类不需要的文物和其他可能未销毁的文物的悬念, 我们提议从仅靠真实数据的培训模型进行监管的学习转向不受监督的学习。 我们表明,通过调整自我监控的视频演示,我们可以消除依赖数据集特定偏差的风险, 并改进深底片探测的稳健性。
Article 107
Title@2025-05-29 (4): A False Discovery Rate Control Method Using a Fully Connected Hidden Markov Random Field for Neuroimaging Data
Title: A False Discovery Rate Control Method Using a Fully Connected Hidden Markov Random Field for Neuroimaging Data | Eine falsche Discovery Rate Control-Methode mit einem vollständig verbundenen versteckten Markov Random Field für Neuroimaging-Daten | 假发现率控制方法, 使用完全连接的隐藏 Markov 随机字段来生成 Neuroimage 数据 2505.20688v2 |
Authors: Taehyo Kim, Qiran Jia, Mony J. de Leon, Hai Shu
False discovery rate (FDR) control methods are essential for voxel-wise multiple testing in neuroimaging data analysis, where hundreds of thousands or even millions of tests are conducted to detect brain regions associated with disease-related changes. Classical FDR control methods (e.g., BH, q-value, and LocalFDR) assume independence among tests and often lead to high false non-discovery rates (FNR). Although various spatial FDR control methods have been developed to improve power, they still fall short of jointly addressing three major challenges in neuroimaging applications: capturing complex spatial dependencies, maintaining low variability in both false discovery proportion (FDP) and false non-discovery proportion (FNP) across replications, and achieving computational scalability for high-resolution data. To address these challenges, we propose fcHMRF-LIS, a powerful, stable, and scalable spatial FDR control method for voxel-wise multiple testing. It integrates the local index of significance (LIS)-based testing procedure with a novel fully connected hidden Markov random field (fcHMRF) designed to model complex spatial structures using a parsimonious parameterization. We develop an efficient expectation-maximization algorithm incorporating mean-field approximation, the Conditional Random Fields as Recurrent Neural Networks (CRF-RNN) technique, and permutohedral lattice filtering, reducing the time complexity from quadratic to linear in the number of tests. Extensive simulations demonstrate that fcHMRF-LIS achieves accurate FDR control, lower FNR, reduced variability in FDP and FNP, and a higher number of true positives compared to existing methods. Applied to an FDG-PET dataset from the Alzheimer’s Disease Neuroimaging Initiative, fcHMRF-LIS identifies neurobiologically relevant brain regions and offers notable advantages in computational efficiency.
假发现率( FDR) 控制方法对于神经成像数据分析中以xoxel 方式进行多重测试至关重要。 在神经成像数据分析中,要检测与疾病相关变化有关的大脑区域,就必须进行数十万甚至数百万次的测试。经典FDR控制方法(如BH、q-价值和地方FDR)在测试中具有独立性,并常常导致高虚假的非发现率(FNR )。虽然已经开发了各种空间FDR控制方法来提高功率,但它们仍然不能共同应对神经成像应用中的三大挑战:获取复杂的空间依赖性,在错误发现比例(FDP)和虚假非发现比例(FNP)之间保持低变异性;实现高清晰度的计算率(FDRF-RF) 测试。我们建议FCRFRF-RD控制方法的当前正值、稳定且可缩缩放的FDRFRF-RFRDR 数据控制方法,它将本地重要指数(LIS) 测试程序与新完全连接的隐蔽随机字段随机字段随机字段(fMRIS) 和直径(fMRFRFRFRR) 的直径直径变变,将一个高的智能数据结构的智能数据结构的智能数据结构的模型的模型的模型,它能能的模型的模型的快速变现显示的模型的模型,它能能能能的模型的模型的模型的模型的模型显示的模型的模型结构变现变现的模型结构。
Article 108
Title@2025-05-29 (4): Learning to Poison Large Language Models for Downstream Manipulation
Title: Learning to Poison Large Language Models for Downstream Manipulation | Große Sprachmodelle für Downstream-Manipulation zu vergiften | 学习下游操作毒物大语言模式 2402.13459v3 |
Authors: Xiangyu Zhou, Yao Qiang, Saleh Zare Zade, Mohammad Amin Roshani, Prashant Khanduri, Douglas Zytko, Dongxiao Zhu
The advent of Large Language Models (LLMs) has marked significant achievements in language processing and reasoning capabilities. Despite their advancements, LLMs face vulnerabilities to data poisoning attacks, where the adversary inserts backdoor triggers into training data to manipulate outputs. This work further identifies additional security risks in LLMs by designing a new data poisoning attack tailored to exploit the supervised fine-tuning (SFT) process. We propose a novel gradient-guided backdoor trigger learning (GBTL) algorithm to identify adversarial triggers efficiently, ensuring an evasion of detection by conventional defenses while maintaining content integrity. Through experimental validation across various language model tasks, including sentiment analysis, domain generation, and question answering, our poisoning strategy demonstrates a high success rate in compromising various LLMs’ outputs. We further propose two defense strategies against data poisoning attacks, including in-context learning (ICL) and continuous learning (CL), which effectively rectify the behavior of LLMs and significantly reduce the decline in performance. Our work highlights the significant security risks present during SFT of LLMs and the necessity of safeguarding LLMs against data poisoning attacks.
大语言模型(LLMS)的出现在语言处理和推理能力方面取得了显著成就。尽管取得了进步,LLMS面临数据中毒袭击的脆弱性,因为对手将后门触发器插入培训数据以操纵产出。这项工作进一步确定了LMS的额外安全风险,为此设计了新的数据中毒袭击,专门利用监管的微调(SFT)程序。我们提出了一个新的梯度引导后门触发学习算法,以有效识别对抗性触发器,确保常规防御在保持内容完整性的同时逃避发现。通过对各种语言模型任务(包括情绪分析、域生成和回答问题)的实验性验证,我们的中毒战略在损害LMS的各种产出方面表现出高成功率。我们进一步提出了两种防范数据中毒袭击的防御战略,包括文体内学习(ICL)和持续学习(CLF),以有效纠正LMs的行为并显著降低性能下降。我们的工作突出了SFTM公司在维持内容完整性方面所面临的重大安全风险,以及保护LMS公司免受数据中毒袭击的必要性。
Article 109
Title@2025-05-29 (4): SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training
Title: SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training | SGD als Freie Energie Minimierung: Ein thermodynamischer Blick auf neurales Netzwerktraining | SGD作为自由能源最小化:关于神经网络培训的热动力学观点 2505.23489v1 |
Authors: Ildus Sadrtdinov, Ivan Klimov, Ekaterina Lobacheva, Dmitry Vetrov
We present a thermodynamic interpretation of the stationary behavior of stochastic gradient descent (SGD) under fixed learning rates (LRs) in neural network training. We show that SGD implicitly minimizes a free energy function $F=U-TS$, balancing training loss $U$ and the entropy of the weights distribution $S$, with temperature $T$ determined by the LR. This perspective offers a new lens on why high LRs prevent training from converging to the loss minima and how different LRs lead to stabilization at different loss levels. We empirically validate the free energy framework on both underparameterized (UP) and overparameterized (OP) models. UP models consistently follow free energy minimization, with temperature increasing monotonically with LR, while for OP models, the temperature effectively drops to zero at low LRs, causing SGD to minimize the loss directly and converge to an optimum. We attribute this mismatch to differences in the signal-to-noise ratio of stochastic gradients near optima, supported by both a toy example and neural network experiments.
我们对神经网络培训中固定学习率(LRs)下的悬浮梯度下降(SGD)的固定行为进行热力学解释。我们表明,SGD隐含地最大限度地减少免费能源功能$F=U-TS$,平衡培训损失美元和重量分布的酶值S$的平衡,温度由LR确定。这个角度提供了一个新的透视点,说明高LD为何阻止培训与损失迷你相融合,以及不同LD如何在不同损失水平上稳定。我们从经验中验证了测量不足(UP)和超分度(OP)模型的免费能源框架。UPD模型始终遵循免费能源最小化,温度与LR单调,而对于OP模型来说,温度在低LR值时实际下降到零,使SGD直接将损失降至最小,并趋于最佳。我们将此错配对称,这归因于在选择系统附近的信号到噪音梯度梯度梯度梯度比率的差异,同时得到一个实例和神经网络实验的支持。
Article 110
Title@2025-05-29 (4): Federated Granger Causality Learning for Interdependent Clients with State Space Representation
Title: Federated Granger Causality Learning for Interdependent Clients with State Space Representation | Föderiertes Granger-Causality-Lernen für interdependente Kunden mit staatlicher Raumdarstellung | 为具有国家空间代表制的相互依存客户提供 2501.13890v4 |
Authors: Ayush Mohanty, Nazal Mohamed, Paritosh Ramanan, Nagi Gebraeel
Advanced sensors and IoT devices have improved the monitoring and control of complex industrial enterprises. They have also created an interdependent fabric of geographically distributed process operations (clients) across these enterprises. Granger causality is an effective approach to detect and quantify interdependencies by examining how one client’s state affects others over time. Understanding these interdependencies captures how localized events, such as faults and disruptions, can propagate throughout the system, possibly causing widespread operational impacts. However, the large volume and complexity of industrial data pose challenges in modeling these interdependencies. This paper develops a federated approach to learning Granger causality. We utilize a linear state space system framework that leverages low-dimensional state estimates to analyze interdependencies. This addresses bandwidth limitations and the computational burden commonly associated with centralized data processing. We propose augmenting the client models with the Granger causality information learned by the server through a Machine Learning (ML) function. We examine the co-dependence between the augmented client and server models and reformulate the framework as a standalone ML algorithm providing conditions for its sublinear and linear convergence rates. We also study the convergence of the framework to a centralized oracle model. Moreover, we include a differential privacy analysis to ensure data security while preserving causal insights. Using synthetic data, we conduct comprehensive experiments to demonstrate the robustness of our approach to perturbations in causality, the scalability to the size of communication, number of clients, and the dimensions of raw data. We also evaluate the performance on two real-world industrial control system datasets by reporting the volume of data saved by decentralization.
先进的传感器和IoT装置改善了对复杂工业企业的监测和控制,它们也形成了这些企业之间地理分布的流程操作(客户)的相互依存结构; 危险的因果关系是一种有效的方法,通过审查一个客户的状态如何长期影响他人来检测和量化相互依存关系; 理解这些相互依存关系可以捕捉整个系统如何传播本地事件,例如故障和干扰,从而可能造成广泛的业务影响; 然而,工业数据的数量和复杂性在模拟这些相互依存关系方面构成了挑战。 本文开发了一种在地理上分布的流程操作(客户)的相互依存结构; 我们利用一个直线式的产业空间系统框架,利用低维度状态估计数分析相互依存关系。 这解决了带宽限制和计算负担,通常与集中数据处理相联系。 我们建议利用服务器通过机器学习(ML)功能学习(ML)所学所学的 “ 重大因果关系 “ 信息来增强客户模式和服务器模型之间的相互依存关系,并将框架重新配置为独立的ML算法,为其次线性和线性因果关系提供了条件。 我们还利用一个在线性状态估算数据整合率进行在线性评估。 我们还通过使用一个数据分析系统来进行数据整合数据整合数据分析,我们用一个数据分析, 将数据质量分析。
Article 111
Title@2025-05-29 (4): TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning
Title: TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning | TimePoint: Beschleunigte Zeitreihenausrichtung über selbstüberwachtes Keypoint- und Descriptor-Lernen | 时间点:通过自上调关键点和描述学习加速时间序列调整 2505.23475v1 |
Authors: Ron Shapira Weber, Shahar Ben Ishay, Andrey Lavrinenko, Shahaf E. Finder, Oren Freifeld
Fast and scalable alignment of time series is a fundamental challenge in many domains. The standard solution, Dynamic Time Warping (DTW), struggles with poor scalability and sensitivity to noise. We introduce TimePoint, a self-supervised method that dramatically accelerates DTW-based alignment while typically improving alignment accuracy by learning keypoints and descriptors from synthetic data. Inspired by 2D keypoint detection but carefully adapted to the unique challenges of 1D signals, TimePoint leverages efficient 1D diffeomorphisms, which effectively model nonlinear time warping, to generate realistic training data. This approach, along with fully convolutional and wavelet convolutional architectures, enables the extraction of informative keypoints and descriptors. Applying DTW to these sparse representations yield major speedups and typically higher alignment accuracy than standard DTW applied to the full signals. TimePoint demonstrates strong generalization to real-world time series when trained solely on synthetic data, and further improves with fine-tuning on real data. Extensive experiments demonstrate that TimePoint consistently achieves faster and more accurate alignments than standard DTW, making it a scalable solution for time-series analysis. Our code is available at https://github.com/BGU-CS-VIL/TimePoint
在许多领域,时间序列的快速和可伸缩一致是一个根本性的挑战。标准解决方案,即动态时间扭曲(DTW),与不易缩放和对噪音的敏感度进行斗争。我们引入了时间定位(TimePoint),这是一个自我监督的方法,它大大加快了DTW的调整速度,同时通过学习合成数据的关键点和描述器来提高调整准确性。受2D关键点探测的启发,但经过仔细调整,以适应1D信号的独特挑战。TimePoint利用高效的1D二变形(它们有效地模拟非线性时间扭曲)来生成现实的培训数据。这个方法,加上全面进化和波盘旋的组合结构,使得能够提取信息化关键点和描述器。将DTW应用于这些稀疏少的表示方式可以产生重大加速,通常比对完整信号应用的标准DTW更精确。时间序列显示在仅接受合成数据培训时对现实世界时间序列进行强烈的概括化,并且通过对真实数据进行微调进一步改进。广泛的实验表明,Timerpoint 能够持续实现比标准的更快和更加精确的校正的校准。
Article 112
Title@2025-05-29 (4): Refining Labeling Functions with Limited Labeled Data
Title: Refining Labeling Functions with Limited Labeled Data | Verfeinerung von Beschriftungsfunktionen mit begrenzten beschrifteten Daten | 用有限标签数据改进标签功能 2505.23470v1 |
Authors: Chenjie Li, Amir Gilad, Boris Glavic, Zhengjie Miao, Sudeepa Roy
Programmatic weak supervision (PWS) significantly reduces human effort for labeling data by combining the outputs of user-provided labeling functions (LFs) on unlabeled datapoints. However, the quality of the generated labels depends directly on the accuracy of the LFs. In this work, we study the problem of fixing LFs based on a small set of labeled examples. Towards this goal, we develop novel techniques for repairing a set of LFs by minimally changing their results on the labeled examples such that the fixed LFs ensure that (i) there is sufficient evidence for the correct label of each labeled datapoint and (ii) the accuracy of each repaired LF is sufficiently high. We model LFs as conditional rules which enables us to refine them, i.e., to selectively change their output for some inputs. We demonstrate experimentally that our system improves the quality of LFs based on surprisingly small sets of labeled datapoints.
方案薄弱监督(PWS)通过将用户提供的标签功能(LF)的产出与未贴标签的数据点结合起来,大大减少了人类在标签数据上的努力。然而,产生的标签的质量直接取决于LF的准确性。在这项工作中,我们根据一小组贴标签的例子来研究确定LF的问题。为实现这一目标,我们开发了修复一组LF的新技术,在标签例子上尽可能地改变其结果,以确保(一) 有足够的证据证明每个标签数据点的正确标签;(二) 每一所修理的LF的准确性足够高。我们将LF作为有条件规则,使我们能够改进这些规则,即有选择地改变其产出,用于某些投入。我们实验性地表明,我们的系统根据惊人的小组标签数据点改进了LF的质量。
Article 113
Title@2025-05-29 (4): Surveying the space of descriptions of a composite system with machine learning
Title: Surveying the space of descriptions of a composite system with machine learning | Vermessung des Raumes der Beschreibungen eines Verbundsystems mit maschinellem Lernen | 勘查机器学习综合系统说明的空间 2411.18579v2 |
Authors: Kieran A. Murphy, Yujing Zhang, Dani S. Bassett
Multivariate information theory provides a general and principled framework for understanding how the components of a complex system are connected. Existing analyses are coarse in nature – built up from characterizations of discrete subsystems – and can be computationally prohibitive. In this work, we propose to study the continuous space of possible descriptions of a composite system as a window into its organizational structure. A description consists of specific information conveyed about each of the components, and the space of possible descriptions is equivalent to the space of lossy compression schemes of the components. We introduce a machine learning framework to optimize descriptions that extremize key information theoretic quantities used to characterize organization, such as total correlation and O-information. Through case studies on spin systems, sudoku boards, and letter sequences from natural language, we identify extremal descriptions that reveal how system-wide variation emerges from individual components. By integrating machine learning into a fine-grained information theoretic analysis of composite random variables, our framework opens a new avenues for probing the structure of real-world complex systems.
多变量信息理论为理解复杂系统的各个组成部分是如何连接起来提供了一个一般性和原则性的框架。现有的分析性质粗糙,是由离散子子系统的特征所建立,而且可能无法进行计算。在这项工作中,我们提议研究合成系统作为窗口进入其组织结构的可能描述的连续空间。描述包含就每个组成部分传递的具体信息,而可能的描述空间相当于各组成部分损失压缩计划的空间。我们引入了一个机器学习框架,优化描述,将用于描述组织特征的关键信息理论量(如总相关性和O-信息)加以扩展。通过对旋转系统、苏杜库板和自然语言字母序列的案例研究,我们确定了显示各个组成部分如何出现全系统差异的极端描述。通过将机器学习合成随机变量的精细信息理论分析,我们的框架为验证现实世界复杂系统的结构开辟了新的途径。
Article 114
Title@2025-05-29 (4): Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models
Title: Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models | Retrieval Visuelle Kontrastive Dekodierung zu Mitigate-Objekt-Halluzinationen in großen Vision-Sprachen-Modellen | 在大型视觉-语言模型中,将检索视觉对抗性脱钩作为稀释物体幻觉的大型视觉-语言模型 2505.20569v2 |
Authors: Jihoon Lee, Min Song
Despite significant advancements in Large Vision-Language Models, Object Hallucination (OH) remains a persistent challenge. Building upon prior studies on contrastive decoding that address this issue without requiring additional model training, we introduce RVCD (Retrieval Visual Contrastive Decoding), an advanced method to suppress OH. RVCD leverages both negative and positive images at the logit level, explicitly referencing AI-generated images designed to represent a single concept. Our approach demonstrates substantial improvements over existing decoding-based methods.
尽管大型视觉语言模型(OH)取得了显著进步,但目标幻化(OH)仍然是一个长期挑战。我们以前曾研究过反比解码方法,在不需要额外示范培训的情况下解决这一问题。我们引入了RVCD(RVCD),这是抑制OH的先进方法。RVCD在登录层面利用了负面和正面图像,明确引用了AI生成的图像,目的是代表一个单一的概念。我们的方法表明,与现有的解码方法相比,有了很大的改进。
Article 115
Title@2025-05-29 (4): A Tutorial on Meta-Reinforcement Learning
Title: A Tutorial on Meta-Reinforcement Learning | Ein Tutorial zum Meta-Reinforcement-Lernen | 关于元加强学习的教学材料 2301.08028v4 |
Authors: Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson
While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible. In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
虽然深层强化学习(RL)在机器学习方面催生了众多引人注目的成功,但是由于广泛采用这种学习方法,其数据效率往往很差,而且其所产生政策的普遍性有限,因此受到阻碍。一个有希望的减轻这些限制的方法是,将更好的RL算法发展成一个机器学习问题本身,在称为Met-RL的进程中,这是一种机器学习问题本身。Meta-RL最常在一种问题环境下进行研究,因为考虑到任务的分配,我们的目标是学习一种能够适应任务分配所产生的任何新任务的政策,尽可能少有数据。我们在这次调查中详细描述了元-RL问题设置及其主要变异。我们讨论如何在高层次上根据任务分配和每项任务可用的学习预算进行元-RL研究。我们利用这些组,然后调查元-RL的算法和应用。我们最后通过介绍在为深层RL执业者提供标准工具箱的元-RL部分的道路上的公开问题。
Article 116
Title@2025-05-29 (4): Agentic Knowledgeable Self-awareness
Title: Agentic Knowledgeable Self-awareness | Agentisch sachkundiges Selbstbewußtsein | A. 动态知识自觉意识 2504.03553v2 |
Authors: Shuofei Qiao, Zhisong Qiu, Baochang Ren, Xiaobin Wang, Xiangyuan Ru, Ningyu Zhang, Xiang Chen, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a “flood irrigation” methodology that indiscriminately injects gold trajectories, external feedback, and domain knowledge into agent models. This practice overlooks the fundamental human cognitive principle of situational self-awareness during decision-making-the ability to dynamically assess situational demands and strategically employ resources during decision-making. We propose agentic knowledgeable self-awareness to address this gap, a novel paradigm enabling LLM-based agents to autonomously regulate knowledge utilization. Specifically, we propose KnowSelf, a data-centric approach that applies agents with knowledgeable self-awareness like humans. Concretely, we devise a heuristic situation judgement criterion to mark special tokens on the agent’s self-explored trajectories for collecting training data. Through a two-stage training process, the agent model can switch between different situations by generating specific special tokens, achieving optimal planning effects with minimal costs. Our experiments demonstrate that KnowSelf can outperform various strong baselines on different tasks and models with minimal use of external knowledge. Code is available at https://github.com/zjunlp/KnowSelf.
大型语言模型(LLMS)在各种代理规划任务中取得了相当大的成绩,然而,传统代理规划方法采用了一种“洪水灌溉”方法,不加区别地将金轨、外部反馈和领域知识注入代理模型中,这种做法忽略了在决策过程中对情况自我认识的基本人类认知原则,即动态地评估形势需求和在决策中战略性地利用资源的能力。我们提出了一种具有代理知识的自我意识来解决这一差距的新模式,使以LLM为基础的代理能够自主地规范知识的利用。具体地说,我们提出了一种以数据为中心的方法,将具有了解情况的自我认识的代理人应用到像人类那样有知识的自我意识的代理人。具体地说,我们设计了一种超常状况判断标准,以标志该代理人收集培训数据的自我探索轨迹的特殊标志。通过两阶段的培训过程,该代理模型可以在不同情况之间转换,产生特定的特殊标志,以最低的成本实现最佳的规划效果。我们的实验表明,“了解自我”可以超越不同任务和模型上的各种强的基线,而很少使用外部知识。《准则》可在 https://gimb/commus.
Article 117
Title@2025-05-29 (4): Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning
Title: Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning | Pessimismus-Prinzip kann wirksam sein: Auf dem Weg zu einem Rahmen für Null-Shot-Transfer-Verstärkungs-Lernen | 悲观主义原则可以有效:建立一个零热转移强化学习框架 2505.18447v2 |
Authors: Chi Zhang, Ziying Jia, George K. Atia, Sihong He, Yue Wang
Transfer reinforcement learning aims to derive a near-optimal policy for a target environment with limited data by leveraging abundant data from related source domains. However, it faces two key challenges: the lack of performance guarantees for the transferred policy, which can lead to undesired actions, and the risk of negative transfer when multiple source domains are involved. We propose a novel framework based on the pessimism principle, which constructs and optimizes a conservative estimation of the target domain’s performance. Our framework effectively addresses the two challenges by providing an optimized lower bound on target performance, ensuring safe and reliable decisions, and by exhibiting monotonic improvement with respect to the quality of the source domains, thereby avoiding negative transfer. We construct two types of conservative estimations, rigorously characterize their effectiveness, and develop efficient distributed algorithms with convergence guarantees. Our framework provides a theoretically sound and practically robust solution for transfer learning in reinforcement learning.
加强转让学习的目的是,通过利用相关来源领域的大量数据,为数据有限的目标环境制定接近最佳的政策,但面临两个主要挑战:对转移的政策缺乏业绩保障,这可能导致不理想的行动,以及在涉及多个来源领域时出现负转移的风险。我们提议基于悲观原则的新框架,该新框架构建并优化对目标领域绩效的保守估计。我们的框架有效地应对了两个挑战,对目标业绩提供了最优化的较低约束,确保了安全可靠的决定,并展示了源领域质量方面的单一改进,从而避免了负面转移。我们构建了两种保守的估算,严格地描述其有效性,并制定了具有趋同保证的高效分布算法。我们的框架为在强化学习中转让学习提供了一种理论上健全和切实可靠的解决方案。
Article 118
Title@2025-05-29 (4): LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Title: LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection | LENSLLM: Enthüllen von Feintuning-Dynamik für die LLM-Auswahl | LENSLLLM: 用于选择LLM的连续精细调整动态 2505.03793v2 |
Authors: Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou
The proliferation of open-sourced Large Language Models (LLMs) and diverse downstream tasks necessitates efficient model selection, given the impracticality of fine-tuning all candidates due to computational constraints. Despite the recent advances in LLM selection, a fundamental research question largely remains nascent: how can we model the dynamic behaviors of LLMs during fine-tuning, thereby enhancing our understanding of their generalization performance across diverse downstream tasks? In this work, we propose a novel theoretical framework that provides a proper lens to assess the generalization capabilities of LLMs, thereby enabling accurate and efficient LLM selection for downstream applications. In particular, we first derive a PAC-Bayesian Generalization Bound that unveils fine-tuning dynamics of LLMs and then introduce LENSLLM, a Neural Tangent Kernel (NTK)-based Rectified Scaling Model that enables accurate performance predictions across diverse tasks while maintaining computational efficiency. Extensive empirical results on 3 large-scale benchmarks demonstrate that our model achieves up to 91.1% accuracy and reduces up to 88.5% computational cost in LLM selection, outperforming 5 state-of-the-art methods. We open-source our proposed LENSLLM model and corresponding results at LensLLM.io.
开放源码大语言模型(LLMS)和多种下游任务的扩散要求高效的模型选择,因为由于计算限制,对所有候选人进行微调不切实际。尽管LLM的选择最近有所进展,但一个基本研究问题基本上仍然新生:我们如何在微调中模拟LLM的动态行为,从而增进我们对不同下游任务一般表现的理解?在这项工作中,我们提出了一个新的理论框架,为评估LLMS的普及能力提供一个适当的透镜,从而能够准确和高效地选择下游应用的LLM。特别是,我们首先得出一个PAC-Bayesian通用圈,它揭示LLMS的微调动态,然后采用以NENSLLM(NTK)为基础的重新定位模型,它能够准确预测不同任务的业绩,同时保持计算效率。关于3个大规模基准的广泛经验结果显示,我们的模型在LM选择中达到91.1%的准确度,并将计算成本降低到88.5%。我们提议的LENSLLLLLLLA和相应结果。
Article 119
Title@2025-05-29 (4): Broadband Ground Motion Synthesis by Diffusion Model with Minimal Condition
Title: Broadband Ground Motion Synthesis by Diffusion Model with Minimal Condition | Broadband Ground Motion Synthese durch Diffusion Modell mit minimalem Zustand | 以最小条件传播模型进行宽带地面移动合成 2412.17333v2 |
Authors: Jaeheun Jung, Jaehyuk Lee, Changhae Jung, Hanyoung Kim, Bosung Jung, Donghun Lee
Shock waves caused by earthquakes can be devastating. Generating realistic earthquake-caused ground motion waveforms help reducing losses in lives and properties, yet generative models for the task tend to generate subpar waveforms. We present High-fidelity Earthquake Groundmotion Generation System (HEGGS) and demonstrate its superior performance using earthquakes from North American, East Asian, and European regions. HEGGS exploits the intrinsic characteristics of earthquake dataset and learns the waveforms using an end-to-end differentiable generator containing conditional latent diffusion model and hi-fidelity waveform construction model. We show the learning efficiency of HEGGS by training it on a single GPU machine and validate its performance using earthquake databases from North America, East Asia, and Europe, using diverse criteria from waveform generation tasks and seismology. Once trained, HEGGS can generate three dimensional E-N-Z seismic waveforms with accurate P/S phase arrivals, envelope correlation, signal-to-noise ratio, GMPE analysis, frequency content analysis, and section plot analysis.
地震引发的冲击波可能是毁灭性的。 产生现实的地震引发的地面运动波形有助于减少生命和财产损失,但这项任务的基因模型往往产生亚波形。 我们展示了高虚震地震地面变化系统(HEGGS ) , 并用来自北美、东亚和欧洲区域的地震来展示其优异性。 高震地震数据组利用地震数据集的内在特征,并利用含有有条件潜伏扩散模型和高频波形构建模型的端到端不同发电机来学习波形。 我们用来自北美、东亚和欧洲的地震数据库对高频地震震震动产生系统进行培训,并使用波形生成任务和地震学的不同标准来验证其性能,我们展示了高频GGS的学习效率。 高频系统一旦经过培训,可以生成三维E-N-Z地震波状,并配有准确的P/S阶段到达、信封连接、信号到噪音比率、GPEPE分析、频率内容分析和部分绘图分析。
Article 120
Title@2025-05-29 (4): On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
Title: On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment | Globale Konvergenzraten für Föderierten politischen Gradienten unter heterogener Umwelt | 关于不同不同环境下联邦政策分级制全球趋同率的全球趋同率 2505.23459v1 |
Authors: Safwan Labbi, Paul Mangold, Daniil Tiapkin, Eric Moulines
Ensuring convergence of policy gradient methods in federated reinforcement learning (FRL) under environment heterogeneity remains a major challenge. In this work, we first establish that heterogeneity, perhaps counter-intuitively, can necessitate optimal policies to be non-deterministic or even time-varying, even in tabular environments. Subsequently, we prove global convergence results for federated policy gradient (FedPG) algorithms employing local updates, under a {\L}ojasiewicz condition that holds only for each individual agent, in both entropy-regularized and non-regularized scenarios. Crucially, our theoretical analysis shows that FedPG attains linear speed-up with respect to the number of agents, a property central to efficient federated learning. Leveraging insights from our theoretical findings, we introduce b-RS-FedPG, a novel policy gradient method that employs a carefully constructed softmax-inspired parameterization coupled with an appropriate regularization scheme. We further demonstrate explicit convergence rates for b-RS-FedPG toward near-optimal stationary policies. Finally, we demonstrate that empirically both FedPG and b-RS-FedPG consistently outperform federated Q-learning on heterogeneous settings.
在这种工作中,我们首先确定,即使在表格环境中,差异性或许是反直觉的,可以要求最佳政策非决定性甚至时间变化性的最佳政策,即使在表格环境中也是如此。随后,我们证明,在使用本地更新的federated 政策梯度(FedPG)算法中,根据一种只对每个个体代理商(在正对式和非正式假设中)保持的 ojjasiewicz 条件,采用当地更新的fredadd point(FedPG) ,全球趋同结果。我们进一步表明,B-RS-FedPG在代理商数量方面实现了线性加速,这是高效联进化学习的中央财产。最后,我们从我们的理论结论中汲取了深刻的见解,我们引入了 b-RS-FedPG,这是一种新的政策梯度方法,采用经过精心构建的软式激励参数化参数化,加上适当的正规化计划。我们进一步证明,B-RS-FDPGG与近优性固定政策之间的明确趋同率率。最后,我们证明,我们不断学习FGPGF和FM-FMFMFMFFFFFFF的不断进化模式。
Article 121
Title@2025-05-29 (4): Diffusion Guidance Is a Controllable Policy Improvement Operator
Title: Diffusion Guidance Is a Controllable Policy Improvement Operator | Diffusion Guidance ist ein kontrollierbarer Politikverbesserungs-Betreiber | 传播指导是可控制的政策改进操作员 2505.23458v1 |
Authors: Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
At the core of reinforcement learning is the idea of learning beyond the performance in the data. However, scaling such systems has proven notoriously tricky. In contrast, techniques from generative modeling have proven remarkably scalable and are simple to train. In this work, we combine these strengths, by deriving a direct relation between policy improvement and guidance of diffusion models. The resulting framework, CFGRL, is trained with the simplicity of supervised learning, yet can further improve on the policies in the data. On offline RL tasks, we observe a reliable trend – increased guidance weighting leads to increased performance. Of particular importance, CFGRL can operate without explicitly learning a value function, allowing us to generalize simple supervised methods (e.g., goal-conditioned behavioral cloning) to further prioritize optimality, gaining performance for “free” across the board.
强化学习的核心是超越数据性能的学习理念。然而,推广这种系统证明是极其棘手的。相反,基因模型的技巧被证明非常可扩展,而且很容易培训。在这项工作中,我们将这些优势结合起来,在政策改进与传播模式指导之间建立直接的关系。由此形成的框架CFGRL经过监督学习的简单培训,但可以进一步改进数据中的政策。在离线的RL任务中,我们观察到一种可靠的趋势 – – 指导权重的增加导致性能的提高。特别重要的是,CFGRL可以在不明确学习价值功能的情况下运作,从而使我们能够将简单的监督方法(如有目标的克隆行为)推广到更优先的层次上,从而获得“免费”的性能。
Article 122
Title@2025-05-29 (4): TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction
Title: TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction | TabReason: Eine verstärkte Lern-verbesserte Begründung LLM für erklärbare tabellarische Datenvorhersage | TabReson: 用于可解释的图表数据预测的强化学习-提高合理理由的强化学习-强化LLMLM 2505.21807v2 |
Authors: Tommy Xu, Zhitian Zhang, Xiangyu Sun, Lauren Kelly Zung, Hossein Hajimirsadeghi, Greg Mori
Predictive modeling on tabular data is the cornerstone of many real-world applications. Although gradient boosting machines and some recent deep models achieve strong performance on tabular data, they often lack interpretability. On the other hand, large language models (LLMs) have demonstrated powerful capabilities to generate human-like reasoning and explanations, but remain under-performed for tabular data prediction. In this paper, we propose a new approach that leverages reasoning-based LLMs, trained using reinforcement learning, to perform more accurate and explainable predictions on tabular data. Our method introduces custom reward functions that guide the model not only toward high prediction accuracy but also toward human-understandable reasons for its predictions. Experimental results show that our model achieves promising performance on financial benchmark datasets, outperforming most existing LLMs.
在表格数据上建立预测模型是许多现实世界应用的基石。 虽然梯度推动机和最近一些深层模型在表格数据上表现良好,但它们往往缺乏可解释性。 另一方面,大型语言模型(LLMs)已经展示出产生人性推理和解释的强大能力,但在表格数据预测方面表现仍然不足。 在本文中,我们提出了一种新的方法,利用经过强化学习培训的基于推理的LMs,对列表数据进行更准确和可解释的预测。 我们的方法引入了定制奖励功能,不仅引导模型实现高预测准确性,而且引导模型预测的人性难以理解的原因。 实验结果显示,我们的模型在财务基准数据集上取得了有希望的业绩,优于大多数现有的LLMs。
Article 123
Title@2025-05-29 (4): Learning Cascade Ranking as One Network
Title: Learning Cascade Ranking as One Network | Kaskaden-Ranking als ein Netzwerk lernen | 学习连级安排 “ 一个网络 “ 网络 2503.09492v2 |
Authors: Yunli Wang, Zhen Zhang, Zhiqiang Wang, Zixuan Yang, Yu Li, Jian Yang, Shiyang Wen, Peng Jiang, Kun Gai
Cascade Ranking is a prevalent architecture in large-scale top-k selection systems like recommendation and advertising platforms. Traditional training methods focus on single-stage optimization, neglecting interactions between stages. Recent advances have introduced interaction-aware training paradigms, but still struggle to 1) align training objectives with the goal of the entire cascade ranking (i.e., end-to-end recall of ground-truth items) and 2) learn effective collaboration patterns for different stages. To address these challenges, we propose LCRON, which introduces a novel surrogate loss function derived from the lower bound probability that ground truth items are selected by cascade ranking, ensuring alignment with the overall objective of the system. According to the properties of the derived bound, we further design an auxiliary loss for each stage to drive the reduction of this bound, leading to a more robust and effective top-k selection. LCRON enables end-to-end training of the entire cascade ranking system as a unified network. Experimental results demonstrate that LCRON achieves significant improvement over existing methods on public benchmarks and industrial applications, addressing key limitations in cascade ranking training and significantly enhancing system performance.
为了应对这些挑战,我们提议LCRON采用新的替代损失功能,即地面真相项目按级联排名的低约束概率选择,确保符合系统的总体目标。根据衍生约束性培训的特性,我们进一步设计每个阶段的附带损失,推动减少这一约束性,导致更有力和更有效的顶级选择。 LCRON能够对整个级联排名系统进行端到端培训,作为一个统一的网络。实验结果显示,LCRON大大改进了公共基准和工业应用的现有方法,解决了级联排名培训的关键限制,并大大加强了系统绩效。
Article 124
Title@2025-05-29 (4): DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
Title: DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation | DynaMem: Online-Dynamischer Raum-Semantischer Speicher für mobile Manipulationen in der offenen Welt | DynaMem: 用于开放世界移动操纵的在线动态空间-空间内存 2411.04999v2 |
Authors: Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto
Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system’s applicability in real-world scenarios where environments frequently change due to human intervention or the robot’s own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot’s environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, which is more than a 2x improvement over state-of-the-art static systems. Our code as well as our experiment and deployment videos are open sourced and can be found on our project website: https://dynamem.github.io/
在开放词汇移动操作方面已经取得了显著进展, 目标是让机器人在自然语言描述下的任何环境中执行任务。 然而, 大多数当前系统都假设一个静态环境, 从而限制系统在现实世界环境中的可应用性, 因为在现实世界环境中, 环境经常因人类干预或机器人自己的行动而变化。 在此工作中, 我们展示DynaMem, 这是开放世界移动操作的新方法, 使用动态的spatio- 语义内存来代表机器人的环境。 Dynamem 在三个真实和九个离线场的屏幕上, 构建了一个三维数据结构, 以维持点云的动态记忆, 并用modoral LLMS 或由状态视觉语言模型生成的开放词汇特性解答开放词汇对象本地化查询。 Dynam, 我们的机器人可以探索新的环境, 搜索在记忆中找不到的物体, 并随着物体移动、 出现或消失在现场, 不断更新记忆。 我们在三个真实和九个离线场的屏幕上进行广泛的实验 Stech SE3机器人, 并实现一个平均选取- lib- 成功率在我们的系统上, 70% 。 在非静位系统上找到的系统上, 我们的系统可以找到一个正常的系统, 。
Article 125
Title@2025-05-29 (4): Network Inversion for Uncertainty-Aware Out-of-Distribution Detection
Title: Network Inversion for Uncertainty-Aware Out-of-Distribution Detection | Netzwerk-Inversion für unsichere Out-of-Distribution-Erkennung | 用于不确定性软件发送外检测的网络转换 2505.23448v1 |
Authors: Pirzada Suhail, Rehna Afroz, Amit Sethi
Out-of-distribution (OOD) detection and uncertainty estimation (UE) are critical components for building safe machine learning systems, especially in real-world scenarios where unexpected inputs are inevitable. In this work, we propose a novel framework that combines network inversion with classifier training to simultaneously address both OOD detection and uncertainty estimation. For a standard n-class classification task, we extend the classifier to an (n+1)-class model by introducing a “garbage” class, initially populated with random gaussian noise to represent outlier inputs. After each training epoch, we use network inversion to reconstruct input images corresponding to all output classes that initially appear as noisy and incoherent and are therefore excluded to the garbage class for retraining the classifier. This cycle of training, inversion, and exclusion continues iteratively till the inverted samples begin to resemble the in-distribution data more closely, suggesting that the classifier has learned to carve out meaningful decision boundaries while sanitising the class manifolds by pushing OOD content into the garbage class. During inference, this training scheme enables the model to effectively detect and reject OOD samples by classifying them into the garbage class. Furthermore, the confidence scores associated with each prediction can be used to estimate uncertainty for both in-distribution and OOD inputs. Our approach is scalable, interpretable, and does not require access to external OOD datasets or post-hoc calibration techniques while providing a unified solution to the dual challenges of OOD detection and uncertainty estimation.
在这项工作中,我们提出了一个新框架,将网络与分类培训相结合,同时处理OOD检测和不确定性估算问题。对于标准的n级分类任务,我们将分类器扩展为(n+1)级模型,引入一个“垃圾”类,最初由随机的Gausian噪音组成,以代表外部投入。在每次培训后,我们使用网络转换来重建与所有产出类别相对应的不确定性图像,这些产出类别最初显得吵闹和不协调,因此被排除在垃圾分类班之外,以便再培训 OOD 检测和不确定性估算同时处理 OO 。这一培训、 转换和排斥的周期继续反复进行,直到被倒置的样本开始更接近分发中的数据,表明分类器学会了将有意义的决定界限分离出来,同时通过将 OOD 内容推入外部检测班级,从而将 OOD 的可校准方法推入到垃圾类中。这一培训计划使得该模型能够有效地检测和拒绝所有产出类别中的输入图象,同时将OOD 的样本用于对质量的估算。
Article 126
Title@2025-05-29 (4): GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Title: GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning | GSQ-Tuning: Group-Shared Exponents integer in einer voll quantifizierten Schulung für LLMs On-Device-Fine-Tuning | GSQ-Turning:为在线设计精微调LLM女士提供全面量化培训的集团共享指数整数 2502.12913v3 |
Authors: Sifan Zhou, Shuo Wang, Zhihang Yuan, Mingjia Shi, Yuzhang Shang, Dawei Yang
Large Language Models (LLMs) fine-tuning technologies have achieved remarkable results. However, traditional LLM fine-tuning approaches face significant challenges: they require large Floating Point (FP) computation, raising privacy concerns when handling sensitive data, and are impractical for resource-constrained edge devices. While Parameter-Efficient Fine-Tuning (PEFT) techniques reduce trainable parameters, their reliance on floating-point arithmetic creates fundamental incompatibilities with edge hardware. In this work, we introduce a novel framework for on-device LLM fine-tuning that eliminates the need for floating-point operations in both inference and training, named GSQ-Tuning. At its core is the Group-Shared Exponents Integer format, which efficiently represents model parameters in integer format using shared exponents among parameter groups. When combined with LoRA-like adapters, this enables fully integer-based fine-tuning that is both memory and compute efficient. We demonstrate that our approach achieves accuracy comparable to BF16-based fine-tuning while significantly reducing 1.85x memory usage. Moreover, compared to FP8, our method can reduce 5x power consumption and 11x chip area with same performance, making large-scale model adaptation feasible on edge devices.
大型语言模型(LLMS)的微调技术取得了显著成果。然而,传统的LLM微调方法面临重大挑战:它们需要大型浮点计算,在处理敏感数据时引起隐私问题,对资源限制的边缘设备不切实际。虽然参数-有效精美微调(PEFT)技术减少了可训练参数,但对浮点计算方法的依赖使得对边端硬件的精确性与边端硬件产生根本的不兼容性。在这项工作中,我们引入了一个新型的LLM微调框架,消除了在感应和培训(称为GSQ-Tuning)中进行浮点操作的需要。其核心是群体共享集价 Integer格式,该格式有效地代表了使用各参数组共享的整数格式的模型参数。当它们与类似LORA的适应器相结合时,可以使完全基于整流点的微调既具有记忆性又具有计算效率。我们的方法达到了与基于BF16的微调的精确性能,同时大大减少了1.85x记忆使用。此外,与可操作性平级的平方标准的S-11级吸能装置相比,我们的方法可以降低高能区平位的平段的平段的性能。
Article 127
Title@2025-05-29 (4): SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins
Title: SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins | SCoTT: Strategisches Chain-of-Thought-Tasking für Wireless-Aware-Roboternavigation in digitalen Zwillingen | SCTT: “ 数字双双 “ 中无线软件机器人导航战略研究链任务 2411.18212v2 |
Authors: Aladin Djuhera, Amin Seffo, Vlad C. Andrei, Holger Boche, Walid Saad
Path planning under wireless performance constraints is a complex challenge in robot navigation. However, naively incorporating such constraints into classical planning algorithms often incurs prohibitive search costs. In this paper, we propose SCoTT, a wireless-aware path planning framework that leverages vision-language models (VLMs) to co-optimize average path gains and trajectory length using wireless heatmap images and ray-tracing data from a digital twin (DT). At the core of our framework is Strategic Chain-of-Thought Tasking (SCoTT), a novel prompting paradigm that decomposes the exhaustive search problem into structured subtasks, each solved via chain-of-thought prompting. To establish strong baselines, we compare classical A* and wireless-aware extensions of it, and derive DP-WA, an optimal, iterative dynamic programming algorithm that incorporates all path gains and distance metrics from the DT, but at significant computational cost. In extensive experiments, we show that SCoTT achieves path gains within 2% of DP-WA while consistently generating shorter trajectories. Moreover, SCoTT’s intermediate outputs can be used to accelerate DP-WA* by reducing its search space, saving up to 62% in execution time. We validate our framework using four VLMs, demonstrating effectiveness across both large and small models, thus making it applicable to a wide range of compact models at low inference cost. We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations. Finally, we discuss data acquisition pipelines, compute requirements, and deployment considerations for VLMs in 6G-enabled DTs, underscoring the potential of natural language interfaces for wireless-aware navigation in real-world applications.
无线性能限制下的路径规划是机器人导航中的一项复杂挑战。然而,将此类限制纳入经典规划算法中,天真地将此类限制纳入经典规划算法,往往带来令人望而却步的搜索费用。在本文中,我们提议SCOTT,即一个利用视觉语言模型(VLMS)来利用无线热映像和数字双体(DT)的射线追踪数据来优化平均路径增益和轨迹长度的无线天线路路路路路规划框架。在我们框架的核心是战略传输任务链(SCoTTT),这是一个将彻底搜索问题分解成结构化子任务的新颖的激励型模式,每个子线条都通过思维链加速解决。为了建立强有力的基线,我们比较了经典A* 和无线线路路路路路路路路路路路路路路路路路路路路路路路路图的扩展,我们通过广泛实验,在DP-WA* 内部实现路径上实现路径增益,同时持续产生较短的轨迹要求。 此外,SCTFTFS-LLL 快速运行运行系统运行运行系统运行运行中可以加速运行运行运行。我们使用大型搜索系统运行的运行中,在四大路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路。
Article 128
Title@2025-05-29 (4): The Strong, Weak and Benign Goodhart’s law. An independence-free and paradigm-agnostic formalisation
Title: The Strong, Weak and Benign Goodhart’s law. An independence-free and paradigm-agnostic formalisation | The Strong, Weak and Benign Goodharts Gesetz. Eine unabhängigkeitsfreie und paradigmatisch-agnostische Formalisierung | 强势、弱弱和本尼·古德哈特法,无独立和无范式、不可知的正规化 2505.23445v1 |
Authors: Adrien Majka, El-Mahdi El-Mhamdi
Goodhart’s law is a famous adage in policy-making that states that ``When a measure becomes a target, it ceases to be a good measure’’. As machine learning models and the optimisation capacity to train them grow, growing empirical evidence reinforced the belief in the validity of this law without however being formalised. Recently, a few attempts were made to formalise Goodhart’s law, either by categorising variants of it, or by looking at how optimising a proxy metric affects the optimisation of an intended goal. In this work, we alleviate the simplifying independence assumption, made in previous works, and the assumption on the learning paradigm made in most of them, to study the effect of the coupling between the proxy metric and the intended goal on Goodhart’s law. Our results show that in the case of light tailed goal and light tailed discrepancy, dependence does not change the nature of Goodhart’s effect. However, in the light tailed goal and heavy tailed discrepancy case, we exhibit an example where over-optimisation occurs at a rate inversely proportional to the heavy tailedness of the discrepancy between the goal and the metric. %
Goodhart的法律是决策中一个著名的格言,它指出“当一项措施成为目标时,它不再是一个好的衡量标准”。随着机器学习模式和训练它们的最佳能力的增长,越来越多的经验证据加强了对该法有效性的信念,而没有正式化。最近,有人试图将Goodhart的法律正规化,或者对它进行分类,或者研究如何优化代用衡量标准影响预期目标的优化。在这项工作中,我们减少了以前作品中所作的简化独立假设,以及其中多数作品中对学习模式所作的假设,以研究代用衡量标准与Goodhart法律预期目标的挂钩的影响。我们的结果显示,在光尾目标和小尾差异的情况下,依赖并不改变Goodhart效应的性质。然而,在浅尾尾目标和严重尾端差异的情况下,我们展示了一个例子,过度优化发生的速度与目标之间严重尾部差异之间的反比率。
Article 129
Title@2025-05-29 (4): Strategic Classification with Non-Linear Classifiers
Title: Strategic Classification with Non-Linear Classifiers | Strategische Klassifizierung mit nicht linearen Klassifikatoren | 战略分类与非链分类法战略分类 2505.23443v1 |
Authors: Benyamin Trachtenberg, Nir Rosenfeld
In strategic classification, the standard supervised learning setting is extended to support the notion of strategic user behavior in the form of costly feature manipulations made in response to a classifier. While standard learning supports a broad range of model classes, the study of strategic classification has, so far, been dedicated mostly to linear classifiers. This work aims to expand the horizon by exploring how strategic behavior manifests under non-linear classifiers and what this implies for learning. We take a bottom-up approach showing how non-linearity affects decision boundary points, classifier expressivity, and model classes complexity. A key finding is that universal approximators (e.g., neural nets) are no longer universal once the environment is strategic. We demonstrate empirically how this can create performance gaps even on an unrestricted model class.
在战略分类方面,标准监督的学习环境扩大到支持战略用户行为的概念,其形式是针对分类者进行昂贵的特性操纵。虽然标准学习支持一系列广泛的模型类,但到目前为止,战略分类的研究大多专门针对线性分类者。这项工作旨在扩大视野,探讨非线性分类者的战略行为如何表现,以及这意味着学习什么。我们采取自下而上的方法,表明非线性如何影响决定边界点、分类性直观性和模型类的复杂性。一项关键发现是,一旦环境具有战略意义,通用近似器(例如神经网)就不再具有普遍性。我们从经验上证明,即使在一个不受限制的模型类中,这如何造成业绩差距。
Article 130
Title@2025-05-29 (4): Rethinking Regularization Methods for Knowledge Graph Completion
Title: Rethinking Regularization Methods for Knowledge Graph Completion | Überdenken von Regularisierungsmethoden für Wissensgraphenvervollständigung | 重新思考知识图完成正规化方法 2505.23442v1 |
Authors: Linyu Li, Zhi Jin, Yuanpeng He, Dongming Jin, Haoran Duan, Zhengwei Tao, Xuan Zhang, Jiandong Li
Knowledge graph completion (KGC) has attracted considerable attention in recent years because it is critical to improving the quality of knowledge graphs. Researchers have continuously explored various models. However, most previous efforts have neglected to take advantage of regularization from a deeper perspective and therefore have not been used to their full potential. This paper rethinks the application of regularization methods in KGC. Through extensive empirical studies on various KGC models, we find that carefully designed regularization not only alleviates overfitting and reduces variance but also enables these models to break through the upper bounds of their original performance. Furthermore, we introduce a novel sparse-regularization method that embeds the concept of rank-based selective sparsity into the KGC regularizer. The core idea is to selectively penalize those components with significant features in the embedding vector, thus effectively ignoring many components that contribute little and may only represent noise. Various comparative experiments on multiple datasets and multiple models show that the SPR regularization method is better than other regularization methods and can enable the KGC model to further break through the performance margin.
近些年来,知识图的完成(KGC)由于对提高知识图的质量至关重要,所以引起了相当大的注意。研究人员不断探索各种模型。然而,以往的多数努力都忽略了从更深的视角利用正规化,因此没有充分利用其潜力。本文件重新思考了在KGC应用正规化方法的问题。通过对各种KGC模型的广泛经验研究,我们发现,经过精心设计的正规化不仅减轻了过度和减少差异,而且使这些模型能够突破其原始性能的上限。此外,我们引入了一种新的稀有常规化方法,将基于等级的选择性聚变概念嵌入KGC正规化器中。核心思想是选择性地惩罚那些在嵌入矢量中具有重要特征的成分,从而实际上忽略了许多很少起作用的成分,而且可能只是代表噪音。关于多个数据集和多个模型的各种比较实验表明,SPR正规化方法比其他正规化方法要好,能够使KGC模型进一步突破性差。
Article 131
Title@2025-05-29 (4): The challenge of hidden gifts in multi-agent reinforcement learning
Title: The challenge of hidden gifts in multi-agent reinforcement learning | Die Herausforderung der versteckten Gaben in Multi-Agenten-Verstärkung Lernen | 多试剂强化学习中隐藏礼品的挑战 2505.20579v2 |
Authors: Dane Malenfant, Blake A. Richards
Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness in independent agents can benefit these settings.
有时我们从其他人的行动中受益,即使我们不知道他们采取了这些行动。例如,如果邻居选择不在其家中时不在其家门前停泊,即使不知道他们采取了这一行动,也可以受益。这些“隐藏的礼物”代表了多试剂强化学习(MARL)的一个有趣的挑战,因为当其他人的有益行动被隐藏起来时,就分配信用是非三角的。在这里,我们研究隐藏的礼品的影响,任务很简单,MARL的任务非常简单。在这个任务中,网格世界环境中的代理商有单独的门可以打开,以获得个人报酬。同样,如果所有代理商都打开了他们的家门,他们也可以得到更大的集体奖赏。然而,所有这些“隐藏的礼物”只是当代理人在其他人的有益行动被隐藏起来的时候,集体奖赏才能得到。 值得注意的是,没有什么可以告诉代理商其他代理商已经放下了钥匙,因此,放弃他人的钥匙的行为就是“隐藏的礼物”。我们用不同的门打开门打开了自己的门来获得个人奖赏。同样,如果所有的代理商都打开他们的门门, 包括MAL 算算,那么,我们就能在他们自己学习了一个真正的历史任务中,我们如何在学习这些任务中,我们如何在学习这些任务中,我们是如何学习了。
Article 132
Title@2025-05-29 (4): LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
Title: LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty | LoTUS: Großformatige Maschine entlernen mit einem Geschmack von Ungewissheit | LoTUS: 大型机器与不确定性的味道脱钩 2503.18314v4 |
Authors: Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves, Petros Daras
We present LoTUS, a novel Machine Unlearning (MU) method that eliminates the influence of training samples from pre-trained models, avoiding retraining from scratch. LoTUS smooths the prediction probabilities of the model up to an information-theoretic bound, mitigating its over-confidence stemming from data memorization. We evaluate LoTUS on Transformer and ResNet18 models against eight baselines across five public datasets. Beyond established MU benchmarks, we evaluate unlearning on ImageNet1k, a large-scale dataset, where retraining is impractical, simulating real-world conditions. Moreover, we introduce the novel Retrain-Free Jensen-Shannon Divergence (RF-JSD) metric to enable evaluation under real-world conditions. The experimental results show that LoTUS outperforms state-of-the-art methods in terms of both efficiency and effectiveness. Code: https://github.com/cspartalis/LoTUS.
我们介绍LotUS,这是消除培训样本对培训前模式的影响、避免从零开始再培训的新颖的机器不学习方法;LotUS将模型的预测概率顺通到信息理论约束,减轻数据记忆的过度信任;我们根据五个公共数据集的八个基线对变异器和ResNet18模型进行评估;除了已经确立的MU基准外,我们还评估在图像Net1k上的未学习,这是一个大规模数据集,在那里,再培训是不切实际的,模拟现实世界条件;此外,我们推出新的Retrain Free Jensen-Shannon Divergence (RF-JSD) 标准,以便能够在现实世界条件下进行评估;实验结果显示,LotUS在效率和有效性方面都超越了最新技术方法。代码:https://github.com/cspartalis/LATUS。
Article 133
Title@2025-05-29 (4): Bounded-Abstention Pairwise Learning to Rank
Title: Bounded-Abstention Pairwise Learning to Rank | Gebundene Abhaltung Pairwise Learning to Rank | 学习排名 2505.23437v1 |
Authors: Antonio Ferrara, Andrea Pugnana, Francesco Bonchi, Salvatore Ruggieri
Ranking systems influence decision-making in high-stakes domains like health, education, and employment, where they can have substantial economic and social impacts. This makes the integration of safety mechanisms essential. One such mechanism is $\textit{abstention}$, which enables algorithmic decision-making system to defer uncertain or low-confidence decisions to human experts. While abstention have been predominantly explored in the context of classification tasks, its application to other machine learning paradigms remains underexplored. In this paper, we introduce a novel method for abstention in pairwise learning-to-rank tasks. Our approach is based on thresholding the ranker’s conditional risk: the system abstains from making a decision when the estimated risk exceeds a predefined threshold. Our contributions are threefold: a theoretical characterization of the optimal abstention strategy, a model-agnostic, plug-in algorithm for constructing abstaining ranking models, and a comprehensive empirical evaluations across multiple datasets, demonstrating the effectiveness of our approach.
分级制度影响保健、教育和就业等高层次领域的决策,可以对这些领域产生巨大的经济和社会影响。这使得整合安全机制至关重要。其中一种机制是美元,使算法决策系统能够将不确定或低信任决定推迟给人类专家。虽然主要在分类任务方面探索了弃权权,但在其他机器学习范式中应用弃权权的情况仍未得到充分探讨。在本文中,我们引入了一种新颖的方法,在对齐学习到排序的任务中弃权权。我们的方法是基于对排级者的有条件风险设定门槛:当估计风险超过预先确定的门槛时,系统不做出决策。我们的贡献有三重:最佳弃权战略的理论定性、构建不排位模型的、插入式算法,以及跨多个数据集的全面经验评估,显示了我们的方法的有效性。
Article 134
Title@2025-05-29 (4): Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning
Title: Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning | Trainieren mit Perturbation, schlussfolgern nach Merging: Ein Zwei-Stufen-Rahmen für kontinuierliches Lernen | 接转训练、合并后的推推:持续学习的双阶段框架 2505.22389v2 |
Authors: Haomiao Qiu, Miao Zhang, Ziyue Qiao, Liqiang Nie
Continual Learning (CL) aims to enable models to continuously acquire new knowledge from a sequence of tasks with avoiding the forgetting of learned information. However, existing CL methods only rely on the parameters of the most recent task for inference, which makes them susceptible to catastrophic forgetting. Inspired by the recent success of model merging techniques, we propose \textbf{Perturb-and-Merge (P\&M)}, a novel continual learning framework that integrates model merging into the CL paradigm to mitigate forgetting. Specifically, after training on each task, P\&M constructs a new model by forming a convex combination of the previous model and the newly trained task-specific model. Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient. To further improve the performance of the merged model, we observe that the degradation introduced during merging can be alleviated by a regularization term composed of the task vector and the Hessian matrix of the loss function. Interestingly, we show that this term can be efficiently approximated using second-order symmetric finite differences, and a stochastic perturbation strategy along the task vector direction is accordingly devised which incurs no additional forward or backward passes while providing an effective approximation of the regularization term. Finally, we combine P\&M with LoRA, a parameter-efficient fine-tuning method, to reduce memory overhead. Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
持续学习(CL)旨在让模型能够不断从一系列任务中获取新知识,避免忘记已学习的信息;然而,现有的CL方法只依靠最近一项推断任务的参数,因此很容易发生灾难性的忘记。受最近成功的模型合并技术的启发,我们提议了\ textbf{Perturb-and-Merge(PM)},这是一个新的持续学习框架,将模式合并到CL模式中,以减少遗忘。具体地说,在对每项任务进行培训之后,PM通过将以前的模型和新培训的具体任务模型结合起来,来构建一个新的模型。通过理论分析,我们最大限度地减少所有任务的总损失,并为最佳合并系数找到分析解决办法。为了进一步改善合并模型的性能,我们观察到,合并过程中引入的退化可以通过由任务矢量组成的正规化术语和损失函数的赫西式矩阵来缓解。有趣的是,在对每项任务进行培训后,我们可以用二级的精细度定的固定差异和新培训的具体任务模式来构建一个新的模型。最后,我们用一个不断调整的轨道化的方法来将一个我们进化的系统化的系统化的系统化方向结合起来。
Article 135
Title@2025-05-29 (4): Emergent Risk Awareness in Rational Agents under Resource Constraints
Title: Emergent Risk Awareness in Rational Agents under Resource Constraints | Emergent Risk Awareness in Rational Agents unter Ressourcenbeschränkungen | 资源限制下对合理代理的新兴风险意识 2505.23436v1 |
Authors: Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Wei-Chen Lee, Ani Calinescu, Doyne Farme, Michael Wooldridge
Advanced reasoning models with agentic capabilities (AI agents) are deployed to interact with humans and to solve sequential decision-making problems under (approximate) utility functions and internal models. When such problems have resource or failure constraints where action sequences may be forcibly terminated once resources are exhausted, agents face implicit trade-offs that reshape their utility-driven (rational) behaviour. Additionally, since these agents are typically commissioned by a human principal to act on their behalf, asymmetries in constraint exposure can give rise to previously unanticipated misalignment between human objectives and agent incentives. We formalise this setting through a survival bandit framework, provide theoretical and empirical results that quantify the impact of survival-driven preference shifts, identify conditions under which misalignment emerges and propose mechanisms to mitigate the emergence of risk-seeking or risk-averse behaviours. As a result, this work aims to increase understanding and interpretability of emergent behaviours of AI agents operating under such survival pressure, and offer guidelines for safely deploying such AI systems in critical resource-limited environments.
运用具有代理能力的高级推理模型(AI代理商)与人类互动,并在(近似)通用功能和内部模型下解决顺序决策问题。当这类问题有资源或失败制约,一旦资源用尽,行动序列可能被迫终止时,代理商面临暗含的权衡,从而改变其使用(理性)行为。此外,由于这些代理商通常由一位人类主委托代表其行事,因此,在受限制的接触中的不对称可能导致人类目标与代理商激励之间先前未曾预料到的不匹配。我们通过生存强盗框架将这一设置正规化,提供理论和经验结果,量化由生存驱动的优惠转移的影响,查明出现不匹配的条件,并提出机制,以减轻在这种生存压力下运作的AI代理商新出现的行为,并提供在关键资源有限的环境中安全部署此类AI系统的指导方针。
Article 136
Title@2025-05-29 (4): Diversity-Aware Policy Optimization for Large Language Model Reasoning
Title: Diversity-Aware Policy Optimization for Large Language Model Reasoning | Diversity-Aware-Politikoptimierung für groß angelegte Sprachmodell-Reasoning | 大语言示范理由的多样性政策优化 2505.23433v1 |
Authors: Jian Yao, Ran Cheng, Xingyu Wu, Jibin Wu, Kay Chen Tan
The reasoning capabilities of large language models (LLMs) have advanced rapidly, particularly following the release of DeepSeek R1, which has inspired a surge of research into data quality and reinforcement learning (RL) algorithms. Despite the pivotal role diversity plays in RL, its influence on LLM reasoning remains largely underexplored. To bridge this gap, this work presents a systematic investigation into the impact of diversity in RL-based training for LLM reasoning, and proposes a novel diversity-aware policy optimization method. Across evaluations on 12 LLMs, we observe a strong positive correlation between the solution diversity and Potential at k (a novel metric quantifying an LLM’s reasoning potential) in high-performing models. This finding motivates our method to explicitly promote diversity during RL training. Specifically, we design a token-level diversity and reformulate it into a practical objective, then we selectively apply it to positive samples. Integrated into the R1-zero training framework, our method achieves a 3.5 percent average improvement across four mathematical reasoning benchmarks, while generating more diverse and robust solutions.
大型语言模型(LLMs)的推理能力迅速发展,特别是在DeepSeek R1发布后,它激发了数据质量和强化学习算法研究的激增。尽管多样性在RL中发挥着关键作用,但对LLM推理的影响仍然在很大程度上没有得到充分探讨。为弥合这一差距,这项工作对基于RL培训的多样性对LLM推理的影响进行了系统调查,并提出了新的多样性认识政策优化方法。在对12LMs的评价中,我们看到在高绩效模型中,解决办法的多样性和潜力在 k(对LLM推理潜力进行量化的新指标)之间有着强烈的正相关关系。这一发现激励了我们在RL培训中明确促进多样性的方法。具体地说,我们设计了象征性的多样性并将其转化为一个实际目标,然后有选择地将其应用于积极的样本。我们的方法被纳入R1-零培训框架,在四个数学推理基准中实现了平均3.5%的改进,同时产生了更加多样化和有力的解决方案。
Article 137
Title@2025-05-29 (4): Improved Learning via k-DTW: A Novel Dissimilarity Measure for Curves
Title: Improved Learning via k-DTW: A Novel Dissimilarity Measure for Curves | Verbessertes Lernen über k-DTW: Ein neuartiges Maß an Unähnlichkeit für Kurven | 通过 k-DTW改进学习:曲线的新差异措施 2505.23431v1 |
Authors: Amer Krivošija, Alexander Munteanu, André Nusser, Chris Schwiegelshohn
This paper introduces $k$-Dynamic Time Warping ($k$-DTW), a novel dissimilarity measure for polygonal curves. $k$-DTW has stronger metric properties than Dynamic Time Warping (DTW) and is more robust to outliers than the Fr'{e}chet distance, which are the two gold standards of dissimilarity measures for polygonal curves. We show interesting properties of $k$-DTW and give an exact algorithm as well as a $(1+\varepsilon)$-approximation algorithm for $k$-DTW by a parametric search for the $k$-th largest matched distance. We prove the first dimension-free learning bounds for curves and further learning theoretic results. $k$-DTW not only admits smaller sample size than DTW for the problem of learning the median of curves, where some factors depending on the curves’ complexity $m$ are replaced by $k$, but we also show a surprising separation on the associated Rademacher and Gaussian complexities: $k$-DTW admits strictly smaller bounds than DTW, by a factor $\tilde\Omega(\sqrt{m})$ when $k\ll m$. We complement our theoretical findings with an experimental illustration of the benefits of using $k$-DTW for clustering and nearest neighbor classification.
本文介绍了对多边形曲线的美元- 机动时间扭曲(kk$- DTW) 。 美元- DTW比动态时间扭曲(DTW) 具有比动态时间扭曲(DTW) 更强的度量特性,并且比Fr'{e}chet 距离(Fr'{e}chet 距离) 更强的外端学习范围。 这是多边形曲线不同计量的两个金标准。 我们展示了K美元- DTW 的有趣特性,给出了精确的算法以及美元- DTW 的(1 varepsilon) 和 美元- DTW 和美元(美元- 美元) 相联的比方程式的比值( 美元- DDTQ_ m) 。 我们证明,对于曲线的计算结果来说,第一个无维度学习的学习范围比 DTW 的样本小。 美元- DTW 不仅接受比 DTW 的中位值小的样本大小, 某些取决于曲线的精度值 的精度值 以 美元 的精度值 。
Article 138
Title@2025-05-29 (4): Proper Dataset Valuation by Pointwise Mutual Information
Title: Proper Dataset Valuation by Pointwise Mutual Information | Richtiger Datensatz Bewertung durch pointwise Gegenseitige Informationen | 按点对点相互信息分列的适当数据集估价 2405.18253v3 |
Authors: Shuran Zheng, Xuan Qi, Rui Ray Chen, Yongchan Kwon, James Zou
Data plays a central role in advancements in modern artificial intelligence, with high-quality data emerging as a key driver of model performance. This has prompted the development of principled and effective data curation methods in recent years. However, existing methods largely rely on heuristics, and whether they are truly effective remains unclear. For instance, standard evaluation methods that assess a trained model’s performance on specific benchmarks may incentivize assigning high scores to data that merely resembles the test set. This issue exemplifies Goodhart’s law: when a measure becomes a target, it ceases to be a good measure. To address this issue, we propose an information-theoretic framework for evaluating data curation methods. We define dataset quality in terms of its informativeness about the true model parameters, formalized using the Blackwell ordering of informativeness. Under this ordering, Blackwell’s theorem ensures that more informative data yields optimal models with lower expected loss on the true underlying distribution. To measure informativeness, we show that the Blackwell order can be determined by the Shannon mutual information between the curated data and the test data. To estimate this mutual information, we introduce a novel method that trains Bayesian models on embedded datasets and computes mutual information from the posteriors of model parameters. Experiments on real-world data demonstrate that our mutual information-based evaluation assigns appropriately lower scores to data curation strategies that reduce dataset informativeness, while traditional test score-based evaluation methods may favor data curation strategies that overfit to the test set but compromise the training data’s informativeness.
数据在现代人工智能的进步中发挥着核心作用, 高品质数据正在成为模型性能的关键驱动力。 这促使近年来发展了有原则和有效的数据校正方法。 但是, 现有方法在很大程度上依赖脂质学, 并且它们是否真正有效, 仍然不清楚。 例如, 评估经过培训的模型在具体基准方面的绩效的标准评价方法, 可能会激励将高分分配到仅仅类似于测试集的数据中。 这个问题体现了Goodhart 的法律: 当一项措施成为目标时, 它不再是一个良好的衡量标准。 为了解决这个问题, 我们提出了一个用于评价数据校正性方法的信息―― 信息性框架。 我们用关于真正模型参数的信息性来定义数据集的质量, 使用Blackwell 的信息性排序正规化。 根据此命令, 更丰富的数据性数据性能可以产生最佳模型, 其真实基础分布的预期损失较低。 为了衡量基于信息性, 我们显示, 黑well 命令可以由香农数据曲线性数据与测试性数据模型之间的相互信息性评估确定。 我们从共同数据测试性数据测试模型中, 引入了一种共同数据级数据测试方法, 以测试性数据性数据性数据性数据性测试模型, 。
Article 139
Title@2025-05-29 (4): Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary
Title: Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary | Überrefusal in LLMs aus Sicht der Sicherheitsentscheidungsgrenze zu verstehen und zu mildern | 从安全裁定边界的始终如一的视角理解和减轻LLM女士的过度拒绝 2505.18325v2 |
Authors: Licheng Pan, Yongqi Tong, Xin Zhang, Xiaolu Zhang, Jun Zhou, Zhixuan Chu
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, yet they often refuse to answer legitimate queries-a phenomenon known as overrefusal. Overrefusal typically stems from over-conservative safety alignment, causing models to treat many reasonable prompts as potentially risky. To systematically understand this issue, we probe and leverage the models’safety decision boundaries to analyze and mitigate overrefusal. Our findings reveal that overrefusal is closely tied to misalignment at these boundary regions, where models struggle to distinguish subtle differences between benign and harmful content. Building on these insights, we present RASS, an automated framework for prompt generation and selection that strategically targets overrefusal prompts near the safety boundary. By harnessing steering vectors in the representation space, RASS efficiently identifies and curates boundary-aligned prompts, enabling more effective and targeted mitigation of overrefusal. This approach not only provides a more precise and interpretable view of model safety decisions but also seamlessly extends to multilingual scenarios.We have explored the safety decision boundaries of various LLMs and construct the MORBench evaluation set to facilitate robust assessment of model safety and helpfulness across multiple languages. Code and datasets will be released at https://anonymous.4open.science/r/RASS-80D3.
大型语言模型(LLMS)在一系列广泛的任务中表现出了非凡的能力,然而,它们往往拒绝回答被称为过度反悔的正当问题。过度反悔通常源于过度保守的安全协调,导致许多合理的提示被视为潜在的风险。为了系统地理解这一问题,我们探究并利用模型的安全决定界限来分析和减轻过度反悔。我们的调查结果显示,过度反悔与这些边界区域的错误对立密切相关,这些模型努力区分良性内容和有害内容之间的微妙差异。我们根据这些洞察力提出RASS,这是一个迅速生成和选择的自动框架,其战略目标是在安全边界附近过度反悔。通过在代表空间利用指导矢量,RASS有效地识别和调节边界对立的提示,从而能够更有效和有针对性地减轻过度反悔。这种方法不仅为示范安全决定提供了更准确和可解释的视角,而且无缝地扩展到多语种情景。我们探索了各种LMS的安全决定的界限,并构建了MORBESCSE/ASSADR号多版本评估系统,将便利对安全性和安全/开放性准则进行强有力的评估。
Article 140
Title@2025-05-29 (4): On the Validity of Head Motion Patterns as Generalisable Depression Biomarkers
Title: On the Validity of Head Motion Patterns as Generalisable Depression Biomarkers | Über die Gültigkeit von Head Motion Patterns als Generalisable Depression Biomarkers | 头动模式作为可普遍适用的萧条生物标志物的有效性 2505.23427v1 |
Authors: Monika Gahalawat, Maneesh Bilalpur, Raul Fernandez Rojas, Jeffrey F. Cohn, Roland Goecke, Ramanathan Subramanian
Depression is a debilitating mood disorder negatively impacting millions worldwide. While researchers have explored multiple verbal and non-verbal behavioural cues for automated depression assessment, head motion has received little attention thus far. Further, the common practice of validating machine learning models via a single dataset can limit model generalisability. This work examines the effectiveness and generalisability of models utilising elementary head motion units, termed kinemes, for depression severity estimation. Specifically, we consider three depression datasets from different western cultures (German: AVEC2013, Australian: Blackdog and American: Pitt datasets) with varied contextual and recording settings to investigate the generalisability of the derived kineme patterns via two methods: (i) k-fold cross-validation over individual/multiple datasets, and (ii) model reuse on other datasets. Evaluating classification and regression performance with classical machine learning methods, our results show that: (1) head motion patterns are efficient biomarkers for estimating depression severity, achieving highly competitive performance for both classification and regression tasks on a variety of datasets, including achieving the second best Mean Absolute Error (MAE) on the AVEC2013 dataset, and (2) kineme-based features are more generalisable than (a) raw head motion descriptors for binary severity classification, and (b) other visual behavioural cues for severity estimation (regression).
虽然研究人员探索了多种语言和非语言行为提示,用于自动抑郁症评估,但头部运动迄今很少受到关注。此外,通过单一数据集验证机器学习模型的常见做法可能限制模型的通用性。这项工作审查了使用基本头动单元的模型的有效性和可概括性,即所谓的“直流”,以进行抑郁程度估计。具体地说,我们考虑了来自不同西方文化(德国:AVEC2013、澳大利亚:黑狗和美国:Pittt数据集)的三种抑郁症数据集,它们具有不同的背景和记录设置,以调查衍生的直系血型模式的可通用性。此外,通过两种方法:(一) 个人/多重数据集的千倍交叉校验,以及(二) 利用其他数据集的模型再利用。用经典机器学习方法评估分类和回归性表现。我们的结果显示:(1) 头动模式是评估抑郁症严重程度的高效生物标志,在各种数据集的分类和回归任务上实现高度竞争性的性表现,包括实现第二最佳直系直系模式(MAE) 和基于其他直观的直观性(直观性) AVA) 和直观性分析性(AVARC性(MA) ) 的直观) 和直观性(A-R) 直观) 。
Article 141
Title@2025-05-29 (4): Enhanced DACER Algorithm with High Diffusion Efficiency
Title: Enhanced DACER Algorithm with High Diffusion Efficiency | Verbesserter DACER-Algorithmus mit hoher Diffusionseffizienz | DACER 高传播效率增强的DACER 计算法 2505.23426v1 |
Authors: Yinuo Wang, Mining Tan, Wenjun Zou, Haotian Lin, Xujie Song, Wenxuan Wang, Tong Liu, Likun Wang, Guojian Zhan, Tianze Zhu, Shiqi Liu, Jingliang Duan, Shengbo Eben Li
Due to their expressive capacity, diffusion models have shown great promise in offline RL and imitation learning. Diffusion Actor-Critic with Entropy Regulator (DACER) extended this capability to online RL by using the reverse diffusion process as a policy approximator, trained end-to-end with policy gradient methods, achieving strong performance. However, this comes at the cost of requiring many diffusion steps, which significantly hampers training efficiency, while directly reducing the steps leads to noticeable performance degradation. Critically, the lack of inference efficiency becomes a significant bottleneck for applying diffusion policies in real-time online RL settings. To improve training and inference efficiency while maintaining or even enhancing performance, we propose a Q-gradient field objective as an auxiliary optimization target to guide the denoising process at each diffusion step. Nonetheless, we observe that the independence of the Q-gradient field from the diffusion time step negatively impacts the performance of the diffusion policy. To address this, we introduce a temporal weighting mechanism that enables the model to efficiently eliminate large-scale noise in the early stages and refine actions in the later stages. Experimental results on MuJoCo benchmarks and several multimodal tasks demonstrate that the DACER2 algorithm achieves state-of-the-art performance in most MuJoCo control tasks with only five diffusion steps, while also exhibiting stronger multimodality compared to DACER.
传播模型由于其表现能力,在离线RL和模仿学习中表现出了巨大的希望。与 Entropy 监管机构(DACER)的Difulation Actor-Cripic-Cripic with Entropy Control (DACER)将这一能力扩展至在线RL,方法是将反向传播进程用作政策辅助工具,通过政策梯度方法培训端至端端,取得强效。然而,这样做的代价是要求采取许多传播步骤,这严重妨碍培训效率,同时直接减少步骤导致显著的绩效退化。关键是,缺乏推论效率成为了在实时在线RL环境中应用传播政策的重大瓶颈。为了提高培训和推断效率,同时保持甚至提高绩效,我们提议将分级外地目标作为辅助性优化目标,以指导每个推广步骤的分解进程。然而,我们认为,相对于传播时间步骤的独立性对传播政策的业绩产生了消极影响。为了解决这一问题,我们引入了时间加权机制,使模型能够在早期应用大规模噪音,并且改进行动效率,同时提高培训和推导效效率,同时在后阶段也展示了FAROAL2 级任务,在最高级任务上展示了更强的MAL-CA上,在最高级任务上,仅能取得更强的模级基准,在后阶段展示了BA-CAMA级的进度上展示了5级基准。
Article 142
Title@2025-05-29 (4): Hierarchical Neuro-Symbolic Decision Transformer
Title: Hierarchical Neuro-Symbolic Decision Transformer | Hierarchischer neuro-symbolischer Entscheidungstransformator | 等级性神经-共制决定变换器 2503.07148v3 |
Authors: Ali Baheri, Cecilia O. Alm
We present a hierarchical neuro-symbolic control framework that tightly couples a classical symbolic planner with a transformer-based policy to address long-horizon decision-making under uncertainty. At the high level, the planner assembles an interpretable sequence of operators that guarantees logical coherence with task constraints, while at the low level each operator is rendered as a sub-goal token that conditions a decision transformer to generate fine-grained actions directly from raw observations. This bidirectional interface preserves the combinatorial efficiency and explainability of symbolic reasoning without sacrificing the adaptability of deep sequence models, and it permits a principled analysis that tracks how approximation errors from both planning and execution accumulate across the hierarchy. Empirical studies in stochastic grid-world domains demonstrate that the proposed method consistently surpasses purely symbolic, purely neural and existing hierarchical baselines in both success and efficiency, highlighting its robustness for sequential tasks.
我们提出了一个等级级神经 – – 精神共振控制框架,将传统的象征性规划师与基于变压器的政策紧紧结合在一起,以便在不确定的情况下处理长视距的决策。在高层,规划员将可解释的操作员序列组合在一起,保证逻辑上与任务限制保持一致,而在低层,每个操作员则作为次级目标象征,为决定变压器提供条件,直接从原始观测中产生细微的分级行动。这个双向界面保存组合的效率和象征性推理的可解释性,同时又不牺牲深层次序列模型的适应性,并允许进行有原则的分析,以跟踪规划和执行过程中的近似差如何在等级结构中积累。在随机化的网域域域的实证研究表明,拟议的方法在成功和效率方面始终超越纯粹的象征性、纯粹的神经性和现有的分级基线,突出其对于连续任务的坚固性。
Article 143
Title@2025-05-29 (4): Risk-aware Direct Preference Optimization under Nested Risk Measure
Title: Risk-aware Direct Preference Optimization under Nested Risk Measure | Risikobewusste Direktpräferenzoptimierung unter verschachtelter Risikomaßnahme | 内层风险措施下认识到风险的直接最优化 2505.20359v2 |
Authors: Lijun Zhang, Lin Li, Yajie Qi, Huizhong Song, Yaodong Yang, Jun Wang, Wei Wei
When fine-tuning pre-trained Large Language Models (LLMs) to align with human values and intentions, maximizing the estimated reward can lead to superior performance, but it also introduces potential risks due to deviations from the reference model’s intended behavior. Most existing methods typically introduce KL divergence to constrain deviations between the trained model and the reference model; however, this may not be sufficient in certain applications that require tight risk control. In this paper, we introduce Risk-aware Direct Preference Optimization (Ra-DPO), a novel approach that incorporates risk-awareness by employing a class of nested risk measures. This approach formulates a constrained risk-aware advantage function maximization problem and then converts the Bradley-Terry model into a token-level representation. The objective function maximizes the likelihood of the policy while suppressing the deviation between a trained model and the reference model using a sequential risk ratio, thereby enhancing the model’s risk-awareness. Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and AlpacaEval, demonstrate the proposed method’s superior performance in balancing alignment performance and model drift. Our code is opensourced at https://github.com/zlj123-max/Ra-DPO.
当微调经过训练的大型语言模型(LLMS)与人类价值观和意图相匹配时,尽量扩大估计的奖励可以带来优异业绩,但也带来因偏离参考模型预期行为而带来的潜在风险。大多数现有方法通常引入KL差异以限制经过训练的模型与参考模型之间的偏差;然而,在某些应用中,这也许不够充分,需要严格的风险控制。在本文件中,我们引入了风险觉悟直接优化(Ra-DPO),这是一种新颖的办法,通过采用某种类嵌巢式风险计量措施,纳入风险意识。这种方法提出了有限的风险意识优势功能最大化问题,然后将布拉德利-Terry模型转换成象征性代表。客观功能在抑制经过训练的模型与参考模型之间的偏差的同时,使用一个顺序风险比比来控制,从而增强模型的风险意识。三种开放源数据集(IMDb Dataset、Athropic HDataset和AlpacaEval)的实验结果,展示了拟议的方法在Ormal-Oral-Axligal oral coal Progisal dal disal dismalgard) 和Reval disal disalgresligresligaldaldaldaldaldormald.
Article 144
Title@2025-05-29 (4): OTPTO: Joint Product Selection and Inventory Optimization in Fresh E-commerce Front-End Warehouses
Title: OTPTO: Joint Product Selection and Inventory Optimization in Fresh E-commerce Front-End Warehouses | OTPTO: Gemeinsame Produktauswahl und Bestandsoptimierung in Fresh E-Commerce Front-End Warehouses | OTPTO: 在新的电子商务前端仓库中联合产品选择和清单优化 2505.23421v1 |
Authors: Zheming Zhang, Yan Jiang, Qingshan Li, Ai Han
In China’s competitive fresh e-commerce market, optimizing operational strategies, especially inventory management in front-end warehouses, is key to enhance customer satisfaction and to gain a competitive edge. Front-end warehouses are placed in residential areas to ensure the timely delivery of fresh goods and are usually in small size. This brings the challenge of deciding which goods to stock and in what quantities, taking into account capacity constraints. To address this issue, traditional predict-then-optimize (PTO) methods that predict sales and then decide on inventory often don’t align prediction with inventory goals, as well as fail to prioritize consumer satisfaction. This paper proposes a multi-task Optimize-then-Predict-then-Optimize (OTPTO) approach that jointly optimizes product selection and inventory management, aiming to increase consumer satisfaction by maximizing the full order fulfillment rate. Our method employs a 0-1 mixed integer programming model OM1 to determine historically optimal inventory levels, and then uses a product selection model PM1 and the stocking model PM2 for prediction. The combined results are further refined through a post-processing algorithm OM2. Experimental results from JD.com’s 7Fresh platform demonstrate the robustness and significant advantages of our OTPTO method. Compared to the PTO approach, our OTPTO method substantially enhances the full order fulfillment rate by 4.34% (a relative increase of 7.05%) and narrows the gap to the optimal full order fulfillment rate by 5.27%. These findings substantiate the efficacy of the OTPTO method in managing inventory at front-end warehouses of fresh e-commerce platforms and provide valuable insights for future research in this domain.
在中国具有竞争力的新电子商务市场中,优化业务战略,特别是前端仓库的库存管理,是提高客户满意度和获得竞争优势的关键。前端仓库位于住宅区,以确保及时交付新鲜货物,通常规模较小。这带来了确定哪些货物储存和数量的挑战,同时考虑到能力限制。为了解决这一问题,传统的预测-即时优化(PTO)方法预测销售,然后决定库存,往往不与库存目标的预测保持一致,也没有优先考虑消费者满意度。本文建议采用多任务优化-即时优化-即时优化(OTPTO)方法,共同优化产品选择和库存管理(OTTO)方法,目的是通过最大限度地实现全订单完成率提高消费者满意度。我们的方法采用了0-1混合的OM1编程编程模型来确定历史最佳库存水平,然后使用产品选择模型PM1和库存储存模式PMM2来进行预测。通过后期电子处理算方法进一步改进了消费者满意度。7-OM2的当前最佳优化方法,通过JDOTO方法提升了我们前端汇率的完整排序。
Article 145
Title@2025-05-29 (4): Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Title: Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition | Probeneffiziente menschliche Bewertung großer Sprachmodelle durch maximalen Diskrepanzwettbewerb | 通过最大差异竞争对大语言模式进行抽样有效人力评价 2404.08008v2 |
Authors: Kehua Feng, Keyan Ding, Hongzhi Tan, Kede Ma, Zhihua Wang, Shuangquan Guo, Yuzhou Cheng, Ge Sun, Guozhou Zheng, Qiang Zhang, Huajun Chen
Reliable evaluation of large language models (LLMs) is impeded by two key challenges: objective metrics often fail to reflect human perception of natural language, and exhaustive human labeling is prohibitively expensive. Here, we propose a sample-efficient human evaluation method for LLMs based on the principle of MAximum Discrepancy (MAD) Competition. Our method automatically and adaptively selects a compact set of input instructions that maximize semantic discrepancy between pairs of LLM responses. Human evaluators then perform three-alternative forced choices on these paired responses, which are aggregated into a global ranking using Elo rating. We apply our approach to compare eight widely used LLMs across four tasks: scientific knowledge understanding, mathematical reasoning, creative and functional writing, and code generation and explanation. Experimental results show that our sample-efficient evaluation method recovers “gold-standard” model rankings with a handful of MAD-selected instructions, reveals respective strengths and weaknesses of each LLM, and offers nuanced insights to guide future LLM development. Code is available at https://github.com/weiji-Feng/MAD-Eval .
对大型语言模型(LLMS)的可靠评价受到两大挑战的阻碍:客观指标往往不能反映人类对自然语言的看法,而详尽的人类标签则极其昂贵。在这里,我们提议根据Meximum差异(MAD)竞争原则,对LLMS进行抽样有效的人类评价。我们的方法自动和适应性地选择一套紧凑的投入指示,最大限度地扩大LLM对口答复之间的语义差异。然后,人类评价者对这些配对反应进行三种选择性强迫选择,然后用Elo等级汇总为全球排名。我们采用我们的方法,将八种广泛使用的LLMS对四大任务进行比较:科学知识理解、数学推理、创造性和功能性写作、代码生成和解释。实验结果表明,我们的抽样有效评价方法恢复了“古老标准”模型的排名,并有少数MAD选定的指示,揭示了每个LM的长处和短处,并提供了细微的洞察见解,以指导未来的LM发展。代码可在https://github.com/weiji-Feng/MAD-Eval查阅。
Article 146
Title@2025-05-29 (4): Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates
Title: Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates | Robustheitskongruente Adversarial Training für sicheres maschinelles Lernen Modellaktualisierungen | 安全机器学习模型更新的强力和共性安全机器学习模型自动培训 2402.17390v2 |
Authors: Daniele Angioni, Luca Demetrio, Maura Pintor, Luca Oneto, Davide Anguita, Battista Biggio, Fabio Roli
Machine-learning models demand periodic updates to improve their average accuracy, exploiting novel architectures and additional data. However, a newly updated model may commit mistakes the previous model did not make. Such misclassifications are referred to as negative flips, experienced by users as a regression of performance. In this work, we show that this problem also affects robustness to adversarial examples, hindering the development of secure model update practices. In particular, when updating a model to improve its adversarial robustness, previously ineffective adversarial attacks on some inputs may become successful, causing a regression in the perceived security of the system. We propose a novel technique, named robustness-congruent adversarial training, to address this issue. It amounts to fine-tuning a model with adversarial training, while constraining it to retain higher robustness on the samples for which no adversarial example was found before the update. We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators. Our experiments on robust models for computer vision confirm that both accuracy and robustness, even if improved after model update, can be affected by negative flips, and our robustness-congruent adversarial training can mitigate the problem, outperforming competing baseline methods.
机器学习模型要求定期更新,以提高其平均准确性,利用新建筑和额外数据。然而,新更新的模型可能犯前一个模型没有犯过的错误。这种错误分类被称作负翻转,用户作为业绩的倒退经历。在这项工作中,我们表明,这一问题还影响到对抗性实例的稳健性,妨碍制定安全的模型更新做法。特别是,在更新一个模型以提高其对抗性强力的模型时,以前对一些投入的无效对抗性攻击可能会成功,导致对系统安全感知的倒退。我们为解决这一问题提出了一种叫作 “ 稳健的对立式培训 “ 的新技术。它相当于用对抗性培训对模型进行微调,同时限制它在更新前没有找到对抗性范例的样本上保持更高的稳健性。我们表明,我们的算法,更一般地说,在不倒退的限制下学习,为培训一致的估算者提供了一个有理论基础的框架。我们关于稳健的计算机视觉模型的实验证实,既准确又稳健又稳健,即使在更新模型后改进了对立性基准,也会受到消极性的影响。
Article 147
Title@2025-05-29 (4): Privacy Amplification by Structured Subsampling for Deep Differentially Private Time Series Forecasting
Title: Privacy Amplification by Structured Subsampling for Deep Differentially Private Time Series Forecasting | Datenschutzverstärkung durch strukturierte Subsampling für tief differential private Zeitreihen Forecasting | 以结构化的分抽样对深相异私人时间序列预测进行隐私放大 2502.02410v2 |
Authors: Jan Schuchardt, Mina Dalirrooyfard, Jed Guzelkabaagac, Anderson Schneider, Yuriy Nevmyvaka, Stephan Günnemann
Many forms of sensitive data, such as web traffic, mobility data, or hospital occupancy, are inherently sequential. The standard method for training machine learning models while ensuring privacy for units of sensitive information, such as individual hospital visits, is differentially private stochastic gradient descent (DP-SGD). However, we observe in this work that the formal guarantees of DP-SGD are incompatible with time-series-specific tasks like forecasting, since they rely on the privacy amplification attained by training on small, unstructured batches sampled from an unstructured dataset. In contrast, batches for forecasting are generated by (1) sampling sequentially structured time series from a dataset, (2) sampling contiguous subsequences from these series, and (3) partitioning them into context and ground-truth forecast windows. We theoretically analyze the privacy amplification attained by this structured subsampling to enable the training of forecasting models with sound and tight event- and user-level privacy guarantees. Towards more private models, we additionally prove how data augmentation amplifies privacy in self-supervised training of sequence models. Our empirical evaluation demonstrates that amplification by structured subsampling enables the training of forecasting models with strong formal privacy guarantees.
许多形式的敏感数据,如网络流量、流动数据或医院占用等,本质上是相继的。培训机器学习模式的标准方法,在确保敏感信息单位隐私(如个别医院访问)的同时,确保个人隐私的机器学习模式的标准方法,有差异的私人随机梯度下降(DP-SGD)。然而,我们在这项工作中观察到,DP-SGD的正式保障与诸如预测等特定时间序列任务不相容,因为它们依赖从一个非结构化数据集抽样的小型、非结构化批次培训所实现的隐私放大。相比之下,预测的批次是通过以下方式产生的:(1) 从数据集中抽样按顺序结构排列的时间序列,(2) 取样这些序列的连续序列,(3) 将其分割到上下文和地面的预测窗口。我们从理论上分析这种结构化的子样本所实现的隐私扩展,以便能够对预测模型进行稳妥、紧的事件和用户级隐私保障的培训。为了建立更隐秘的模型,我们进一步证明数据扩充如何在自我监督的序列模型培训中增强隐私。我们的经验性评估表明,通过结构化的子模型进行严格的预测。
Article 148
Title@2025-05-29 (4): On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
Title: On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists | On-Device Collaborative Language Modeling über eine Mischung aus Generalisten und Spezialisten | 通过通识主义者和专家混合组合的在线合作语言建模 2409.13931v4 |
Authors: Dongyang Fan, Bettina Messmer, Nikita Doikov, Martin Jaggi
On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate private learning with scarce data, Federated Learning has become a standard approach. However, it faces challenges such as computational resource heterogeneity and data heterogeneity among end users. We propose CoMiGS ($\textbf{Co}$llaborative learning with a $\textbf{Mi}$xture of $\textbf{G}$eneralists and $\textbf{S}$pecialists), the first approach to address both challenges. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is optimized using a separate validation set to ensure alignment with the target distribution. We solve our objective with alternating minimization, for which we provide a theoretical analysis. Our method shares generalist experts across users while localizing a varying number of specialist experts, thereby adapting to users’ computational resources and preserving privacy. Through extensive experiments, we show CoMiGS effectively balances general and personalized knowledge for each token generation. We demonstrate that CoMiGS remains robust against overfitting-due to the generalists’ regularizing effect-while adapting to local data through specialist expertise. We open source our codebase for collaborative LLMs.
提高隐私能力和提供个性化用户经验的能力日益受到重视。为了便利私人利用稀缺数据进行私人学习,联邦学习协会已成为一种标准做法,但它面临着计算资源差异和终端用户数据差异等挑战。我们提议使用美元(textbf{Co}Co}$xtural leaudial learning with $\ textbf{G}$xture,我们用一个理论分析来解决我们的目标。我们的方法与用户的一般专家共享,同时将不同数量的专家本地化,从而适应用户的计算资源并保护隐私。通过广泛的实验,我们的方法的一项关键创新是双级优化制定混合-Explants学习目标,即使用单独的校正组合优化路由器,以确保与目标分配保持一致。我们用一个交替最小化的方法来解决我们的目标。我们的方法与用户的普通专家专家专家专家分享,从而适应用户的计算资源并保护隐私。我们通过广泛的实验,展示CoMIGS公司有效平衡普通和个体化知识,以适应每一代的正常数据源。我们继续展示我们普通专家的开放数据库。
Article 149
Title@2025-05-29 (4): KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Title: KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction | KVzip: Query-Agnostic KV Cache-Kompression mit Kontext-Rekonstruktion | KVzip: 在背景重建中压缩缓存 2505.23416v1 |
Authors: Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song
Transformer-based large language models (LLMs) cache context as key-value (KV) pairs during inference. As context length grows, KV cache sizes expand, leading to substantial memory overhead and increased attention latency. This paper introduces KVzip, a query-agnostic KV cache eviction method enabling effective reuse of compressed KV caches across diverse queries. KVzip quantifies the importance of a KV pair using the underlying LLM to reconstruct original contexts from cached KV pairs, subsequently evicting pairs with lower importance. Extensive empirical evaluations demonstrate that KVzip reduces KV cache size by 3-4$\times$ and FlashAttention decoding latency by approximately 2$\times$, with negligible performance loss in question-answering, retrieval, reasoning, and code comprehension tasks. Evaluations include various models such as LLaMA3.1-8B, Qwen2.5-14B, and Gemma3-12B, with context lengths reaching up to 170K tokens. KVzip significantly outperforms existing query-aware KV eviction methods, which suffer from performance degradation even at a 90% cache budget ratio under multi-query scenarios.
以变换器为基础的大语言缓存模型(LLMS)作为关键值(KV)在推断过程中的对等缓存环境。随着上下文长度的扩大,KV缓存规模扩大,导致大量存储管理费用,并增加关注时间。本文件介绍KVzip,这是一个查询-分析KV缓存驱逐方法,可以在不同查询中有效再利用压缩的KV缓存。KVzip量化了KV配对的重要性,使用基本的LLMM重建原始背景,从缓存的KV配对中重建原始背景,随后驱逐重要性较低的双对。广泛的实证评估表明,KVzip将KV缓存规模减少3-4美元,使KV缓存规模减少约2美元/美元,闪电控制解码拉长约2美元,在问答、检索、推理和代码理解任务中可忽略不计的绩效损失。评估包括各种模型,如LLLAMA3.1-8B、Quen2.5-14B和Gemma3-12B,其上的背景长度高达170K表示。 KVzip显著超出现有的查询-awa KV驱逐比例。在预算下,在90-V驱逐假设下,其业绩退化为90。
Article 150
Title@2025-05-29 (4): Bidirectional predictive coding
Title: Bidirectional predictive coding | Bidirektionale vorausschauende Kodierung | 双向预测双向预测编码 2505.23415v1 |
Authors: Gaspard Oliviers, Mufeng Tang, Rafal Bogacz
Predictive coding (PC) is an influential computational model of visual learning and inference in the brain. Classical PC was proposed as a top-down generative model, where the brain actively predicts upcoming visual inputs, and inference minimises the prediction errors. Recent studies have also shown that PC can be formulated as a discriminative model, where sensory inputs predict neural activities in a feedforward manner. However, experimental evidence suggests that the brain employs both generative and discriminative inference, while unidirectional PC models show degraded performance in tasks requiring bidirectional processing. In this work, we propose bidirectional PC (bPC), a PC model that incorporates both generative and discriminative inference while maintaining a biologically plausible circuit implementation. We show that bPC matches or outperforms unidirectional models in their specialised generative or discriminative tasks, by developing an energy landscape that simultaneously suits both tasks. We also demonstrate bPC’s superior performance in two biologically relevant tasks including multimodal learning and inference with missing information, suggesting that bPC resembles biological visual inference more closely.
预测编码(PC)是大脑中视觉学习和推断的有影响力的计算模型。古典PC被建议为自上而下的基因模型,大脑积极预测即将出现的视觉输入,推断最小化预测错误。最近的研究还显示,PC可以作为一种歧视模型,感官输入以进食方式预测神经活动。然而,实验证据表明,大脑同时使用基因和歧视性推断,而单向PC模型则显示需要双向处理的任务的性能退化。在这项工作中,我们提出了双向PC(BPC),这是一种包含基因化和歧视性推断的PC(PC),同时保持生物上可信的电路执行。我们表明,BPC通过开发既适合两种任务又适合两种任务的能源景观,其单向性模型与单向性模型相匹配或超出。我们还表明,BPC在两种与生物相关的任务中表现优异,包括多式学习和与缺失信息的推断,表明BPC更接近生物视觉。
Article 151
Title@2025-05-29 (4): Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model
Title: Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model | Identifizierung und optimale nichtlineare Steuerung der Turbojet-Engine mit Koopman Eigenfunktionsmodell | 使用 Koopman Eigen功能模型对涡轮喷气发动机进行最佳非线性识别和最佳非线性控制 2505.10438v2 |
Authors: David Grasev
Gas turbine engines represent complex highly nonlinear dynamical systems. Deriving their physics-based models can be challenging as it requires performance characteristics, that are not always available, and one often has to make many simplifying assumptions. In this paper, the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models are discussed and addressed by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics were estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics was mapped into an optimally constructed Koopman eigenfunction space. The process included eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model was validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator were then designed in the eigenfunction space and compared to the classical and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure allowed targeting individual modes during the optimization process, resulting in a better performance tuning. The results showed that the Koopman-based controller outperformed the other benchmark controllers in both reference tracking and disturbance rejection, under sea-level and varying flight conditions, due to its global nature.
燃气涡轮机引擎代表复杂的高度非线性动态系统。 开发基于物理的模型可能具有挑战性,因为它需要性能特征,这些特征并非总有,而且往往需要做出许多简化的假设。 在本文件中,根据闭环控制下的标准引擎操作所收集的数据,使用基于闭环控制下标准发动机操作所收集的数据,对常规实验方法得出组件级和局部线性参数分布式模型的局限性进行讨论和解决。转动动态使用稀疏的非线性动态识别法来估计。随后,该动态的自主部分被映射成一个最佳构造的库普曼天文功能空间。这一过程包括使用美术算法和时间预测来进行精精精精精度优化,然后是基于梯度的叶机能识别。由此形成的Koopman模型在内部参考级模型模型模型模型模型上得到验证。 一个全球最佳的非线性反馈控制器和一个卡尔曼天文估计仪,然后在机能空间中设计出一个全球最佳的非线性阻力控制器,与基于经典和增益的成型成型成型成型成型成型成型成型的成型成型成型的成型的成型控制器。 结构结构允许在优化过程中将单个的飞行成型调整结果,在测试后,在优化后,在测试后,在优化后,在优化后,在优化后将每个飞行成型的飞行成型后,在优化后,在优化后,在优化后,在调整后,在调整后,在调整成型后,在调整后,在调整后,在调整后,在调整后,在调整后,在调整后,在调整后,在调整后,在优化后,在调整后,在调整后,在调整后,在调整了其他的飞行成型后,在调整成型后制式操作制式后,在调整后,在调整成型后,在调整后,在调整后,在调整后,在调整后,在调整后制式的飞行。
Article 152
Title@2025-05-29 (4): Buffer-free Class-Incremental Learning with Out-of-Distribution Detection
Title: Buffer-free Class-Incremental Learning with Out-of-Distribution Detection | Pufferfreies Klassen-Inkrementelles Lernen mit Out-of-Distribution Detection | 含有扩散外检测检测的无缓缓度免费类级学习 2505.23412v1 |
Authors: Srishti Gupta, Daniele Angioni, Maura Pintor, Ambra Demontis, Lea Schönherr, Battista Biggio, Fabio Roli
Class-incremental learning (CIL) poses significant challenges in open-world scenarios, where models must not only learn new classes over time without forgetting previous ones but also handle inputs from unknown classes that a closed-set model would misclassify. Recent works address both issues by (i)~training multi-head models using the task-incremental learning framework, and (ii) predicting the task identity employing out-of-distribution (OOD) detectors. While effective, the latter mainly relies on joint training with a memory buffer of past data, raising concerns around privacy, scalability, and increased training time. In this paper, we present an in-depth analysis of post-hoc OOD detection methods and investigate their potential to eliminate the need for a memory buffer. We uncover that these methods, when applied appropriately at inference time, can serve as a strong substitute for buffer-based OOD detection. We show that this buffer-free approach achieves comparable or superior performance to buffer-based methods both in terms of class-incremental learning and the rejection of unknown samples. Experimental results on CIFAR-10, CIFAR-100 and Tiny ImageNet datasets support our findings, offering new insights into the design of efficient and privacy-preserving CIL systems for open-world settings.
在开放世界情景中,各种模型不仅必须长期学习新班级,而不能忘记以前班级,而且还必须处理封闭型模式可能错误分类的未知班级的投入。最近的工作通过(一) 培训使用任务强化学习框架的多头模型,以及(二) 预测任务身份,使用分配以外的探测器(OOD)检测器。虽然有效,但后者主要依靠与过去数据的记忆缓冲联合培训,引起对隐私、可扩缩性和增加培训时间的关切。我们在本文件中深入分析了HOC OOD后检测方法,并调查其消除记忆缓冲需求的潜力。我们发现,这些方法,如果在回溯时间适当应用,可以有力地替代缓冲性OOD检测。我们表明,这种缓冲性方法在课堂内学习和拒绝未知样本方面都取得了类似或优异于缓冲性方法的绩效。我们在CIFAR-10、CIFAR-100和Tinal-Refreialal 图像网络系统中的实验结果,支持了我们对CIFAR-10、CIFAR-G-100和Timliviewalalal-deal-ILSetal Devely Devely dislation sy sy sse real sy sy sy surviewmal sy systemal sy sy sy sy systection sy sy sy sy sy sy sy sy sy sy sy sy sy sy sy sy sy sy sy sy sy sy symal sy sy sy systections sy sy systections sy sy sy system sy sy sy sy sy sy sy sy sy sy sy sy system sy sy sy sy system system system system systemts.sm system system system systems.s.s.
Article 153
Title@2025-05-29 (4): Video Editing for Audio-Visual Dubbing
Title: Video Editing for Audio-Visual Dubbing | Videobearbeitung für Audio-Visual-Dubbing | 音像视频编辑 2505.23406v1 |
Authors: Binyamin Manela, Sharon Gannot, Ethan Fetyaya
Visual dubbing, the synchronization of facial movements with new speech, is crucial for making content accessible across different languages, enabling broader global reach. However, current methods face significant limitations. Existing approaches often generate talking faces, hindering seamless integration into original scenes, or employ inpainting techniques that discard vital visual information like partial occlusions and lighting variations. This work introduces EdiDub, a novel framework that reformulates visual dubbing as a content-aware editing task. EdiDub preserves the original video context by utilizing a specialized conditioning scheme to ensure faithful and accurate modifications rather than mere copying. On multiple benchmarks, including a challenging occluded-lip dataset, EdiDub significantly improves identity preservation and synchronization. Human evaluations further confirm its superiority, achieving higher synchronization and visual naturalness scores compared to the leading methods. These results demonstrate that our content-aware editing approach outperforms traditional generation or inpainting, particularly in maintaining complex visual elements while ensuring accurate lip synchronization.
视觉遮盖,即面部遮盖与新语言同步,对于让不同语言的内容能够无障碍使用至关重要,使全球范围更加广泛。然而,目前的方法面临巨大的限制。现有的方法往往产生说话面孔,阻碍无缝融入原始场景,或采用丢弃重要视觉信息的油漆技术,如部分隔离和照明变异。这项工作引入了EdiDub,这是一个将视觉遮盖重新配置为内容识别编辑任务的新框架。EdiDub通过利用专门设置确保忠实和准确的修改而不是仅仅复制来保存原始视频环境。在多个基准上,包括具有挑战性的隐蔽滑坡数据集,EdiDub显著改进了身份保护和同步。人类评估进一步证实了其优越性,实现了更高的同步性和视觉自然性分数,而与主要方法相比,这些结果表明,我们的内容觉编辑方法超越了传统生成或画面,特别是在保持复杂的视觉元素的同时确保准确的唇同步。
Article 154
Title@2025-05-29 (4): A Refined Analysis of UCBVI
Title: A Refined Analysis of UCBVI | Eine raffinierte Analyse von UCBVI | UCBVI的精细分析 2502.17370v2 |
Authors: Simone Drago, Marco Mussi, Alberto Maria Metelli
In this work, we provide a refined analysis of the UCBVI algorithm (Azar et al., 2017), improving both the bonus terms and the regret analysis. Additionally, we compare our version of UCBVI with both its original version and the state-of-the-art MVP algorithm. Our empirical validation demonstrates that improving the multiplicative constants in the bounds has significant positive effects on the empirical performance of the algorithms.
在这项工作中,我们提供了对UCBVI算法(Azar等人,2017年)的精细分析,改进了奖金条件和遗憾分析。此外,我们比较了我们的UCBVI版本及其原始版本和最新的MVP算法。我们的经验验证表明,改进界限中的多倍常数对算法的经验性表现具有重大的积极影响。
Article 155
Title@2025-05-29 (4): Closed-form Solutions: A New Perspective on Solving Differential Equations
Title: Closed-form Solutions: A New Perspective on Solving Differential Equations | Closed-form Lösungen: Eine neue Perspektive zur Lösung von Differentialgleichungen | 封闭式解决办法:解决差异等量的新视角 2405.14620v3 |
Authors: Shu Wei, Yanjie Li, Lina Yu, Weijun Li, Min Wu, Linjun Sun, Jufeng Han, Yan Pang
The quest for analytical solutions to differential equations has traditionally been constrained by the need for extensive mathematical expertise. Machine learning methods like genetic algorithms have shown promise in this domain, but are hindered by significant computational time and the complexity of their derived solutions. This paper introduces SSDE (Symbolic Solver for Differential Equations), a novel reinforcement learning-based approach that derives symbolic closed-form solutions for various differential equations. Evaluations across a diverse set of ordinary and partial differential equations demonstrate that SSDE outperforms existing machine learning methods, delivering superior accuracy and efficiency in obtaining analytical solutions.
对不同方程式的分析性解决办法的寻求历来受到对广泛数学专门知识需要的制约,基因算法等机械学习方法在这一领域显示了希望,但受到大量计算时间及其衍生解决方案复杂性的阻碍,本文件介绍了SDE(不同等式的Symbolic Solveer),这是一种新型强化学习法,为各种差异方程式提供象征性的封闭式解决方案。 对各种普通和部分差异方程式的评价表明,SSSDE优于现有机器学习方法,在获得分析解决方案方面提供了更高的准确性和效率。
Article 156
Title@2025-05-29 (4): Subgroups Matter for Robust Bias Mitigation
Title: Subgroups Matter for Robust Bias Mitigation | Untergruppen Materie für robuste Bias Mitigation | 稳健的Biust Bias 减轻风险的分组事项 2505.21363v2 |
Authors: Anissa Alloula, Charles Jones, Ben Glocker, Bartłomiej W. Papież
Despite the constant development of new bias mitigation methods for machine learning, no method consistently succeeds, and a fundamental question remains unanswered: when and why do bias mitigation techniques fail? In this paper, we hypothesise that a key factor may be the often-overlooked but crucial step shared by many bias mitigation methods: the definition of subgroups. To investigate this, we conduct a comprehensive evaluation of state-of-the-art bias mitigation methods across multiple vision and language classification tasks, systematically varying subgroup definitions, including coarse, fine-grained, intersectional, and noisy subgroups. Our results reveal that subgroup choice significantly impacts performance, with certain groupings paradoxically leading to worse outcomes than no mitigation at all. Our findings suggest that observing a disparity between a set of subgroups is not a sufficient reason to use those subgroups for mitigation. Through theoretical analysis, we explain these phenomena and uncover a counter-intuitive insight that, in some cases, improving fairness with respect to a particular set of subgroups is best achieved by using a different set of subgroups for mitigation. Our work highlights the importance of careful subgroup definition in bias mitigation and presents it as an alternative lever for improving the robustness and fairness of machine learning models.
尽管不断为机器学习开发新的减少偏见方法,但没有方法始终成功,还有一个根本问题仍未解答:减少偏见技术何时和为何失败?在本文中,我们假设一个关键因素可能是许多减轻偏见方法(即分组的定义)经常被忽略但至关重要的步骤:分组的定义。为了调查这一点,我们全面评估了多种愿景和语言分类任务中最先进的减轻偏见方法,系统化的不同分组定义,包括粗糙、细细的、交叉的和吵闹的分组。我们的结果显示分组选择对业绩产生了重大影响,某些分组的偏差导致的结果反常,而不是完全没有缓解。我们的调查结果表明,观察一组分组之间的差异并不是利用这些分组进行减缓的充分理由。我们通过理论分析,解释这些现象并找出反直觉的洞察力,即在某些情况下,通过使用不同的分组来改进对特定分组的公平性是最好的办法。我们的工作强调了审慎的分组定义在减轻偏见方面的重要性,并把它作为提高机器学习模型的稳健性和公正性的替代杠杆。
Article 157
Title@2025-05-29 (4): Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments
Title: Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments | Entschlüsselung des Interplays zwischen Übertragungseffekten und Belohnungsautokorrelationen in Switchback-Experimenten | 在回转实验中解开结转效应与回转回实验中回调自动关系之间的交互作用 2403.17285v3 |
Authors: Qianglin Wen, Chengchun Shi, Ying Yang, Niansheng Tang, Hongtu Zhu
A/B testing has become the gold standard for policy evaluation in modern technological industries. Motivated by the widespread use of switchback experiments in A/B testing, this paper conducts a comprehensive comparative analysis of various switchback designs in Markovian environments. Unlike many existing works which derive the optimal design based on specific and relatively simple estimators, our analysis covers a range of state-of-the-art estimators developed in the reinforcement learning (RL) literature. It reveals that the effectiveness of different switchback designs depends crucially on (i) the size of the carryover effect and (ii) the auto-correlations among reward errors over time. Meanwhile, these findings are estimator-agnostic, i.e., they apply to most RL estimators. Based on these insights, we provide a workflow to offer guidelines for practitioners on designing switchback experiments in A/B testing.
A/B测试已成为现代技术产业政策评价的黄金标准,由于在A/B测试中广泛使用回转实验,本文件对Markovian环境中的各种回转设计进行了全面比较分析。与许多现有工程不同,这些工程根据具体和相对简单的估测器得出最佳设计,我们的分析涵盖在强化学习文献(RL)中开发的一系列最先进的估计器。它显示,不同的回转设计的有效性关键取决于(一) 转转效应的大小和(二) 随时间推移奖励错误之间的自动反差。同时,这些结论是估计式的,也就是说,这些结果适用于大多数RL估计器。基于这些认识,我们提供了一个工作流程,为A/B测试中设计回转实验的从业者提供指导方针。
Article 158
Title@2025-05-29 (4): Dynamic Estimation Loss Control in Variational Quantum Sensing via Online Conformal Inference
Title: Dynamic Estimation Loss Control in Variational Quantum Sensing via Online Conformal Inference | Dynamische Abschätzungsverlustkontrolle bei der variationalen Quantensensing über Online-Konforme Inferenz | 通过在线非正式推断在变化量测量中动态估计损失控制 2505.23389v1 |
Authors: Ivana Nikoloska, Hamdi Joudeh, Ruud van Sloun, Osvaldo Simeone
Quantum sensing exploits non-classical effects to overcome limitations of classical sensors, with applications ranging from gravitational-wave detection to nanoscale imaging. However, practical quantum sensors built on noisy intermediate-scale quantum (NISQ) devices face significant noise and sampling constraints, and current variational quantum sensing (VQS) methods lack rigorous performance guarantees. This paper proposes an online control framework for VQS that dynamically updates the variational parameters while providing deterministic error bars on the estimates. By leveraging online conformal inference techniques, the approach produces sequential estimation sets with a guaranteed long-term risk level. Experiments on a quantum magnetometry task confirm that the proposed dynamic VQS approach maintains the required reliability over time, while still yielding precise estimates. The results demonstrate the practical benefits of combining variational quantum algorithms with online conformal inference to achieve reliable quantum sensing on NISQ devices.
量子遥感利用非古典效应来克服古典传感器的局限性,其应用范围从引力波探测到纳米级成像等,然而,在噪音和取样装置上建立的实际量子传感器面临重大的噪音和取样限制,而目前的变量测量方法缺乏严格的性能保障。本文建议为VQS建立一个在线控制框架,以动态更新变量参数,同时在估算中提供确定性误差栏。通过利用在线符合性推论技术,该方法产生有保障长期风险水平的顺序估算数据集。量子磁测量任务实验证实,拟议的VQS动态方法在一段时间内保持必要的可靠性,同时仍然得出准确的估计数。结果显示,将变量算法与在线一致性推导法相结合,以在 NISQ设备上实现可靠的量子测量的实际好处。
Article 159
Title@2025-05-29 (4): BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction
Title: BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction | BatteryLife: Ein umfassender Datensatz und Benchmark für die Vorhersage der Akkulaufzeit | 电池寿命:电池寿命预测综合数据集和基准 2502.18807v4 |
Authors: Ruifeng Tan, Weixiang Hong, Jiayue Tang, Xibin Lu, Ruijun Ma, Xiang Zheng, Jia Li, Jiaqiang Huang, Tong-Yi Zhang
Battery Life Prediction (BLP), which relies on time series data produced by battery degradation tests, is crucial for battery utilization, optimization, and production. Despite impressive advancements, this research area faces three key challenges. Firstly, the limited size of existing datasets impedes insights into modern battery life data. Secondly, most datasets are restricted to small-capacity lithium-ion batteries tested under a narrow range of diversity in labs, raising concerns about the generalizability of findings. Thirdly, inconsistent and limited benchmarks across studies obscure the effectiveness of baselines and leave it unclear if models popular in other time series fields are effective for BLP. To address these challenges, we propose BatteryLife, a comprehensive dataset and benchmark for BLP. BatteryLife integrates 16 datasets, offering a 2.5 times sample size compared to the previous largest dataset, and provides the most diverse battery life resource with batteries from 8 formats, 59 chemical systems, 9 operating temperatures, and 421 charge/discharge protocols, including both laboratory and industrial tests. Notably, BatteryLife is the first to release battery life datasets of zinc-ion batteries, sodium-ion batteries, and industry-tested large-capacity lithium-ion batteries. With the comprehensive dataset, we revisit the effectiveness of baselines popular in this and other time series fields. Furthermore, we propose CyclePatch, a plug-in technique that can be employed in various neural networks. Extensive benchmarking of 18 methods reveals that models popular in other time series fields can be unsuitable for BLP, and CyclePatch consistently improves model performance establishing state-of-the-art benchmarks. Moreover, BatteryLife evaluates model performance across aging conditions and domains. BatteryLife is available at https://github.com/Ruifeng-Tan/BatteryLife.
电池寿命预测(BLP)依赖电池降解测试产生的时间序列数据,对电池的使用、优化和生产至关重要。尽管取得了令人印象深刻的进步,但这一研究领域面临三大挑战。首先,现有数据集的规模有限,妨碍了对现代电池寿命数据的洞察力。第二,大多数数据集仅限于在实验室中以范围很窄的多种形式测试的小容量锂离子电池,使人们对调查结果的可概括性产生担忧。第三,各研究之间不一致和有限的基准模糊了基线的有效性,并使人们不清楚其他时间序列中流行的模型是否对电池的使用有效。为了应对这些挑战,我们提议电池服务(Belly Liferation),一个全面的数据集(BLP),一个全面的数据集(Telly Lifer),一个全面的数据集(2.5倍),比以前最大的数据集(2.5倍),提供最多样化的电池生命资源,从8种格式、59个化学系统、9个运行温度和421个充电/排电协议,包括实验室和工业测试。 值得注意的是,电池生命是释放电池生命数据系列的第一至排放期数据系列(Bliferreal-real-de),我们正在测试其他的碱- Streal-reax-de-de-deal-de-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-deal-de Streal-deal-de-deal-deal-de-deal-de-de-deal-destration-destration-deal-de-destration-destration-destration-de 。我们使用了这个系统-deal-deal-stal-de-real-st-de-de-de-de-de-de-de-de-deal-deal-deal-deal-deal-deal-de-de-de-deal-deal-deal-deal-deal-deal-deal-deal-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-
Article 160
Title@2025-05-29 (4): A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
Title: A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers | Eine statistische Lernperspektive zur halbdualen Neural Optimal Transport Solvers | 半对半对半的神经神经优化运输解决方案的统计学习视角 2502.01310v2 |
Authors: Roman Tarasov, Petr Mokrov, Milena Gazdieva, Evgeny Burnaev, Alexander Korotin
Neural network-based optimal transport (OT) is a recent and fruitful direction in the generative modeling community. It finds its applications in various fields such as domain translation, image super-resolution, computational biology and others. Among the existing OT approaches, of considerable interest are adversarial minimax solvers based on semi-dual formulations of OT problems. While promising, these methods lack theoretical investigation from a statistical learning perspective. Our work fills this gap by establishing upper bounds on the generalization error of an approximate OT map recovered by the minimax quadratic OT solver. Importantly, the bounds we derive depend solely on some standard statistical and mathematical properties of the considered functional classes (neural nets). While our analysis focuses on the quadratic OT, we believe that similar bounds could be derived for general OT case, paving the promising direction for future research.
以神经网络为基础的最佳运输(OT)是基因模型界最近的一个富有成果的方向。它发现它在各个领域的应用,例如域译、图像超分辨率、计算生物学和其他领域。在现有OT方法中,相当感兴趣的是基于半双向的OT问题配方的对抗性微轴求解器。这些方法虽然很有希望,但缺乏从统计学习角度进行理论调查的理论性研究。我们的工作填补了这一空白,确定了迷你麦角四极OT求解器所恢复的近似OT地图的概括性误差的上限。重要的是,我们获得的界限完全取决于所考虑的功能类(内网)的一些标准统计和数学特性。虽然我们的分析侧重于二次式OT,但我们认为,一般OT案例可以得出类似的界限,为未来的研究铺平了有希望的方向。
Article 161
Title@2025-05-29 (4): Automated Modeling Method for Pathloss Model Discovery
Title: Automated Modeling Method for Pathloss Model Discovery | Automatisierte Modellierungsmethode für Pathloss Model Discovery | 病理模型发现自动建模方法 2505.23383v1 |
Authors: Ahmad Anaqreh, Shih-Kai Chou, Mihael Mohorčič, Carolina Fortuna
Modeling propagation is the cornerstone for designing and optimizing next-generation wireless systems, with a particular emphasis on 5G and beyond era. Traditional modeling methods have long relied on statistic-based techniques to characterize propagation behavior across different environments. With the expansion of wireless communication systems, there is a growing demand for methods that guarantee the accuracy and interoperability of modeling. Artificial intelligence (AI)-based techniques, in particular, are increasingly being adopted to overcome this challenge, although the interpretability is not assured with most of these methods. Inspired by recent advancements in AI, this paper proposes a novel approach that accelerates the discovery of path loss models while maintaining interpretability. The proposed method automates the model formulation, evaluation, and refinement, facilitating model discovery. We evaluate two techniques: one based on Deep Symbolic Regression, offering full interpretability, and the second based on Kolmogorov-Arnold Networks, providing two levels of interpretability. Both approaches are evaluated on two synthetic and two real-world datasets. Our results show that Kolmogorov-Arnold Networks achieve R^2 values close to 1 with minimal prediction error, while Deep Symbolic Regression generates compact models with moderate accuracy. Moreover, on the selected examples, we demonstrate that automated methods outperform traditional methods, achieving up to 75% reduction in prediction errors, offering accurate and explainable solutions with potential to increase the efficiency of discovering next-generation path loss models.
建模传播是设计和优化下一代无线系统的基石,特别侧重于5G及以后时代。传统的建模方法长期依赖基于统计数据的技术来描述不同环境的传播行为。随着无线通信系统的扩展,对保证建模准确性和互操作性的方法的需求日益增长。人造智能(AI)技术,特别是人造智能(AI)技术,正日益被采用来克服这一挑战,尽管这些方法大多不能保证解释性。根据最近AI的进步,本文件提出了一种新颖的方法,加快了路径丢失模型的发现,同时保持了可解释性。拟议的方法将模型的制定、评价和完善自动化地结合了不同环境的传播行为特征。随着无线通信系统的扩展,我们评估了两种技术:一种基于深度反射力的模型,提供了完全的可解释性,第二种基于科尔莫托洛夫-奥尔纳德网络,提供了两种可解释性水平的可解释性方法。两种方法都是在两种合成和两种真实世界数据集的基础上加以评价的。我们的结果显示,科尔莫戈洛夫-阿诺尔德网络实现了接近R2的值,同时保持可解释性2,将模型自动地与模型连接,为模型,便于预测性错误。我们用深度分析方法展示了降低75的方法,同时展示了一种方法,用最精确的方法,用最精确的方法展示的方法展示的方式展示的方法展示了降低的方法,用。
Article 162
Title@2025-05-29 (4): Tracking Progress Towards Sustainable Development Goal 6 Using Satellite Imagery
Title: Tracking Progress Towards Sustainable Development Goal 6 Using Satellite Imagery | Fortschritte auf dem Weg zu einer nachhaltigen Entwicklung verfolgen Ziel 6 Nutzung von Satellitenbildern | 利用卫星图像跟踪可持续发展目标6的进展情况 2411.19093v2 |
Authors: Othmane Echchabi, Aya Lahlou, Nizar Talty, Josh Malcolm Manto, Ka Leung Lam
Clean water and sanitation are essential for health, well-being, and sustainable development, yet significant global disparities persist. Although the United Nations’ Sustainable Development Goal (SDG) 6 clearly defines targets for universal access to clean water and sanitation, limitations in data coverage and openness impede accurate tracking of progress in many countries. To bridge these gaps, this study integrates Afrobarometer survey data, satellite imagery from Landsat 8 and Sentinel-2, and advanced deep learning techniques using Meta’s self-supervised Distillation with No Labels (DINO) model to develop a modeling framework for evaluating access to piped water and sewage system across diverse African regions. The modeling framework achieved notable accuracy, with over 96% for piped water and 97% for sewage system access classification. When combined with geospatial population data, validation against official statistics from the United Nations Joint Monitoring Program demonstrated high concordance at the national scale (R2 of 0.95 for piped water access and R2 of 0.85 for sewage system access). The national-level estimates can represent SDG Indicators 6.1.1 and 6.2.1. This approach provides policymakers and stakeholders with an effective, scalable, and cost-efficient tool to pinpoint underserved areas requiring targeted intervention. The methodology developed herein can be adapted for assessing other infrastructure-related SDGs, promoting enhanced monitoring and informed decision-making towards achieving global sustainability objectives.
虽然联合国可持续发展目标(SDG)6明确规定了普遍获得清洁饮用水和卫生设施的目标,但数据覆盖面和开放程度的局限性妨碍了对许多国家进展情况的准确跟踪。为弥合这些差距,本研究综合了非洲晴雨表调查数据、来自Landsat 8和Sentinel-2的卫星图像,以及利用Meta自行监督的无拉贝(DINO)蒸馏模型的先进深层次学习技术,以制定一个模型框架,用以评价非洲各区域获得自来水和下水道系统的情况。示范框架达到了显著的准确性,自来水占96%以上,污水系统使用分类占97%。在与地理空间人口数据相结合时,对照联合国联合监测方案官方统计数据的验证表明,国家规模高度一致(用于管道供水的0.95R2和用于污水系统获取的0.85R2)。国家一级估算可代表SDG指标6.1.1和6.2.1。这一方法为决策者和利益攸关方提供了有效的、可扩展的、可调整的和成本效率更高的基础设施,以实现全球决策目标的更有针对性的工具。
Article 163
Title@2025-05-29 (4): Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models
Title: Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models | Meta-Learning-Ansätze für Sprecher-Abhängige Sprachmüdigkeitsmodelle | 议长 – – 独立的声音 “ fatigue “ 模式的元学习方法 2505.23378v1 |
Authors: Roseline Polle, Agnes Norbury, Alexandra Livia Georgescu, Nicholas Cummins, Stefano Goria
Speaker-dependent modelling can substantially improve performance in speech-based health monitoring applications. While mixed-effect models are commonly used for such speaker adaptation, they require computationally expensive retraining for each new observation, making them impractical in a production environment. We reformulate this task as a meta-learning problem and explore three approaches of increasing complexity: ensemble-based distance models, prototypical networks, and transformer-based sequence models. Using pre-trained speech embeddings, we evaluate these methods on a large longitudinal dataset of shift workers (N=1,185, 10,286 recordings), predicting time since sleep from speech as a function of fatigue, a symptom commonly associated with ill-health. Our results demonstrate that all meta-learning approaches tested outperformed both cross-sectional and conventional mixed-effects models, with a transformer-based method achieving the strongest performance.
依赖议长的建模可以大幅提高语音健康监测应用的绩效。虽然使用混合效应模型通常用于对演讲者进行适应,但需要为每次新观察进行费用昂贵的计算再培训,使其在生产环境中不切实际。我们将此任务改造成一个元学习问题,并探索三种日益复杂的方法:基于共同的远程模型、原型网络和基于变压器的序列模型。我们使用预先培训的语音嵌入,对轮班工人的大型纵向数据集(N=1,185、10,286录音)进行评估,预测演讲后睡眠时间的疲劳功能,这种症状通常与健康不良相关。我们的结果表明,所有经测试的元学习方法都优于跨部门和常规混合效应模型,而基于变压器的方法取得最强的性能。
Article 164
Title@2025-05-29 (4): GWQ: Gradient-Aware Weight Quantization for Large Language Models
Title: GWQ: Gradient-Aware Weight Quantization for Large Language Models | GWQ: Gradient-Aware Weight Quantization für große Sprachmodelle | GWQ: 大语言模型的渐变软件重量 2411.00850v4 |
Authors: Yihua Shao, Yan Gu, Siyu Chen, Haiyang Liu, Zixian Zhu, Zijian Ling, Minxi Yan, Ziyang Yan, Chenyu Zhang, Michele Magno, Haotong Qin, Yan Wang, Jingcai Guo, Ling Shao, Hao Tang
Large language models (LLMs) show impressive performance in solving complex language tasks. However, its large number of parameters presents significant challenges for the deployment. So, compressing LLMs to low bits can enable to deploy on resource-constrained devices. To address this problem, we propose gradient-aware weight quantization (GWQ), the first quantization approach for low-bit weight quantization that leverages gradients to localize outliers, requiring only a minimal amount of calibration data for outlier detection. GWQ retains the top 1\% outliers preferentially at FP16 precision, while the remaining non-outlier weights are stored in a low-bit. We widely evaluate GWQ on different task include language modeling, grounding detection, massive multitask language understanding and vision-language question and answering. Results show that models quantified by GWQ performs better than other quantization method. During quantization process, GWQ only need one calibration set to realize effective quant. Also, GWQ achieves 1.2x inference speedup in comparison to the original model and effectively reduces the inference memory.
大型语言模型(LLMS)在解决复杂的语言任务方面表现出令人印象深刻的成绩。然而,它的大量参数对部署提出了巨大的挑战。因此,将LLMS压缩到低位位位上可以使资源受限制的装置得到部署。为了解决这个问题,我们提议了低位重量四分法(GWQ),即低位重量四分法(GWQ),这是利用梯度使外层局部化的首个量化方法,只需要最低限度的校准数据来进行外层检测。GWQ在FP16精确度上优先保留顶端的1外端值,而剩余的非外层重量则储存在低位。我们广泛评价GWQ的不同任务包括语言建模、地基探测、大型多任务语言理解和视觉语言问题及回答。结果显示,GWQ量化的模型比其他四分法方法效果更好。在四分法过程中,GWQ只需要一个校准装置来实现有效的夸度。此外,GWQ在与原始模型相比,实现了1.2x的推力速度,并有效地减少了内存。
Article 165
Title@2025-05-29 (4): Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
Title: Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective | Das Nachdenken über die Auswahlkriterien bei der Stärkung des Lernens für LLM-Reasoning: Eine Kompetenz-Schwierigkeits-Alignment-Perspektive | 重新思考在加强学习学习中为LLM 合理性提供强化学习的抽样标准:能力-困难-协调观点 2505.17652v2 |
Authors: Deyang Kong, Qi Guo, Xiangyu Xi, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, Wei Ye
Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by scheduling problems based on problem difficulties. However, these approaches suffer from unstable and biased estimations of problem difficulty and fail to capture the alignment between model competence and problem difficulty in RL training, leading to suboptimal results. To tackle these limitations, this paper introduces $\textbf{C}$ompetence-$\textbf{D}$ifficulty $\textbf{A}$lignment $\textbf{S}$ampling ($\textbf{CDAS}$), which enables accurate and stable estimation of problem difficulties by aggregating historical performance discrepancies of problems. Then the model competence is quantified to adaptively select problems whose difficulty is in alignment with the model’s current competence using a fixed-point system. Experimental results across a range of challenging mathematical benchmarks show that CDAS achieves great improvements in both accuracy and efficiency. CDAS attains the highest average accuracy against baselines and exhibits significant speed advantages compared to Dynamic Sampling, a competitive strategy in DAPO, which is 2.33 times slower than CDAS.
强化学习展示了提高大语言模型推理能力的潜力,然而,在推出阶段很难推广低抽样效率,因为现有方法试图通过根据问题困难安排问题列表来提高效率;然而,这些方法存在问题难度的不稳定和偏差估计,无法反映模型能力与问题培训困难之间的吻合,导致结果不尽理想。为了克服这些限制,本文件采用了$\ textbf{C}$offompentence-$\ textbf{D}$culticy $textbf{A}$clucy collemination $\ textbf{S}$样本(textbf{S}$)试图提高效率,通过汇集问题的历史性能差异,准确和稳定地估计问题困难。然后,模型能力量化为适应性选择的问题,这些问题的难度与目前使用固定点系统的能力相符。 一系列具有挑战性的数学基准的实验结果显示,CDAS在准确性和效率两方面都取得了很大的改进。CDAS在基准方面达到了最高的平均精确度,并展示了比CAS慢速度优势,而DA33比CPO具有竞争力的战略是SDA33。
Article 166
Title@2025-05-29 (4): Dynamic Spectral Backpropagation for Efficient Neural Network Training
Title: Dynamic Spectral Backpropagation for Efficient Neural Network Training | Dynamische Spektral-Backpropagation für effizientes Neural-Netzwerk-Training | 促进高效神经网络培训的动态光谱后方通信 2505.23369v1 |
Authors: Mannmohan Muthuraman
Dynamic Spectral Backpropagation (DSBP) enhances neural network training under resource constraints by projecting gradients onto principal eigenvectors, reducing complexity and promoting flat minima. Five extensions are proposed, dynamic spectral inference, spectral architecture optimization, spectral meta learning, spectral transfer regularization, and Lie algebra inspired dynamics, to address challenges in robustness, fewshot learning, and hardware efficiency. Supported by a third order stochastic differential equation (SDE) and a PAC Bayes limit, DSBP outperforms Sharpness Aware Minimization (SAM), Low Rank Adaptation (LoRA), and Model Agnostic Meta Learning (MAML) on CIFAR 10, Fashion MNIST, MedMNIST, and Tiny ImageNet, as demonstrated through extensive experiments and visualizations. Future work focuses on scalability, bias mitigation, and ethical considerations.
在资源限制下,动态光谱反后推进(DSBP)通过预测主要成分器的梯度、降低复杂性和促进平板微型微粒,增强神经网络培训;提议了五个扩展,即动态光谱推断、光谱结构优化、光谱元学习、光谱传输正规化和立叶代数激励动态,以应对稳健性、微小学习和硬件效率方面的挑战。在第三顺序分异差方程和PAC贝耶限制的支持下,DSBP在CIFAR 10、时装MIS、MMDMISIS和Tiny图像网络上,通过广泛的实验和视觉化来显示,DSBP优于敏化意识最小化(SAM)、低品位适应(LORA)和模型Agnictic Muta Learning(MAMML),未来工作的重点是可扩展性、减少偏见和道德考虑。
Article 167
Title@2025-05-29 (4): Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs
Title: Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs | Graph of Records: Steigerung der retrieval Augmented Generation für Langkontext-Zusammenfassung mit Graphen | 记录图图:用图表进行长文本摘要的推进检索增量生成器 2410.11001v2 |
Authors: Haozhen Zhang, Tao Feng, Jiaxuan You
Retrieval-augmented generation (RAG) has revitalized Large Language Models (LLMs) by injecting non-parametric factual knowledge. Compared with long-context LLMs, RAG is considered an effective summarization tool in a more concise and lightweight manner, which can interact with LLMs multiple times using diverse queries to get comprehensive responses. However, the LLM-generated historical responses, which contain potentially insightful information, are largely neglected and discarded by existing approaches, leading to suboptimal results. In this paper, we propose $\textit{graph of records}$ ($\textbf{GoR}$), which leverages historical responses generated by LLMs to enhance RAG for long-context global summarization. Inspired by the $\textit{retrieve-then-generate}$ paradigm of RAG, we construct a graph by establishing an edge between the retrieved text chunks and the corresponding LLM-generated response. To further uncover the intricate correlations between them, GoR features a $\textit{graph neural network}$ and an elaborately designed $\textit{BERTScore}$-based objective for self-supervised model training, enabling seamless supervision signal backpropagation between reference summaries and node embeddings. We comprehensively compare GoR with 12 baselines across four long-context summarization datasets, and the results indicate that our proposed method reaches the best performance ($\textit{e.g.}$, 15%, 8%, and 19% improvement over retrievers w.r.t. Rouge-L, Rouge-1, and Rouge-2 on the WCEP dataset). Extensive experiments further demonstrate the effectiveness of GoR.
Retrieval- 放大生成(RAG) 通过注入非参数事实知识,使大语言模型{ LLMs (LLMs) 注入了非参数事实知识。与长文本LLMs相比,RAG被视为一种更简便和轻量化的有效总和工具,它可以与LLMs多次互动,使用不同的查询来获得全面答复。然而,LLLM 生成的历史响应,包含潜在的深刻信息,在很大程度上被现有方法所忽视和抛弃,导致低于最佳结果。在本文中,我们提议$\textit{记录图}$($\ textb{RRR}$),利用LLMS的历史性回应来提高RAG的长文本全球总和化。受 $\ textitalite{reat- generate} 模式的启发,我们通过在回收的文本块和相应的LLMRRRRS建立边缘关系来构建一个图表。GRF_Retrietrealation, 将一个基于 net netnal netroduction netroal 网络的 Net netw} $ $, 和一个精细化的自我化的模型, SIal deal dealational deal deal dealationalational deal dislational deal dislational disl dald the thewegald slod the weal be weald the wealdaldald supaldald supald sild.
Article 168
Title@2025-05-29 (4): Guarantees of a Preconditioned Subgradient Algorithm for Overparameterized Asymmetric Low-rank Matrix Recovery
Title: Guarantees of a Preconditioned Subgradient Algorithm for Overparameterized Asymmetric Low-rank Matrix Recovery | Garantien eines vorkonditionierten Subgradienten Algorithmus für überparameterisierte asymmetrische Low-rank Matrix Erholung | 保证为超参数化的测量性对称低级矩阵恢复提供先决条件的亚梯分算法的保障 2410.16826v2 |
Authors: Paris Giampouras, HanQin Cai, Rene Vidal
In this paper, we focus on a matrix factorization-based approach to recover low-rank {\it asymmetric} matrices from corrupted measurements. We propose an {\it Overparameterized Preconditioned Subgradient Algorithm (OPSA)} and provide, for the first time in the literature, linear convergence rates independent of the rank of the sought asymmetric matrix in the presence of gross corruptions. Our work goes beyond existing results in preconditioned-type approaches addressing their current limitation, i.e., the lack of convergence guarantees in the case of {\it asymmetric matrices of unknown rank}. By applying our approach to (robust) matrix sensing, we highlight its merits when the measurement operator satisfies a mixed-norm restricted isometry property. Lastly, we present extensive numerical experiments that validate our theoretical results and demonstrate the effectiveness of our approach for different levels of overparameterization and outlier corruptions.
在本文中,我们侧重于一个基于要素化的矩阵化方法,从腐败的测量中恢复低级别(Iit 不对称)矩阵。我们建议采用“超度超标预设亚临界测算法 ” ( OPSA ) , 并在文献中首次规定,在出现严重腐败的情况下,线性趋同率独立于所寻求的非对称矩阵的等级。我们的工作超越了处理当前限制的前提条件型方法的现有结果,即:在未知级别( iit 不对称矩阵 ) 的情况下缺乏趋同保证。我们通过对( robust) 矩阵感测采用我们的方法,我们强调测量操作员满足混合的规范限制的测量属性的优点。最后,我们提出了广泛的数字实验,以证实我们的理论结果,并表明我们处理不同程度的过度定量和外部腐败的方法的有效性。
Article 169
Title@2025-05-29 (4): Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control
Title: Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control | Grower-in-the-Loop Interaktives Verstärkungslernen für Greenhouse Climate Control | 种植者在Loop-Loop 互动强化学习促进温室气候控制 2505.23355v1 |
Authors: Maxiu Xiao, Jianglin Lan, Jingxing Yu, Eldert van Henten, Congcong Sun
Climate control is crucial for greenhouse production as it directly affects crop growth and resource use. Reinforcement learning (RL) has received increasing attention in this field, but still faces challenges, including limited training efficiency and high reliance on initial learning conditions. Interactive RL, which combines human (grower) input with the RL agent’s learning, offers a potential solution to overcome these challenges. However, interactive RL has not yet been applied to greenhouse climate control and may face challenges related to imperfect inputs. Therefore, this paper aims to explore the possibility and performance of applying interactive RL with imperfect inputs into greenhouse climate control, by: (1) developing three representative interactive RL algorithms tailored for greenhouse climate control (reward shaping, policy shaping and control sharing); (2) analyzing how input characteristics are often contradicting, and how the trade-offs between them make grower’s inputs difficult to perfect; (3) proposing a neural network-based approach to enhance the robustness of interactive RL agents under limited input availability; (4) conducting a comprehensive evaluation of the three interactive RL algorithms with imperfect inputs in a simulated greenhouse environment. The demonstration shows that interactive RL incorporating imperfect grower inputs has the potential to improve the performance of the RL agent. RL algorithms that influence action selection, such as policy shaping and control sharing, perform better when dealing with imperfect inputs, achieving 8.4% and 6.8% improvement in profit, respectively. In contrast, reward shaping, an algorithm that manipulates the reward function, is sensitive to imperfect inputs and leads to a 9.4% decrease in profit. This highlights the importance of selecting an appropriate mechanism when incorporating imperfect inputs.
气候控制对温室生产至关重要,因为它直接影响到作物增长和资源使用。强化学习(RL)在这一领域受到越来越多的关注,但仍然面临挑战,包括培训效率有限和高度依赖初始学习条件。交互式RL将人(growwer)的投入与RL代理的学习结合起来,为克服这些挑战提供了潜在的解决办法。然而,互动的RL尚未应用于温室气候控制,并可能面临与不完善的投入有关的挑战。因此,本文件旨在探讨应用互动式RL(RL)和不完善的投入对温室气候控制进行互动RL的可能性和性能,具体做法是:(1) 开发三种具有代表性的互动式RL(RL)算法,专门用于温室气体控制重要投入(升级成型、政策制定和共享);(2) 分析投入特点常常与R(growwer)交错,使种植者的投入难以完善;(3) 提出以神经网络为基础的方法,在有限的投入提供量的情况下,增强互动RL算法的稳健性;(4) 在模拟的温室环境中,对具有不完善投入的三种互动的RL算法进行全面评价。演示表明,将不完善的RL的RL(ralimalevilevalimalevalim )的计算,在改进过程中,这种算值的精定值的精定值的精细性能能能的精细性能能能能改进了精准性能,从而改进了精制成成成成成成成成成成成成成成成能,从而改进了RL,从而改进了RL 改进了RL 改进了Ral性能性能,使RL 改进了RL 。
Article 170
Title@2025-05-29 (4): ChatHuman: Chatting about 3D Humans with Tools
Title: ChatHuman: Chatting about 3D Humans with Tools | ChatHuman: Chatten über 3D-Menschen mit Tools | 聊天:用工具聊天关于3D人类 2405.04533v2 |
Authors: Jing Lin, Yao Feng, Weiyang Liu, Michael J. Black
Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including 3D pose, shape, contact, human-object interaction, and emotion. While widely applicable in vision and other areas, such methods require expert knowledge to select, use, and interpret the results. To address this, we introduce ChatHuman, a language-driven system that integrates the capabilities of specialized methods into a unified framework. ChatHuman functions as an assistant proficient in utilizing, analyzing, and interacting with tools specific to 3D human tasks, adeptly discussing and resolving related challenges. Built on a Large Language Model (LLM) framework, ChatHuman is trained to autonomously select, apply, and interpret a diverse set of tools in response to user inputs. Our approach overcomes significant hurdles in adapting LLMs to 3D human tasks, including the need for domain-specific knowledge and the ability to interpret complex 3D outputs. The innovations of ChatHuman include leveraging academic publications to instruct the LLM on tool usage, employing a retrieval-augmented generation model to create in-context learning examples for managing new tools, and effectively discriminating between and integrating tool results by transforming specialized 3D outputs into comprehensible formats. Experiments demonstrate that ChatHuman surpasses existing models in both tool selection accuracy and overall performance across various 3D human tasks, and it supports interactive chatting with users. ChatHuman represents a significant step toward consolidating diverse analytical methods into a unified, robust system for 3D human tasks.
已经提出了许多方法来检测、估计和分析人们在图像中的特性,包括3D形象、形状、接触、人类物件互动和情感。这些方法在视觉和其他领域广泛适用,但需要专家知识来选择、使用和解释结果。为此,我们引入了Chathuman,这是一个语言驱动系统,将专门方法的能力纳入一个统一的框架。ChatHuman作为一个助理,在利用、分析和与3D人类任务具体工具互动方面十分熟练的助理职能,恰当地讨论和解决相关挑战。在大语言模型(LLLM)框架上构建了大语言模型框架,ChatHuman接受了自主选择、应用和解释一套针对用户投入的多样化工具的培训。我们的方法克服了将LMMs与3D人类任务相适应的重大障碍,包括需要具体领域的知识和解释复杂的3D产出的能力。Chathuenhury的创新包括利用学术出版物来指导LM的工具使用,使用一种检索和推荐的多样化的生成模型,以创建用于管理新工具的连带学习范例,以及有效地将人文分析任务与整个分析工具加以区别,从而将人权选择工具转化为。
Article 171
Title@2025-05-29 (4): BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change
Title: BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change | BAH-Datensatz für Ambivalenz/Hesitanzerkennung in Videos für Verhaltensänderungen | BAH 行为变化视频中双向/隐私识别 BAH 数据集 2505.19328v2 |
Authors: Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger
Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H. This paper introduces a first Behavioural Ambivalence/Hesitancy (BAH) dataset collected for subject-based multimodal recognition of A/H in videos. It contains videos from 224 participants captured across 9 provinces in Canada, with different age, and ethnicity. Through our web platform, we recruited participants to answer 7 questions, some of which were designed to elicit A/H while recording themselves via webcam with microphone. BAH amounts to 1,118 videos for a total duration of 8.26 hours with 1.5 hours of A/H. Our behavioural team annotated timestamp segments to indicate where A/H occurs, and provide frame- and video-level annotations with the A/H cues. Video transcripts and their timestamps are also included, along with cropped and aligned faces in each frame, and a variety of participants meta-data. We include results baselines for BAH at frame- and video-level recognition in multi-modal setups, in addition to zero-shot prediction, and for personalization using unsupervised domain adaptation. The limited performance of baseline models highlights the challenges of recognizing A/H in real-world videos. The data, code, and pretrained weights are available.
承认与矛盾和偏执(A/H)相关的复杂情感,可在数字行为变化干预措施的个人化和有效性方面发挥关键作用。这些微妙和冲突的情感表现为面部和声频表达方式等多种模式与身体语言之间的不协调。尽管专家可以接受识别A/H的培训,但将其纳入数字干预措施的费用和效果较低。自动学习系统提供了一种具有成本效益的替代方法,可以适应个人用户,并在实时和资源有限的环境中无缝运作。然而,目前没有为设计识别A/H的ML模型提供数据集。本文首次介绍了在视频中为基于主题的A/H的多式识别形式(BAH)收集的双向/喜剧(BAH)数据集。该数据集包含加拿大9个省不同年龄和族裔的224名参与者的视频,通过我们的网络平台,我们招募了7个问题,其中一些是用来在实时和资源有限的情况下进行A/H的,在8.26个总时间段里,在视频/A/H格式上,在视频/图表中,在视频/图表中显示每个视频/图表的直径上,在A/A/A/H格式中,在A/A/A/A/H格式上,在视频/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/A/
Article 172
Title@2025-05-29 (4): Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
Title: Towards Reward Fairness in RLHF: From a Resource Allocation Perspective | Zur Belohnung Fairness in RLHF: Aus Ressourcenzuweisungsperspektive | 走向RLHF的奖励公平:从资源分配角度 2505.23349v1 |
Authors: Sheng Ouyang, Yulan Hu, Ge Chen, Qingyang Li, Fuzheng Zhang, Yong Liu
Rewards serve as proxies for human preferences and play a crucial role in Reinforcement Learning from Human Feedback (RLHF). However, if these rewards are inherently imperfect, exhibiting various biases, they can adversely affect the alignment of large language models (LLMs). In this paper, we collectively define the various biases present in rewards as the problem of reward unfairness. We propose a bias-agnostic method to address the issue of reward fairness from a resource allocation perspective, without specifically designing for each type of bias, yet effectively mitigating them. Specifically, we model preference learning as a resource allocation problem, treating rewards as resources to be allocated while considering the trade-off between utility and fairness in their distribution. We propose two methods, Fairness Regularization and Fairness Coefficient, to achieve fairness in rewards. We apply our methods in both verification and reinforcement learning scenarios to obtain a fairness reward model and a policy model, respectively. Experiments conducted in these scenarios demonstrate that our approach aligns LLMs with human preferences in a more fair manner.
奖赏是人类偏好的代言人,在加强从人类反馈中学习(RLHF)中发挥着关键作用。然而,如果这些奖赏本质上不完美,表现出各种偏见,则会对大型语言模式(LLMS)的匹配产生不利影响。在本文件中,我们共同将奖赏中存在的各种偏见定义为奖赏不公平的问题。我们建议一种不偏颇的方法,从资源分配的角度解决奖赏公平问题,而不具体设计每一种类型的偏见,而是有效地减轻这些偏见。具体地说,我们把优待学习作为一种资源分配问题,在考虑其分配的效用和公平之间取舍时,将奖赏作为应分配的资源处理。我们提出了两种方法,即公平性和公平性,以实现奖赏的公平性。我们用我们的方法来核查和加强学习情景,分别获得公平奖赏模式和政策模式。在这些情景中进行的实验表明,我们的做法以更公平的方式使LMs与人类的偏好相一致。
Article 173
Title@2025-05-29 (4): Sentinel: Scheduling Live Streams with Proactive Anomaly Detection in Crowdsourced Cloud-Edge Platforms
Title: Sentinel: Scheduling Live Streams with Proactive Anomaly Detection in Crowdsourced Cloud-Edge Platforms | Sentinel: Planung von Livestreams mit proaktiver Anomalieerkennung in Crowdsourced Cloud-Edge-Plattformen | 哨兵:将现场流排成日程,在人源云源云源平台上进行主动异常探测 2505.23347v1 |
Authors: Yuting Li, Shaoyuan Huang, Tengwen Zhang, Cheng Zhang, Xiaofei Wang, Victor C. M. Leung
With the rapid growth of live streaming services, Crowdsourced Cloud-edge service Platforms (CCPs) are playing an increasingly important role in meeting the increasing demand. Although stream scheduling plays a critical role in optimizing CCPs’ revenue, most optimization strategies struggle to achieve practical results due to various anomalies in unstable CCPs. Additionally, the substantial scale of CCPs magnifies the difficulties of anomaly detection in time-sensitive scheduling. To tackle these challenges, this paper proposes Sentinel, a proactive anomaly detection-based scheduling framework. Sentinel models the scheduling process as a two-stage Pre-Post-Scheduling paradigm: in the pre-scheduling stage, Sentinel conducts anomaly detection and constructs a strategy pool; in the post-scheduling stage, upon request arrival, it triggers an appropriate scheduling based on a pre-generated strategy to implement the scheduling process. Extensive experiments on realistic datasets show that Sentinel significantly reduces anomaly frequency by 70%, improves revenue by 74%, and doubles the scheduling speed.
随着现场流流服务的迅速增长,众源云端服务平台(CCP)在满足不断增长的需求方面发挥着越来越重要的作用。虽然流流时间安排在优化CP收入方面发挥着关键作用,但大多数优化战略都因不稳定的CP的异常而难以取得实际成果。此外,大量CCP扩大了在时间敏感时间安排中发现异常情况的困难。为了应对这些挑战,本文件建议哨兵,这是一个积极主动的异常检测列表框架。 哨兵将时间安排过程作为两个阶段的排期前模式:在排期前阶段,Sentinel进行异常检测,并建立一个战略集合;在排期后阶段,在接到要求后阶段,根据事先制定的战略,启动适当的时间安排,以实施排期进程。关于现实数据集的广泛实验显示,Sentinel显著减少异常频率70%,增加收入74%,并增加时间安排速度的两倍。
Article 174
Title@2025-05-29 (4): Graph Positional Autoencoders as Self-supervised Learners
Title: Graph Positional Autoencoders as Self-supervised Learners | Graphische Positionale Autoencoder als selbstüberwachte Lernende | 作为自监管学习者进行定位自动校对的图形图 2505.23345v1 |
Authors: Yang Liu, Deyu Bo, Wenxuan Cao, Yuan Fang, Yawen Li, Chuan Shi
Graph self-supervised learning seeks to learn effective graph representations without relying on labeled data. Among various approaches, graph autoencoders (GAEs) have gained significant attention for their efficiency and scalability. Typically, GAEs take incomplete graphs as input and predict missing elements, such as masked nodes or edges. While effective, our experimental investigation reveals that traditional node or edge masking paradigms primarily capture low-frequency signals in the graph and fail to learn the expressive structural information. To address these issues, we propose Graph Positional Autoencoders (GraphPAE), which employs a dual-path architecture to reconstruct both node features and positions. Specifically, the feature path uses positional encoding to enhance the message-passing processing, improving GAE’s ability to predict the corrupted information. The position path, on the other hand, leverages node representations to refine positions and approximate eigenvectors, thereby enabling the encoder to learn diverse frequency information. We conduct extensive experiments to verify the effectiveness of GraphPAE, including heterophilic node classification, graph property prediction, and transfer learning. The results demonstrate that GraphPAE achieves state-of-the-art performance and consistently outperforms baselines by a large margin.
在各种方法中,图形自动解析器(GAE)因其效率和可缩缩性而得到极大关注。通常,GAE采用不完整的图形作为输入,并预测缺失的元素,如掩码节点或边缘。虽然我们实验性调查有效,但发现传统的节点或边缘遮蔽模式主要在图形中捕捉低频信号,无法学习表达式结构信息。为了解决这些问题,我们提议了图形定位自动解析器(GraphPAE),它使用双向结构来重建节点特征和位置。具体地说,功能路径使用定位编码来增强信息传递处理,提高GAE预测腐败信息的能力。在另一方面,定位路径利用节点表达方式来改进定位和近似易位源,从而使得编码器能够学习不同的频率信息。我们进行了广泛的实验,以核实GAPAE的有效性,包括肝脏节点分类、图形属性预测,以及持续地平距转移等。图表展示了通过大规模定位和转移的状态。
Article 175
Title@2025-05-29 (4): A Descriptor Is All You Need: Accurate Machine Learning of Nonadiabatic Coupling Vectors
Title: A Descriptor Is All You Need: Accurate Machine Learning of Nonadiabatic Coupling Vectors | Ein Deskriptor ist alles, was Sie brauchen: Genaues maschinelles Lernen von nichtadiabatischen Kupplungsvektoren | 描述符是你需要的:非非异相叠合矢量的精确机器学习 2505.23344v1 |
Authors: Jakub Martinka, Lina Zhang, Yi-Fan Hou, Mikołaj Martyka, Jiří Pittner, Mario Barbatti, Pavlo O. Dral
Nonadiabatic couplings (NACs) play a crucial role in modeling photochemical and photophysical processes with methods such as the widely used fewest-switches surface hopping (FSSH). There is therefore a strong incentive to machine learn NACs for accelerating simulations. However, this is challenging due to NACs’ vectorial, double-valued character and the singularity near a conical intersection seam. For the first time, we design NAC-specific descriptors based on our domain expertise and show that they allow learning NACs with never-before-reported accuracy of $R^2$ exceeding 0.99. The key to success is also our new ML phase-correction procedure. We demonstrate the efficiency and robustness of our approach on a prototypical example of fully ML-driven FSSH simulations of fulvene targeting the SA-2-CASSCF(6,6) electronic structure level. This ML-FSSH dynamics leads to an accurate description of $S_1$ decay while reducing error bars by allowing the execution of a large ensemble of trajectories. Our implementations are available in open-source MLatom.
在模拟光化学和光物理过程方面,非非异性联结(NACs)在模拟光化学和光物理过程方面发挥着关键作用,使用的方法包括广泛使用的最少开关的表面购物(FSSH)等。因此,有强大的动力为加速模拟而机械学习NACs学习NACs。然而,由于NACs的矢量性、双值性格和在锥形交叉接合处附近的独特性,这具有挑战性。我们首次根据我们的领域专长设计了针对NAC的专用描述器,并表明它们允许以从未报告过的精确度超过0.99美元为单位学习NACs,而从未报告精确度超过0.99美元。成功的关键也是我们新的ML阶段校正程序。我们展示了我们在完全ML驱动的FSSHSH模拟针对SA-2-CSSCSF(6,6)电子结构水平的原型例子方面的做法的效率和稳健。ML-FSSHSH的动态导致准确描述$_1美元的腐烂度,同时减少误差条,允许执行大型串径。我们可以在开放的MLS-L的操作中进行。
Article 176
Title@2025-05-29 (4): Matryoshka Model Learning for Improved Elastic Student Models
Title: Matryoshka Model Learning for Improved Elastic Student Models | Matryoshka Model Learning für verbesserte elastische Studentenmodelle | Matryoshka 改进弹性学生模式示范学习模式 2505.23337v1 |
Authors: Chetan Verma, Aditya Srinivas Timmaraju, Cho Jui-Hsieh, Suyash Damle, Ngot Bui, Yang Zhang, Wen Chen, Xin Liu, Prateek Jain, Inderjit S Dhillon
Industry-grade ML models are carefully designed to meet rapidly evolving serving constraints, which requires significant resources for model development. In this paper, we propose MatTA, a framework for training multiple accurate Student models using a novel Teacher-TA-Student recipe. TA models are larger versions of the Student models with higher capacity, and thus allow Student models to better relate to the Teacher model and also bring in more domain-specific expertise. Furthermore, multiple accurate Student models can be extracted from the TA model. Therefore, despite only one training run, our methodology provides multiple servable options to trade off accuracy for lower serving cost. We demonstrate the proposed method, MatTA, on proprietary datasets and models. Its practical efficacy is underscored by live A/B tests within a production ML system, demonstrating 20% improvement on a key metric. We also demonstrate our method on GPT-2 Medium, a public model, and achieve relative improvements of over 24% on SAT Math and over 10% on the LAMBADA benchmark.
工业级ML模型经过仔细设计,以适应迅速变化的服务限制,这需要大量资源用于模型开发。在本文件中,我们提议马特塔(MatTA),这是一个使用新型教师-TA-学生食谱培训多种准确学生模型的框架。TA模型是能力较高的学生模型的较大版本,从而使学生模型能够更好地与教师模型相联系,并带来更多的具体领域的专门知识。此外,可以从TA模型中提取多种准确的学生模型。因此,尽管只有一次培训,我们的方法为降低服务成本的准确性提供了多种易用选项。我们展示了拟议的方法,即专有数据集和模型的MatTA,其实际效力体现在生产ML系统内的活A/B测试中,显示关键指标的20%的改进。我们还展示了我们在GPT-2中度(公共模型)上的方法,并在SAT数学基准上实现了超过24%的相对改进,在LAMBADA基准上则超过10%。
Article 177
Title@2025-05-29 (4): X2Graph for Cancer Subtyping Prediction on Biological Tabular Data
Title: X2Graph for Cancer Subtyping Prediction on Biological Tabular Data | X2Graph für Krebs Subtyping Vorhersage auf biologische Tabellendaten | 用于对生物表表数据进行癌症子图谱预测的X2Graph 2505.23334v1 |
Authors: Tu Bui, Mohamed Suliman, Aparajita Haldar, Mohammed Amer, Serban Georgescu
Despite the transformative impact of deep learning on text, audio, and image datasets, its dominance in tabular data, especially in the medical domain where data are often scarce, remains less clear. In this paper, we propose X2Graph, a novel deep learning method that achieves strong performance on small biological tabular datasets. X2Graph leverages external knowledge about the relationships between table columns, such as gene interactions, to convert each sample into a graph structure. This transformation enables the application of standard message passing algorithms for graph modeling. Our X2Graph method demonstrates superior performance compared to existing tree-based and deep learning methods across three cancer subtyping datasets.
尽管深入学习对文本、音频和图像数据集产生了变革性影响,但其在表格数据中的主导地位,特别是在数据往往稀缺的医疗领域,仍然不那么清楚。在本论文中,我们提出了X2Graph,这是在小型生物表格数据集上取得强效的新颖的深层次学习方法。X2Graph利用关于表格列之间关系的外部知识,例如基因互动,将每个样本转换成图表结构。这一转变使得能够应用标准的信息传递算法进行图形建模。我们的X2Graph方法显示,与三个癌症子型数据集相比,与现有的基于树的深层次学习方法相比,其性能更高。
Article 178
Title@2025-05-29 (4): Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization
Title: Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization | Feintuning Next-Scale Visual Autoregressive Modelle mit gruppenrelativer Politikoptimierung | 采用群体相对政策优化优化的 下尺度视觉自动递减模型 2505.23331v1 |
Authors: Matteo Gallici, Haitz Sáez de Ocáriz Borde
Fine-tuning pre-trained generative models with Reinforcement Learning (RL) has emerged as an effective approach for aligning outputs more closely with nuanced human preferences. In this paper, we investigate the application of Group Relative Policy Optimization (GRPO) to fine-tune next-scale visual autoregressive (VAR) models. Our empirical results demonstrate that this approach enables alignment to intricate reward signals derived from aesthetic predictors and CLIP embeddings, significantly enhancing image quality and enabling precise control over the generation style. Interestingly, by leveraging CLIP, our method can help VAR models generalize beyond their initial ImageNet distribution: through RL-driven exploration, these models can generate images aligned with prompts referencing image styles that were absent during pre-training. In summary, we show that RL-based fine-tuning is both efficient and effective for VAR models, benefiting particularly from their fast inference speeds, which are advantageous for online sampling, an aspect that poses significant challenges for diffusion-based alternatives.
通过强化学习(RL),微调培训前的基因化模型已成为使产出更接近于细微人类偏好的一种有效方法。在本文中,我们调查了群体相对政策优化(GROP)模型的应用,以微调下一个规模的视觉自动递减(VAR)模型。我们的实证结果表明,这一方法使得与从美学预测器和CLIP嵌入中得出的复杂奖赏信号相匹配,大大提高了图像质量,并使得能够精确控制生成风格。 有趣的是,通过利用CLIP,我们的方法可以帮助VAR模型在最初的图像网络分布之外加以普及:通过RL驱动的探索,这些模型能够产生与预训练期间缺少的提示图像样式相匹配的速效图像。 总之,我们表明,基于RL的微调对于VAR模型既高效又有效,特别是从其快速推断速度中获益,这对在线取样有利,这是对基于传播的替代品构成重大挑战的一个方面。
Article 179
Title@2025-05-29 (4): Error Broadcast and Decorrelation as a Potential Artificial and Natural Learning Mechanism
Title: Error Broadcast and Decorrelation as a Potential Artificial and Natural Learning Mechanism | Fehlerübertragung und Decorrelation als potenzieller künstlicher und natürlicher Lernmechanismus | 错误 广播和装饰关系作为一种潜在的人工和自然学习机制 2504.11558v2 |
Authors: Mete Erdogan, Cengiz Pehlevan, Alper T. Erdogan
We introduce Error Broadcast and Decorrelation (EBD), a novel learning framework for neural networks that addresses credit assignment by directly broadcasting output errors to individual layers, circumventing weight transport of backpropagation. EBD is rigorously grounded in the stochastic orthogonality property of Minimum Mean Square Error estimators. This fundamental principle states that the error of an optimal estimator is orthogonal to functions of the input. Guided by this insight, EBD defines layerwise loss functions that directly penalize correlations between layer activations and output errors, thereby establishing a principled foundation for error broadcasting. This theoretically sound mechanism naturally leads to the experimentally observed three-factor learning rule and integrates with biologically plausible frameworks to enhance performance and plausibility. Numerical experiments demonstrate EBD’s competitive or better performance against other error-broadcast methods on benchmark datasets. Our findings establish EBD as an efficient, biologically plausible, and principled alternative for neural network training.
我们引入了错误广播和礼节关系(EBD),这是一个神经网络的新学习框架,它通过将输出错误直接广播到各个层,从而绕过反向传播的重量迁移,解决信用分配问题。EBD严格地植根于最低中平方错误估测器的随机或纵深属性。这一基本原则表明,最佳估测器的错误与输入的函数是正交错的。根据这一洞察,EBD定义了层值损失功能,直接惩罚层激活和输出错误之间的相互关系,从而为错误广播奠定了原则基础。这一理论上的健全机制自然导致实验性观测到的三要素学习规则,并与生物上可信的框架相结合,以提高性能和可信赖性。数字实验表明,EBD相对于基准数据集上的其他错误-路标方法具有竞争性或更好的性能。我们的调查结果将EBD确定为神经网络培训的一种高效、生物上合理和有原则的替代方法。
Article 180
Title@2025-05-29 (4): Combinatorial Rising Bandit
Title: Combinatorial Rising Bandit | Kombinatorial Rising Bandit | 混合崛起强盗 2412.00798v3 |
Authors: Seockbean Song, Youngsik Yoon, Siwei Wang, Wei Chen, Jungseul Ok
Combinatorial online learning is a fundamental task for selecting the optimal action (or super arm) as a combination of base arms in sequential interactions with systems providing stochastic rewards. It is applicable to diverse domains such as robotics, social advertising, network routing, and recommendation systems. In many real-world scenarios, we often encounter rising rewards, where playing a base arm not only provides an instantaneous reward but also contributes to the enhancement of future rewards, e.g., robots enhancing proficiency through practice and social influence strengthening in the history of successful recommendations. Moreover, the enhancement of a single base arm may affect multiple super arms that include it, introducing complex dependencies that are not captured by existing rising bandit models. To address this, we introduce the Combinatorial Rising Bandit (CRB) framework and propose a provably efficient algorithm, Combinatorial Rising Upper Confidence Bound (CRUCB). We establish an upper bound on regret CRUCB and show that it is nearly tight by deriving a matching lower bound. In addition, we empirically demonstrate the effectiveness of CRUCB not only in synthetic environments but also in realistic applications of deep reinforcement learning.
混合在线学习是选择最佳行动(或超级臂)的一项基本任务,是选择最佳行动(或超级臂)作为基础臂与提供随机奖励的系统相继互动的一种组合。它适用于机器人、社会广告、网络路由和建议系统等不同领域。在许多现实世界情景中,我们经常遇到不断上升的奖励,玩一个基臂不仅能瞬间提供奖励,而且有助于提高今后的奖励,例如机器人通过实践和社会影响在成功建议的历史中强化了熟练程度。此外,加强一个单一基臂可能会影响多个超级臂,包括它,引入现有的不断上升的强盗模式无法捕捉到的复杂依赖性。为了解决这个问题,我们引入了组合上升强盗(CRB)框架,并提出了一种非常高效的算法,即组合式提高高度自信(CRUCB) 。我们建立了对CRUCB的上层约束,并表明通过得出相匹配的更低约束几乎是紧要紧的。此外,我们从经验上表明CRUCB不仅在合成环境中,而且还在深度强化学习的现实应用中证明了其有效性。
Article 181
Title@2025-05-29 (4): Efficient Parameter Estimation for Bayesian Network Classifiers using Hierarchical Linear Smoothing
Title: Efficient Parameter Estimation for Bayesian Network Classifiers using Hierarchical Linear Smoothing | Effiziente Parameterschätzung für Bayesian Network Klassifikatoren mit Hierarchical Linear Glättung | Bayesian 网络分类器使用等级线性线性平滑法的高效参数参数估测 2505.23320v1 |
Authors: Connor Cooper, Geoffrey I. Webb, Daniel F. Schmidt
Bayesian network classifiers (BNCs) possess a number of properties desirable for a modern classifier: They are easily interpretable, highly scalable, and offer adaptable complexity. However, traditional methods for learning BNCs have historically underperformed when compared to leading classification methods such as random forests. Recent parameter smoothing techniques using hierarchical Dirichlet processes (HDPs) have enabled BNCs to achieve performance competitive with random forests on categorical data, but these techniques are relatively inflexible, and require a complicated, specialized sampling process. In this paper, we introduce a novel method for parameter estimation that uses a log-linear regression to approximate the behaviour of HDPs. As a linear model, our method is remarkably flexible and simple to interpret, and can leverage the vast literature on learning linear models. Our experiments show that our method can outperform HDP smoothing while being orders of magnitude faster, remaining competitive with random forests on categorical data.
Bayesian网络分类(BNCs)拥有一些适合现代分类器的特性:这些特性易于解释,可高度缩放,具有适应性复杂性。然而,与随机森林等主要分类方法相比,传统的学习BNCs的方法在历史上表现不佳。最近使用分级Dirichlet工艺的参数平滑技术使Benesian网络分类器在绝对数据上实现了与随机森林的性能竞争,但这些技术相对不灵活,需要复杂的专门取样程序。在本文中,我们引入了一种新颖的参数估计方法,该方法使用日志线状回归法来近似HDPs的行为。作为一个线性模型,我们的方法非常灵活和简单,可以利用大量文献来解释线性模型。我们的实验表明,我们的方法可以超越HDP的光滑,同时速度更快,与任意森林的绝对数据保持竞争力。
Article 182
Title@2025-05-29 (4): A Straightforward Gradient-Based Approach for High-Tc Superconductor Design: Leveraging Domain Knowledge via Adaptive Constraints
Title: A Straightforward Gradient-Based Approach for High-Tc Superconductor Design: Leveraging Domain Knowledge via Adaptive Constraints | Ein einfacher gradient-basierter Ansatz für High-Tc-Supraleiter-Design: Nutzung von Domain-Wissen über adaptive Einschränkungen | 高Tc超级导体设计的直向渐进式高超导体设计方法:通过适应性制约因素利用域知识 2403.13627v2 |
Authors: Akihiro Fujii, Anh Khoa Augustin Lu, Koji Shimizu, Satoshi Watanabe
Materials design aims to discover novel compounds with desired properties. However, prevailing strategies face critical trade-offs. Conventional element-substitution approaches readily and adaptively incorporate various domain knowledge but remain confined to a narrow search space. In contrast, deep generative models efficiently explore vast compositional landscapes, yet they struggle to flexibly integrate domain knowledge. To address these trade-offs, we propose a gradient-based material design framework that combines these strengths, offering both efficiency and adaptability. In our method, chemical compositions are optimised to achieve target properties by using property prediction models and their gradients. In order to seamlessly enforce diverse constraints, including those reflecting domain insights such as oxidation states, discretised compositional ratios, types of elements, and their abundance, we apply masks and employ a special loss function, namely the integer loss. Furthermore, we initialise the optimisation using promising candidates from existing dataset, effectively guiding the search away from unfavourable regions and thus helping to avoid poor solutions. Our approach demonstrates a more efficient exploration of superconductor candidates, uncovering candidate materials with higher critical temperature than conventional element-substitution and generative models. Importantly, it could propose new compositions beyond those found in existing databases, including new hydride superconductors absent from the training dataset but which share compositional similarities with materials found in literature. This synergy of domain knowledge and machine-learning-based scalability provides a robust foundation for rapid, adaptive, and comprehensive materials design for superconductors and beyond.
然而,主流战略面临着关键的权衡。常规元素替代方法很容易和适应性地融合了各种领域知识,但仍然局限于狭小的搜索空间。相比之下,深基因模型有效地探索了广泛的构成景观,但却努力灵活整合领域知识。为了解决这些权衡,我们提议了一个基于梯度的材料设计框架,将这些优势结合起来,既提供效率和适应性,又提供效率和适应性。在我们的方法中,化学成分最理想地通过使用财产预测模型及其梯度来实现目标属性。为了无缝地执行各种限制,包括反映诸如氧化状态、分解的构成比率、元素类型及其丰度等领域洞察力的制约。相比之下,我们应用了隐蔽式模型,并运用了特殊损失功能,即全方位损失。此外,我们更倾向于利用有希望的候选者,有效地指导从不利区域搜索,从而帮助避免问题。我们的方法表明,对超级导体候选者进行了更高效的探索,发现候选材料的温度比常规元素替代和基因化模型要高得多。重要的是,它可以提出超越快速设计模型的适应性结构,而从快速设计数据库中找到的弹性结构,其中包括在现有结构中找到的高级研究数据库中找到的弹性数据。
Article 183
Title@2025-05-29 (4): Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments
Title: Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments | Verbesserung der Genauigkeit der Markerbewertung durch ordinelles Vertrauensmodellierung in Bildungsbewertungen | 通过在教育评估中建立常规信任模型,加强标标码的准确度 2505.23315v1 |
Authors: Abhirup Chakravarty, Mark Brenchley, Trevor Breakspear, Ian Lewin, Yan Huang
A key ethical challenge in Automated Essay Scoring (AES) is ensuring that scores are only released when they meet high reliability standards. Confidence modelling addresses this by assigning a reliability estimate measure, in the form of a confidence score, to each automated score. In this study, we frame confidence estimation as a classification task: predicting whether an AES-generated score correctly places a candidate in the appropriate CEFR level. While this is a binary decision, we leverage the inherent granularity of the scoring domain in two ways. First, we reformulate the task as an n-ary classification problem using score binning. Second, we introduce a set of novel Kernel Weighted Ordinal Categorical Cross Entropy (KWOCCE) loss functions that incorporate the ordinal structure of CEFR labels. Our best-performing model achieves an F1 score of 0.97, and enables the system to release 47% of scores with 100% CEFR agreement and 99% with at least 95% CEFR agreement -compared to approximately 92% (approx.) CEFR agreement from the standalone AES model where we release all AM predicted scores.
自动读取系统( AES) 中的一个关键道德挑战是确保分数只有在达到高可靠性标准时才释放出来。 信任建模通过给每个自动分分分配一个可靠性估计尺度, 以信任分的形式对每个自动分进行。 在本研究中, 我们将信任估测设定为分类任务: 预测 AES 生成的得分是否正确地将候选人置于适当的 CEFR 级别上。 虽然这是一个二进制决定, 我们以两种方式利用评分域固有的颗粒性。 首先, 我们使用分数宾点将任务重新表述为n- 分类问题。 第二, 我们推出一套包含 CEFR 标签的方形结构的新型 Kernelweighted Ordinal Categorical Entropy (KWOCCE) 损失函数。 我们最优秀的模型达到F1 0.97 分, 使系统能够以100% CEFR协议和99%的得分数发放47%, 至少95%的CEFRFR 协议 — 约92% (approxx) 。
Article 184
Title@2025-05-29 (4): Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition
Title: Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition | Adversariale Semantische und Label-Störung Angriff für Fußgänger Attribute Anerkennung | 对抗性语义和Label干扰攻击,以确认佩德斯特属性 2505.23313v1 |
Authors: Weizhe Kong, Xiao Wang, Ruichong Gao, Chenglong Li, Yu Zhang, Xing Yang, Yaowei Wang, Jin Tang
Pedestrian Attribute Recognition (PAR) is an indispensable task in human-centered research and has made great progress in recent years with the development of deep neural networks. However, the potential vulnerability and anti-interference ability have still not been fully explored. To bridge this gap, this paper proposes the first adversarial attack and defense framework for pedestrian attribute recognition. Specifically, we exploit both global- and patch-level attacks on the pedestrian images, based on the pre-trained CLIP-based PAR framework. It first divides the input pedestrian image into non-overlapping patches and embeds them into feature embeddings using a projection layer. Meanwhile, the attribute set is expanded into sentences using prompts and embedded into attribute features using a pre-trained CLIP text encoder. A multi-modal Transformer is adopted to fuse the obtained vision and text tokens, and a feed-forward network is utilized for attribute recognition. Based on the aforementioned PAR framework, we adopt the adversarial semantic and label-perturbation to generate the adversarial noise, termed ASL-PAR. We also design a semantic offset defense strategy to suppress the influence of adversarial attacks. Extensive experiments conducted on both digital domains (i.e., PETA, PA100K, MSP60K, RAPv2) and physical domains fully validated the effectiveness of our proposed adversarial attack and defense strategies for the pedestrian attribute recognition. The source code of this paper will be released on https://github.com/Event-AHU/OpenPAR.
Pedestrian 属性识别(PAR)是人类核心研究中不可或缺的一项任务,近年来随着深层神经网络的发展取得了巨大进展。然而,潜在的脆弱性和反干预能力仍未得到充分探讨。为弥合这一差距,本文件提出了第一个行人属性识别对抗攻击和防御框架。具体地说,我们利用预先培训的 CLIP PAR 框架对行人图像进行全球和补丁攻击,首先将行人图像分为非重叠的补丁,然后将其嵌入投影层的特征嵌入。与此同时,将属性组扩大为使用预培训的 CLIP 文本编码的提示和内嵌入属性特征。采用了多式变形变形变形器,将获得的愿景和文字符号结合起来,并使用进路图网络进行感化识别。基于上述PAR框架,我们采用对抗性隐性图和标签性变形图解来生成对抗性噪音,称为 ASL-PAR 。我们还在设计了100个直径攻击的直径防御战略。我们为SLIBRial-realal-Destral Arealalal restral Areal revistrational destrational restistraction prep press press 。我们进行了了数字式的磁性攻击。我们进行了对立体攻击。
Article 185
Title@2025-05-29 (4): Rethinking Gradient-Based Methods: Multi-Property Materials Design Beyond Differentiable Targets
Title: Rethinking Gradient-Based Methods: Multi-Property Materials Design Beyond Differentiable Targets | Rethinking Gradient-Based Methods: Multi-Property Materials Design Beyond Differentiable Targets | 重新思考渐进方法:超出可区别目标的多财产材料设计 2410.08562v4 |
Authors: Akihiro Fujii, Yoshitaka Ushiku, Koji Shimizu, Anh Khoa Augustin Lu, Satoshi Watanabe
Gradient-based methods offer a simple, efficient strategy for materials design by directly optimizing candidates using gradients from pretrained property predictors. However, their use in crystal structure optimization is hindered by two key challenges: handling non-differentiable constraints, such as charge neutrality and structural fidelity, and susceptibility to poor local minima. We revisit and extend the gradient-based methods to address these issues. We propose Simultaneous Multi-property Optimization using Adaptive Crystal Synthesizer (SMOACS), which integrates oxidation-number masks and template-based initialization to enforce non-differentiable constraints, avoid poor local minima, and flexibly incorporate additional constraints without retraining. SMOACS enables multi-property optimization. including exceptional targets such as high-temperature superconductivity, and scales to large crystal systems, both persistent challenges for generative models, even those enhanced with gradient-based guidance from property predictors. In experiments on five target properties and three datasets, SMOACS outperforms generative models and Bayesian optimization methods, successfully designing 135-atom perovskite structures that satisfy multiple property targets and constraints, a task at which the other methods fail entirely.
以梯度为基础的方法为材料设计提供了一种简单、有效的战略,即直接优化候选人使用来自预先培训的财产预测器的梯度来优化材料设计;然而,晶体结构优化中的使用却受到两大挑战的阻碍:处理非差别的限制,如收费中立性和结构忠诚性,以及易受当地低劣微粒的影响。我们重新审视并推广基于梯度的方法,以解决这些问题。我们提议使用适应性水晶合成器(SMOACS),即结合氧化数字面罩和基于模板的初始化,以实施不可区别的限制,避免当地微小的差,灵活地纳入额外的限制,而不进行再培训。SMOACS能够实现多丙型优化,包括高温超导力和尺度等特殊目标,将其推广到大型晶体系统,两者都对基因化模型构成持续的挑战,即使是用基于梯度的预测器(SMOACS)对五个目标特性和三个数据集进行实验,SMOACS优于基因化模型和Bayes最优化方法,成功地设计了135-Atomat系统的其他限制,从而完全满足了多重目标。
Article 186
Title@2025-05-29 (4): Score-based Generative Modeling for Conditional Independence Testing
Title: Score-based Generative Modeling for Conditional Independence Testing | Score-basierte Generative Modellierung für die Prüfung der bedingten Unabhängigkeit | 有条件独立测试基于记分率生成模型 2505.23309v1 |
Authors: Yixin Ren, Chenghou Jin, Yewei Xia, Li Ke, Longtao Huang, Hui Xue, Hao Zhang, Jihong Guan, Shuigeng Zhou
Determining conditional independence (CI) relationships between random variables is a fundamental yet challenging task in machine learning and statistics, especially in high-dimensional settings. Existing generative model-based CI testing methods, such as those utilizing generative adversarial networks (GANs), often struggle with undesirable modeling of conditional distributions and training instability, resulting in subpar performance. To address these issues, we propose a novel CI testing method via score-based generative modeling, which achieves precise Type I error control and strong testing power. Concretely, we first employ a sliced conditional score matching scheme to accurately estimate conditional score and use Langevin dynamics conditional sampling to generate null hypothesis samples, ensuring precise Type I error control. Then, we incorporate a goodness-of-fit stage into the method to verify generated samples and enhance interpretability in practice. We theoretically establish the error bound of conditional distributions modeled by score-based generative models and prove the validity of our CI tests. Extensive experiments on both synthetic and real-world datasets show that our method significantly outperforms existing state-of-the-art methods, providing a promising way to revitalize generative model-based CI testing.
确定随机变量之间的有条件独立(CI)关系,是机器学习和统计方面,特别是在高维环境中,一项根本性但具有挑战性的任务。现有的基于基因模型的CI测试方法,例如使用基因对抗网络(GANs),常常与不可取的有条件分布模式和培训不稳定性模型进行斗争,从而导致低级性能。为了解决这些问题,我们提议通过基于分数的基因化模型进行新的CI测试方法,实现精确的I型错误控制和强力测试能力。具体地说,我们首先采用切片有条件得分比对方案,准确估计有条件得分,并利用Langevin动态的有条件抽样来生成完全的假设样本,确保准确的I型错误控制。然后,我们把一个合适的阶段纳入核实生成样本和加强实际解释性的方法中。我们理论上确定由基于分谱的基因化模型模型模型模型模型进行有条件分布的错误,并证明我们的CI测试的有效性。对合成和真实世界数据集进行的广泛实验表明,我们的方法大大超越了现有的状态方法,提供了振兴基于基因模型的有希望的方法。
Article 187
Title@2025-05-29 (4): MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
Title: MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction | MGE-LDM: Gemeinsame Latente Diffusion für simultane Musikgeneration und Quellenextraktion | MGE-LDM:同时制作音乐和来源采掘联合前期传播 2505.23305v1 |
Authors: Yunkee Chae, Kyogu Lee
We present MGE-LDM, a unified latent diffusion framework for simultaneous music generation, source imputation, and query-driven source separation. Unlike prior approaches constrained to fixed instrument classes, MGE-LDM learns a joint distribution over full mixtures, submixtures, and individual stems within a single compact latent diffusion model. At inference, MGE-LDM enables (1) complete mixture generation, (2) partial generation (i.e., source imputation), and (3) text-conditioned extraction of arbitrary sources. By formulating both separation and imputation as conditional inpainting tasks in the latent space, our approach supports flexible, class-agnostic manipulation of arbitrary instrument sources. Notably, MGE-LDM can be trained jointly across heterogeneous multi-track datasets (e.g., Slakh2100, MUSDB18, MoisesDB) without relying on predefined instrument categories. Audio samples are available at our project page: https://yoongi43.github.io/MGELDM_Samples/.
我们提出MGE-LDM,这是同步音乐生成、源估算和查询源分离的统一潜在扩散框架。与以往限制固定仪器类别的做法不同,MGE-LDM学会了在单一紧凑潜在扩散模型中对全部混合物、亚混合物和单个源头进行联合分布。根据推断,MGE-LDM使(1) 完全的混合物生成,(2) 部分生成(即源估算)和(3) 任意源的文字提取。通过在潜在空间中将分离和估算作为有条件的油漆任务,我们的方法支持对任意仪器源进行灵活、类分类处理。值得注意的是,MGE-LDM可以在不依赖预定仪器类别的情况下,在多轨数据集(例如,Slakh2100, MUSDB18, MoisesDBDB)之间联合进行培训。我们的项目网页:https://yogi43.github.io/MGELDM_Samples/。
Article 188
Title@2025-05-29 (4): Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models
Title: Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models | Verstehen und Abmildern von Fehlkalibrierung bei sofortiger Tuning für Vision-Language-Modelle | 理解和减缓视觉语言模型快速开票时的误差 2410.02681v4 |
Authors: Shuoyuan Wang, Yixuan Li, Hongxin Wei
Confidence calibration is critical for the safe deployment of machine learning models in the real world. However, such issue in vision-language models like CLIP, particularly after fine-tuning, has not been fully addressed. In this work, we demonstrate that existing prompt tuning methods usually lead to a trade-off of calibration between base and new classes: the cross-entropy loss in CoOp causes overconfidence in new classes by increasing textual label divergence, whereas the regularization of KgCoOp maintains the confidence level but results in underconfidence in base classes due to the improved accuracy. Inspired by the observations, we introduce Dynamic Outlier Regularization (DOR) to ensure the confidence calibration on both base and new classes after fine-tuning. In particular, we propose to minimize the feature deviation of novel textual labels (instead of base classes) sampled from a large vocabulary. In effect, DOR prevents the increase in textual divergence for new labels while easing restrictions on base classes. Extensive experiments demonstrate that DOR can enhance the calibration performance of current fine-tuning methods on base and new classes.
信任度校准对于在现实世界中安全部署机器学习模型至关重要,然而,CLIP等视觉语言模型中的这类问题,特别是在微调后,还没有得到充分解决。在这项工作中,我们证明现有的快速调试方法通常导致基准和新等级之间校准的权衡:Coop中的交叉有机物损失通过增加文字标签差异造成新类别中的不信任过大,而KgCoOOp的正规化则维持了信任水平,但由于基础等级的准确性提高而导致信任度不足。在观察的启发下,我们引入了动态外部常规化(DOR)以确保在微调后对基础和新等级的校准。特别是,我们建议尽可能减少从大词汇中抽样的新文本标签(而不是基础等级)的特征偏差。实际上,DOR防止新标签的文字差异增加,同时放宽对基础等级的限制。广泛的实验表明,DOR可以提高目前在基础和新等级的校准方法的校准性。
Article 189
Title@2025-05-29 (4): How Does Response Length Affect Long-Form Factuality
Title: How Does Response Length Affect Long-Form Factuality | Wie wirkt sich die Response-Länge auf die Langform-Faktizität aus? | 反应时间长度如何影响长期事实质量 2505.23295v1 |
Authors: James Xu Zhao, Jimmy Z. J. Liu, Bryan Hooi, See-Kiong Ng
Large language models (LLMs) are widely used for long-form text generation. However, factual errors in the responses would undermine their reliability. Despite growing attention to LLM factuality, the effect of response length on factuality remains underexplored. In this work, we systematically investigate this relationship by first introducing an automatic and bi-level long-form factuality evaluation framework, which achieves high agreement with human annotations while being cost-effective. Using this framework, we conduct controlled experiments and find that longer responses exhibit lower factual precision, confirming the presence of length bias. To explain this phenomenon, we empirically examine three hypotheses: error propagation, long context, and facts exhaustion. Our results reveal that facts exhaustion, where the model gradually exhausts more reliable knowledge, is the primary cause of factual degradation, rather than the other two hypotheses.
大型语言模型(LLMs)被广泛用于长式文本的生成,但是,答复中的事实错误会损害其可靠性。尽管人们日益关注LLM的实际情况,但答复时间长度对事实质量的影响仍未得到充分探讨。在这项工作中,我们系统地调查这种关系,首先采用自动和双级长式事实质量评估框架,在符合成本效益的情况下与人的注释取得高度一致。我们利用这一框架,进行有控制的实验,发现较长的答复显示事实准确性较低,证实存在时间偏差。为了解释这一现象,我们从经验上研究了三种假设:错误传播、长背景和事实耗竭。我们的结果显示,在模型逐渐耗尽更可靠的知识的情况下,事实用尽是造成实际退化的主要原因,而不是其他两种假设。
Article 190
Title@2025-05-29 (4): Multi-Modal Framing Analysis of News
Title: Multi-Modal Framing Analysis of News | Multi-Modal Framing Analyse der Nachrichten | 新闻多模式结构分析 2503.20960v3 |
Authors: Arnav Arora, Srishti Yadav, Maria Antoniak, Serge Belongie, Isabelle Augenstein
Automated frame analysis of political communication is a popular task in computational social science that is used to study how authors select aspects of a topic to frame its reception. So far, such studies have been narrow, in that they use a fixed set of pre-defined frames and focus only on the text, ignoring the visual contexts in which those texts appear. Especially for framing in the news, this leaves out valuable information about editorial choices, which include not just the written article but also accompanying photographs. To overcome such limitations, we present a method for conducting multi-modal, multi-label framing analysis at scale using large (vision-) language models. Grounding our work in framing theory, we extract latent meaning embedded in images used to convey a certain point and contrast that to the text by comparing the respective frames used. We also identify highly partisan framing of topics with issue-specific frame analysis found in prior qualitative work. We demonstrate a method for doing scalable integrative framing analysis of both text and image in news, providing a more complete picture for understanding media bias.
在计算社会科学中,政治传播的自动框架分析是一项流行的任务,用于研究作者如何选择一个专题的方方面面来设计其接受范围。迄今为止,这种研究范围很窄,使用一套固定的预设框架,只注重文本,忽视了文本的视觉背景。特别是为了在新闻中进行设计,这留下了关于编辑选择的宝贵信息,其中不仅包括书面文章,也包括相附照片。为了克服这些限制,我们提出了一个方法,用大型(视觉)语言模型进行规模的多式多标签框架分析。我们用构思理论作为我们工作的基础,我们从图像中提取潜含的含义,用来传递一个特定点,并通过比较所使用的相应框架来对比文本。我们还确定了高度偏向性的主题框架,在先前的质量工作中发现了针对具体问题的框架分析。我们展示了对文本和新闻图像进行可扩展的综合框架分析的方法,为理解媒体偏见提供了更完整的图片。
Article 191
Title@2025-05-29 (4): Comparative Analysis of the Land Use and Land Cover Changes in Different Governorates of Oman using Spatiotemporal Multi-spectral Satellite Data
Title: Comparative Analysis of the Land Use and Land Cover Changes in Different Governorates of Oman using Spatiotemporal Multi-spectral Satellite Data | Vergleichende Analyse der Bodennutzungs- und Bodenbedeckungsänderungen in verschiedenen Gouvernements von Oman unter Verwendung spatiotemporaler multispektraler Satellitendaten | 利用斯帕蒂多光谱多谱段卫星数据对阿曼不同省份土地利用和土地覆盖变化的比较分析 2505.23285v1 |
Authors: Muhammad Shafi, Syed Mohsin Bokhari
Land cover and land use (LULC) changes are key applications of satellite imagery, and they have critical roles in resource management, urbanization, protection of soils and the environment, and enhancing sustainable development. The literature has heavily utilized multispectral spatiotemporal satellite data alongside advanced machine learning algorithms to monitor and predict LULC changes. This study analyzes and compares LULC changes across various governorates (provinces) of the Sultanate of Oman from 2016 to 2021 using annual time steps. For the chosen region, multispectral spatiotemporal data were acquired from the open-source Sentinel-2 satellite dataset. Supervised machine learning algorithms were used to train and classify different land covers, such as water bodies, crops, urban, etc. The constructed model was subsequently applied within the study region, allowing for an effective comparative evaluation of LULC changes within the given timeframe.
土地覆盖和土地利用变化是卫星图像的关键应用,在资源管理、城市化、土壤和环境保护以及促进可持续发展方面发挥着关键作用;文献大量利用多光谱空间卫星数据以及先进的机器学习算法来监测和预测土地覆盖和土地利用变化;这项研究利用年度时间步骤分析和比较了阿曼苏丹国各省(省)从2016年至2021年的土地覆盖和土地利用变化;对于选定的区域,从开放源码Sentinel-2卫星数据集获取了多谱波段时数据;利用了超导机学习算法来培训和分类不同的土地覆盖,如水体、作物、城市等;随后在研究区域内应用了所构建的模式,以便能够在规定的时限内对土地覆盖和土地利用变化进行有效的比较评估。
Article 192
Title@2025-05-29 (4): Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers
Title: Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers | Verbesserung der kontinuierlichen Lernleistung und Effizienz mit Hilfsklassifikatoren | 提高持续学习成绩和效率,辅级分级 2403.07404v4 |
Authors: Filip Szatkowski, Yaoyue Zheng, Fei Yang, Bartłomiej Twardowski, Tomasz Trzciński, Joost van de Weijer
Continual learning is crucial for applying machine learning in challenging, dynamic, and often resource-constrained environments. However, catastrophic forgetting - overwriting previously learned knowledge when new information is acquired - remains a major challenge. In this work, we examine the intermediate representations in neural network layers during continual learning and find that such representations are less prone to forgetting, highlighting their potential to accelerate computation. Motivated by these findings, we propose to use auxiliary classifiers(ACs) to enhance performance and demonstrate that integrating ACs into various continual learning methods consistently improves accuracy across diverse evaluation settings, yielding an average 10% relative gain. We also leverage the ACs to reduce the average cost of the inference by 10-60% without compromising accuracy, enabling the model to return the predictions before computing all the layers. Our approach provides a scalable and efficient solution for continual learning.
持续学习对于在具有挑战性、动态性和经常受到资源制约的环境中应用机器学习至关重要。然而,灾难性的遗忘 — — 在获得新信息时超过先前学到的知识 — — 仍然是一个重大挑战。在这项工作中,我们检查了神经网络层在持续学习过程中的中间代表,发现这种代表不太容易忘记,强调其加速计算的潜力。我们建议利用辅助分类器提高业绩,并表明将ACs纳入各种持续学习方法,不断提高不同评价环境的准确性,平均产生10%的相对收益。我们还利用ACs将推断的平均成本降低10-60%,同时不损害准确性,使模型能够在计算所有层次之前返回预测。我们的方法为持续学习提供了可扩展和高效的解决方案。
Article 193
Title@2025-05-29 (4): Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
Title: Optimal Protocols for Continual Learning via Statistical Physics and Control Theory | Optimale Protokolle für kontinuierliches Lernen über statistische Physik und Steuerungstheorie | 通过统计物理和控制理论不断学习的最佳最佳协议 2409.18061v3 |
Authors: Francesco Mori, Stefano Sarao Mannelli, Francesca Mignacco
Artificial neural networks often struggle with catastrophic forgetting when learning multiple tasks sequentially, as training on new tasks degrades the performance on previously learned tasks. Recent theoretical work has addressed this issue by analysing learning curves in synthetic frameworks under predefined training protocols. However, these protocols relied on heuristics and lacked a solid theoretical foundation assessing their optimality. In this paper, we fill this gap by combining exact equations for training dynamics, derived using statistical physics techniques, with optimal control methods. We apply this approach to teacher-student models for continual learning and multi-task problems, obtaining a theory for task-selection protocols maximising performance while minimising forgetting. Our theoretical analysis offers non-trivial yet interpretable strategies for mitigating catastrophic forgetting, shedding light on how optimal learning protocols modulate established effects, such as the influence of task similarity on forgetting. Finally, we validate our theoretical findings with experiments on real-world data.
人工神经网络在连续学习多重任务时往往与灾难性的遗忘作斗争,因为关于新任务的培训会降低以前学到的任务的绩效。最近的理论工作通过分析根据预先确定的培训规程在合成框架中的学习曲线来解决这个问题。然而,这些规程依赖超自然学,缺乏坚实的理论基础来评估其最佳性能。在本文中,我们通过将培训动态的精确方程式、利用统计物理技术的推算和最佳控制方法来填补这一差距。我们将这种方法应用于师生模式,用于持续学习和多任务问题,获取任务选择协议优化绩效的理论,同时尽量减少遗忘。我们的理论分析为减轻灾难性的遗忘提供了非边际但可解释的战略,并展示了最佳学习规程如何调整既定效果,例如任务相似对遗忘的影响。最后,我们用真实世界数据实验来验证我们的理论发现。
Article 194
Title@2025-05-29 (4): LADA: Scalable Label-Specific CLIP Adapter for Continual Learning
Title: LADA: Scalable Label-Specific CLIP Adapter for Continual Learning | LADA: Skalierbarer Label-Spezifischer CLIP Adapter für kontinuierliches Lernen | 旱地退化评估:用于持续学习的可缩放标签特定CLIP适应器 2505.23271v1 |
Authors: Mao-Lin Luo, Zi-Hao Zhou, Tong Wei, Min-Ling Zhang
Continual learning with vision-language models like CLIP offers a pathway toward scalable machine learning systems by leveraging its transferable representations. Existing CLIP-based methods adapt the pre-trained image encoder by adding multiple sets of learnable parameters, with each task using a partial set of parameters. This requires selecting the expected parameters for input images during inference, which is prone to error that degrades performance. To address this problem, we introduce LADA (Label-specific ADApter). Instead of partitioning parameters across tasks, LADA appends lightweight, label-specific memory units to the frozen CLIP image encoder, enabling discriminative feature generation by aggregating task-agnostic knowledge. To prevent catastrophic forgetting, LADA employs feature distillation for seen classes, preventing their features from being interfered with by new classes. Positioned after the image encoder, LADA prevents gradient flow to the frozen CLIP parameters, ensuring efficient training. Extensive results show that LADA achieves state-of-the-art performance in continual learning settings. The implementation code is available at https://github.com/MaolinLuo/LADA.
利用CLIP等视觉语言模型不断学习,通过利用其可转移的表示式,为可扩缩机器学习系统提供了一条路径。基于CLIP的现有方法通过添加多套可学习参数,对预培训的图像编码器进行调整,每项任务都使用一套部分参数。这要求在推断过程中选择输入图像的预期参数,这容易导致性能下降的错误。为了解决这一问题,我们引入了旱地退化评估(Label专用自动适应器) 。旱地退化评估(LADA)不是将参数分成不同任务,而是将轻量级、特定标签的记忆单位分到冷冻的 CLIP 图像编码器,通过汇总任务- 认知知识, 使歧视性特性生成。为了防止灾难性的遗忘,旱地退化评估在所见的班级使用特征蒸馏,防止其特征受到新班的干扰。在图像编码后定位,旱地退化评估防止梯梯梯向冻结的CLIP参数流,确保有效的培训。广泛的结果显示,旱地退化评估在持续学习环境中实现状态-艺术性表现。执行代码可在 http://github.com/MaolinLuo/LADADADADADADAD。
Article 195
Title@2025-05-29 (4): Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs
Title: Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs | Entfernt Machine Unlearning wirklich Modellwissen? Ein Rahmen für die Prüfung von Unlearning in LLMs | 机器取消学习是否真正删除了示范知识? 审计框架是否在LLMM中取消学习? 2505.23270v1 |
Authors: Haokun Chen, Yueqi Zhang, Yuan Bi, Yao Zhang, Tong Liu, Jinhe Bi, Jian Lan, Jindong Gu, Claudia Grosser, Denis Krompass, Nassir Navab, Volker Tresp
In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of LLMs. In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods. By using various auditing algorithms, we evaluate the effectiveness and robustness of different unlearning strategies. To explore alternatives beyond prompt-based auditing, we propose a novel technique that leverages intermediate activation perturbations, addressing the limitations of auditing methods that rely solely on model inputs and outputs.
近年来,大语言模型(LLMS)取得了显著进步,引起了研究界的极大关注,其能力主要归功于大型结构,需要大规模数据集的广泛培训,然而,这类数据集往往包含来自公共互联网的敏感或版权内容,引起对数据隐私和所有权的关切。《一般数据保护条例》(GDPR)等监管框架赋予个人要求删除这类敏感信息的权利。这促使开发了旨在将特定知识从模型中去除而无需花费昂贵的再培训的机读算法。尽管取得了这些进步,但由于LLMS的内在复杂性和基因性质,评价未学习算法的功效仍然是一项挑战。在这项工作中,我们引入了一个非学习评价综合审计框架,由三个基准数据集、六个未学习算法和五个快速审计方法组成。我们通过使用各种审计算法,评估不同不学习战略的有效性和稳健性。为了探索超越快速审计的替代方法,我们提出了一种创新技术,即利用中间启动过动,解决仅依赖投入和产出的审计方法的局限性。
Article 196
Title@2025-05-29 (4): Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
Title: Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning | Behavior-Regularized Diffusion Policy Optimierung für Offline-Verstärkung Lernen | 离线强化学习的传播政策优化 2502.04778v2 |
Authors: Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Chenjun Xiao, Yang Yu, Zongzhang Zhang
Behavior regularization, which constrains the policy to stay close to some behavior policy, is widely used in offline reinforcement learning (RL) to manage the risk of hazardous exploitation of unseen actions. Nevertheless, existing literature on behavior-regularized RL primarily focuses on explicit policy parameterizations, such as Gaussian policies. Consequently, it remains unclear how to extend this framework to more advanced policy parameterizations, such as diffusion models. In this paper, we introduce BDPO, a principled behavior-regularized RL framework tailored for diffusion-based policies, thereby combining the expressive power of diffusion policies and the robustness provided by regularization. The key ingredient of our method is to calculate the Kullback-Leibler (KL) regularization analytically as the accumulated discrepancies in reverse-time transition kernels along the diffusion trajectory. By integrating the regularization, we develop an efficient two-time-scale actor-critic RL algorithm that produces the optimal policy while respecting the behavior constraint. Comprehensive evaluations conducted on synthetic 2D tasks and continuous control tasks from the D4RL benchmark validate its effectiveness and superior performance.
行为正规化政策限制了政策接近某些行为政策,在非在线强化学习(RL)中广泛用于管理危险利用无形行动的风险,然而,关于行为正规化的RL的现有文献主要侧重于明确的政策参数化,如高斯政策,因此仍不清楚如何将这一框架扩展至更先进的政策参数化,如推广模式。在本文件中,我们引入了BDPO,这是为基于传播的政策量身定制的基于行为正规化的原则性RL框架,从而结合了传播政策的表达力和规范化所提供的稳健性。我们方法的关键要素是分析计算Kullback-Leiberr(KL)的正规化,作为沿传播轨道的逆时过渡内圈的累积差异。通过整合正规化,我们开发了一种高效的双时制的演员-critict RL算法,在尊重行为约束的同时产生最佳政策。对合成2D任务进行了全面评价,D4RL基准的连续控制任务验证了其有效性和优劣性。
Article 197
Title@2025-05-29 (4): Efficiently Access Diffusion Fisher: Within the Outer Product Span Space
Title: Efficiently Access Diffusion Fisher: Within the Outer Product Span Space | Effizienter Zugriff auf Diffusion Fisher: Innerhalb des Outer Product Span Space | 有效获取扩散渔渔场:在外生产品空间内 2505.23264v1 |
Authors: Fangyikang Wang, Hubery Yin, Shaobin Zhuang, Huminhao Zhu, Yinan Li, Lei Qian, Chao Zhang, Hanbin Zhao, Hui Qian, Chen Li
Recent Diffusion models (DMs) advancements have explored incorporating the second-order diffusion Fisher information (DF), defined as the negative Hessian of log density, into various downstream tasks and theoretical analysis. However, current practices typically approximate the diffusion Fisher by applying auto-differentiation to the learned score network. This black-box method, though straightforward, lacks any accuracy guarantee and is time-consuming. In this paper, we show that the diffusion Fisher actually resides within a space spanned by the outer products of score and initial data. Based on the outer-product structure, we develop two efficient approximation algorithms to access the trace and matrix-vector multiplication of DF, respectively. These algorithms bypass the auto-differentiation operations with time-efficient vector-product calculations. Furthermore, we establish the approximation error bounds for the proposed algorithms. Experiments in likelihood evaluation and adjoint optimization demonstrate the superior accuracy and reduced computational cost of our proposed algorithms. Additionally, based on the novel outer-product formulation of DF, we design the first numerical verification experiment for the optimal transport property of the general PF-ODE deduced map.
最近的传播模型(DMs)已经探索了将第二阶扩散渔业信息(DF)纳入各种下游任务和理论分析中,目前的做法通常通过对学分网络应用自动差异来接近Fisher的传播。这种黑箱方法虽然简单,缺乏准确性保证,而且耗时。在本文中,我们表明,扩散渔业者实际上生活在分数外产品和初始数据所覆盖的空间之内。根据外产品结构,我们分别开发了两种高效近似算法,以获取DF的痕量和矩阵矢量乘数乘数。这些算法绕过自动差异操作,以具有时间效率的矢量产品计算。此外,我们为拟议的算法确定了近似误差界限。在可能性评估和联合优化方面进行的实验表明,我们提议的算法的精确性和计算成本已经降低。此外,根据DF的新型外产品配方,我们设计了第一次数字核查实验,用于PFO-OD推算的通用地图的最佳运输特性。
Article 198
Title@2025-05-29 (4): Stable Thompson Sampling: Valid Inference via Variance Inflation
Title: Stable Thompson Sampling: Valid Inference via Variance Inflation | Stabile Thompson-Probenahme: Gültige Schlussfolgerung durch Varianz-Inflation | 稳定汤普森抽样:因通货膨胀差异而得出的有效推论 2505.23260v1 |
Authors: Budhaditya Halder, Shubhayan Pan, Koulik Khamaru
We consider the problem of statistical inference when the data is collected via a Thompson Sampling-type algorithm. While Thompson Sampling (TS) is known to be both asymptotically optimal and empirically effective, its adaptive sampling scheme poses challenges for constructing confidence intervals for model parameters. We propose and analyze a variant of TS, called Stable Thompson Sampling, in which the posterior variance is inflated by a logarithmic factor. We show that this modification leads to asymptotically normal estimates of the arm means, despite the non-i.i.d. nature of the data. Importantly, this statistical benefit comes at a modest cost: the variance inflation increases regret by only a logarithmic factor compared to standard TS. Our results reveal a principled trade-off: by paying a small price in regret, one can enable valid statistical inference for adaptive decision-making algorithms.
在通过汤普森抽样类型算法收集数据时,我们考虑统计推论问题。尽管汤普森抽样(TS)已知是非现成的最佳和实证有效的,但其适应性抽样办法对建立模型参数的信任间隔提出了挑战。我们提出并分析了TS的变种,称为Stabable Thompson抽样,其中后方差异因对数因素而膨胀。我们表明,这一修改导致对手臂手段的无现性正常估计,尽管数据是非i.d.性质。重要的是,这一统计效益的代价不大:差价通胀因与标准TS相比的逻辑因素而增加遗憾。我们的结果揭示了一种有原则的权衡:如果支付少量的价钱,人们就可以为适应性决策算法提供有效的统计推论。
Article 199
Title@2025-05-29 (4): BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL
Title: BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL | BOFormer: Lernen, Multi-Objektive Bayesian Optimierung über nicht-Markovian RL zu lösen | BOFormer: 学会通过非马尔科维安RL解决多目标巴耶斯最佳利用 2505.21974v2 |
Authors: Yu-Heng Hung, Kai-Jie Lin, Yu-Heng Lin, Chien-Yi Wang, Cheng Sun, Ping-Chun Hsieh
Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the \textit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose \textit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.
贝叶斯优化(BO)为优化黑箱功能提供了高效管道,在Gaussian进程之前和获取功能(AF)的帮助下,优化黑箱功能提供了高效管道。最近,在单一目标BO的背景下,基于学习的AF公司由于其有利的非微型性质,见证了有希望的经验结果。尽管如此,这些方法直接扩展到多目标巴伊西亚优化(MOBO),这得益于MOBO问题的非马尔科维尼亚性质。为了解决这个问题,在非马尔科维尼亚RL文献和变异者在语言建模方面的成功启发下,我们提出了一个普遍深入的Q学习框架,并提议通过序列建模为MOBO提供这种框架。我们通过广泛的评估表明,BOFormer公司不断超越各种合成MOBO和实体-世界多功能化的基于学习的算法。我们公开提供了源代码,以鼓励在这方面进行进一步的研究。
Article 200
Title@2025-05-29 (4): Skywork Open Reasoner 1 Technical Report
Title: Skywork Open Reasoner 1 Technical Report | Skywork Open Reasoner 1 Technischer Bericht | ” 天窗开放理由1 “ 技术报告 2505.22312v2 |
Authors: Jujie He, Jiacai Liu, Chris Yuhao Liu, Rui Yan, Chaojie Wang, Peng Cheng, Xiaoyu Zhang, Fuxiang Zhang, Jiacheng Xu, Wei Shen, Siyuan Li, Liang Zeng, Tianwen Wei, Cheng Cheng, Bo An, Yang Liu, Yahui Zhou
The success of DeepSeek-R1 underscores the significant role of reinforcement learning (RL) in enhancing the reasoning capabilities of large language models (LLMs). In this work, we present Skywork-OR1, an effective and scalable RL implementation for long Chain-of-Thought (CoT) models. Building on the DeepSeek-R1-Distill model series, our RL approach achieves notable performance gains, increasing average accuracy across AIME24, AIME25, and LiveCodeBench from 57.8% to 72.8% (+15.0%) for the 32B model and from 43.6% to 57.5% (+13.9%) for the 7B model. Our Skywork-OR1-32B model surpasses both DeepSeek-R1 and Qwen3-32B on the AIME24 and AIME25 benchmarks, while achieving comparable results on LiveCodeBench. The Skywork-OR1-7B and Skywork-OR1-Math-7B models demonstrate competitive reasoning capabilities among models of similar size. We perform comprehensive ablation studies on the core components of our training pipeline to validate their effectiveness. Additionally, we thoroughly investigate the phenomenon of entropy collapse, identify key factors affecting entropy dynamics, and demonstrate that mitigating premature entropy collapse is critical for improved test performance. To support community research, we fully open-source our model weights, training code, and training datasets.
DeepSeek-R1的成功突显了加强学习(RL)在提高大型语言模型(LLMs)推理能力方面的重要作用。 在这项工作中,我们展示了Skywork-OR1,这是长搜索链(COT)模型的一种有效和可扩展的RL执行。在DeepSeek-R1-Distry模型系列的基础上,我们的REL方法取得了显著的绩效收益,使32B模型的AME24、AIME25和LiveCodeBench的平均准确率从57.8%提高到72.8%(+15.0%),7B模型的推理能力从43.6%提高到57.5%(+13.9%)。我们的Skywork-OR1-32B模型在AIME24和AIME25基准方面超过了DeepStual-R1和Qwen3-32B,同时在LiveCode Bench、Skywork-OR1-7B和Skywork-OR1-Math-7B模型中, 展示了类似规模的竞争性推理推论能力。我们进行了全面的推算研究,我们进行了全面研究,并验证了对关键数据流数据流流流数据流的精度研究,并验证了基础的精度的精度研究。
Article 201
Title@2025-05-29 (4): Tensor Product Attention Is All You Need
Title: Tensor Product Attention Is All You Need | Tensor Produkt-Achtung ist alles, was Sie brauchen | 色素产品 关注是所有你需要的 2501.06425v4 |
Authors: Yifan Zhang, Yifeng Liu, Huizhuo Yuan, Zhen Qin, Yang Yuan, Quanquan Gu, Andrew C Yao
Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we propose Tensor Product Attention (TPA), a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, substantially shrinking the KV cache size at inference time. By factorizing these representations into contextual low-rank components and seamlessly integrating with Rotary Position Embedding (RoPE), TPA achieves improved model quality alongside memory efficiency. Based on TPA, we introduce the Tensor Product Attention Transformer,(T6), a new model architecture for sequence modeling. Through extensive empirical evaluation on language modeling tasks, we demonstrate that T6 surpasses or matches the performance of standard Transformer baselines, including Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped-Query Attention (GQA), and Multi-Head Latent Attention (MLA) across various metrics, including perplexity and a range of established evaluation benchmarks. Notably, TPA’s memory efficiency and computational efficiency at the decoding stage enable processing longer sequences under fixed resource constraints, addressing a critical scalability challenge in modern language models. The code is available at https://github.com/tensorgi/T6.
用于处理较长输入序列的扩缩语言模型通常需要大量关键值缓存(KV),从而在推断过程中产生大量的记忆管理。在本文件中,我们提议Tensor产品注意(TPA),这是一个新式关注机制,它使用高分分解来代表查询、键和紧凑的值,在推论时间大大缩小了KV缓存的大小。通过将这些表达方式纳入上下文低级别组件,并与扶轮定位嵌入器(ROPE)无缝结合,TPA在存储效率的同时提高了模型质量。在TPA的基础上,我们引入了Tensor产品注意变换器(T6),这是一个用于序列建模的新模型架构。通过对语言模型任务进行广泛的经验性评估,我们证明T6超过或匹配标准变换器基线的性能,包括多处注意(MAHA)、多处注意(MQA)、集体-Query 注意(GQA)和多处迟应注意(MLA)等各种指标,包括过硬度和一系列既定评价基准。值得注意的是,TPAA的存储序列可控系统在Sqlabal Scal Scal Scal Scal Procal commal Procal Procal commal Procal competion commal commal competion commal competion competional compeal competion commal commal commal commal commal commal commal commal commal commal commal commal commal comm commal commal commal commal commal comm comm comm comm comm comm comm commal commal commal commal commal commal commal comm comm comm comm comm comm comm comm comm comm comm comm comm comm comm comm comm commcal comm com
Article 202
Title@2025-05-29 (4): Sparseformer: a Transferable Transformer with Multi-granularity Token Sparsification for Medical Time Series Classification
Title: Sparseformer: a Transferable Transformer with Multi-granularity Token Sparsification for Medical Time Series Classification | Sparseformer: ein übertragbarer Transformer mit Multigranularitäts-Tokensparsifikation für die Klassifizierung medizinischer Zeitreihen | 分散式分析器:医疗时间序列分类的可转让变异器,具有多管质质调分法 2503.15578v2 |
Authors: Jiexia Ye, Weiqi Zhang, Ziyue Li, Jia Li, Fugee Tsung
Medical time series (MedTS) classification is crucial for improved diagnosis in healthcare, and yet it is challenging due to the varying granularity of patterns, intricate inter-channel correlation, information redundancy, and label scarcity. While existing transformer-based models have shown promise in time series analysis, they mainly focus on forecasting and fail to fully exploit the distinctive characteristics of MedTS data. In this paper, we introduce Sparseformer, a transformer specifically designed for MedTS classification. We propose a sparse token-based dual-attention mechanism that enables global modeling and token compression, allowing dynamic focus on the most informative tokens while distilling redundant features. This mechanism is then applied to the multi-granularity, cross-channel encoding of medical signals, capturing intra- and inter-granularity correlations and inter-channel connections. The sparsification design allows our model to handle heterogeneous inputs of varying lengths and channels directly. Further, we introduce an adaptive label encoder to address label space misalignment across datasets, equipping our model with cross-dataset transferability to alleviate the medical label scarcity issue. Our model outperforms 12 baselines across seven medical datasets under supervised learning. In the few-shot learning experiments, our model also achieves superior average results. In addition, the in-domain and cross-domain experiments among three diagnostic scenarios demonstrate our model’s zero-shot learning capability. Collectively, these findings underscore the robustness and transferability of our model in various medical applications.
医疗时间序列(MedTS)分类对于改善医疗诊断至关重要,然而,由于模式的颗粒性、渠道间关联的复杂、信息冗余和标签稀缺等不同,这种分类具有挑战性。虽然基于变压器的现有模型在时间序列分析中显示出希望,但主要侧重于预测,未能充分利用MedTS数据的独特特性。在本文中,我们引入了专门为MedTS分类设计的变压器Sprasserector(Sparseexor),这是专门为MedTS分类设计的变压器。我们建议了一种稀疏的象征性双向定位机制,可以进行全球建模和象征性压缩,允许动态地关注信息最丰富的代号,同时蒸馏多余的特性。这个机制随后应用到医疗信号的多频谱性、跨通道编码,捕捉到内部和群体间关联和通道间连接。在模型中,我们模型外加固的标签模型模型,在三个医学标签短缺问题上的交叉传输能力。我们模型外演化了12个实验,在模型中学习了我们的标准级模型。
Article 203
Title@2025-05-29 (4): RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting
Title: RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting | RiverMamba: Ein staatliches Weltraummodell für globale Flussentladung und Hochwasserprognose | RiverMamba:全球河流排泄和洪水预报国家空间模型 2505.22535v2 |
Authors: Mohamad Hakam Shams Eddin, Yikui Zhang, Stefan Kollet, Juergen Gall
Recent deep learning approaches for river discharge forecasting have improved the accuracy and efficiency in flood forecasting, enabling more reliable early warning systems for risk management. Nevertheless, existing deep learning approaches in hydrology remain largely confined to local-scale applications and do not leverage the inherent spatial connections of bodies of water. Thus, there is a strong need for new deep learning methodologies that are capable of modeling spatio-temporal relations to improve river discharge and flood forecasting for scientific and operational applications. To address this, we present RiverMamba, a novel deep learning model that is pretrained with long-term reanalysis data and that can forecast global river discharge and floods on a $0.05^\circ$ grid up to 7 days lead time, which is of high relevance in early warning. To achieve this, RiverMamba leverages efficient Mamba blocks that enable the model to capture global-scale channel network routing and enhance its forecast capability for longer lead times. The forecast blocks integrate ECMWF HRES meteorological forecasts, while accounting for their inaccuracies through spatio-temporal modeling. Our analysis demonstrates that RiverMamba delivers reliable predictions of river discharge, including extreme floods across return periods and lead times, surpassing both operational AI- and physics-based models.
最近,河流排放预测的深层次学习方法提高了洪水预报的准确性和效率,使更可靠的风险管理预警系统得以建立,然而,水文学的现有深层次学习方法仍主要局限于局部应用,没有利用水体固有的空间联系,因此,迫切需要采用新的深层次学习方法,能够模拟地表-时际关系,以改进河流排放和洪水预报,用于科学和业务应用;为此,我们介绍河马巴,这是一个具有新颖的深层次学习模式,经过长期再分析数据培训,可以预测全球河流排放和洪水,耗资0.05 circ$的电网,最多7天,这在预警方面具有高度相关性。为了实现这一点,河曼巴利用高效的曼巴区块,使该模型能够捕捉全球规模的水道网络路线,并加强其预测较长的准备时间。为了解决这个问题,预报区将ECMWFFH HRES气象预报综合起来,同时通过基于地段的模型计算出其不准确性。我们的分析表明,河曼巴提供了可靠的河流排放预测,包括跨回回段的极端洪水。
Article 204
Title@2025-05-29 (4): Accelerating RLHF Training with Reward Variance Increase
Title: Accelerating RLHF Training with Reward Variance Increase | Beschleunigung des RLHF-Trainings mit Belohnungsvarianzsteigerung | 加快RLHF培训,增加奖励差异 2505.23247v1 |
Authors: Zonglin Yang, Zhexuan Gu, Houduo Qi, Yancheng Yuan
Reinforcement learning from human feedback (RLHF) is an essential technique for ensuring that large language models (LLMs) are aligned with human values and preferences during the post-training phase. As an effective RLHF approach, group relative policy optimization (GRPO) has demonstrated success in many LLM-based applications. However, efficient GRPO-based RLHF training remains a challenge. Recent studies reveal that a higher reward variance of the initial policy model leads to faster RLHF training. Inspired by this finding, we propose a practical reward adjustment model to accelerate RLHF training by provably increasing the reward variance and preserving the relative preferences and reward expectation. Our reward adjustment method inherently poses a nonconvex optimization problem, which is NP-hard to solve in general. To overcome the computational challenges, we design a novel $O(n \log n)$ algorithm to find a global solution of the nonconvex reward adjustment model by explicitly characterizing the extreme points of the feasible set. As an important application, we naturally integrate this reward adjustment model into the GRPO algorithm, leading to a more efficient GRPO with reward variance increase (GRPOVI) algorithm for RLHF training. As an interesting byproduct, we provide an indirect explanation for the empirical effectiveness of GRPO with rule-based reward for RLHF training, as demonstrated in DeepSeek-R1. Experiment results demonstrate that the GRPOVI algorithm can significantly improve the RLHF training efficiency compared to the original GRPO algorithm.
从人类反馈中强化学习(RLHF)是确保大型语言模式(LLMS)在培训后阶段与人的价值和偏好相一致的一项必要技术。作为一种有效的RLHF方法,集体相对政策优化(GROP)在许多基于LLM的应用中证明是成功的。然而,基于GROP的高效RLHF培训仍是一项挑战。最近的研究显示,初步政策模式的奖励差异较大,导致更快的RLHF培训。根据这一发现,我们提出了一个实际的奖励调整模式,以加快RLHF培训的进度,具体地增加奖励差异,保持相对的偏好和预期。我们的奖励调整方法必然造成一个非convex优化问题,而这个问题一般难以解决。为了克服计算方面的挑战,我们设计了一个新的$O(n\log)算法,以寻找全球办法解决非Convex奖励调整模式,明确描述可行的RLHF培训的极端点。作为一个重要的应用,我们自然地将这一奖励调整模式纳入GROPO的算法,从而实现更高效的REGRO-LLLLL培训的深度解释,通过令人感兴趣的RGROGV规则,为我们提供了一种令人感兴趣的标准的升级的增值培训结果。
Article 205
Title@2025-05-29 (4): Measuring Participant Contributions in Decentralized Federated Learning
Title: Measuring Participant Contributions in Decentralized Federated Learning | Messung der Teilnehmerbeiträge im dezentralisierten Föderierten Lernen | 分权联邦学习中的衡量参与者贡献 2505.23246v1 |
Authors: Honoka Anada, Tatsuya Kaneko, Shinya Takamaeda-Yamazaki
Federated learning (FL) enables multiple clients to collaboratively train models without sharing their data. Measuring participant contributions in FL is crucial for incentivizing clients and ensuring transparency. While various methods have been proposed for contribution measurement, they are designed exclusively for centralized federated learning (CFL), where a central server collects and aggregates client models, along with evaluating their contributions. Meanwhile, decentralized federated learning (DFL), in which clients exchange models directly without a central server, has gained significant attention for mitigating communication bottlenecks and eliminating a single point of failure. However, applying existing contribution measurement methods to DFL is challenging due to the presence of multiple global models and the absence of a central server. In this study, we present novel methodologies for measuring participant contributions in DFL. We first propose DFL-Shapley, an extension of the Shapley value tailored for DFL, adapting this widely used CFL metric to decentralized settings. Given the impracticality of computing the ideal DFL-Shapley in real-world systems, we introduce DFL-MR, a computable approximation that estimates overall contributions by accumulating round-wise Shapley values. We evaluate DFL-Shapley and DFL-MR across various FL scenarios and compare them with existing CFL metrics. The experimental results confirm DFL-Shapley as a valid ground-truth metric and demonstrate DFL-MR’s proximity to DFL-Shapley across various settings, highlighting their effectiveness as contribution metrics in DFL.
联邦学习(FL)使多个客户能够在不分享数据的情况下合作培训模型。衡量FL中的参与者贡献对于激励客户和确保透明度至关重要。虽然提出了各种衡量贡献的方法,但是这些方法只用于中央联合学习(CFL),中央服务器收集并汇总客户模式,同时评价其贡献。与此同时,分散化的联邦学习(DFL),客户在没有中央服务器的情况下直接交换模式,从而在减少通信瓶颈和消除单一的失败点方面得到了极大关注。然而,由于存在多种全球模式和缺乏中央服务器,将现有捐款计量方法应用于DLFL是具有挑战性的。我们在本研究中提出了衡量参与者贡献的新方法。我们首先提议DFL-Shapley,这是为DFLD定制的“普利值”的延伸,将这一广泛使用的CFLLM(D-Shapley-Shapley)指标应用于分散的环境。鉴于在现实世界系统中计算理想的DFLFL-S-Shaplay(DFL-D-SL)指标不切实际的难度,我们引入DL-ML-MR,通过不断积累的莎平面的莎平面和CL(C-R)和直径(BL) 将现有成本(我们评估)和直径(DFLFLFL-R-R-R-R-R-R-R)的模型的模型(我们评估)和各种)的模拟的模拟的模型,评估,评估,将现有的模型作为整个的模拟的模拟的模拟的模拟的模拟的模型作为对各种结果。
Article 206
Title@2025-05-29 (4): Are You Using Reliable Graph Prompts? Trojan Prompt Attacks on Graph Neural Networks
Title: Are You Using Reliable Graph Prompts? Trojan Prompt Attacks on Graph Neural Networks | Verwenden Sie zuverlässige Graph-Prompts? Trojanische Prompt-Angriffe auf Graph-Neural-Netzwerke | 你用的是可靠图形提示吗? Trojan对图形神经网络的迅速攻击 2410.13974v2 |
Authors: Minhua Lin, Zhiwei Zhang, Enyan Dai, Zongyu Wu, Yilong Wang, Xiang Zhang, Suhang Wang
Graph Prompt Learning (GPL) has been introduced as a promising approach that uses prompts to adapt pre-trained GNN models to specific downstream tasks without requiring fine-tuning of the entire model. Despite the advantages of GPL, little attention has been given to its vulnerability to backdoor attacks, where an adversary can manipulate the model’s behavior by embedding hidden triggers. Existing graph backdoor attacks rely on modifying model parameters during training, but this approach is impractical in GPL as GNN encoder parameters are frozen after pre-training. Moreover, downstream users may fine-tune their own task models on clean datasets, further complicating the attack. In this paper, we propose TGPA, a backdoor attack framework designed specifically for GPL. TGPA injects backdoors into graph prompts without modifying pre-trained GNN encoders and ensures high attack success rates and clean accuracy. To address the challenge of model fine-tuning by users, we introduce a finetuning-resistant poisoning approach that maintains the effectiveness of the backdoor even after downstream model adjustments. Extensive experiments on multiple datasets under various settings demonstrate the effectiveness of TGPA in compromising GPL models with fixed GNN encoders.
快速化图形学习(GPL)已作为一种很有希望的方法被引入为一种很有希望的方法,它使用快速的来使经过预先训练的GNN模型适应具体的下游任务,而无需对整个模型进行微调。尽管GPL的优点,它却很少注意其易受幕后攻击的脆弱性,即对手可以通过嵌入隐藏的触发器来操纵模型的行为。现有的图形后门攻击依靠的是修改培训期间的模型参数,但在GPL中这种做法是不切实际的,因为GNN编码参数在培训前被冻结。此外,下游用户可能会在清洁数据集方面微调自己的任务模型,使攻击进一步复杂化。在本文件中,我们提议TGPA是专门为GPL设计的后门攻击框架。TGPA将后门注入图形提示,而不修改经过预先训练的GNNNC编码器,确保高攻击成功率和干净的准确性。为了应对模型用户微调的挑战,我们引入了微调耐毒药的方法,即使在下游模型调整之后仍后门的功效。在各种环境下对多个数据集进行广泛的试验,展示了GGPL固定模型。
Article 207
Title@2025-05-29 (4): Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts
Title: Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Autonome Datenauswahl mit Zero-shot Generative Klassifikatoren für mathematische Texte | 具有数学文本零光生成分类器的自动数据选择 2402.07625v6 |
Authors: Yifan Zhang, Yifan Luo, Yang Yuan, Andrew C Yao
We present Autonomous Data Selection (AutoDS), a method that leverages base language models themselves as zero-shot “generative classifiers” to automatically curate high-quality mathematical texts. Unlike prior approaches that require human annotations or training a dedicated data filter, AutoDS relies solely on a model’s logits to determine whether a given passage is mathematically informative and educational. By integrating AutoDS into a continual pretraining pipeline, we substantially boost downstream performance on challenging math benchmarks (MATH, GSM8K, and BBH) while using far fewer tokens than previous methods. Empirically, our approach achieves roughly a twofold improvement in pretraining token efficiency over strong baselines, underscoring the potential of self-directed data selection in enhancing mathematical reasoning. We release our curated AutoMathText dataset to facilitate future research in automated domain-specific data curation. The AutoMathText dataset is available at https://huggingface.co/datasets/math-ai/AutoMathText. The code is available at https://github.com/yifanzhang-pro/AutoMathText.
我们推出自动数据选择(AutoDS) , 这是一种将基本语言模型本身作为零光“ 遗传分类器” 来自动翻译高质量数学文本的方法。 与以前要求人为说明或培训专用数据过滤器的方法不同, AutoDS 完全依靠模型的登录来确定某一特定通道是否具有数学上的信息和教育性。 通过将AutoDS纳入持续的培训前管道,我们大大提升了具有挑战性的数学基准(MATH、GSM8K和BBH)的下游性能,同时使用远比以往少得多的符号。 从目前来看,我们的方法在强化基线的预培训标语效率上取得了双重改进,强调了在加强数学推理过程中自行选择数据的潜力。 我们发行了我们经过校准的AutoMathText数据集,以促进未来在自动特定域数据曲线上的研究。 AutoMatext数据集可在https://huggingface.co/datasts/math-ai/AutoMatthText上查阅。 代码可在 https://github.com/yfanzhah- promatthTextText查阅。
Article 208
Title@2025-05-29 (4): Equivalence of stochastic and deterministic policy gradients
Title: Equivalence of stochastic and deterministic policy gradients | Gleichwertigkeit stochastischer und deterministischer politischer Gradienten | 政策梯度和确定性政策梯度等同 2505.23244v1 |
Authors: Emo Todorov
Policy gradients in continuous control have been derived for both stochastic and deterministic policies. Here we study the relationship between the two. In a widely-used family of MDPs involving Gaussian control noise and quadratic control costs, we show that the stochastic and deterministic policy gradients, natural gradients, and state value functions are identical; while the state-control value functions are different. We then develop a general procedure for constructing an MDP with deterministic policy that is equivalent to a given MDP with stochastic policy. The controls of this new MDP are the sufficient statistics of the stochastic policy in the original MDP. Our results suggest that policy gradient methods can be unified by approximating state value functions rather than state-control value functions.
连续控制的政策梯度是用于随机和确定性政策的政策梯度。 我们在这里研究两者之间的关系。 在涉及高山控制噪音和二次控制成本的多用途产品系列中,我们显示,随机和确定性政策梯度、自然梯度和国家价值功能是相同的;虽然国家控制值功能不同。 然后,我们制定了一个一般程序,用以构建一个具有确定性政策的多用途产品多用途产品多用途产品多用途产品,该政策相当于具有随机政策的某个多用途产品多用途产品多用途产品。 这一新多用途产品多用途产品的控制是原始多用途产品多用途产品多用途产品的充分统计。 我们的结果表明,政策梯度方法可以通过近似国家价值功能而不是国家控制值功能来统一。
Article 209
Title@2025-05-29 (4): Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
Title: Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game | Sprachagenten mit Verstärkung Lernen für strategisches Spiel im Werwolf Spiel | 在狼人游戏中进行战略游戏强化学习的语文代理 2310.18940v4 |
Authors: Zelai Xu, Chao Yu, Fei Fang, Yu Wang, Yi Wu
Agents built with large language models (LLMs) have shown great potential across a wide range of domains. However, in complex decision-making tasks, pure LLM-based agents tend to exhibit intrinsic bias in their choice of actions, which is inherited from the model’s training data and results in suboptimal performance. To develop strategic language agents, i.e., agents that generate flexible language actions and possess strong decision-making abilities, we propose a novel framework that powers LLM-based agents with reinforcement learning (RL). We consider Werewolf, a popular social deduction game, as a challenging testbed that emphasizes versatile communication and strategic gameplay. To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates. Then an RL policy trained to optimize the decision-making ability chooses an action from the candidates to play in the game. Extensive experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game. We also conduct human-agent experiments and find that our agents achieve human-level performance and demonstrate strong strategic play.
然而,在复杂的决策任务中,纯粹的LLM代理商往往在选择行动时表现出内在的偏见,这种偏见是从该模式的培训数据所继承的,其结果不尽人意。为了发展战略语言代理商,即产生灵活语言行动和拥有强大决策能力的代理商,我们提议了一个赋予LLM代理商以强化学习能力的新框架。我们认为Wrewolf是一种流行的社会推理游戏,是一种具有挑战性的试金,它强调多功能的沟通和战略游戏。为了减轻语言行动的内在偏见,我们的代理商利用LLM进行推理推理和产生一套不同的行动候选人。然后,为优化决策能力而培训的RL政策从候选人中选择了在游戏中玩的动作。广泛的实验表明,我们的代理商克服了内在的偏见,超越了在Werewolf游戏中现有的LM代理商。我们还进行人力代理实验,发现我们的代理商取得了人的水平表现并展示了强有力的战略游戏。
Article 210
Title@2025-05-29 (4): Joint estimation of smooth graph signals from partial linear measurements
Title: Joint estimation of smooth graph signals from partial linear measurements | Gemeinsame Schätzung glatter Graphensignale aus partiellen linearen Messungen | 对部分线性测量得出的平滑图示信号的联合估计 2505.23240v1 |
Authors: Hemant Tyagi
Given an undirected and connected graph $G$ on $T$ vertices, suppose each vertex $t$ has a latent signal $x_t \in \mathbb{R}^n$ associated to it. Given partial linear measurements of the signals, for a potentially small subset of the vertices, our goal is to estimate $x_t$’s. Assuming that the signals are smooth w.r.t $G$, in the sense that the quadratic variation of the signals over the graph is small, we obtain non-asymptotic bounds on the mean squared error for jointly recovering $x_t$’s, for the smoothness penalized least squares estimator. In particular, this implies for certain choices of $G$ that this estimator is weakly consistent (as $T \rightarrow \infty$) under potentially very stringent sampling, where only one coordinate is measured per vertex for a vanishingly small fraction of the vertices. The results are extended to a multi-layer'' ranking problem where $x_t$ corresponds to the latent strengths of a collection of $n$ items, and noisy pairwise difference measurements are obtained at each
layer’’ $t$ via a measurement graph $G_t$. Weak consistency is established for certain choices of $G$ even when the individual $G_t$’s are very sparse and disconnected.
假设一个未引导且连接的图形$G$美元对美元的顶端值值, 假设每个顶端美元都有一个隐性信号 $x_ t $ $ 美元 $ mathbb{Rn 美元 。 如果对信号进行部分线性测量, 对于潜在的一小部分顶端, 我们的目标是估算$x t$ 美元 。 假设信号是平滑的 w.r. t G$ 美元 , 也就是说, 图形上的信号的四面形变化很小, 我们得到的是每个顶端的平方误差上的非防线, 以共同恢复$x t$, 以平整平坦的方式处罚最小的正方位数 。 特别是, 对于某些G$的选择, 这个顶端值可能不太一致( 如$T r.r. t. t. g. 美元) , 假设在可能非常严格的取样中, 只能测量每个顶端值的顶端值只有一个坐标, 以消失的每平坦度选择。 结果被扩展为每平方平方平方平方美元, 美元, 每平方平方平方平方平方平方平方平方平方平方平方的测量问题, 美元。
Article 211
Title@2025-05-29 (4): Learn Singularly Perturbed Solutions via Homotopy Dynamics
Title: Learn Singularly Perturbed Solutions via Homotopy Dynamics | Singulär perturbed Lösungen über Homotopy Dynamics lernen | 通过智多基动力学学习单点受扰动的解决方案 2502.00488v3 |
Authors: Chuqi Chen, Yahong Yang, Yang Xiang, Wenrui Hao
Solving partial differential equations (PDEs) using neural networks has become a central focus in scientific machine learning. Training neural networks for singularly perturbed problems is particularly challenging due to certain parameters in the PDEs that introduce near-singularities in the loss function. In this study, we overcome this challenge by introducing a novel method based on homotopy dynamics to effectively manipulate these parameters. From a theoretical perspective, we analyze the effects of these parameters on training difficulty in these singularly perturbed problems and establish the convergence of the proposed homotopy dynamics method. Experimentally, we demonstrate that our approach significantly accelerates convergence and improves the accuracy of these singularly perturbed problems. These findings present an efficient optimization strategy leveraging homotopy dynamics, offering a robust framework to extend the applicability of neural networks for solving singularly perturbed differential equations.
使用神经网络解决部分差异方程式(PDEs)已成为科学机器学习的中心焦点。由于PDEs中的某些参数在损失函数中引入了近同质元素,因此,为奇特受扰动的问题培训神经网络尤其具有挑战性。在本研究中,我们通过采用基于同质动态的新颖方法来有效操控这些参数,克服了这一挑战。从理论角度看,我们分析了这些参数对这些奇特受扰动问题培训难度的影响,并建立了拟议的同质动态方法的趋同。实验来看,我们证明我们的方法大大加快了这些奇异受扰的问题的趋同,提高了这些问题的准确性。这些研究结果展示了利用同质动态的高效优化战略,提供了一个强大的框架,以扩大神经网络在解决奇受扰动差异方程式方面的适用性。
Article 212
Title@2025-05-29 (4): HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
Title: HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model | HiDe-LlaVA: Hierarchische Entkopplung zur kontinuierlichen Instruktionstuning von multimodalen Großsprachenmodellen | HIDE-LLALAVA:多式大语言模式连续教学制导的等级脱钩 2503.12941v2 |
Authors: Haiyang Guo, Fanhu Zeng, Ziwei Xiang, Fei Zhu, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu
Instruction tuning is widely used to improve a pre-trained Multimodal Large Language Model (MLLM) by training it on curated task-specific datasets, enabling better comprehension of human instructions. However, it is infeasible to collect all possible instruction datasets simultaneously in real-world scenarios. Thus, enabling MLLM with continual instruction tuning is essential for maintaining their adaptability. However, existing methods often trade off memory efficiency for performance gains, significantly compromising overall efficiency. In this paper, we propose a task-specific expansion and task-general fusion framework based on the variations in Centered Kernel Alignment (CKA) similarity across different model layers when trained on diverse datasets. Furthermore, we analyze the information leakage present in the existing benchmark and propose a new and more challenging benchmark to rationally evaluate the performance of different methods. Comprehensive experiments showcase a significant performance improvement of our method compared to existing state-of-the-art methods. Code and dataset are released at https://github.com/Ghy0501/HiDe-LLaVA.
教学调整被广泛用来改进经过事先训练的多式大语言模型(MLLM),方法是对它进行关于具体任务数据集的培训,以便更好地了解人类的指令;然而,在现实世界情景中,不可能同时收集所有可能的指令数据集;因此,使MLLM能够不断进行教学调整,对于保持其适应性至关重要;然而,现有方法往往以业绩增益来交换记忆效率,从而大大降低总体效率;在本文件中,我们建议,根据在就不同数据集进行的培训中枢 Kernel对齐(CKA) 不同模型层的变异,建立任务扩展和任务一般融合框架;此外,我们分析现有基准中的信息渗漏情况,并提出新的、更具挑战性的基准,以合理评估不同方法的绩效;全面试验显示,我们的方法与现有“最先进”方法相比,业绩有显著改进。代码和数据集公布在https://github.com/Ghy0501/Hide-LLAVA。
Article 213
Title@2025-05-29 (4): Graph Random Walk with Feature-Label Space Alignment: A Multi-Label Feature Selection Method
Title: Graph Random Walk with Feature-Label Space Alignment: A Multi-Label Feature Selection Method | Graph Random Walk mit Feature-Label-Raumausrichtung: Eine Multi-Label-Feature-Auswahlmethode | 带有地貌标签空间对齐的任意漫步图图 : 多标签特征选择方法 2505.23228v1 |
Authors: Wanfu Gao, Jun Gao, Qingqi Han, Hanlin Pan, Kunpeng Liu
The rapid growth in feature dimension may introduce implicit associations between features and labels in multi-label datasets, making the relationships between features and labels increasingly complex. Moreover, existing methods often adopt low-dimensional linear decomposition to explore the associations between features and labels. However, linear decomposition struggles to capture complex nonlinear associations and may lead to misalignment between the feature space and the label space. To address these two critical challenges, we propose innovative solutions. First, we design a random walk graph that integrates feature-feature, label-label, and feature-label relationships to accurately capture nonlinear and implicit indirect associations, while optimizing the latent representations of associations between features and labels after low-rank decomposition. Second, we align the variable spaces by leveraging low-dimensional representation coefficients, while preserving the manifold structure between the original high-dimensional multi-label data and the low-dimensional representation space. Extensive experiments and ablation studies conducted on seven benchmark datasets and three representative datasets using various evaluation metrics demonstrate the superiority of the proposed method\footnote{Code: https://github.com/Heilong623/-GRW-}.
多标签数据集的特征和标签的迅速增长可能会导致多标签数据集的特征和标签之间的隐含关联,使特征和标签之间的关系日益复杂;此外,现有方法往往采用低维线性分解法来探索特征和标签之间的关联;然而,线性分解挣扎以捕捉复杂的非线性关联,并可能导致特征空间和标签空间之间的错位。为了应对这两个关键挑战,我们提出了创新的解决办法。首先,我们设计一个随机行进图,将特征-特点、标签-标签和特征-标签关系结合起来,以准确捕捉非线性和非线性间接关联,同时在低级别脱压缩后优化特征和标签之间的潜在关联。第二,我们通过利用低度代表系数来调整变量空间,同时保持原高维多标签数据和低维度代表空间之间的多重结构。我们利用各种评价指标对七个基准数据集和三个具有代表性的数据集进行了广泛的实验和对比研究,展示了拟议方法的优越性:https://github.wcom/Hegry3/HI。
Article 214
Title@2025-05-29 (4): am-ELO: A Stable Framework for Arena-based LLM Evaluation
Title: am-ELO: A Stable Framework for Arena-based LLM Evaluation | am-ELO: Ein stabiles Rahmenwerk für Arena-basierte LLM-Evaluierung | AM-ELO:基于竞技场的LLM评价稳定框架 2505.03475v2 |
Authors: Zirui Liu, Jiatong Li, Yan Zhuang, Qi Liu, Shuanghong Shen, Jie Ouyang, Mingyue Cheng, Shijin Wang
Arena-based evaluation is a fundamental yet significant evaluation paradigm for modern AI models, especially large language models (LLMs). Existing framework based on ELO rating system suffers from the inevitable instability problem due to ranking inconsistency and the lack of attention to the varying abilities of annotators. In this paper, we introduce a novel stable arena framework to address these issues by enhancing the ELO Rating System. Specifically, we replace the iterative update method with a Maximum Likelihood Estimation (MLE) approach, m-ELO, and provide theoretical proof of the consistency and stability of the MLE approach for model ranking. Additionally, we proposed the am-ELO, which modify the Elo Rating’s probability function to incorporate annotator abilities, enabling the simultaneous estimation of model scores and annotator reliability. Experiments demonstrate that this method ensures stability, proving that this framework offers a more robust, accurate, and stable evaluation method for LLMs.
以Arena为基础的评价是现代AI模型,特别是大型语言模型的一个基本而重要的评价范例。基于ELO评级制度的现有框架由于排名不一致和对说明者能力的不同缺乏重视而不可避免地面临不稳定问题。在本文件中,我们引入了一个新的稳定的舞台框架,通过加强ELO评级制度解决这些问题。具体地说,我们用最大相似估计法(MLE)取代迭代更新方法,m-ELO, 并提供理论证据,证明模型排名MLE方法的一致性和稳定性。此外,我们提议了修改Elo Raiting概率功能的AM-ELO, 以纳入说明者能力,使模型分数和说明者可靠性能够同时估算。实验表明,这一方法确保了稳定性,证明这一框架为LLMMS提供了更加可靠、准确和稳定的评价方法。
Article 215
Title@2025-05-29 (4): Generalizability vs. Counterfactual Explainability Trade-Off
Title: Generalizability vs. Counterfactual Explainability Trade-Off | Generalisierbarkeit vs. gegenfaktische Erklärbarkeit Trade-Off | 通用与反事实解释 2505.23225v1 |
Authors: Fabiano Veglianti, Flavio Giorgi, Fabrizio Silvestri, Gabriele Tolomei
In this work, we investigate the relationship between model generalization and counterfactual explainability in supervised learning. We introduce the notion of $\varepsilon$-valid counterfactual probability ($\varepsilon$-VCP) – the probability of finding perturbations of a data point within its $\varepsilon$-neighborhood that result in a label change. We provide a theoretical analysis of $\varepsilon$-VCP in relation to the geometry of the model’s decision boundary, showing that $\varepsilon$-VCP tends to increase with model overfitting. Our findings establish a rigorous connection between poor generalization and the ease of counterfactual generation, revealing an inherent trade-off between generalization and counterfactual explainability. Empirical results validate our theory, suggesting $\varepsilon$-VCP as a practical proxy for quantitatively characterizing overfitting.
在这项工作中,我们调查了模型的概括化和在监督的学习中反事实解释之间的关系。我们引入了 $\ varepsilon$-valid evactial objective (\ varepsilon$-VCP) 的概念 – – 在其 $\ varepsilon$-neiborbority内找到一个数据点的扰动的概率,从而导致标签的改变。我们在模型决定边界的几何学上对$\ varepsilon$-VCP 提供了理论分析,表明美元-VCP 往往随着模型的安装而增加。我们的调查结果在差的概括化和反事实生成的容易性之间建立了严格的联系,揭示了一般化和反事实解释之间的内在权衡。 实证结果证实了我们的理论,认为 $\ varepslon$-VCP 是量化过度配置的实用代言。
Article 216
Title@2025-05-29 (4): JANET: Joint Adaptive predictioN-region Estimation for Time-series
Title: JANET: Joint Adaptive predictioN-region Estimation for Time-series | JANET: Gemeinsame adaptive Vorhersage-Region Schätzung für Zeitreihen | JANET: 时间序列联合适应性预测N-区域估算 2407.06390v2 |
Authors: Eshant English, Eliot Wong-Toi, Matteo Fontana, Stephan Mandt, Padhraic Smyth, Christoph Lippert
Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches struggle to handle multi-step ahead prediction tasks, where uncertainty estimates across multiple future time points are crucial. We propose JANET (Joint Adaptive predictioN-region Estimation for Time-series), a novel framework for constructing conformal prediction regions that are valid for both univariate and multivariate time series. JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlled K-familywise error rates, enabling flexible adaptation to specific application needs. Our empirical evaluation demonstrates JANET’s superior performance in multi-step prediction tasks across diverse time series datasets, highlighting its potential for reliable and interpretable uncertainty quantification in sequential data.
非正式预测为机器学习模型提供了提供理论保障的预测数据集,但互换性的基本假设限制了其对时间序列数据的适用性。此外,现有方法在努力处理多步前的预测任务,而今后多个时间点的不确定性估计至关重要。我们提议JANET(联合适应性预测-N-区域对时间序列的估算),这是构建适用于单体和多变时间序列的符合性预测区域的新框架。JANET概括了进化符合性框架,并有效地生成了带有可控K-家庭误差率的联合预测区域,使得能够灵活适应具体应用需要。我们的经验评估表明,JANET在跨不同时间序列数据集的多步预测任务中表现优异,突出了其在连续数据中可靠和可解释的不确定性量化的潜力。
Article 217
Title@2025-05-29 (4): A Signed Graph Approach to Understanding and Mitigating Oversmoothing in GNNs
Title: A Signed Graph Approach to Understanding and Mitigating Oversmoothing in GNNs | Ein signierter Graphansatz zum Verständnis und zur Milderung von Übersäuerung in GNNs | 签署《理解和减缓全球NNNs中过度过度使用问题图表方法》 2502.11394v2 |
Authors: Jiaqi Wang, Xinyi Wu, James Cheng, Yifei Wang
Deep graph neural networks (GNNs) often suffer from oversmoothing, where node representations become overly homogeneous with increasing depth. While techniques like normalization, residual connections, and edge dropout have been proposed to mitigate oversmoothing, they are typically developed independently, with limited theoretical understanding of their underlying mechanisms. In this work, we present a unified theoretical perspective based on the framework of signed graphs, showing that many existing strategies implicitly introduce negative edges that alter message-passing to resist oversmoothing. However, we show that merely adding negative edges in an unstructured manner is insufficient-the asymptotic behavior of signed propagation depends critically on the strength and organization of positive and negative edges. To address this limitation, we leverage the theory of structural balance, which promotes stable, cluster-preserving dynamics by connecting similar nodes with positive edges and dissimilar ones with negative edges. We propose Structural Balanced Propagation (SBP), a plug-and-play method that assigns signed edges based on either labels or feature similarity to explicitly enhance structural balance in the constructed signed graphs. Experiments on nine benchmarks across both homophilic and heterophilic settings demonstrate that SBP consistently improves classification accuracy and mitigates oversmoothing, even at depths of up to 300 layers. Our results provide a principled explanation for prior oversmoothing remedies and introduce a new direction for signed message-passing design in deep GNNs.
深图形内心网络(GNNS)往往受到过度透透的困扰,因为节点表现随着深度的提高而变得过于相似。 虽然有人提议采用正常化、剩余连接和边缘辍学等技术来缓解过度透透析,但它们通常是独立开发的,其基本机制的理论理解有限。 在这项工作中,我们以签名图表框架为基础提出一个统一的理论观点,表明许多现有战略隐含着改变信息传递以抵制过度透析的负面边缘。然而,我们表明,仅仅以非结构化方式增加负边缘是不够的,即签名传播的无弹性行为严重依赖正面和负边的力度和组织。为了应对这一局限性,我们利用结构平衡理论,通过将类似节点与正边和反面的偏差联系起来,促进稳定、集束保留动态。我们提出了结构平衡(SBP ) , 一种基于标签或特征为明确加强已签名的图表结构平衡而增加的深度偏差( ) 不够充分。
Article 218
Title@2025-05-29 (4): Daunce: Data Attribution through Uncertainty Estimation
Title: Daunce: Data Attribution through Uncertainty Estimation | Daunce: Datenzuweisung durch Unsicherheitsabschätzung | Daunce:通过不确定性估计数据归属 2505.23223v1 |
Authors: Xingyuan Pan, Chenlu Ye, Joseph Melkonian, Jiaqi W. Ma, Tong Zhang
Training data attribution (TDA) methods aim to identify which training examples influence a model’s predictions on specific test data most. By quantifying these influences, TDA supports critical applications such as data debugging, curation, and valuation. Gradient-based TDA methods rely on gradients and second-order information, limiting their applicability at scale. While recent random projection-based methods improve scalability, they often suffer from degraded attribution accuracy. Motivated by connections between uncertainty and influence functions, we introduce Daunce - a simple yet effective data attribution approach through uncertainty estimation. Our method operates by fine-tuning a collection of perturbed models and computing the covariance of per-example losses across these models as the attribution score. Daunce is scalable to large language models (LLMs) and achieves more accurate attribution compared to existing TDA methods. We validate Daunce on tasks ranging from vision tasks to LLM fine-tuning, and further demonstrate its compatibility with black-box model access. Applied to OpenAI’s GPT models, our method achieves, to our knowledge, the first instance of data attribution on proprietary LLMs.
培训数据归属(TDA)方法旨在确定哪些培训范例影响具体测试数据模型的预测。通过量化这些影响,TDA支持关键应用,如数据调试、整理和估值。基于梯度的TDA方法依赖梯度和二级信息,限制其规模的适用性。虽然最近的随机预测方法提高了可缩放性,但它们往往会受到可缩放性差的准确性。受不确定性和影响功能之间联系的驱动,我们引入了Daunce(一种简单而有效的数据归属方法,通过不确定性估计。我们的方法是对这些模型的渗透模型进行微调,并计算这些模型中每项损失的共差值,作为属性分。Daunce对大语言模型(LLLMS)具有可缩放性,并比现有的TDA方法实现更准确的归属。我们验证Daunce(Daunce)的任务从愿景任务到LM微调,并进一步证明它与黑箱模型访问的兼容性。我们的方法适用于OpenAI的GPTM模型, 我们的方法达到我们的知识,即拥有LM的数据归属的第一个例子。
Article 219
Title@2025-05-29 (4): Trajectory Generator Matching for Time Series
Title: Trajectory Generator Matching for Time Series | Trajektorie Generator passend für Zeitreihen | 时间序列匹配轨迹生成器 2505.23215v1 |
Authors: T. Jahn, J. Chemseddine, P. Hagemann, C. Wald, G. Steidl
Accurately modeling time-continuous stochastic processes from irregular observations remains a significant challenge. In this paper, we leverage ideas from generative modeling of image data to push the boundary of time series generation. For this, we find new generators of SDEs and jump processes, inspired by trajectory flow matching, that have the marginal distributions of the time series of interest. Specifically, we can handle discontinuities of the underlying processes by parameterizing the jump kernel densities by scaled Gaussians that allow for closed form formulas of the corresponding Kullback-Leibler divergence in the loss. Unlike most other approaches, we are able to handle irregularly sampled time series.
从非正常观测中精确地模拟持续时间的随机过程仍是一项重大挑战。 在本文中,我们利用图像数据基因模型模型的创意来推动时间序列生成的界限。 为此,我们发现由轨迹流量匹配所启发的SDEs和跳跃过程的新生成器,这些生成器具有时间序列的边际分布。 具体地说,我们可以通过将跳动内核密度作为参数来应对基本过程的不连续性, 由比例尺的高斯人来参数, 从而允许相应的 Kullback- Leibler 差异的封闭形式公式。 与大多数其他方法不同, 我们能够处理不规则的抽样时间序列 。
Article 220
Title@2025-05-29 (4): Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Title: Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model | Engere Datenschutzprüfung von DP-SGD im Hidden State Threat Model | 对隐藏国家威胁模式DP-SGD的更严格隐私审计 2405.14457v3 |
Authors: Tudor Cebere, Aurélien Bellet, Nicolas Papernot
Machine learning models can be trained with formal privacy guarantees via differentially private optimizers such as DP-SGD. In this work, we focus on a threat model where the adversary has access only to the final model, with no visibility into intermediate updates. In the literature, this hidden state threat model exhibits a significant gap between the lower bound from empirical privacy auditing and the theoretical upper bound provided by privacy accounting. To challenge this gap, we propose to audit this threat model with adversaries that craft a gradient sequence designed to maximize the privacy loss of the final model without relying on intermediate updates. Our experiments show that this approach consistently outperforms previous attempts at auditing the hidden state model. Furthermore, our results advance the understanding of achievable privacy guarantees within this threat model. Specifically, when the crafted gradient is inserted at every optimization step, we show that concealing the intermediate model updates in DP-SGD does not enhance the privacy guarantees. The situation is more complex when the crafted gradient is not inserted at every step: our auditing lower bound matches the privacy upper bound only for an adversarially-chosen loss landscape and a sufficiently large batch size. This suggests that existing privacy upper bounds can be improved in certain regimes.
通过DP-SGD等不同的私人优化设备,可以对机器学习模式进行正式的隐私保障培训。 在这项工作中,我们侧重于一个威胁模式,对手只能使用最后模式,中间更新没有可见度。在文献中,这种隐蔽的国家威胁模式在经验隐私审计的较低约束与隐私会计提供的理论上限之间存在巨大差距。为了挑战这一差距,我们提议与对手一起审计这一威胁模式,这些对手设计了一个梯度序列,目的是在不依赖中间更新的情况下最大限度地减少最后模式的隐私损失。我们的实验表明,这一方法始终优于先前审计隐蔽状态模式的尝试。此外,我们的成果促进了对这一威胁模式中可实现的隐私保障的理解。具体地说,在每次优化步骤插入精心设计的梯度时,我们表明,隐藏DP-SGD的中间模型更新并不增强隐私保障。如果不是每一步都插入精心设计的梯度,那么情况就更加复杂:我们的审计就更低约束了隐私上限,只有对敌对式混合损失场景和足够大批量尺寸。这表明,现有的隐私上限在某些制度中是可以改进的。
Article 221
Title@2025-05-29 (4): Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
Title: Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces | Verbesserung der parallelen Programmleistung mit LLM-Optimierern über Agent-System-Schnittstellen | 通过代理-系统接口改进与LLM优化器的平行方案绩效 2410.15625v3 |
Authors: Anjiang Wei, Allen Nie, Thiago S. F. X. Teixeira, Rohan Yadav, Wonchan Lee, Ke Wang, Alex Aiken
Modern scientific discovery increasingly relies on high-performance computing for complex modeling and simulation. A key challenge in improving parallel program performance is efficiently mapping tasks to processors and data to memory, a process dictated by intricate, low-level system code known as mappers. Developing high-performance mappers demands days of manual tuning, posing a significant barrier for domain scientists without systems expertise. We introduce a framework that automates mapper development with generative optimization, leveraging richer feedback beyond scalar performance metrics. Our approach features the Agent-System Interface, which includes a Domain-Specific Language (DSL) to abstract away the low-level complexity of system code and define a structured search space, as well as AutoGuide, a mechanism that interprets raw execution output into actionable feedback. Unlike traditional reinforcement learning methods such as OpenTuner, which rely solely on scalar feedback, our method finds superior mappers in far fewer iterations. With just 10 iterations, it outperforms OpenTuner even after 1000 iterations, achieving 3.8X faster performance. Our approach finds mappers that surpass expert-written mappers by up to 1.34X speedup across nine benchmarks while reducing tuning time from days to minutes.
现代科学发现日益依赖高性能计算来进行复杂的建模和模拟。 改进平行程序性能的一个关键挑战是高效地绘制处理器和数据到记忆的处理器和数据的工作,这一过程由复杂、低层次的系统代码(即映射器)所决定。 开发高性能绘图师需要数日人工调整,这对没有系统专长的域科学家构成了巨大的障碍。 我们引入了一个框架,使成像开发自动成像,使其具有基因化优化,使更丰富的反馈超过缩微性能度量度尺度。 我们的方法特征是代理系统-系统界面,包括一个DSL(DSL)来抽取系统代码的低度复杂度,并定义结构搜索空间,以及AutoGuide(一个将原始执行输出解释为可操作反馈的机制) 。 与OpenTuner(OpenTuner)等传统的强化学习方法不同, 我们的方法仅依靠缩放反馈, 其发现高级地图师在更小得多的迭。 我们的方法在10次的外, 它比OpenTuster(OnTustry-TultalTustr)更接近于1000次后, 实现3.X更快的功能。 我们的方法从超过专家写地图数日,同时将速度调整到1.34时间调整至1.34时间到1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Article 222
Title@2025-05-29 (4): On the performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models
Title: On the performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models | Über die Leistung von Monte Carlo mit maschinellem Lernen bei der Probenahme von einfachen Modellen der statistischen Physik | 关于机械学习辅助蒙特卡洛利用简单统计物理模型取样的 2505.22598v2 |
Authors: Luca Maria Del Bono, Federico Ricci-Tersenghi, Francesco Zamponi
Recent years have seen a rise in the application of machine learning techniques to aid the simulation of hard-to-sample systems that cannot be studied using traditional methods. Despite the introduction of many different architectures and procedures, a wide theoretical understanding is still lacking, with the risk of suboptimal implementations. As a first step to address this gap, we provide here a complete analytic study of the widely-used Sequential Tempering procedure applied to a shallow MADE architecture for the Curie-Weiss model. The contribution of this work is twofold: firstly, we give a description of the optimal weights and of the training under Gradient Descent optimization. Secondly, we compare what happens in Sequential Tempering with and without the addition of local Metropolis Monte Carlo steps. We are thus able to give theoretical predictions on the best procedure to apply in this case. This work establishes a clear theoretical basis for the integration of machine learning techniques into Monte Carlo sampling and optimization.
近年来,在应用机器学习技术协助模拟无法使用传统方法研究的难以取样的系统方面,机器学习技术的运用有所增加。尽管采用了许多不同的结构和程序,但仍然缺乏广泛的理论理解,存在执行不理想的风险。作为缩小这一差距的第一步,我们在此对适用于Curie-Weiss模型浅层陶瓷结构的广泛使用的序列诱惑程序进行了全面的分析研究。这项工作的贡献有两个方面:第一,我们描述了最佳重量和在梯层源优化下进行的培训。第二,我们比较了在序列式诱惑中发生的情况,而没有加上Metopolis Monte Carlo的当地步骤。因此,我们能够对本案应用的最佳程序作出理论预测。这项工作为将机器学习技术纳入Monte Carlo采样和优化提供了一个明确的理论基础。
Article 223
Title@2025-05-29 (4): Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM
Title: Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM | Auf dem Weg zu einer robusten, überlappenden Spracherkennung: Ein Lautsprecher-Bewusst-Progressiver Ansatz mit WavLM | 争取强劲的超重叠语音探测:使用WavLM 的演讲者-警示渐进方法 2505.23207v1 |
Authors: Zhaokai Sun, Li Zhang, Qing Wang, Pan Zhou, Lei Xie
Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation, a critical challenge in multi-party speech processing. This work proposes a speaker-aware progressive OSD model that leverages a progressive training strategy to enhance the correlation between subtasks such as voice activity detection (VAD) and overlap detection. To improve acoustic representation, we explore the effectiveness of state-of-the-art self-supervised learning (SSL) models, including WavLM and wav2vec 2.0, while incorporating a speaker attention module to enrich features with frame-level speaker information. Experimental results show that the proposed method achieves state-of-the-art performance, with an F1 score of 82.76\% on the AMI test set, demonstrating its robustness and effectiveness in OSD.
迭代语音探测(OSD)旨在确定多个发言者在对话中重叠的区域,这是多党语言处理中的一项关键挑战,这项工作提议采用一个语音意识渐进式OSD模式,利用渐进式培训战略,加强语音活动探测(VAD)和重叠探测等子任务之间的相互关系。为了改善声学表现,我们探索最先进的自我监督学习(SSL)模式的有效性,包括WavLM 和 wav2vec 2.0,同时纳入一个语音关注模块,以丰富框架级演讲者信息的特征。实验结果显示,拟议方法取得了最新业绩,在AMI测试集上获得了82.76%的F1分,表明其在OSD中非常健全和有效。
Article 224
Title@2025-05-29 (4): Disentangled Multi-span Evolutionary Network against Temporal Knowledge Graph Reasoning
Title: Disentangled Multi-span Evolutionary Network against Temporal Knowledge Graph Reasoning | Disentangled Multi-Span Evolutionary Network gegen Temporal Knowledge Graph Reasoning | 对抗时间知识图表推理的多空间演进网络 2505.14020v2 |
Authors: Hao Dong, Ziyue Qiao, Zhiyuan Ning, Qi Hao, Yi Du, Pengyang Wang, Yuanchun Zhou
Temporal Knowledge Graphs (TKGs), as an extension of static Knowledge Graphs (KGs), incorporate the temporal feature to express the transience of knowledge by describing when facts occur. TKG extrapolation aims to infer possible future facts based on known history, which has garnered significant attention in recent years. Some existing methods treat TKG as a sequence of independent subgraphs to model temporal evolution patterns, demonstrating impressive reasoning performance. However, they still have limitations: 1) In modeling subgraph semantic evolution, they usually neglect the internal structural interactions between subgraphs, which are actually crucial for encoding TKGs. 2) They overlook the potential smooth features that do not lead to semantic changes, which should be distinguished from the semantic evolution process. Therefore, we propose a novel Disentangled Multi-span Evolutionary Network (DiMNet) for TKG reasoning. Specifically, we design a multi-span evolution strategy that captures local neighbor features while perceiving historical neighbor semantic information, thus enabling internal interactions between subgraphs during the evolution process. To maximize the capture of semantic change patterns, we design a disentangle component that adaptively separates nodes’ active and stable features, used to dynamically control the influence of historical semantics on future evolution. Extensive experiments conducted on four real-world TKG datasets show that DiMNet demonstrates substantial performance in TKG reasoning, and outperforms the state-of-the-art up to 22.7% in MRR.
时间知识图(TKGGs)是静态知识图(KGs)的延伸,它包含时间特征,通过描述事实发生时描述知识的瞬态。TKG外推法旨在根据已知历史推断未来可能发生的事实,这些历史近年来引起了极大关注。有些现有方法将TKG视为一个独立的子集,用来模拟时间演变模式,展示令人印象深刻的推理性能。然而,它们仍然有局限性:(1)在模拟子图的语义演变中,它们通常忽视子图之间的内部结构互动,而这种互动实际上对于编码TKGs至关重要。(2)它们忽略了不会导致语义变化的潜在平稳特征,而这种变化应当与语义演变过程区别开来。因此,我们提出了一个新的新颖的解交错多谱进化网络(DIMNet),用于模拟时间进化模式,具体地说,我们设计一个多谱进化战略,在探测历史邻居的语义进化信息,从而在进化过程中促进内部互动。为了最大限度地采集语义变化的进化和进化模式,我们设计了对历史进化的进化的进化的进化结构的进化结构的进化、进化、进化的进化的进化、进化、进化、进化的进化、进化、进化的进化、进化、进化的进化的进化的进化、进化的进化的进化的进化的进进化的进进化的进化的进化、进化、进化、进化、进化、进进进化的进进进化的进进进进进进进进进进进进进进进进进进进进进进进进的进的进的进进进进进进进的进的进的进的进进进进进进进进进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进进进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进的进
Article 225
Title@2025-05-29 (4): Aligning Text to Image in Diffusion Models is Easier Than You Think
Title: Aligning Text to Image in Diffusion Models is Easier Than You Think | Text an Bild in Diffusions-Modellen ausrichten ist einfacher, als Sie denken | 在传播模型中将文本对齐到图像比您想象的容易 2503.08250v4 |
Authors: Jaa-Yeon Lee, Byunghee Cha, Jeongsol Kim, Jong Chul Ye
While recent advancements in generative modeling have significantly improved text-image alignment, some residual misalignment between text and image representations still remains. Some approaches address this issue by fine-tuning models in terms of preference optimization, etc., which require tailored datasets. Orthogonal to these methods, we revisit the challenge from the perspective of representation alignment-an approach that has gained popularity with the success of REPresentation Alignment (REPA). We first argue that conventional text-to-image (T2I) diffusion models, typically trained on paired image and text data (i.e., positive pairs) by minimizing score matching or flow matching losses, is suboptimal from the standpoint of representation alignment. Instead, a better alignment can be achieved through contrastive learning that leverages existing dataset as both positive and negative pairs. To enable efficient alignment with pretrained models, we propose SoftREPA- a lightweight contrastive fine-tuning strategy that leverages soft text tokens for representation alignment. This approach improves alignment with minimal computational overhead by adding fewer than 1M trainable parameters to the pretrained model. Our theoretical analysis demonstrates that our method explicitly increases the mutual information between text and image representations, leading to enhanced semantic consistency. Experimental results across text-to-image generation and text-guided image editing tasks validate the effectiveness of our approach in improving the semantic consistency of T2I generative models.
虽然最近在变形模型方面的进步大大改善了文字形象的调整,但文本和图像表示形式之间还存在一些残留的不匹配现象。有些方法通过在偏好优化方面微调模型来解决这个问题,这需要量制数据集。对方法的调整,我们从代表调整方法的角度重新审视挑战,这种方法随着降级调整的成功而获得普遍欢迎。我们首先认为,传统文本对比图像(T2I)传播模式,通常通过尽量减少得分匹配或流动匹配损失来培训成对图像和文本数据(即正对对对),从代表调整的角度来说,这种方法并不理想。相反,通过对比性学习,将现有数据集作为正对和负对等工具,可以实现更好的调整。为了能够有效地与预先接受的模型保持一致,我们提议SoftREPA-一种较轻量的对比微调整战略,利用软文本标记来调整代表比例。这一方法通过将低于1M的可培训参数添加到预升级的图象化模型,从而改进了我们的正统性。我们之间的理论分析方法明确地显示了我们改进了在生成图像格式上的一致性。
Article 226
Title@2025-05-29 (4): JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows
Title: JAPAN: Joint Adaptive Prediction Areas with Normalising-Flows | JAPAN: Gemeinsame adaptive Vorhersagebereiche mit Normalisierungs-Flows | JAPAN: 联合适应性预测区与标准化花束 2505.23196v1 |
Authors: Eshant English, Christoph Lippert
Conformal prediction provides a model-agnostic framework for uncertainty quantification with finite-sample validity guarantees, making it an attractive tool for constructing reliable prediction sets. However, existing approaches commonly rely on residual-based conformity scores, which impose geometric constraints and struggle when the underlying distribution is multimodal. In particular, they tend to produce overly conservative prediction areas centred around the mean, often failing to capture the true shape of complex predictive distributions. In this work, we introduce JAPAN (Joint Adaptive Prediction Areas with Normalising-Flows), a conformal prediction framework that uses density-based conformity scores. By leveraging flow-based models, JAPAN estimates the (predictive) density and constructs prediction areas by thresholding on the estimated density scores, enabling compact, potentially disjoint, and context-adaptive regions that retain finite-sample coverage guarantees. We theoretically motivate the efficiency of JAPAN and empirically validate it across multivariate regression and forecasting tasks, demonstrating good calibration and tighter prediction areas compared to existing baselines. We also provide several \emph{extensions} adding flexibility to our proposed framework.
综合预测为不确定性的量化提供了一个具有有限抽样有效性保证的模型 – – 不可知性框架,使它成为构建可靠预测数据集的有吸引力的工具;然而,现有方法通常依赖基于剩余值的符合性评分,在基本分布为多式联运时,这些评分会施加几何限制和困难;特别是,它们往往产生以平均值为中心的过于保守的预测区,往往不能捕捉复杂预测分布的真实形状;在这项工作中,我们引入了日本航空航天研究所(联合适应性预测区,具有标准化-花样),一个使用基于密度的符合性评分的符合性预测框架。日本航空航天研究所通过利用基于流量的模型,估算(预测性)密度和构建预测区,方法是对估计密度评分进行阈值,使保持有限抽样覆盖的紧凑、可能不连贯和背景适应性区域得以维持。我们从理论上鼓励日本航空航天研究所的效率,并在多重回归和预测任务中进行实证,表明良好的校准和较现有基线更为严格的预测区。我们还提供数个基于流基模型的预测区,以增加我们提议的框架的灵活性。
Article 227
Title@2025-05-29 (4): Less is More: Unlocking Specialization of Time Series Foundation Models via Structured Pruning
Title: Less is More: Unlocking Specialization of Time Series Foundation Models via Structured Pruning | Weniger ist mehr: Unlocking Spezialisierung von Time Series Foundation Models über strukturiertes Pruning | 较少是更多:通过结构式普鲁宁解锁时间序列基础模型的专业化 2505.23195v1 |
Authors: Lifan Zhao, Yanyan Shen, Zhaoyang Liu, Xue Wang, Jiaji Deng
Scaling laws motivate the development of Time Series Foundation Models (TSFMs) that pre-train vast parameters and achieve remarkable zero-shot forecasting performance. Surprisingly, even after fine-tuning, TSFMs cannot consistently outperform smaller, specialized models trained on full-shot downstream data. A key question is how to realize effective adaptation of TSFMs for a target forecasting task. Through empirical studies on various TSFMs, the pre-trained models often exhibit inherent sparsity and redundancy in computation, suggesting that TSFMs have learned to activate task-relevant network substructures to accommodate diverse forecasting tasks. To preserve this valuable prior knowledge, we propose a structured pruning method to regularize the subsequent fine-tuning process by focusing it on a more relevant and compact parameter space. Extensive experiments on seven TSFMs and six benchmarks demonstrate that fine-tuning a smaller, pruned TSFM significantly improves forecasting performance compared to fine-tuning original models. This “prune-then-finetune” paradigm often enables TSFMs to achieve state-of-the-art performance and surpass strong specialized baselines.
令人惊讶的是,即使是在微调后,高科技模型也不能始终优于全速下游数据专门模型。一个关键问题是如何使高科技模型有效地适应目标预测任务。通过对各种高科技模型的实证研究,预先培训的模型往往在计算中表现出固有的空间和冗余,表明高科技模型已经学会了启动与任务相关的网络子结构以适应不同的预测任务。为了保存这一宝贵的先前知识,我们建议了一种结构化的调整方法,以规范随后的微调过程,将重点置于一个更相关和紧凑的参数空间上。对7个高科技模型和6个基准的广泛实验表明,微调一个较小的、精细的TSFM模型与微调原型模型相比,大大改进了预测业绩。这种“春-正时-菲涅纳”模型常常使高科技模型能够实现最先进的业绩和超强的专业化基线。
Article 228
Title@2025-05-29 (4): Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Title: Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection | Multimodale Inverse Aufmerksamkeit Netzwerk mit Intrinsic Discriminant Feature Exploitation für gefälschte Nachrichten Erkennung | 多式反向关注网络,利用内在差异性地貌特征利用假新闻探测 2502.01699v2 |
Authors: Tianlin Zhang, En Yu, Yi Shao, Jiande Sun
Multimodal fake news detection has garnered significant attention due to its profound implications for social security. While existing approaches have contributed to understanding cross-modal consistency, they often fail to leverage modal-specific representations and explicit discrepant features. To address these limitations, we propose a Multimodal Inverse Attention Network (MIAN), a novel framework that explores intrinsic discriminative features based on news content to advance fake news detection. Specifically, MIAN introduces a hierarchical learning module that captures diverse intra-modal relationships through local-to-global and local-to-local interactions, thereby generating enhanced unimodal representations to improve the identification of fake news at the intra-modal level. Additionally, a cross-modal interaction module employs a co-attention mechanism to establish and model dependencies between the refined unimodal representations, facilitating seamless semantic integration across modalities. To explicitly extract inconsistency features, we propose an inverse attention mechanism that effectively highlights the conflicting patterns and semantic deviations introduced by fake news in both intra- and inter-modality. Extensive experiments on benchmark datasets demonstrate that MIAN significantly outperforms state-of-the-art methods, underscoring its pivotal contribution to advancing social security through enhanced multimodal fake news detection.
由于对社会保障的深刻影响,多式假新闻探测已经引起人们的极大关注。虽然现有办法有助于理解跨式一致性,但往往未能利用模式特定的表现方式和明显的差异性。为了解决这些限制,我们提议建立一个多式反向关注网络(MIAN),这是一个新颖的框架,根据新闻内容探索内在的歧视性特征,以推动假新闻探测。具体地说,MIAN引入了一个等级学习模块,通过地方对全球和地方对地方的互动,通过地方对地方对地方对地方对地方对地方对地方对地方的互动,从而产生强化的单一形式表现方式,改进对假新闻的识别。此外,跨式互动模块采用共同注意机制,在完善的单式表达方式之间建立和模式依赖性,促进各模式之间无缝的相互融合。为了明确提取不一致特征,我们提议一个反向关注机制,有效地突出假消息在内部和现代新闻中出现的相互冲突的模式和语义偏差。关于基准数据集的广泛实验表明,MIAN大大超越了其通过改进的多式联运方式对改进其关键信息探测方式的贡献。
Article 229
Title@2025-05-29 (4): Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics
Title: Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics | Beyond Zero Initialization: Untersuchung der Auswirkungen von Non-Zero Initialization auf LoRA Fine-Tuning Dynamics | 零启动后零启动后:调查非零初始化对LORA微调动力学的影响 2505.23194v1 |
Authors: Shiwei Li, Xiandi Luo, Xing Tang, Haozhao Wang, Hao Chen, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method. In standard LoRA layers, one of the matrices, $A$ or $B$, is initialized to zero, ensuring that fine-tuning starts from the pretrained model. However, there is no theoretical support for this practice. In this paper, we investigate the impact of non-zero initialization on LoRA’s fine-tuning dynamics from an infinite-width perspective. Our analysis reveals that, compared to zero initialization, simultaneously initializing $A$ and $B$ to non-zero values improves LoRA’s robustness to suboptimal learning rates, particularly smaller ones. Further analysis indicates that although the non-zero initialization of $AB$ introduces random noise into the pretrained weight, it generally does not affect fine-tuning performance. In other words, fine-tuning does not need to strictly start from the pretrained model. The validity of our findings is confirmed through extensive experiments across various models and datasets. The code is available at https://github.com/Leopold1423/non_zero_lora-icml25.
低级别适应(LORA)是一种广泛使用的参数效率微调方法。 在标准的LORA层中,一个矩阵,即$A美元或$B$,被初始化为零,确保微调从预培训模式开始。然而,这种做法没有理论上的支持。在本文中,我们从无限宽的角度调查非零初始化对LORA微调动态的影响。我们的分析表明,与零初始化相比,同时初始化美元和美元B$至非零值提高了LORA对亚最佳学习率的稳健性,特别是较小的。进一步的分析表明,尽管美元的非零初始化将随机噪音引入预培训的重量,但通常不会影响微调性能。换句话说,微调不需要严格地从预培训模式开始。我们的调查结果的有效性通过各种模型和数据集的广泛实验得到确认。代码见https://github.com/Leopoldnon_ze_zerola-rica-icm25。
Article 230
Title@2025-05-29 (4): DeepRTE: Pre-trained Attention-based Neural Network for Radiative Tranfer
Title: DeepRTE: Pre-trained Attention-based Neural Network for Radiative Tranfer | DeepRTE: Pre-trained Aufmerksamkeit-basiertes Neural-Netzwerk für Radiative Tranfer | DeepRTE: 培训前的辐射Tranfer神经网络,以关注为主的神经网络 2505.23190v1 |
Authors: Yekun Zhu, Min Tang, Zheng Ma
In this study, we propose a novel neural network approach, termed DeepRTE, to address the steady-state Radiative Transfer Equation (RTE). The RTE is a differential-integral equation that governs the propagation of radiation through a participating medium, with applications spanning diverse domains such as neutron transport, atmospheric radiative transfer, heat transfer, and optical imaging. Our proposed DeepRTE framework leverages pre-trained attention-based neural networks to solve the RTE with high accuracy and computational efficiency. The efficacy of the proposed approach is substantiated through comprehensive numerical experiments.
在这次研究中,我们提出了一个新的神经网络方法,称为“深线网络”,以解决稳定状态的辐射转移赤道(RTE)问题。 RETE是一个差异整体方程式,它通过一个参与媒介管理辐射的传播,其应用范围涵盖多个领域,如中子传输、大气辐射转移、热传输和光学成像。我们提议的DeepRTE框架利用预先训练的以注意力为基础的神经网络,以高精确度和计算效率解决RETE。 拟议的方法的效力通过全面的数值实验得到证实。
Article 231
Title@2025-05-29 (4): Plug In and Learn: Federated Intelligence over a Smart Grid of Models
Title: Plug In and Learn: Federated Intelligence over a Smart Grid of Models | Plug In and Learn: Federated Intelligence über ein Smart Grid aus Modellen | 插插插和学习:对智能模型网的联邦情报 2302.04363v4 |
Authors: S. Abdurakhmanova, Y. SarcheshmehPour, A. Jung
We present a model-agnostic federated learning method that mirrors the operation of a smart power grid: diverse local models, like energy prosumers, train independently on their own data while exchanging lightweight signals to coordinate with statistically similar peers. This coordination is governed by a graph-based regularizer that encourages connected models to produce similar predictions on a shared, public unlabeled dataset. The resulting method is a flexible instance of regularized empirical risk minimization and supports a wide variety of local models - both parametric and non-parametric - provided they can be trained via regularized loss minimization. Such training is readily supported by standard ML libraries including scikit-learn, Keras, and PyTorch.
我们提出了一个示范的、不可知的联邦学习方法,它反映了智能电网的运作:各种当地模型,如能源制造者,独立地以自己的数据进行训练,同时交换轻量的信号,以便与统计上类似的同行进行协调。这种协调由一个基于图表的正规化器管理,它鼓励连接模型对共用的、公开的、没有标签的数据集作出类似的预测。由此产生的方法是一个将经验风险降到最低的正规化的灵活实例,它支持各种各样的当地模型,包括参数和非参数模型,只要它们能够通过将损失降到最低的方式加以培训。这种培训很容易得到标准ML图书馆的支持,包括Scikit-learn、Keras和PyTorch。
Article 232
Title@2025-05-29 (4): Dequantified Diffusion-Schr{ö}dinger Bridge for Density Ratio Estimation
Title: Dequantified Diffusion-Schr{ö}dinger Bridge for Density Ratio Estimation | Dequantifizierte Diffusion-Schr{ö}dinger-Brücke für Dichte-Verhältnis-Schätzung | 密度比率估计的量化扩散 - Schrdinger桥 2505.05034v3 |
Authors: Wei Chen, Shigui Li, Jiacheng Li, Junmei Yang, John Paisley, Delu Zeng
Density ratio estimation is fundamental to tasks involving f-divergences, yet existing methods often fail under significantly different distributions or inadequately overlapping supports – the density-chasm and the support-chasm problems. Additionally, prior approaches yield divergent time scores near boundaries, leading to instability. We design $\textbf{D}^3\textbf{RE}$, a unified framework for robust, stable and efficient density ratio estimation. We propose the dequantified diffusion bridge interpolant (DDBI), which expands support coverage and stabilizes time scores via diffusion bridges and Gaussian dequantization. Building on DDBI, the proposed dequantified Schr{"o}dinger bridge interpolant (DSBI) incorporates optimal transport to solve the Schr{"o}dinger bridge problem, enhancing accuracy and efficiency. Our method offers uniform approximation and bounded time scores in theory, and outperforms baselines empirically in mutual information and density estimation tasks.
密度比率估计对于涉及裂变的任务来说至关重要,然而,现有方法往往在分布差异很大或重叠不足的情况下失败 – – 密度和支点问题。此外,先前的办法在边界附近产生不同的时间分数,导致不稳定。我们设计了$textbf{D3\textbf{{RE}$,这是一个稳健、稳定、高效的密度比率估计的统一框架。我们提议了分解扩散桥间插点(DDBI),它通过传播桥和高斯断层来扩大支持覆盖面和稳定时间分数。在DBI的基础上,拟议的取消的Schr}o}dinger桥间连接(DSBI)将最佳运输纳入解决Schr_‘o}dinger桥问题,提高准确性和效率。我们的方法在理论上提供了统一的近似和捆绑时间分数,在相互信息和密度估计任务方面超越了基线。
Article 233
Title@2025-05-29 (4): Unsupervisedly Learned Representations: Should the Quest be Over?
Title: Unsupervisedly Learned Representations: Should the Quest be Over? | Unüberwacht gelernte Repräsentationen: Sollte die Suche vorbei sein? | 无人监督的派任代表:调查是否应该结束? 2001.07495v6 |
Authors: Daniel N. Nissani
After four decades of research there still exists a Classification accuracy gap of about 20% between our best Unsupervisedly Learned Representations methods and the accuracy rates achieved by intelligent animals. It thus may well be that we are looking in the wrong direction. A possible solution to this puzzle is presented. We demonstrate that Reinforcement Learning can learn representations which achieve the same accuracy as that of animals. Our main modest contribution lies in the observations that: a. when applied to a real world environment Reinforcement Learning does not require labels, and thus may be legitimately considered as Unsupervised Learning, and b. in contrast, when Reinforcement Learning is applied in a simulated environment it does inherently require labels and should thus be generally be considered as Supervised Learning. The corollary of these observations is that further search for Unsupervised Learning competitive paradigms which may be trained in simulated environments may be futile.
经过40年的研究,我们的最佳未经监督的教学方法与智能动物所达到的精确率之间仍然存在着约20%的分类准确性差距。 因此,我们很可能在寻找错误的方向。 我们展示了这个谜题的可能解决办法。 我们证明,加强学习可以学习与动物一样精确的表达方式。 我们的主要微小贡献在于以下观察:a.在应用到真实的世界环境中,加强学习并不需要标签,因此可以被合法地视为不受监督的学习,b. 相比之下,当强化学习在模拟环境中应用时,它的确需要标签,因此一般应被视为监督学习。 这些观察的必然结果是,进一步寻找可能在模拟环境中培训的不受监督的学习竞争性模式可能徒劳无益。
Article 234
Title@2025-05-29 (4): Rethinking Positive Pairs in Contrastive Learning
Title: Rethinking Positive Pairs in Contrastive Learning | Positive Paare im kontrastistischen Lernen neu denken | 在反竞争学习中重新思考正对对 2410.18200v2 |
Authors: Jiantao Wu, Sara Atito, Zhenhua Feng, Shentong Mo, Josef Kitler, Muhammad Awais
The training methods in AI do involve semantically distinct pairs of samples. However, their role typically is to enhance the between class separability. The actual notion of similarity is normally learned from semantically identical pairs. This paper presents SimLAP: a simple framework for learning visual representation from arbitrary pairs. SimLAP explores the possibility of learning similarity from semantically distinct sample pairs. The approach is motivated by the observation that for any pair of classes there exists a subspace in which semantically distinct samples exhibit similarity. This phenomenon can be exploited for a novel method of learning, which optimises the similarity of an arbitrary pair of samples, while simultaneously learning the enabling subspace. The feasibility of the approach will be demonstrated experimentally and its merits discussed.
AI中的培训方法确实涉及分解的样本,但是,它们的作用通常是加强分级的分离性。通常,相似性的实际概念是从同义的对等中学习的。本文介绍了SimLAP:从任意的对等中学习视觉表现的简单框架。SimLAP探讨了从分立的样本对对等中学习相似性的可能性。这种方法的动因是观测到,对于任何一对类别中存在一个分空间,在分层中,分立的样本表现出相似性。这个现象可以被用于一种新型的学习方法,这种方法在选择任意的对等样本的相似性的同时学习赋能的子空间。该方法的可行性将在实验中加以展示,并讨论其优点。
Article 235
Title@2025-05-29 (4): Improving the Effective Receptive Field of Message-Passing Neural Networks
Title: Improving the Effective Receptive Field of Message-Passing Neural Networks | Verbesserung des effektiven Empfangsfeldes von message-passing Neural Networks | 改进信息传送神经网络的有效接收领域 2505.23185v1 |
Authors: Shahaf E. Finder, Ron Shapira Weber, Moshe Eliasof, Oren Freifeld, Eran Treister
Message-Passing Neural Networks (MPNNs) have become a cornerstone for processing and analyzing graph-structured data. However, their effectiveness is often hindered by phenomena such as over-squashing, where long-range dependencies or interactions are inadequately captured and expressed in the MPNN output. This limitation mirrors the challenges of the Effective Receptive Field (ERF) in Convolutional Neural Networks (CNNs), where the theoretical receptive field is underutilized in practice. In this work, we show and theoretically explain the limited ERF problem in MPNNs. Furthermore, inspired by recent advances in ERF augmentation for CNNs, we propose an Interleaved Multiscale Message-Passing Neural Networks (IM-MPNN) architecture to address these problems in MPNNs. Our method incorporates a hierarchical coarsening of the graph, enabling message-passing across multiscale representations and facilitating long-range interactions without excessive depth or parameterization. Through extensive evaluations on benchmarks such as the Long-Range Graph Benchmark (LRGB), we demonstrate substantial improvements over baseline MPNNs in capturing long-range dependencies while maintaining computational efficiency.
信息传递神经网络已成为处理和分析图表结构数据的基石,但其效力往往受到过度夸大等现象的阻碍,因为长期依赖性或相互作用在MPNN输出中没有得到充分的反映和表达。这一限制反映了动态神经网络中有效接收域(ERF)的挑战,理论可接受域在实践中没有得到充分利用。在这项工作中,我们展示和理论上解释MPNN的有限ERF问题。此外,在CNN ER扩增最近的进展的启发下,我们提议在MPNNN建立跨离式多级信息传递神经网络(IM-MPNNN)结构,以解决这些问题。我们的方法包括图的等级分解,使信息能够跨越多尺度的表达方式,便利远程互动,而没有过度深度或参数化。我们通过对远程图像基准(LRGBN)等基准的广泛评价,显示在计算效率的同时,在捕获远程依赖性基准(IM-MPN)方面大大改进了基线。
Article 236
Title@2025-05-29 (4): Two Is Better Than One: Rotations Scale LoRAs
Title: Two Is Better Than One: Rotations Scale LoRAs | Zwei ist besser als eins: Rotationsskala LoRAs | 二比一好:轮作规模LORAs 2505.23184v1 |
Authors: Hongcan Guo, Guoshun Nan, Yuan Yang, Diyang Zhang, Haotian Li, Zhican Chen, Qinchuan Zhou, Yuhan Ran, Xinye Cao, Sicong Leng, Xiaofeng Tao, Xudong Jiang
Scaling Low-Rank Adaptation (LoRA)-based Mixture-of-Experts (MoE) facilitates large language models (LLMs) to efficiently adapt to diverse tasks. However, traditional gating mechanisms that route inputs to the best experts may fundamentally hinder LLMs’ scalability, leading to poor generalization and underfitting issues. We identify that the root cause lies in the restricted expressiveness of existing weighted-sum mechanisms, both within and outside the convex cone of LoRA representations. This motivates us to propose RadarGate, a novel geometrically inspired gating method that introduces rotational operations of LoRAs representations to boost the expressiveness and facilitate richer feature interactions among multiple LoRAs for scalable LLMs. Specifically, we first fuse each LoRA representation to other LoRAs using a learnable component and then feed the output to a rotation matrix. This matrix involves learnable parameters that define the relative angular relationship between LoRA representations. Such a simple yet effective mechanism provides an extra degree of freedom, facilitating the learning of cross-LoRA synergies and properly tracking the challenging poor generalization and underfitting issues as the number of LoRA grows. Extensive experiments on 6 public benchmarks across 21 tasks show the effectiveness of our RadarGate for scaling LoRAs. We also provide valuable insights, revealing that the rotations to each pair of representations are contrastive, encouraging closer alignment of semantically similar representations during geometrical transformation while pushing distance ones further apart. We will release our code to the community.
低朗适应(LORA)基于低朗适应(LORA)的低朗适应(LOE)的混合物(MOE)有助于大型语言模型(LLMS)有效适应各种任务;然而,将投入投入输送给最佳专家的传统机制可能会从根本上阻碍LLMS的伸缩性,导致LLMS的简化和不适当问题;我们发现,根源在于现有加权和加权机制在LORA的表层内和外的表达方式的清晰度有限;这促使我们提出雷达Gate(RadarGate),这是一种具有地貌灵感的新型定位方法,引入LORA代表方式的旋转性操作,以提升其清晰度,便利多个LORA的伸缩性,促进多个LLOMS之间的更丰富性特征互动。具体地说,我们首先将每个LORA代表方式与其他LAM的伸缩性整合起来,然后将输出到轮值矩阵中。这种简单但有效的机制提供了额外的自由度,有助于学习LARA的交叉互动协作,并正确跟踪具有挑战性的缩缩缩略缩缩缩缩的缩缩缩缩缩缩图表。
Article 237
Title@2025-05-29 (4): MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network
Title: MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network | MADCluster: Modell-agnostische Anomalieerkennung mit selbstüberwachtem Clustering-Netzwerk | MADCluster:使用自监管的集群网进行模型-不可知异常探测 2505.16223v2 |
Authors: Sangyong Lee, Subo Hwang, Dohoon Kim
In this paper, we propose MADCluster, a novel model-agnostic anomaly detection framework utilizing self-supervised clustering. MADCluster is applicable to various deep learning architectures and addresses the ‘hypersphere collapse’ problem inherent in existing deep learning-based anomaly detection methods. The core idea is to cluster normal pattern data into a ‘single cluster’ while simultaneously learning the cluster center and mapping data close to this center. Also, to improve expressiveness and enable effective single clustering, we propose a new ‘One-directed Adaptive loss’. The optimization of this loss is mathematically proven. MADCluster consists of three main components: Base Embedder capturing high-dimensional temporal dynamics, Cluster Distance Mapping, and Sequence-wise Clustering for continuous center updates. Its model-agnostic characteristics are achieved by applying various architectures to the Base Embedder. Experiments on four time series benchmark datasets demonstrate that applying MADCluster improves the overall performance of comparative models. In conclusion, the compatibility of MADCluster shows potential for enhancing model performance across various architectures.
在本文中,我们提出MADCluster, 这是一种利用自我监督的集群, 新的模型- 不可知异常检测框架。 MADCluster 适用于各种深层学习结构, 并解决现有深层学习异常检测方法中固有的“ 整体崩溃” 问题。 核心思想是将正常模式数据分组成“ 单组群集” , 同时学习集束中心, 绘制与此中心相近的数据。 另外, 为了提高表达性, 并能够进行有效的单一集束, 我们提出了一个新的“ 单向适应损失” 。 对这种损失的优化得到了数学上的证明。 MADCluster 由三个主要组成部分组成: 底嵌入器捕捉高度时间动态, 集群距离绘图, 以及连续中心更新的顺序组合。 其模型- 数学特征是通过对基底嵌入器应用各种结构实现的。 对四个时间序列基准数据集的实验表明, 应用MADCluster 将改善比较模型的总体性。 总之, MADCluster 的兼容性展示了各种结构中增强模型性的潜力 。
Article 238
Title@2025-05-29 (4): FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation
Title: FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation | FSL-SAGE: Beschleunigung des Federated Split Learning durch Smashed Activation Gradient Abschätzung | FSL-SAGE:通过分散的激励加速渐进式估算,加速联邦分化学习 2505.23182v1 |
Authors: Srijith Nair, Michael Lin, Amirreza Talebi, Peizhong Ju, Elizabeth Bentley, Jia Liu
Collaborative training methods like Federated Learning (FL) and Split Learning (SL) enable distributed machine learning without sharing raw data. However, FL assumes clients can train entire models, which is infeasible for large-scale models. In contrast, while SL alleviates the client memory constraint in FL by offloading most training to the server, it increases network latency due to its sequential nature. Other methods address the conundrum by using local loss functions for parallel client-side training to improve efficiency, but they lack server feedback and potentially suffer poor accuracy. We propose FSL-SAGE (Federated Split Learning via Smashed Activation Gradient Estimation), a new federated split learning algorithm that estimates server-side gradient feedback via auxiliary models. These auxiliary models periodically adapt to emulate server behavior on local datasets. We show that FSL-SAGE achieves a convergence rate of $\mathcal{O}(1/\sqrt{T})$, where $T$ is the number of communication rounds. This result matches FedAvg, while significantly reducing communication costs and client memory requirements. Our empirical results also verify that it outperforms existing state-of-the-art FSL methods, offering both communication efficiency and accuracy.
合作培训方法,如Federal Learning(FL)和Splet Learning(SL)等合作培训方法,使得分散的机器学习无需共享原始数据。然而,FL假设客户可以培训整个模型,这对于大型模型来说是行不通的。相比之下,虽然SL通过将大多数培训卸载到服务器,减轻了FL客户的记忆限制,但由于其相继性质,它增加了网络的延迟性。其他方法通过利用当地损失功能进行平行客户方培训来解决难题,提高效率,但是它们缺乏服务器反馈,并可能受到错误的准确性。我们提议FSL-SAG(通过超速动作快速动画快速动画快速动)可以培训整个模型(FSL-SAGAGE ) , 一种新的联合分离学习算法,通过辅助模型来估计服务器-侧梯度反馈。这些辅助模型定期适应当地数据集的服务器行为。我们显示FSAL-SAGEGA达到$math cal {O}(1/ sqrt{T}) $, 其中的通信回合数为$T}。这个结果与FAVAvg,同时大幅降低通信成本和客户记忆要求。我们的经验结果也提供了。
Article 239
Title@2025-05-29 (4): FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification
Title: FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification | FreRA: Eine frequenzrefinierte Augmentation für kontrastives Lernen in der Zeitreihenklassifikation | FreRA:关于时间序列分类的校对性学习频率改进 2505.23181v1 |
Authors: Tian Tian, Chunyan Miao, Hangwei Qian
Contrastive learning has emerged as a competent approach for unsupervised representation learning. However, the design of an optimal augmentation strategy, although crucial for contrastive learning, is less explored for time series classification tasks. Existing predefined time-domain augmentation methods are primarily adopted from vision and are not specific to time series data. Consequently, this cross-modality incompatibility may distort the semantically relevant information of time series by introducing mismatched patterns into the data. To address this limitation, we present a novel perspective from the frequency domain and identify three advantages for downstream classification: global, independent, and compact. To fully utilize the three properties, we propose the lightweight yet effective Frequency Refined Augmentation (FreRA) tailored for time series contrastive learning on classification tasks, which can be seamlessly integrated with contrastive learning frameworks in a plug-and-play manner. Specifically, FreRA automatically separates critical and unimportant frequency components. Accordingly, we propose semantic-aware Identity Modification and semantic-agnostic Self-adaptive Modification to protect semantically relevant information in the critical frequency components and infuse variance into the unimportant ones respectively. Theoretically, we prove that FreRA generates semantic-preserving views. Empirically, we conduct extensive experiments on two benchmark datasets, including UCR and UEA archives, as well as five large-scale datasets on diverse applications. FreRA consistently outperforms ten leading baselines on time series classification, anomaly detection, and transfer learning tasks, demonstrating superior capabilities in contrastive representation learning and generalization in transfer learning scenarios across diverse datasets.
相互抵触的学习已成为不受监督的代表制学习的一种合格方法。然而,设计最佳增强战略虽然对对比学习至关重要,但对于时间序列分类任务而言,探索得较少。现有的预先定义的时间域增强方法主要从视觉角度采用,而并非针对时间序列数据。因此,这种交叉现代不兼容性可能会通过在数据中引入不匹配的模式,扭曲时间序列的语义信息。为了应对这一限制,我们从频率域提出一个新的视角,并找出下游分类的三个优势:全球、独立和紧凑。要充分利用这三个属性,我们建议为时间序列任务专门设计为时间序列对比学习而设计的轻重但有效的频率更新的增强(FreRA)方法,这些方法可以以插接和播放的方式与对比性学习框架密切结合。具体地说,FreRA自动将关键和不重要的频率组成部分分开。因此,我们提议从频率域域角度认识身份的修改和语义识别自定义自适应性自我调整,以保护关键频率组件中的语系相关信息,并且不精确地将精确的递增缩的变校程,在常规的档案中分别生成数据。
Article 240
Title@2025-05-29 (4): The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning
Title: The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning | Die Panaceas zur Verbesserung der Zersetzung mit geringem Rank im kommunikativ-effizienten Federated Learning | 改善通信-高效联邦学习中低-兰克分解的全景 2505.23176v1 |
Authors: Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li
To improve the training efficiency of federated learning (FL), previous research has employed low-rank decomposition techniques to reduce communication overhead. In this paper, we seek to enhance the performance of these low-rank decomposition methods. Specifically, we focus on three key issues related to decomposition in FL: what to decompose, how to decompose, and how to aggregate. Subsequently, we introduce three novel techniques: Model Update Decomposition (MUD), Block-wise Kronecker Decomposition (BKD), and Aggregation-Aware Decomposition (AAD), each targeting a specific issue. These techniques are complementary and can be applied simultaneously to achieve optimal performance. Additionally, we provide a rigorous theoretical analysis to ensure the convergence of the proposed MUD. Extensive experimental results show that our approach achieves faster convergence and superior accuracy compared to relevant baseline methods. The code is available at https://github.com/Leopold1423/fedmud-icml25.
为了提高联邦学习的培训效率,先前的研究采用了低级分解技术,以减少通信管理费用。在本文中,我们力求提高这些低级分解方法的绩效。具体地说,我们侧重于与FL分解有关的三个关键问题:分解什么,如何分解,如何分解,以及如何综合。随后,我们引入了三种新颖技术:模范更新分解技术(MUD),布洛克-中克罗内克分解技术(BKD),以及聚合-Aware分解技术(AAAD),这些技术都是针对一个具体问题的。这些技术是相辅相成的,可以同时应用,以实现最佳绩效。此外,我们提供了严格的理论分析,以确保拟议的MUD的趋同。广泛的实验结果表明,我们的方法与相关的基线方法相比,更快地趋同和更加精确。该代码可在https://github.com/Leopold1423Fedmud-icml25上查阅。
Article 241
Title@2025-05-29 (4): Contrastive Learning and Abstract Concepts: The Case of Natural Numbers
Title: Contrastive Learning and Abstract Concepts: The Case of Natural Numbers | Kontrastives Lernen und abstrakte Konzepte: Der Fall natürlicher Zahlen | 差异学习和抽象概念:自然数字案例 2408.02247v6 |
Authors: Daniel N. Nissani
Contrastive Learning (CL) has been successfully applied to classification and other downstream tasks related to concrete concepts, such as objects contained in the ImageNet dataset. No attempts seem to have been made so far in applying this promising scheme to more abstract entities. A prominent example of these could be the concept of (discrete) Quantity. CL can be frequently interpreted as a self-supervised scheme guided by some profound and ubiquitous conservation principle (e.g. conservation of identity in object classification tasks). In this introductory work we apply a suitable conservation principle to the semi-abstract concept of natural numbers by which discrete quantities can be estimated or predicted. We experimentally show, by means of a toy problem, that contrastive learning can be trained to count at a glance with high accuracy both at human as well as at super-human ranges.. We compare this with the results of a trained-to-count at a glance supervised learning (SL) neural network scheme of similar architecture. We show that both schemes exhibit similar good performance on baseline experiments, where the distributions of the training and testing stages are equal. Importantly, we demonstrate that in some generalization scenarios, where training and testing distributions differ, CL boasts more robust and much better error performance.
在与具体概念有关的分类和其他下游任务方面,例如图像网络数据集中包含的物体,已经成功地应用了对比学习(CL),在分类和其他与具体概念相关的下游任务方面,例如图像网络数据集中包含的物体。在将这一有希望的计划应用于更抽象的实体方面,迄今似乎没有尝试过任何尝试。其中的一个突出的例子可能是(分辨)数量的概念。CL可以经常被解释为一种由某些深刻和无处不在的保存原则(例如,在物体分类任务中保存身份)指导的自我监督计划。在这项介绍性工作中,我们对自然数字的半抽取概念适用了适当的保护原则,通过这种概念可以估计或预测离散的数量。我们实验性地表明,通过一个微小的问题,对比学习可以被训练在人类和超人范围内以高精度的眼光进行计算。我们把这个结果与经过培训后计数的类似结构的视觉学习(SL)神经网络计划的结果进行比较。我们表明,这两个计划在基线实验中都表现出类似的良好业绩表现,在那里,培训和测试阶段的分布是相同的,我们在一般的判断中显示比较精确的成绩。
Article 242
Title@2025-05-29 (4): Pseudo Multi-Source Domain Generalization: Bridging the Gap Between Single and Multi-Source Domain Generalization
Title: Pseudo Multi-Source Domain Generalization: Bridging the Gap Between Single and Multi-Source Domain Generalization | Pseudo-Multi-Source-Domain-Verallgemeinerung: Die Lücke zwischen Single- und Multi-Source-Domain-Verallgemeinerung überbrücken | Pseudo多源多源通用化:缩小单一源和多源通用化之间的差距 2505.23173v1 |
Authors: Shohei Enomoto
Deep learning models often struggle to maintain performance when deployed on data distributions different from their training data, particularly in real-world applications where environmental conditions frequently change. While Multi-source Domain Generalization (MDG) has shown promise in addressing this challenge by leveraging multiple source domains during training, its practical application is limited by the significant costs and difficulties associated with creating multi-domain datasets. To address this limitation, we propose Pseudo Multi-source Domain Generalization (PMDG), a novel framework that enables the application of sophisticated MDG algorithms in more practical Single-source Domain Generalization (SDG) settings. PMDG generates multiple pseudo-domains from a single source domain through style transfer and data augmentation techniques, creating a synthetic multi-domain dataset that can be used with existing MDG algorithms. Through extensive experiments with PseudoDomainBed, our modified version of the DomainBed benchmark, we analyze the effectiveness of PMDG across multiple datasets and architectures. Our analysis reveals several key findings, including a positive correlation between MDG and PMDG performance and the potential of pseudo-domains to match or exceed actual multi-domain performance with sufficient data. These comprehensive empirical results provide valuable insights for future research in domain generalization. Our code is available at https://github.com/s-enmt/PseudoDomainBed.
深度学习模式往往难以在与培训数据不同的数据发布上保持绩效,特别是在环境条件经常变化的现实世界应用中。多源通用化(MDG)在通过培训中利用多个源域显示应对这一挑战的前景,但其实际应用却因创建多域数据集的巨大成本和困难而受到限制。为了应对这一限制,我们提议Pseudo多多源多源通用化(PMDG),这是一个新颖的框架,使得能够在更实用的单一源通用化(SDG)设置中应用复杂的千年发展目标算法。PMD在单一源域中产生多种假数据,通过样式转让和数据增强技术,从单一源域产生多种假数据,创建合成多域数据集,与现有的千年发展目标算法一起使用。通过与我们修改版的DoneBed基准(Pseudomamaineed)的广泛实验,我们分析了多数据集和结构中PMDMDG的有效性。我们的分析揭示了几项关键结论,包括千年发展目标和PMDGPGS的性能和伪Dodomain-mains潜力,以匹配或超过我们现有的多域/Gealmamamamainalalalalalal exalalalal exalal exmental disals提供足够或超过或超过或超过我们现有的多域内的现有数据。
Article 243
Title@2025-05-29 (4): Global Tensor Motion Planning
Title: Global Tensor Motion Planning | Globale Tensor-Bewegungsplanung | 全球时势规划 2411.19393v3 |
Authors: An T. Le, Kay Hansel, João Carvalho, Joe Watson, Julen Urain, Armin Biess, Georgia Chalvatzaki, Jan Peters
Batch planning is increasingly necessary to quickly produce diverse and quality motion plans for downstream learning applications, such as distillation and imitation learning. This paper presents Global Tensor Motion Planning (GTMP) – a sampling-based motion planning algorithm comprising only tensor operations. We introduce a novel discretization structure represented as a random multipartite graph, enabling efficient vectorized sampling, collision checking, and search. We provide a theoretical investigation showing that GTMP exhibits probabilistic completeness while supporting modern GPU/TPU. Additionally, by incorporating smooth structures into the multipartite graph, GTMP directly plans smooth splines without requiring gradient-based optimization. Experiments on lidar-scanned occupancy maps and the MotionBenchMarker dataset demonstrate GTMP’s computation efficiency in batch planning compared to baselines, underscoring GTMP’s potential as a robust, scalable planner for diverse applications and large-scale robot learning tasks.
批量规划对于迅速为下游学习应用(如蒸馏和模仿学习)制定多样化和高质量的运动计划越来越有必要。本文介绍了全球电锯运动规划(GTMP) – – 一种基于取样的运动规划算法,仅包含高温操作。我们引入了一种新型的离散结构,作为随机的多部分图,使高效的矢量取样、碰撞检查和搜索成为可能。我们提供了一项理论调查,表明GTMP在支持现代GPU/TPU的同时,表现出了概率性的完整性。此外,通过将平滑的结构纳入多面图,GTMP直接计划平滑的样条,而不需要基于梯度的优化。关于Lidar扫描占用图和Mtion BenchMarker数据集的实验显示了GTMP在批量规划中与基线相比的计算效率,强调GTMP作为各种应用和大规模机器人学习任务的强大、可扩展的规划员的潜力。
Article 244
Title@2025-05-29 (4): Pre-training for Recommendation Unlearning
Title: Pre-training for Recommendation Unlearning | Vorschulung für Empfehlung Unlearning | 建议培训前培训 2505.22649v2 |
Authors: Guoxuan Chen, Lianghao Xia, Chao Huang
Modern recommender systems powered by Graph Neural Networks (GNNs) excel at modeling complex user-item interactions, yet increasingly face scenarios requiring selective forgetting of training data. Beyond user requests to remove specific interactions due to privacy concerns or preference changes, regulatory frameworks mandate recommender systems’ ability to eliminate the influence of certain user data from models. This recommendation unlearning challenge presents unique difficulties as removing connections within interaction graphs creates ripple effects throughout the model, potentially impacting recommendations for numerous users. Traditional approaches suffer from significant drawbacks: fragmentation methods damage graph structure and diminish performance, while influence function techniques make assumptions that may not hold in complex GNNs, particularly with self-supervised or random architectures. To address these limitations, we propose a novel model-agnostic pre-training paradigm UnlearnRec that prepares systems for efficient unlearning operations. Our Influence Encoder takes unlearning requests together with existing model parameters and directly produces updated parameters of unlearned model with little fine-tuning, avoiding complete retraining while preserving model performance characteristics. Extensive evaluation on public benchmarks demonstrates that our method delivers exceptional unlearning effectiveness while providing more than 10x speedup compared to retraining approaches. We release our method implementation at: https://github.com/HKUDS/UnlearnRec.
由图形神经网络(GNNS)推动的现代推荐系统在模拟复杂的用户-项目互动方面十分出色,但日益面临需要选择性地忘记培训数据的各种情景。除了用户要求消除因隐私问题或偏好变化而产生的特定互动外,监管框架授权用户要求删除特定用户数据的影响。这个建议不学习的挑战带来了独特的困难,因为删除互动图中的连接在整个模型中产生连结效应,可能对许多用户产生潜在影响。传统方法存在重大缺陷:碎裂方法损坏图表结构并降低性能,而影响功能技术则在复杂的GNNS中,特别是自我监督或随机结构中做出可能无法维持的假设。为了解决这些限制,我们提议建立一个新型的模型-认知前模式UnlearnRec,为高效的不学习操作准备系统。我们的影响编码器与现有的模型参数一起,直接生成了不学习模型的最新参数,很少进行微调,避免完全再培训,同时保持模型性能特性。对公共基准进行广泛的评估表明,我们的方法在提供超乎寻常的不学习有效性,同时提供10x/REUDRUDS对比再培训方法。
Article 245
Title@2025-05-29 (4): Best Arm Identification with Possibly Biased Offline Data
Title: Best Arm Identification with Possibly Biased Offline Data | Best Arm Identification mit möglicherweise Biased Offline Daten | 最佳武器标识(可能附带的离线数据) 2505.23165v1 |
Authors: Le Yang, Vincent Y. F. Tan, Wang Chi Cheung
We study the best arm identification (BAI) problem with potentially biased offline data in the fixed confidence setting, which commonly arises in real-world scenarios such as clinical trials. We prove an impossibility result for adaptive algorithms without prior knowledge of the bias bound between online and offline distributions. To address this, we propose the LUCB-H algorithm, which introduces adaptive confidence bounds by incorporating an auxiliary bias correction to balance offline and online data within the LUCB framework. Theoretical analysis shows that LUCB-H matches the sample complexity of standard LUCB when offline data is misleading and significantly outperforms it when offline data is helpful. We also derive an instance-dependent lower bound that matches the upper bound of LUCB-H in certain scenarios. Numerical experiments further demonstrate the robustness and adaptability of LUCB-H in effectively incorporating offline data.
我们研究固定信心环境下可能偏差的离线数据的最佳手臂识别问题,这通常出现在临床试验等现实世界情景中。我们证明,在没有事先了解在线和离线分布之间的偏差的情况下,适应性算法不可能产生结果。为了解决这个问题,我们提议采用LUCB-H算法,通过在 LUCB框架内纳入辅助性偏差校正以平衡离线和在线数据,引入适应性信任界限。理论分析表明,LUCB-H与标准的LUCB的样本复杂性相匹配,因为离线数据具有误导性,在离线数据有帮助时大大超过它。我们还得出了一个与LUCB-H的上界相匹配的低实例约束。数字实验进一步证明了LUCB-H在有效纳入离线数据方面的稳健性和适应性。
Article 246
Title@2025-05-29 (4): Temporal Relation Extraction in Clinical Texts: A Span-based Graph Transformer Approach
Title: Temporal Relation Extraction in Clinical Texts: A Span-based Graph Transformer Approach | Temporale Beziehungsextraktion in klinischen Texten: Ein Span-basierter Graph Transformer-Ansatz | 临床文本中的时间关系抽取时间关系:基于泛泛面的图形变形器方法 2503.18085v2 |
Authors: Rochana Chaturvedi, Peyman Baghershahi, Sourav Medya, Barbara Di Eugenio
Temporal information extraction from unstructured text is essential for contextualizing events and deriving actionable insights, particularly in the medical domain. We address the task of extracting clinical events and their temporal relations using the well-studied I2B2 2012 Temporal Relations Challenge corpus. This task is inherently challenging due to complex clinical language, long documents, and sparse annotations. We introduce GRAPHTREX, a novel method integrating span-based entity-relation extraction, clinical large pre-trained language models (LPLMs), and Heterogeneous Graph Transformers (HGT) to capture local and global dependencies. Our HGT component facilitates information propagation across the document through innovative global landmarks that bridge distant entities. Our method improves the state-of-the-art with 5.5% improvement in the tempeval $F_1$ score over the previous best and up to 8.9% improvement on long-range relations, which presents a formidable challenge. We further demonstrate generalizability by establishing a strong baseline on the E3C corpus. This work not only advances temporal information extraction but also lays the groundwork for improved diagnostic and prognostic models through enhanced temporal reasoning.
从非结构化文本中抽取时空信息对于使事件背景化和产生可操作的洞察力至关重要,特别是在医疗领域。我们利用研究周密的2012年I2B2《时际关系挑战》来应对提取临床事件及其时间关系的任务。由于复杂的临床语言、长的文件和稀疏的注释,这项任务具有内在挑战性。我们引入了GRAPHTREX,这是将基于跨实体关系的提取、临床预先培训的大型语言模型(LPLMS)和异质图形变异器(HGT)整合在一起的新方法,以捕捉本地和全球依赖性。我们HGT部分不仅通过连接遥远实体的创新的全球里程碑促进在文件中的信息传播,而且通过强化的时空推理推理为改进诊断和预测模型打下基础。
Article 247
Title@2025-05-29 (4): Implicit Inversion turns CLIP into a Decoder
Title: Implicit Inversion turns CLIP into a Decoder | Implizite Inversion macht CLIP zu einem Decoder | 隐隐性 Indicide Inversion 将 CLIP 转换为解码器 2505.23161v1 |
Authors: Antonio D’Orazio, Maria Rosaria Briglia, Donato Crisostomi, Dario Loi, Emanuele Rodolà, Iacopo Masi
CLIP is a discriminative model trained to align images and text in a shared embedding space. Due to its multimodal structure, it serves as the backbone of many generative pipelines, where a decoder is trained to map from the shared space back to images. In this work, we show that image synthesis is nevertheless possible using CLIP alone – without any decoder, training, or fine-tuning. Our approach optimizes a frequency-aware implicit neural representation that encourages coarse-to-fine generation by stratifying frequencies across network layers. To stabilize this inverse mapping, we introduce adversarially robust initialization, a lightweight Orthogonal Procrustes projection to align local text and image embeddings, and a blending loss that anchors outputs to natural image statistics. Without altering CLIP’s weights, this framework unlocks capabilities such as text-to-image generation, style transfer, and image reconstruction. These findings suggest that discriminative models may hold untapped generative potential, hidden in plain sight.
CLIP是一种歧视性模型,在共同嵌入空间对图像和文字进行匹配。 由于其多式结构, 它是许多基因管道的主干, 在那里, 解码器经过培训, 从共享空间映射回到图像。 在这项工作中, 我们显示图像合成仍然有可能单独使用 CLIP – – 无需任何解码、 培训或微调。 我们的方法优化了频觉隐性神经代表, 从而通过对网络各层的频率进行分解来鼓励粗化到纤维的生成。 为了稳定这一反向映射, 我们引入了对抗性强的初始化, 轻量的 Orthogonal Procrustes 投影, 以对本地文本和图像嵌入进行匹配, 以及将输出锁定到自然图像统计数据的混合损失。 在不改变 CLIP 的重量的情况下, 这个框架释放了文本到图像生成、 风格传输和图像重建等能力。 这些发现显示, 歧视模式可能具有未开发的基因化潜力, 隐藏在普通的视野中 。
Article 248
Title@2025-05-29 (4): Topological Adaptive Least Mean Squares Algorithms over Simplicial Complexes
Title: Topological Adaptive Least Mean Squares Algorithms over Simplicial Complexes | Topologische Adaptive Least Mean Squares Algorithmen über Simplicial Complexes | 简单综合体的地形适应性最低中度平方平方平方平方平方平方平方平 2505.23160v1 |
Authors: Lorenzo Marinucci, Claudio Battiloro, Paolo Di Lorenzo
This paper introduces a novel adaptive framework for processing dynamic flow signals over simplicial complexes, extending classical least-mean-squares (LMS) methods to high-order topological domains. Building on discrete Hodge theory, we present a topological LMS algorithm that efficiently processes streaming signals observed over time-varying edge subsets. We provide a detailed stochastic analysis of the algorithm, deriving its stability conditions, steady-state mean-square-error, and convergence speed, while exploring the impact of edge sampling on performance. We also propose strategies to design optimal edge sampling probabilities, minimizing rate while ensuring desired estimation accuracy. Assuming partial knowledge of the complex structure (e.g., the underlying graph), we introduce an adaptive topology inference method that integrates with the proposed LMS framework. Additionally, we propose a distributed version of the algorithm and analyze its stability and mean-square-error properties. Empirical results on synthetic and real-world traffic data demonstrate that our approach, in both centralized and distributed settings, outperforms graph-based LMS methods by leveraging higher-order topological features.
本文介绍一个新的适应框架,用于处理简单复合物的动态流信号,将典型的最小比例(LMS)方法扩大到高阶地形领域。根据离散的Hodge理论,我们提出一种地貌式LMS算法,高效处理在时间变化边缘子集中观测到的流信号。我们对算法进行详细的随机分析,得出其稳定性条件、稳定状态平均比例值和趋同速度,同时探索边缘取样对性能的影响。我们还提出了设计最佳边缘取样概率的战略,最大限度地降低比率,同时确保预期的估计准确性。假设对复杂结构(例如基本图)有部分了解,我们采用适应性地貌推论方法,与拟议的LMS框架相结合。此外,我们提出一个分布式的算法,分析其稳定性和平均比例值特性。关于合成和真实世界交通数据的实证结果表明,我们在中央和分布式环境中采用的方法,超越了基于图表的精确度。
Article 249
Title@2025-05-29 (4): Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services
Title: Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services | Privacy-Aware Joint DNN Model Bereitstellung und Partitionierung Optimierung für kollaborative Edge Inferenz Services | DNN 联合DNN 合作边缘推断服务示范部署和分离优化优化模式 2502.16091v3 |
Authors: Zhipeng Cheng, Xiaoyu Xia, Hong Wang, Minghui Liwang, Ning Chen, Xuwei Fan, Xianbin Wang
Edge inference (EI) has emerged as a promising paradigm to address the growing limitations of cloud-based Deep Neural Network (DNN) inference services, such as high response latency, limited scalability, and severe data privacy exposure. However, deploying DNN models on resource-constrained edge devices introduces additional challenges, including limited computation/storage resources, dynamic service demands, and heightened privacy risks. To tackle these issues, this paper presents a novel privacy-aware optimization framework that jointly addresses DNN model deployment, user-server association, and model partitioning, with the goal of minimizing long-term average inference delay under resource and privacy constraints. The problem is formulated as a complex, NP-hard stochastic optimization. To efficiently handle system dynamics and computational complexity, we employ a Lyapunov-based approach to transform the long-term objective into tractable per-slot decisions. Furthermore, we introduce a coalition formation game to enable adaptive user-server association and design a greedy algorithm for model deployment within each coalition. Extensive simulations demonstrate that the proposed algorithm significantly reduces inference delay and consistently satisfies privacy constraints, outperforming state-of-the-art baselines across diverse scenarios.
为解决这些问题,本文件提出了一个新的隐私优化框架,共同解决基于云的深神经网络(DNN)的测算服务(DNN)的日益局限性,如高反应延迟、缩放有限和数据隐私暴露严重等。然而,在资源紧缺的边缘装置中部署DNN模型带来了额外的挑战,包括有限的计算/存储资源、动态服务需求和增加隐私风险。为解决这些问题,本文件提出了一个新的隐私意识优化框架,共同解决DNN模型部署、用户-服务器关联和模型分割,目标是在资源和隐私限制下尽量减少长期平均推论延迟。问题被表述为复杂、硬的系统优化。为高效处理系统动态和计算复杂性,我们采用了基于Lyapunov的方法将长期目标转化为可移动的人均决定。此外,我们引入了一个联盟形成游戏,以便能够适应用户-服务器的组合,并为每个联盟内的模型部署设计一种贪婪的算法。广泛的模拟表明,拟议的算法大大降低了各种隐私基线限制之间的误差,并持续履行各种要求。
Article 250
Title@2025-05-29 (4): Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Title: Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners | Größer, regularisiert, kategorisch: High-Kapacity-Wert-Funktionen sind effiziente Multi-Task-Lerner | 大型、正规、分类:高能力价值功能是高效多任务学习者 2505.23150v1 |
Authors: Michal Nauman, Marek Cygan, Carmelo Sferrazza, Aviral Kumar, Pieter Abbeel
Recent advances in language modeling and vision stem from training large models on diverse, multi-task data. This paradigm has had limited impact in value-based reinforcement learning (RL), where improvements are often driven by small models trained in a single-task context. This is because in multi-task RL sparse rewards and gradient conflicts make optimization of temporal difference brittle. Practical workflows for generalist policies therefore avoid online training, instead cloning expert trajectories or distilling collections of single-task policies into one agent. In this work, we show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online RL, allowing for robust and scalable multi-task training. We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL. We find that, despite its simplicity, the proposed approach leads to state-of-the-art single and multi-task performance, as well as sample-efficient transfer to new tasks.
语言建模和愿景方面的近期进展来自对多种多任务数据大型模型的培训,这种模式在基于价值的强化学习(RL)方面影响有限,因为改进往往由在单一任务背景下培训的小型模型驱动。这是因为在多任务RL稀薄的奖励和梯度冲突中,时间差的优化使时间差变得微不足道。一般政策的实际工作流程因此避免了在线培训,而避免了克隆专家轨迹或将单任务政策集成成一个代理物。在这项工作中,我们发现,使用通过交叉渗透和以可学习任务嵌入为条件而培训的高能力值模型,解决了在线RL的任务干扰问题,从而能够进行稳健和可扩展的多任务培训。我们测试了7项多任务基准的方法,其任务超过280项,涵盖高程度自由类人类控制和离散的视野RL。我们发现,尽管拟议方法简单,但最终导致采用最先进的单一和多任务状态,并且具有抽样效率地转移到新任务。
Article 251
Title@2025-05-29 (4): FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing
Title: FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing | FlowAlign: Trajektorie-regularisierte, inversionsfreie Fluss-basierte Bildbearbeitung | 流动对等: 轨迹- 重新分类、 转换- 无流动图像编辑 2505.23145v1 |
Authors: Jeongsol Kim, Yeobin Hong, Jong Chul Ye
Recent inversion-free, flow-based image editing methods such as FlowEdit leverages a pre-trained noise-to-image flow model such as Stable Diffusion 3, enabling text-driven manipulation by solving an ordinary differential equation (ODE). While the lack of exact latent inversion is a core advantage of these methods, it often results in unstable editing trajectories and poor source consistency. To address this limitation, we propose FlowAlign, a novel inversion-free flow-based framework for consistent image editing with principled trajectory control. FlowAlign introduces a flow-matching loss as a regularization mechanism to promote smoother and more stable trajectories during the editing process. Notably, the flow-matching loss is shown to explicitly balance semantic alignment with the edit prompt and structural consistency with the source image along the trajectory. Furthermore, FlowAlign naturally supports reverse editing by simply reversing the ODE trajectory, highlighting the reversible and consistent nature of the transformation. Extensive experiments demonstrate that FlowAlign outperforms existing methods in both source preservation and editing controllability.
最近的无反向、无流动的图像编辑方法,如FlowEdit 等,利用事先训练的噪音到图像流模式,如Snable Difil 3,通过解决普通的差别方程式(ODE)进行文本驱动的操纵。虽然缺乏确切的潜在反向是这些方法的核心优势,但往往导致编辑轨迹不稳定和源源一致性差。为解决这一限制,我们提议了FlowAlign,这是一个新的无逆向流动框架,用于与有原则的轨迹控制进行一致的图像编辑。FlowAlign 引入了流动匹配损失,作为在编辑过程中促进更顺畅和更稳定的轨迹的正规化机制。值得注意的是,流程匹配损失明显平衡了与按轨迹与源图像编辑的快速和结构一致性之间的平衡。此外,FlowAlign 自然支持反向编辑,只是颠倒了ODE的轨迹,强调变换的可逆转性和一致性。广泛的实验表明,FlowAltal超越了在源保护和编辑可控性两方面的现有方法。
Article 252
Title@2025-05-29 (4): OmniArch: Building Foundation Model For Scientific Computing
Title: OmniArch: Building Foundation Model For Scientific Computing | OmniArch: Building Foundation Model for Scientific Computing | OmniArch:建筑基金会科学计算模型 2402.16014v3 |
Authors: Tianyu Chen, Haoyi Zhou, Ying Li, Hao Wang, Chonghan Gao, Rongye Shi, Shanghang Zhang, Jianxin Li
Foundation models have revolutionized language modeling, while whether this success is replicated in scientific computing remains unexplored. We present OmniArch, the first prototype aiming at solving multi-scale and multi-physics scientific computing problems with physical alignment. We addressed all three challenges with one unified architecture. Its pre-training stage contains a Fourier Encoder-decoder fading out the disharmony across separated dimensions and a Transformer backbone integrating quantities through temporal dynamics, and the novel PDE-Aligner performs physics-informed fine-tuning under flexible conditions. As far as we know, we first conduct 1D-2D-3D united pre-training on the PDEBench, and it sets not only new performance benchmarks for 1D, 2D, and 3D PDEs but also demonstrates exceptional adaptability to new physics via in-context and zero-shot learning approaches, which supports realistic engineering applications and foresight physics discovery.
基金会模型已经使语言建模发生了革命性的变化,而科学计算中是否复制了这一成功,至今仍未探索。我们展示了旨在解决多规模和多物理科学计算问题的第一个原型OmniArch,这是在物理一致性方面解决多规模和多物理科学计算问题的首个原型。我们用一个统一的架构应对了所有这三项挑战。其培训前阶段包含一个Fourier Eccder-decoder ,它通过不同维度的不和谐和通过时间动态将数量整合在一起的变异主干柱,而新颖的PDE-Aligner则在灵活条件下进行物理知情的微调。据我们所知,我们首先在PDEBench上进行了1D-2D-3D联合培训,它不仅为1D、2D和3D PDEs设定了新的性能基准,而且还展示了通过文字和零光学方法对新物理学的特殊适应性,支持现实的工程应用和展望物理发现。
Article 253
Title@2025-05-29 (4): Policy Filtration for RLHF to Mitigate Noise in Reward Models
Title: Policy Filtration for RLHF to Mitigate Noise in Reward Models | Politische Filtration für RLHF zur Mititation von Lärm in Prämienmodellen | 将RLHF政策归类为奖励模型中最小噪音的政策 2409.06957v4 |
Authors: Chuheng Zhang, Wei Shen, Li Zhao, Xuyun Zhang, Xiaolong Xu, Wanchun Dou, Jiang Biang
While direct policy optimization methods exist, pioneering LLMs are fine-tuned with reinforcement learning from human feedback (RLHF) to generate better responses under the supervision of a reward model learned from preference data. One major challenge of RLHF is the inaccuracy of the intermediate reward model, especially in the tasks that requires complex reasoning for the reward model to score a response. We find that the reliability of the reward model varies across responses assigned with different rewards. This motivates us to filter the samples whose rewards may be unreliable to improve the signal-to-noise ratio during policy learning, resulting in Policy Filtration for Proximal Policy Optimization (PF-PPO). To choose a proper policy filtering strategy, we use the coefficient of determination (R2) between the rewards and actual scores on filtered samples as the metrics to help us find promising strategies since it measures how well the rewards filtered by PF-PPO indicate real performance. We provide extensive experiments to validate the effectiveness of PF-PPO in code generation and math reasoning tasks. In code generation, PF-PPO achieves the state-of-the-art performance of 7-billion-parameter models on HumanEval (+7.9%), MBPP (+0.7%), and LeetCode Contest (+10.0%) which is a more challenging benchmark created by us. In math reasoning, PF-PPO yields performance increase using different reward models and benchmarks (Ape210K and CMATH). Code is available on https://github.com/DtYXs/verl/tree/pf-ppo.
虽然存在直接的政策优化方法,但开拓性LLMS的精细调整与人类反馈(RLHF)的强化学习(RLHF)相匹配,以在从优惠数据中学习的奖赏模式(PF-PPPO)的监督下产生更好的反应。RLHF的主要挑战之一是中间奖赏模式的不准确性能,特别是在奖赏模式需要复杂推理才能得分的任务中。我们发现奖赏模式的可靠性因不同答复而不同。这促使我们过滤奖赏模式的可靠性,这些样本的奖赏可能并不可靠,以便在政策学习期间改善信号到音响的比例,从而导致对准性政策优化的政策(PF-PPPPO)进行政策化(PF-PF-PPPO 优化政策(PF-PP-PPP-PPP-PPPO) 政策优化政策(PF-PF-PF-PF-PPPP-PPP-PL) 政策优化政策优化政策政策政策(PF-PF-PPPPPP-P-PM-PLS-PL) 政策优化政策优化政策优化政策政策政策化的优化战略。我们选择适当的政策过滤战略,我们选择战略(R2(R2)与实际判断系数(R2-R2-R2) 与实际成本/MLOMLOMLOMLOMLOD+PSMM)和实际性标准 7+7+PSMLMLMLM) 标准/数学/数学/数学/数学/数学基准 和数学 标准 标准 和数学 标准 标准 标准 标准 和数学/数学/数学/数学/数学 。
Article 254
Title@2025-05-29 (4): Learning to Reason under Off-Policy Guidance
Title: Learning to Reason under Off-Policy Guidance | Unter außerpolitischer Anleitung zur Vernunft lernen | 根据非政策指导学习理由 2504.14945v4 |
Authors: Jianhao Yan, Yafu Li, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang
Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning with verifiable rewards~(\textit{RLVR}). However, existing \textit{RLVR} approaches are inherently ``on-policy’’, limiting learning to a model’s own outputs and failing to acquire reasoning abilities beyond its initial capabilities. To address this issue, we introduce \textbf{LUFFY} (\textbf{L}earning to reason \textbf{U}nder o\textbf{FF}-polic\textbf{Y} guidance), a framework that augments \textit{RLVR} with off-policy reasoning traces. LUFFY dynamically balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training. Specifically, LUFFY combines the Mixed-Policy GRPO framework, which has a theoretically guaranteed convergence rate, alongside policy shaping via regularized importance sampling to avoid superficial and rigid imitation during mixed-policy training. Compared with previous RLVR methods, LUFFY achieves an over \textbf{+6.4} average gain across six math benchmarks and an advantage of over \textbf{+6.2} points in out-of-distribution tasks. Most significantly, we show that LUFFY successfully trains weak models in scenarios where on-policy RLVR completely fails. These results provide compelling evidence that LUFFY transcends the fundamental limitations of on-policy RLVR and demonstrates the great potential of utilizing off-policy guidance in RLVR.
大型推理模型(LRMs)的近期进步表明,多步推理和自我反省等复杂行为可以通过以可核查的回报来强化学习~(\ textit{RLVR}) 。 但是,现有的\ textit{RLVR} 方法本质上是“ 政策性” , 将学习限制在模型自己的产出上, 并且没有获得超出其初始能力的推理能力。 为了解决这个问题, 我们引入了\ textbf{LUFFY} (\ textb{L}学习到理性的多步推理( textbf{U} ) 和自我反动( textb} ) 自我反动学习 。 但是, 现有的\ textitleitle{RLLLLVRRRRR} 方法本身, 通过常规性取样避免表面和僵硬性地模仿( RFF) 基础性分析结果, 与以往的RFF 平均的RFF 方法相比, 成功地展示了以往的RFF 水平优势。
Article 255
Title@2025-05-29 (4): VERINA: Benchmarking Verifiable Code Generation
Title: VERINA: Benchmarking Verifiable Code Generation | VERINA: Benchmarking der überprüfbaren Code-Generierung | VERINA:可核实代码生成基准 2505.23135v1 |
Authors: Zhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel, Kaiyu Yang, Dawn Song
Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable code generation – jointly generating code, specifications, and proofs of code-specification alignment – offers a promising path to address this limitation and further unleash LLMs’ benefits in coding. Yet, there exists a significant gap in evaluation: current benchmarks often lack support for end-to-end verifiable code generation. In this paper, we introduce Verina (Verifiable Code Generation Arena), a high-quality benchmark enabling a comprehensive and modular evaluation of code, specification, and proof generation as well as their compositions. Verina consists of 189 manually curated coding tasks in Lean, with detailed problem descriptions, reference implementations, formal specifications, and extensive test suites. Our extensive evaluation of state-of-the-art LLMs reveals significant challenges in verifiable code generation, especially in proof generation, underscoring the need for improving LLM-based theorem provers in verification domains. The best model, OpenAI o4-mini, generates only 61.4% correct code, 51.0% sound and complete specifications, and 3.6% successful proofs, with one trial per task. We hope Verina will catalyze progress in verifiable code generation by providing a rigorous and comprehensive benchmark. We release our dataset on https://huggingface.co/datasets/sunblaze-ucb/verina and our evaluation code on https://github.com/sunblaze-ucb/verina.
大型语言模型(LLMS)日益融入软件开发,但确保LLM生成的代码的正确性仍具有挑战性,而且往往需要花费昂贵的人工审查。可验证代码的生成 – – 共同生成代码、规格和具体编码协调的证明 – – 为解决这一限制和进一步释放LLMS的编码好处提供了一条充满希望的道路。然而,在评价方面存在着巨大的差距:目前的基准往往缺乏对端至端可核查代码生成的支持。在本文件中,我们引入了一个高质量基准,从而能够对代码、规格和证据生成及其构成进行全面和模块化评价。Verina由189个手工拼凑的编码任务组成,其中有详细的问题描述、参考执行、正式规格和广泛的测试套件。我们对目前最先进的LLMSM(可验证代码生成的DLMSUCSDS/Arencrearetures)的生成存在重大挑战。我们的最佳模型(OO4minirea)只能生成61.4%的代码,51.0%的硬度和3.6%的精确度的代码,我们将提供我们精确度数据生成的进度和3.6%的数据。
Article 256
Title@2025-05-29 (4): DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
Title: DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs | DOPPLER: Dual-Policy-Lernen für die Gerätezuordnung in asynchronen Datenflussgraphen | DOPPLER: 同步数据流图表中设备分配的双政策学习 2505.23131v1 |
Authors: Xinyu Yao, Daniel Bourgeois, Abhinav Jain, Yuxin Tang, Jiawen Yao, Zhimin Ding, Arlei Silva, Chris Jermaine
We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose \textsc{Doppler}, a three-stage framework for training dual-policy networks consisting of 1) a $\mathsf{SEL}$ policy for selecting operations and 2) a $\mathsf{PLC}$ policy for placing chosen operations on devices. Our experiments show that \textsc{Doppler} outperforms all baseline methods across tasks by reducing system execution time and additionally demonstrates sampling efficiency by reducing per-episode training time.
我们研究在数据流图中将操作分配到在工作保护系统中最大限度地减少执行时间的设备上的问题,重点是复杂的机器学习工作量。先前的学习方法往往由于三个关键限制而困难重重:(1) 依赖诸如TensorFlow这样的散装同步系统,这些系统由于障碍同步而未充分利用设备;(2) 在设计学习方法时对基础系统的时间安排机制缺乏认识;(3) 完全依赖强化学习,忽视专家设计的有效超常结构。在本文中,我们提议为培训双政策网络建立一个三阶段框架,包括:1) $\mathsf{SEL} 业务选择政策;和(2) 将选定操作安装在设备上的政策。我们的实验表明, ktextsc{Doppler} 通过减少系统执行时间和通过减少人均培训时间来进一步展示取样效率,从而超越了所有任务的基线方法。
Article 257
Title@2025-05-29 (4): Developing Cryptocurrency Trading Strategy Based on Autoencoder-CNN-GANs Algorithms
Title: Developing Cryptocurrency Trading Strategy Based on Autoencoder-CNN-GANs Algorithms | Entwicklung einer Cryptowährungs-Handelsstrategie auf der Grundlage von Autoencoder-CNN-GAN-Algorithmen | 制定基于自动编码器-CNN-GANs算法的加密货币交易战略 2412.18202v5 |
Authors: Zhuohuan Hu, Richard Yu, Zizhou Zhang, Haoran Zheng, Qianying Liu, Yining Zhou
This paper leverages machine learning algorithms to forecast and analyze financial time series. The process begins with a denoising autoencoder to filter out random noise fluctuations from the main contract price data. Then, one-dimensional convolution reduces the dimensionality of the filtered data and extracts key information. The filtered and dimensionality-reduced price data is fed into a GANs network, and its output serve as input of a fully connected network. Through cross-validation, a model is trained to capture features that precede large price fluctuations. The model predicts the likelihood and direction of significant price changes in real-time price sequences, placing trades at moments of high prediction accuracy. Empirical results demonstrate that using autoencoders and convolution to filter and denoise financial data, combined with GANs, achieves a certain level of predictive performance, validating the capabilities of machine learning algorithms to discover underlying patterns in financial sequences. Keywords - CNN;GANs; Cryptocurrency; Prediction.
本文利用机器学习算法来预测和分析财务时间序列。 这一过程始于从主合同价格数据中过滤随机噪音波动的自定义自动编码器。 然后, 单维演化会降低过滤数据维度并提取关键信息。 过滤和维度降低的价格数据被输入GANs网络, 其输出作为完全连接网络的输入。 通过交叉校验, 一个模型被训练来捕捉价格大幅波动之前的特征。 该模型预测实时价格序列中重大价格变化的可能性和方向, 将交易置于高预测准确度的时刻。 光学结果显示, 使用自动编码器和组合过滤和嵌入金融数据, 与 GANs 相结合, 达到一定水平的预测性能, 验证机器学习算法在财务序列中发现基本模式的能力。 关键词 - CNNN; GANs; Cryptocalcument; 预测性能。
Article 258
Title@2025-05-29 (4): Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network
Title: Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network | Surrogate-Assisted Evolutionary Verstärkung Lernen auf der Grundlage von Autoencoder und Hyperbolic Neural Network | 基于自动编码器和双曲神经网络的代用辅助辅助进化辅助进化强化学习 2505.19423v2 |
Authors: Bingdong Li, Mei Jiang, Hong Qian, Ke Tang, Aimin Zhou, Peng Yang
Evolutionary Reinforcement Learning (ERL), training the Reinforcement Learning (RL) policies with Evolutionary Algorithms (EAs), have demonstrated enhanced exploration capabilities and greater robustness than using traditional policy gradient. However, ERL suffers from the high computational costs and low search efficiency, as EAs require evaluating numerous candidate policies with expensive simulations, many of which are ineffective and do not contribute meaningfully to the training. One intuitive way to reduce the ineffective evaluations is to adopt the surrogates. Unfortunately, existing ERL policies are often modeled as deep neural networks (DNNs) and thus naturally represented as high-dimensional vectors containing millions of weights, which makes the building of effective surrogates for ERL policies extremely challenging. This paper proposes a novel surrogate-assisted ERL that integrates Autoencoders (AE) and Hyperbolic Neural Networks (HNN). Specifically, AE compresses high-dimensional policies into low-dimensional representations while extracting key features as the inputs for the surrogate. HNN, functioning as a classification-based surrogate model, can learn complex nonlinear relationships from sampled data and enable more accurate pre-selection of the sampled policies without real evaluations. The experiments on 10 Atari and 4 Mujoco games have verified that the proposed method outperforms previous approaches significantly. The search trajectories guided by AE and HNN are also visually demonstrated to be more effective, in terms of both exploration and convergence. This paper not only presents the first learnable policy embedding and surrogate-modeling modules for high-dimensional ERL policies, but also empirically reveals when and why they can be successful.
强化进化强化学习(ERL) , 培训强化学习(RL) 政策, 以进化算法( EAs) 培训强化学习( RL) 政策, 展示出比传统政策梯度( EAs) 更高的探索能力和更强。 然而, ERL 受到高计算成本和低搜索效率的影响, 因为 EAs 需要用昂贵的模拟来评估众多候选政策, 其中很多模拟无效, 并且没有为培训做出有意义的贡献。 减少无效评价的一种直观方法是采用代孕。 不幸的是, 现有的ERL 政策往往以深层神经网络( DNNS) 的模式为模型, 因而自然地代表着包含数百万重量的高维度矢量的高度矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量, 使得为ERL政策制定有效的有效代谢( ER) 。 本文提出一个新的代谢辅助ERL , 将A( AE) 和 Syblicol Nealcol Neal) 的演示政策纳入了先前的演示模型, 。
Article 259
Title@2025-05-29 (4): Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals
Title: Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals | Lernen, in wiederholten Hauptagenten-Problemen mit Adversarial Agent Ankunft zu fördern | 学习鼓励与抵达时的对冲代理人员重复发生主要问题 2505.23124v1 |
Authors: Junyan Liu, Arnab Maiti, Artin Tajdini, Kevin Jamieson, Lillian J. Ratliff
We initiate the study of a repeated principal-agent problem over a finite horizon $T$, where a principal sequentially interacts with $K\geq 2$ types of agents arriving in an adversarial order. At each round, the principal strategically chooses one of the $N$ arms to incentivize for an arriving agent of unknown type. The agent then chooses an arm based on its own utility and the provided incentive, and the principal receives a corresponding reward. The objective is to minimize regret against the best incentive in hindsight. Without prior knowledge of agent behavior, we show that the problem becomes intractable, leading to linear regret. We analyze two key settings where sublinear regret is achievable. In the first setting, the principal knows the arm each agent type would select greedily for any given incentive. Under this setting, we propose an algorithm that achieves a regret bound of $O(\min{\sqrt{KT\log N},K\sqrt{T}})$ and provide a matching lower bound up to a $\log K$ factor. In the second setting, an agent’s response varies smoothly with the incentive and is governed by a Lipschitz constant $L\geq 1$. Under this setting, we show that there is an algorithm with a regret bound of $\tilde{O}((LN)^{1/3}T^{2/3})$ and establish a matching lower bound up to logarithmic factors. Finally, we extend our algorithmic results for both settings by allowing the principal to incentivize multiple arms simultaneously in each round.
我们开始研究一个在一定范围内反复发生的主要代理人问题, 即T$, 一位主要代理人与以对抗性命令到达的K$\geq 2$的代理人依次互动。 在每一回合中, 负责人从战略上选择一个美元武器来激励一个身份不明的代理人。 然后, 代理根据其自身的效用和所提供的奖励选择一个手臂, 并且委托人得到相应的奖励。 目标是在事后看到的最佳激励下, 最大限度地减少遗憾。 在不事先了解代理人行为的情况下, 我们显示问题变得棘手, 导致线性遗憾。 我们分析两个关键设置, 亚线性遗憾是可以实现的。 在第一回合中, 委托人知道每种武器类型的手臂会贪婪地选择任何给予的奖励。 在此情况下, 我们提出一种算法, 实现美元( miníqrrt{KT} 的遗憾绑定, K\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Article 260
Title@2025-05-29 (4): BroadGen: A Framework for Generating Effective and Efficient Advertiser Broad Match Keyphrase Recommendations
Title: BroadGen: A Framework for Generating Effective and Efficient Advertiser Broad Match Keyphrase Recommendations | BroadGen: Ein Framework zur Generierung effektiver und effizienter Advertiser Broad Match Keyphrase-Empfehlungen | BloadGen:一个产生有效和高效广告的高效和高效广告大匹配关键词句建议的框架 2505.19164v2 |
Authors: Ashirbad Mishra, Jinyu Zhao, Soumik Dey, Hansi Wu, Binbin Li, Kamesh Madduri
In the domain of sponsored search advertising, the focus of Keyphrase recommendation has largely been on exact match types, which pose issues such as high management expenses, limited targeting scope, and evolving search query patterns. Alternatives like Broad match types can alleviate certain drawbacks of exact matches but present challenges like poor targeting accuracy and minimal supervisory signals owing to limited advertiser usage. This research defines the criteria for an ideal broad match, emphasizing on both efficiency and effectiveness, ensuring that a significant portion of matched queries are relevant. We propose BroadGen, an innovative framework that recommends efficient and effective broad match keyphrases by utilizing historical search query data. Additionally, we demonstrate that BroadGen, through token correspondence modeling, maintains better query stability over time. BroadGen’s capabilities allow it to serve daily, millions of sellers at eBay with over 2.3 billion items.
在受赞助的搜索广告领域,Keyphone建议的重点主要放在精确匹配类型上,这提出了高管理费用、有限目标选择范围和不断变化的搜索查询模式等问题。像Bload匹配类型这样的替代办法可以减轻某些准确匹配的缺点,但由于广告用户使用有限而带来的目标选择准确性差和监管信号少等挑战。这项研究界定了理想广泛匹配的标准,同时强调效率和有效性,确保大量匹配的查询具有相关性。我们提议BroadGen,这是一个创新框架,通过利用历史搜索查询数据,建议高效率和有成效的广泛匹配关键词。此外,我们证明BroadGen通过象征性通信模型,在一段时间内保持了更好的查询稳定性。BloadGen的能力允许它每天为超过23亿项的eBay的数百万卖主提供服务。
Article 261
Title@2025-05-29 (4): CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark
Title: CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark | CASS: Nvidia zu AMD Transpilation mit Daten, Modellen und Benchmark | CASS: Nvidia 到AMD 传输数据、模型和基准 2505.16968v3 |
Authors: Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud
We introduce CASS, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA <–> HIP) and assembly-level (Nvidia SASS <–> AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the CASS family of domain-specific language models, achieving 95% source translation accuracy and 37.5% assembly translation accuracy, substantially outperforming commercial baselines such as GPT-4o, Claude, and Hipify. Our generated code matches native performance in over 85% of test cases, preserving runtime and memory behavior. To support rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 16 GPU domains with ground-truth execution. All data, models, and evaluation tools are released as open source to foster progress in GPU compiler tooling, binary compatibility, and LLM-guided hardware translation.
我们引入了CASS, 这是首个用于跨建筑化 GPU 代码转换的大型数据集和模型套件, 针对源级( CUDA < - > HIP) 和组装级( Nvidia SASSS < - > AMD RDNA3) 翻译。 该数据集由70k经核实的对数组成, 跨越主机和装置, 解决低级别 GPU 代码可移植性的重大差距。 利用此资源, 我们培训 CASS 群域域语言模型, 实现95% 源翻译准确性和37.5% 组装翻译准确性, 大大超过 GPT-4o、 Claude 和 Hipifify等商业基线。 我们生成的代码匹配了85%以上测试案例的本地性能, 保存运行时间和记忆行为。 为了支持严格的评估, 我们引入了 CASS- Bench, 一个覆盖16 GPU 域域的曲线基准, 并带有地标执行。 所有的数据、 模型和评价工具都作为公开来源发布, 以促进 GPUPU 工具的编译、 和 LLM 制硬件翻译的进展 。
Article 262
Title@2025-05-29 (4): To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay
Title: To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay | Zu richten oder nicht zu richten: LLM-Richtungen für Werbetreibende Keyphrase Relevanz bei eBay verwenden | 法官或非法官:在eBay使用LLM判决来作广告 2505.04209v2 |
Authors: Soumik Dey, Hansi Wu, Binbin Li
E-commerce sellers are recommended keyphrases based on their inventory on which they advertise to increase buyer engagement (clicks/sales). The relevance of advertiser keyphrases plays an important role in preventing the inundation of search systems with numerous irrelevant items that compete for attention in auctions, in addition to maintaining a healthy seller perception. In this work, we describe the shortcomings of training Advertiser keyphrase relevance filter models on click/sales/search relevance signals and the importance of aligning with human judgment, as sellers have the power to adopt or reject said keyphrase recommendations. In this study, we frame Advertiser keyphrase relevance as a complex interaction between 3 dynamical systems – seller judgment, which influences seller adoption of our product, Advertising, which provides the keyphrases to bid on, and Search, who holds the auctions for the same keyphrases. This study discusses the practicalities of using human judgment via a case study at eBay Advertising and demonstrate that using LLM-as-a-judge en-masse as a scalable proxy for seller judgment to train our relevance models achieves a better harmony across the three systems – provided that they are bound by a meticulous evaluation framework grounded in business metrics.
在这项工作中,我们描述了培训广告商关键词相关性过滤模型在点击/销售/搜索相关性信号方面的缺点,以及与人类判断保持一致的重要性,因为卖方有权采纳或拒绝上述关键词句建议。在本研究中,我们将广告关键词句的关联性作为三个动态系统 – – 卖方判决 – – 之间的复杂互动关系来设置。 卖方判决影响卖方采用我们的产品 “ 广告 “ ,该判决提供了出价的关键词,而搜索公司则为同一关键词句进行拍卖。本研究讨论了通过eBay广告的案例研究使用人类判断的实用性,并表明使用LM-as-a-judge en-massassess作为卖方判断的可升级代言人,以培训我们的关联性模型,从而在三个系统实现更好的和谐 – – 前提是它们以严格的评价框架为基础。
Article 263
Title@2025-05-29 (4): Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Title: Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking | Dekom-Renorm-Merge: Modellzusammenführung auf dem richtigen Raum verbessert Multitasking | Decom-Renorm-Meorge:正确空间的模型合并改进多重任务 2505.23117v1 |
Authors: Yuatyong Chaichana, Thanapat Trachu, Peerat Limkonchotiwat, Konpat Preechakul, Tirasan Khandhawit, Ekapol Chuangsuwanich
In the era of large-scale training, model merging has evolved into a tool for creating multitasking models efficiently. It enables the knowledge of models to be fused, without the need for heavy computation as required in traditional multitask learning. Existing merging methods often assume that entries at identical positions in weight matrices serve the same function, enabling straightforward entry-wise comparison and merging. However, this assumption overlooks the complexity of finetuned neural networks, where neurons may develop distinct feature compositions, making direct entry-wise merging problematic. We present Decom-Renorm-Merge (DRM), a simple yet effective approach that leverages Singular Value Decomposition to decompose and coordinate weight matrices into an aligned joint space, where entry-wise merging becomes possible. We showcase the effectiveness of DRM across various settings ranging from smaller encoder-based such as ViT and DeBERTa, encoder-decoder-based such as T5, and larger decoder-based such as Llama3.1-8B. Our experimental results show that DRM outperforms several state-of-the-art merging techniques across full finetuning and low-rank adaptation settings. Moreover, our analysis reveals renormalization as the crucial component for creating a robust and even joint space for merging, significantly contributing to the method’s performance.
在大规模培训的时代,模型合并已经演变成一个高效创建多任务模型的工具,使模型知识得以融合,而无需像传统多任务学习所要求的那样进行大量计算。现有的合并方法往往假设重力矩阵中相同位置的条目功能相同,能够直接进行入门比较和合并。然而,这一假设忽略了微调神经网络的复杂性,其中神经元可能形成独特的特征构成,使直接进入的合并成为问题。我们提出了脱子-再调节-Merge(DRM),这是一种简单而有效的方法,利用Singulal值分解法将重量矩阵转换和协调成一个统一的联合空间,从而有可能进行入轨合并。我们展示了DRM在各种环境中的效能,这些环境包括基于小编码器的ViT和DeBERTA,基于T5的编码的编码-decoder-decoder网络,以及Llama3.1-8B等较大的分解器。我们的实验结果表明,DRM优于若干个状态的合并技术,在全面调整和低级的空间调整后,将展示了我们的关键的调整和低级的调整方法。
Article 264
Title@2025-05-29 (4): Learning to Reason from Feedback at Test-Time
Title: Learning to Reason from Feedback at Test-Time | Von Feedback bei Test-Time zur Vernunft lernen | 从测试时的反馈中学习到理由 2502.15771v2 |
Authors: Yanyang Li, Michael Lyu, Liwei Wang
Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.
在一次尝试中解决复杂任务对大型语言模型(LLMs)来说具有挑战性,要取得成功,往往需要与环境的迭代互动和反馈,使有效的反馈利用成为一个关键议题。现有办法要么与时间的概括斗争,要么依靠天真重整而不利用先前的信息。在本文中,我们引入了FTTT,这是一个创新的范例,将反馈利用作为测试时的一个优化问题。此外,我们提议了一个可学习的测试-时间优化器OpTune,以有效利用反馈。在四个推理数据集中对两个LMs的实验表明,FTTT和OpTune实现了更高的可扩展性和性。
Article 265
Title@2025-05-29 (4): CrossLinear: Plug-and-Play Cross-Correlation Embedding for Time Series Forecasting with Exogenous Variables
Title: CrossLinear: Plug-and-Play Cross-Correlation Embedding for Time Series Forecasting with Exogenous Variables | CrossLinear: Plug-and-Play-Cross-Korrelation für Zeitreihenvorhersage mit exogenen Variablen einbetten | Crossliear: 用外源变量预测时间序列的插件和插件交叉校正嵌入 2505.23116v1 |
Authors: Pengfei Zhou, Yunlong Liu, Junli Liang, Qi Song, Xiangyang Li
Time series forecasting with exogenous variables is a critical emerging paradigm that presents unique challenges in modeling dependencies between variables. Traditional models often struggle to differentiate between endogenous and exogenous variables, leading to inefficiencies and overfitting. In this paper, we introduce CrossLinear, a novel Linear-based forecasting model that addresses these challenges by incorporating a plug-and-play cross-correlation embedding module. This lightweight module captures the dependencies between variables with minimal computational cost and seamlessly integrates into existing neural networks. Specifically, it captures time-invariant and direct variable dependencies while disregarding time-varying or indirect dependencies, thereby mitigating the risk of overfitting in dependency modeling and contributing to consistent performance improvements. Furthermore, CrossLinear employs patch-wise processing and a global linear head to effectively capture both short-term and long-term temporal dependencies, further improving its forecasting precision. Extensive experiments on 12 real-world datasets demonstrate that CrossLinear achieves superior performance in both short-term and long-term forecasting tasks. The ablation study underscores the effectiveness of the cross-correlation embedding module. Additionally, the generalizability of this module makes it a valuable plug-in for various forecasting tasks across different domains. Codes are available at https://github.com/mumiao2000/CrossLinear.
使用外源变量进行时间序列预测是一个新出现的重要范例,它给变量之间的依赖性建模带来了独特的挑战。传统模型往往难以区分内生变量和外生变量,导致效率低下和过度适应。在本文中,我们引入了CrossLineear,这是一个全新的线性预测模型,这是一个基于线性预测的新颖模型,它通过纳入插插插和边交叉关系嵌入模块来应对这些挑战。这一轻量级模块捕捉了计算成本最低的变量之间的依赖性,并且无缝地融入了现有的神经网络。具体地说,它捕捉了时间差异性和直接差异性依赖性,同时忽略了时间差异性或间接依赖性,从而减少了过度依赖性建模的风险,并有助于不断改进绩效。此外,Crosleinear采用对齐的处理和全球线性头,以有效捕捉短期和长期的跨时间依赖性,进一步提高其预测精确性。在12个真实世界数据集上的广泛实验表明,CrossLinaltare在短期和长期预测任务中都取得了较高的业绩。Crelationalimations asiming the greabilityal-labilizational
Article 266
Title@2025-05-29 (4): Instance-dependent Convergence Theory for Diffusion Models
Title: Instance-dependent Convergence Theory for Diffusion Models | Instanz-abhängige Konvergenztheorie für Diffusionsmodelle | 扩散模型集成模型理论 2410.13738v2 |
Authors: Yuchen Jiao, Gen Li
Score-based diffusion models have demonstrated outstanding empirical performance in machine learning and artificial intelligence, particularly in generating high-quality new samples from complex probability distributions. Improving the theoretical understanding of diffusion models, with a particular focus on the convergence analysis, has attracted significant attention. In this work, we develop a convergence rate that is adaptive to the smoothness of different target distributions, referred to as instance-dependent bound. Specifically, we establish an iteration complexity of $\min{d,d^{2/3}L^{1/3},d^{1/3}L}\varepsilon^{-2/3}$ (up to logarithmic factors), where $d$ denotes the data dimension, and $\varepsilon$ quantifies the output accuracy in terms of total variation (TV) distance. In addition, $L$ represents a relaxed Lipschitz constant, which, in the case of Gaussian mixture models, scales only logarithmically with the number of components, the dimension and iteration number, demonstrating broad applicability.
基于分数的传播模型在机器学习和人工智能方面,特别是在从复杂概率分布中产生高质量的新样本方面,表现出杰出的经验性表现。改进对扩散模型的理论理解,特别侧重于趋同分析,已经引起极大关注。在这项工作中,我们发展了适应不同目标分布的顺利性的统一率,称之为依赖实例的束缚。具体地说,我们确立了美元(mind,d2/3}L1/3},d1/3}Lvarepsilon2/3}$(最高为对数系数)的迭代复杂性,其中美元表示数据维度,而$\varepsilon 美元则按总变异(TV)距离计算产出精度。此外,美元代表一个松动的Lipschitz常数。在高斯混合模型中,只有对数尺度与组件数量、尺寸和相异数的比值,显示出广泛适用性。
Article 267
Title@2025-05-29 (4): FutureGen: LLM-RAG Approach to Generate the Future Work of Scientific Article
Title: FutureGen: LLM-RAG Approach to Generate the Future Work of Scientific Article | FutureGen: LLM-RAG Ansatz zur Generierung der zukünftigen Arbeit des wissenschaftlichen Artikels | FutureGen:LLM-RAG 产生科学条款未来工作的方法 2503.16561v2 |
Authors: Ibrahim Al Azher, Miftahul Jannat Mokarrama, Zhishuai Guo, Sagnik Ray Choudhury, Hamed Alhoori
The future work section of a scientific article outlines potential research directions by identifying gaps and limitations of a current study. This section serves as a valuable resource for early-career researchers seeking unexplored areas and experienced researchers looking for new projects or collaborations. In this study, we generate future work suggestions from key sections of a scientific article alongside related papers and analyze how the trends have evolved. We experimented with various Large Language Models (LLMs) and integrated Retrieval-Augmented Generation (RAG) to enhance the generation process. We incorporate a LLM feedback mechanism to improve the quality of the generated content and propose an LLM-as-a-judge approach for evaluation. Our results demonstrated that the RAG-based approach with LLM feedback outperforms other methods evaluated through qualitative and quantitative metrics. Moreover, we conduct a human evaluation to assess the LLM as an extractor and judge. The code and dataset for this project are here, code: HuggingFace
科学文章的未来工作章节通过查明当前研究的差距和局限性,概述了潜在的研究方向,概述了未来研究方向。本节是早期职业研究人员寻找未探索领域和有经验的研究人员寻找新项目或协作的宝贵资源。在本研究报告中,我们从科学文章的关键部分提出未来工作建议,并结合相关论文分析趋势如何演变。我们试验了各种大语言模型和综合检索-启动一代(RAG),以加强生成过程。我们采用了LLLM反馈机制,以提高生成内容的质量,并提出LLM-as-a-judge-评价方法。我们的成果表明,以LLM反馈为基础的RAG方法超越了通过定性和定量指标评估的其他方法。此外,我们进行了人类评估,以评估LLM作为提取器和评判器。这个项目的代码和数据集在这里,代码是:HuggingFace:HuggingFace。
Article 268
Title@2025-05-29 (4): Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery
Title: Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery | Neural Interpretable PDEs: Harmonisierung Fourier Insights mit Aufmerksamkeit für skalierbare und Interpretierbare Physik Discovery | 神经可解释的PDEs:协调Fourier Insights,注意可缩放和可解释的物理发现 2505.23106v1 |
Authors: Ning Liu, Yue Yu
Attention mechanisms have emerged as transformative tools in core AI domains such as natural language processing and computer vision. Yet, their largely untapped potential for modeling intricate physical systems presents a compelling frontier. Learning such systems often entails discovering operators that map between functional spaces using limited instances of function pairs – a task commonly framed as a severely ill-posed inverse PDE problem. In this work, we introduce Neural Interpretable PDEs (NIPS), a novel neural operator architecture that builds upon and enhances Nonlocal Attention Operators (NAO) in both predictive accuracy and computational efficiency. NIPS employs a linear attention mechanism to enable scalable learning and integrates a learnable kernel network that acts as a channel-independent convolution in Fourier space. As a consequence, NIPS eliminates the need to explicitly compute and store large pairwise interactions, effectively amortizing the cost of handling spatial interactions into the Fourier transform. Empirical evaluations demonstrate that NIPS consistently surpasses NAO and other baselines across diverse benchmarks, heralding a substantial leap in scalable, interpretable, and efficient physics learning. Our code and data accompanying this paper are available at https://github.com/fishmoon1234/Nonlocal-Attention-Operator.
在诸如自然语言处理和计算机愿景等核心AI领域,关注机制已成为变革性工具,但在自然语言处理和计算机愿景等核心AI领域,它们基本上尚未开发的建立复杂物理系统模型的潜力是一个令人瞩目的前沿。学习这类系统往往需要发现操作者,在功能空间之间绘制使用有限功能对等实例的分布图 – – 这项任务通常被描绘成一个严重错误的反PDE问题。在这项工作中,我们引入了神经可解释的PDE(NIPS)(NIPS),这是一个新的神经操作者结构,在预测准确性和计算效率两方面都建立在并加强了非本地关注操作员(NAO)和其他基准上。NIPS使用了线性关注机制,以便能够进行可缩放的学习并整合一个可学习的内核网络,在Fourier空间中作为一条视通道独立的共动。因此,NIPS消除了明确配置和储存大量双向互动的必要性,有效地将处理空间互动的成本分摊到Fourier的转变中。实证性评估表明,NIPS始终超过NAO(NAO)和其他基准,在可扩展性、可解释和高效物理学学习方面出现大幅度的飞跃式飞跃式飞跃。我们的代码和数据在http上。
Article 269
Title@2025-05-29 (4): LUMION: Fast Fault Recovery for ML Jobs Using Programmable Optical Fabrics
Title: LUMION: Fast Fault Recovery for ML Jobs Using Programmable Optical Fabrics | LUMION: Schnelle Fehlerwiederherstellung für ML-Jobs mit programmierbaren optischen Stoffen | LUMION: 使用可编程光学制造器快速回收 ML 工作 2505.23105v1 |
Authors: Abhishek Vijaya Kumar, Eric Ding, Arjun Devraj, Darius Bunandar, Rachee Singh
When accelerators fail in modern ML datacenters, operators migrate the affected ML training or inference jobs to entirely new racks. This approach, while preserving network performance, is highly inefficient, requiring datacenters to reserve full racks of idle accelerators for fault tolerance. In this paper, we address this resource inefficiency by introducing LUMION, a novel reconfigurable optical fabric for connecting accelerators within a datacenter rack. Instead of migrating entire ML jobs, LUMION dynamically integrates spare accelerators into ongoing workloads as failures occur, thereby maintaining consistent performance without costly migrations. We show the benefits of LUMION by building an end-to-end hardware prototype. Our experiments fine-tune Llama 3.2 and show that LUMION swaps a failed GPU with a healthy one and restarts the ML job within ~ 1 second of the failure. LUMION achieves higher inter-GPU bandwidth compared to traditional electrical racks after replacing failed accelerators with spare ones, leading to nearly 2X improvement in fine-tuning throughput.
当加速器在现代 ML 数据中心失败时, 操作员会将受影响的 ML 培训或推断工作迁移到全新的工作架上。 这种方法在保持网络性能的同时, 效率极低, 要求数据中心保留闲置加速器的完整括号, 以防故障。 在本文中, 我们通过引入LUMION来解决资源效率低下问题。 LUMION是一种新型的可重新配置的光纤结构, 用于将加速器连接到一个数据中心架内。 LUMION 不仅没有将整个 ML 工作迁移, 反而将备用加速器动态地整合到持续的工作量中, 从而保持连续的性能, 而不花费昂贵的迁移。 我们通过建立一个端到端的硬件原型来显示 LUMION 的好处。 我们的实验微调 Llama 3. 2 并显示 LUMION 将一个失败的 GPU 转换为健康的 GPU, 并在故障的1 秒内重新启动 MLL 工作。 LUMION 在用备用加速器替换失败的失败的加速器后, 实现微调的近2X 改进后, 。
Article 270
Title@2025-05-29 (4): Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret
Title: Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret | Ungefähre Thompson-Probenahme für das Lernen linearer quadratischer Regulatoren mit $O(\sqrt{T})$ Bedauern | Thompson 学习线性赤道调节器的近似 Thompson 抽样 以 $(\ sqrt{T}) regret $(\ sqrt{T}) 为学习线性赤道调节器 2405.19380v2 |
Authors: Yeoneung Kim, Gihun Kim, Jiwhan Park, Insoon Yang
We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without relying on the restrictive assumptions that are often used in the literature.
我们提出一种新的汤普森采样算法,学习线性二次调控器(LQR) , 学习贝叶西亚人的遗憾为$O(\\ sqrt{T}) 。 我们的方法利用精心设计的前提条件来利用兰杰文的动态, 并包含一个简单的引言机制。 我们显示, 刺激信号驱动了先决条件的最低值随时间增长, 从而加快了近似后方采样过程。 此外, 我们建立了由我们算法产生的近似后方的非三角集中特性 。 这些特性使我们能够约束系统状态的瞬间, 并在不依赖文献中经常使用的限制性假设的情况下, 获得$O(\ sqrt{T} ) 的遗憾约束 。
Article 271
Title@2025-05-29 (4): Weight Spectra Induced Efficient Model Adaptation
Title: Weight Spectra Induced Efficient Model Adaptation | Gewicht Spectra Induzierte effiziente Modellanpassung | 引导有效模型适应 2505.23099v1 |
Authors: Chongjie Si, Xuankun Yang, Muqing Liu, Yadao Wang, Xiaokang Yang, Wenbo Su, Bo Zheng, Wei Shen
Large-scale foundation models have demonstrated remarkable versatility across a wide range of downstream tasks. However, fully fine-tuning these models incurs prohibitive computational costs, motivating the development of Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA, which introduces low-rank updates to pre-trained weights. Despite their empirical success, the underlying mechanisms by which PEFT modifies model parameters remain underexplored. In this work, we present a systematic investigation into the structural changes of weight matrices during fully fine-tuning. Through singular value decomposition (SVD), we reveal that fine-tuning predominantly amplifies the top singular values while leaving the remainder largely intact, suggesting that task-specific knowledge is injected into a low-dimensional subspace. Furthermore, we find that the dominant singular vectors are reoriented in task-specific directions, whereas the non-dominant subspace remains stable. Building on these insights, we propose a novel method that leverages learnable rescaling of top singular directions, enabling precise modulation of the most influential components without disrupting the global structure. Our approach achieves consistent improvements over strong baselines across multiple tasks, highlighting the efficacy of structurally informed fine-tuning.
大型基础模型在一系列广泛的下游任务中表现出了显著的多功能性。然而,全面微调这些模型带来了令人望而却步的计算成本,鼓励开发Parater-Efficent Fine-Turning(PEFT)方法,如LORA(PERA)方法,该方法采用低级别更新预培训重量。尽管取得了经验上的成功,但PEFT修改模型参数所依据的基本机制仍然未得到充分探讨。在这项工作中,我们提出了对全面微调过程中重量矩阵结构变化的系统调查。我们通过单值分解(SVD),发现微调主要扩大了顶级单值,而其余部分基本保持不变,这表明任务特定知识被注入到一个低维的子空间。此外,我们发现占主导地位的单向矢量矢量在特定任务方向上重新定位,而非占支配地位的子空间则保持稳定。我们提出了一种新的方法,利用最高单级方向的可调整,使最有影响力的部件得以精确调整,同时又不打乱全球结构结构结构结构。我们的方法是在多个任务上实现一致的改进。
Article 272
Title@2025-05-29 (4): Learning to Search for Vehicle Routing with Multiple Time Windows
Title: Learning to Search for Vehicle Routing with Multiple Time Windows | Lernen, nach Fahrzeug Routing mit mehreren Zeitfenstern zu suchen | 学习搜索多时间窗口运行的车辆 2505.23098v1 |
Authors: Kuan Xu, Zhiguang Cao, Chenlong Zheng, Linong Liu
In this study, we propose a reinforcement learning-based adaptive variable neighborhood search (RL-AVNS) method designed for effectively solving the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). Unlike traditional adaptive approaches that rely solely on historical operator performance, our method integrates a reinforcement learning framework to dynamically select neighborhood operators based on real-time solution states and learned experience. We introduce a fitness metric that quantifies customers’ temporal flexibility to improve the shaking phase, and employ a transformer-based neural policy network to intelligently guide operator selection during the local search. Extensive computational experiments are conducted on realistic scenarios derived from the replenishment of unmanned vending machines, characterized by multiple clustered replenishment windows. Results demonstrate that RL-AVNS significantly outperforms traditional variable neighborhood search (VNS), adaptive VNS (AVNS), and state-of-the-art learning-based heuristics, achieving substantial improvements in solution quality and computational efficiency across various instance scales and time window complexities. Particularly notable is the algorithm’s capability to generalize effectively to problem instances not encountered during training, underscoring its practical utility for complex logistics scenarios.
在这项研究中,我们提出了一种基于学习的强化适应性可变邻里搜索(RL-AVNS)方法,该方法旨在有效地解决多时视窗车辆流动问题。 与仅依赖历史运营者性能的传统适应性方法不同,我们的方法将强化学习框架整合到动态地选择基于实时解决方案状态和所学经验的邻里运营者中。 我们引入了一种健康度量标准,对客户的时间灵活性进行量化,以改进摇动阶段,并使用基于变压器的神经政策网络,以明智地指导当地搜索过程中的操作者选择。 进行了广泛的计算实验,其依据是补充无人驾驶自动售货机(以多个集群补充窗口为特征)所产生的现实情景。 结果显示,RL-AVNS大大超越了传统的可变邻里搜索(VNS)、适应性VNS(ANS)和以学习为主的状态的超常态功能,在解决方案质量和计算效率方面在各个实例和时间窗口复杂性方面取得重大改进。特别值得注意的是,算法能力将培训中未遇到的问题有效地归纳到培训中,强调其对复杂物流假设的实用用途。
Article 273
Title@2025-05-29 (4): Stochastic Diffusion: A Diffusion Based Model for Stochastic Time Series Forecasting
Title: Stochastic Diffusion: A Diffusion Based Model for Stochastic Time Series Forecasting | Stochastische Diffusion: Ein diffusionsbasiertes Modell für stochastische Zeitreihen | 斯托卡扩散:以传播为基础的斯托卡时间序列预测模型 2406.02827v2 |
Authors: Yuansan Liu, Sudanthi Wijewickrema, Dongting Hu, Christofer Bester, Stephen O’Leary, James Bailey
Recent innovations in diffusion probabilistic models have paved the way for significant progress in image, text and audio generation, leading to their applications in generative time series forecasting. However, leveraging such abilities to model highly stochastic time series data remains a challenge. In this paper, we propose a novel Stochastic Diffusion (StochDiff) model which learns data-driven prior knowledge at each time step by utilizing the representational power of the stochastic latent spaces to model the variability of the multivariate time series data. The learnt prior knowledge helps the model to capture complex temporal dynamics and the inherent uncertainty of the data. This improves its ability to model highly stochastic time series data. Through extensive experiments on real-world datasets, we demonstrate the effectiveness of our proposed model on stochastic time series forecasting. Additionally, we showcase an application of our model for real-world surgical guidance, highlighting its potential to benefit the medical community.
最近在传播概率模型方面的创新为图像、文本和音频生成方面的重大进步铺平了道路,从而导致其在基因时间序列预测中的应用。然而,利用这种能力模拟高度随机时间序列数据仍然是一个挑战。在本文中,我们提出一个新型的Stochacistic扩散(StochDiff)模型,通过利用随机潜伏空间的代表性能力,在每一个阶段学习数据驱动的先前知识,以模拟多变时间序列数据的变异性。所学的先前知识有助于模型捕捉复杂的时间动态和数据固有的不确定性。这提高了模型模拟高度随机时间序列数据的能力。通过在现实世界数据集上的广泛实验,我们展示了我们提议的模型在随机时间序列预测方面的有效性。此外,我们展示了我们模型在现实世界外科指导方面的应用,突出了它有利于医学界的潜力。
Article 274
Title@2025-05-29 (4): Constraints and Variables Reduction for Optimal Power Flow Using Hierarchical Graph Neural Networks with Virtual Node-Splitting
Title: Constraints and Variables Reduction for Optimal Power Flow Using Hierarchical Graph Neural Networks with Virtual Node-Splitting | Einschränkungen und Variablen-Reduktion für optimalen Stromfluss mittels Hierarchischer Graphen-Neural-Netzwerke mit virtuellem Knoten-Splitting | 利用具有虚拟节点切除功能的等级形图形神经网络减少最佳电力流动的制约因素和变数 2411.06268v2 |
Authors: Thuan Pham, Xingpeng Li
Power system networks are often modeled as homogeneous graphs, which limits the ability of graph neural network (GNN) to capture individual generator features at the same nodes. By introducing the proposed virtual node-splitting strategy, generator-level attributes like costs, limits, and ramp rates can be fully captured by GNN models, improving GNN’s learning capacity and prediction accuracy. Optimal power flow (OPF) problem is used for real-time grid operations. Limited timeframe motivates studies to create size-reduced OPF (ROPF) models to relieve the computational complexity. In this paper, with virtual node-splitting, a novel two-stage adaptive hierarchical GNN is developed to (i) predict critical lines that would be congested, and then (ii) predict base generators that would operate at the maximum capacity. This will substantially reduce the constraints and variables needed for OPF, creating the proposed ROPFLG model with reduced monitor lines and reduced generator-specific variables and constraints. Two ROPF models, ROPFL and ROPFG, with just reduced lines or generators respectively, are also implemented as additional benchmark models. Case studies show that the proposed ROPFLG consistently outperforms the benchmark full OPF (FOPF) and the other two ROPF methods, achieving significant computational time savings while reliably finding optimal solutions.
电源系统网络往往以同质图制成,限制了图形神经网络(GNN)在同一节点上捕捉单个发电机功能的能力。通过采用拟议的虚拟节点分割战略,GNN模型可以充分捕捉到发电机一级的特性,如成本、限值和坡度等,提高GNN的学习能力和预测准确性。最佳电流(OPF)问题用于实时电网操作。有限的时间框架促使研究创建缩小电流(ROPF)模型以减轻计算复杂性。在本文件中,通过虚拟节点分割,开发了一个新型的两阶段适应性级GNNNN,以(一) 预测将凝固的关键线,然后(二) 预测最大容量运行的基发电机。这将大大减少对OPF的制约和变数,创建监测线减少和发电机特定变数和制约因素的拟议ROPFLG模型。两种模型,即ROPFL和ROPFG,分别缩小线或发电机的两种模型,也作为补充基准模型。案例研究表明,将持续地实现ROPFFS的其它重要计算方法。
Article 275
Title@2025-05-29 (4): MAP: Revisiting Weight Decomposition for Low-Rank Adaptation
Title: MAP: Revisiting Weight Decomposition for Low-Rank Adaptation | KARTE: Wiederbesuchen der Gewichtsverringerung für Low-Rank-Anpassung | MAP: 重新审视低浓度适应的重量分解 2505.23094v1 |
Authors: Chongjie Si, Zhiyi Shi, Yadao Wang, Xiaokang Yang, Susanto Rahardja, Wei Shen
The rapid development of large language models has revolutionized natural language processing, but their fine-tuning remains computationally expensive, hindering broad deployment. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, have emerged as solutions. Recent work like DoRA attempts to further decompose weight adaptation into direction and magnitude components. However, existing formulations often define direction heuristically at the column level, lacking a principled geometric foundation. In this paper, we propose MAP, a novel framework that reformulates weight matrices as high-dimensional vectors and decouples their adaptation into direction and magnitude in a rigorous manner. MAP normalizes the pre-trained weights, learns a directional update, and introduces two scalar coefficients to independently scale the magnitude of the base and update vectors. This design enables more interpretable and flexible adaptation, and can be seamlessly integrated into existing PEFT methods. Extensive experiments show that MAP significantly improves performance when coupling with existing methods, offering a simple yet powerful enhancement to existing PEFT methods. Given the universality and simplicity of MAP, we hope it can serve as a default setting for designing future PEFT methods.
大型语言模型的迅速发展使自然语言处理发生了革命性的变化,但是它们的微调仍然在计算上昂贵,阻碍了广泛的部署。参数效率微调方法,如LORA,已经成为一种解决办法。最近的工作,例如DoRA试图将重量调整进一步分解成方向和量级组成部分。然而,现有的配方往往在柱级上以超自然方式界定方向,缺乏一个原则的几何基础。在本文件中,我们建议MAP这个新框架将重量矩阵重新作为高维矢量矢量,并严格地将其调整为方向和规模。MAP使预先训练的重量正常化,学习方向性更新,并引入两个标量系数,独立地测量基的大小,并更新矢量。这种设计可以使更可解释和灵活地适应,并且可以顺利地融入现有的PEFT方法。广泛的实验表明,MAP在与现有方法结合时大大改进了业绩,为现有的PEFT方法提供了简单而有力的改进。鉴于MAP的普及性和简洁性,我们希望它能够作为未来的默认设置PEFT方法。
Article 276
Title@2025-05-29 (4): Equivariant Spherical Transformer for Efficient Molecular Modeling
Title: Equivariant Spherical Transformer for Efficient Molecular Modeling | Equivarianter Spherical Transformer für effiziente molekulare Modellierung | 高效分子建模的等同球质变变变器 2505.23086v1 |
Authors: Junyi An, Xinyu Lu, Chao Qu, Yunfei Shi, Peijia Lin, Qianwei Tang, Licheng Xu, Fenglei Cao, Yuan Qi
SE(3)-equivariant Graph Neural Networks (GNNs) have significantly advanced molecular system modeling by employing group representations. However, their message passing processes, which rely on tensor product-based convolutions, are limited by insufficient non-linearity and incomplete group representations, thereby restricting expressiveness. To overcome these limitations, we introduce the Equivariant Spherical Transformer (EST), a novel framework that leverages a Transformer structure within the spatial domain of group representations after Fourier transform. We theoretically and empirically demonstrate that EST can encompass the function space of tensor products while achieving superior expressiveness. Furthermore, EST’s equivariant inductive bias is guaranteed through a uniform sampling strategy for the Fourier transform. Our experiments demonstrate state-of-the-art performance by EST on various molecular benchmarks, including OC20 and QM9.
SE(3)-QQevariant 图形神经网络(SE(3)-QQQNNs)通过使用集团代表制来显著先进的分子系统建模,然而,他们的信息传递过程依赖以高压产品为基础的变异,受到非线性不足和不完全的集团代表制的限制,从而限制了表达性。为了克服这些局限性,我们引入了EQevariant Spheal Transferal Informationeration(EST),这是一个在Fourier变异后在群体代表制空间范围内利用变异器结构的新框架。我们从理论上和从经验上证明,EST可以涵盖色素产品的功能空间,同时实现高清晰度的表达性。此外,EST的等同感的感性倾向偏向是通过四级变异的统一的抽样战略得到保证的。我们的实验展示了EST在包括OC20和QM9在内的各种分子基准方面的最新表现。
Article 277
Title@2025-05-29 (4): Gradient Boosting Decision Tree with LSTM for Investment Prediction
Title: Gradient Boosting Decision Tree with LSTM for Investment Prediction | Gradienten Auftrieb Entscheidungsbaum mit LSTM für Investitionsvorhersage | 与 LSTM 一起逐步促进投资预测决策树 2505.23084v1 |
Authors: Chang Yu, Fang Liu, Jie Zhu, Shaobo Guo, Yifan Gao, Zhongheng Yang, Meiwei Liu, Qianwen Xing
This paper proposes a hybrid framework combining LSTM (Long Short-Term Memory) networks with LightGBM and CatBoost for stock price prediction. The framework processes time-series financial data and evaluates performance using seven models: Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Bidirectional LSTM (BiLSTM), vanilla LSTM, XGBoost, LightGBM, and standard Neural Networks (NNs). Key metrics, including MAE, R-squared, MSE, and RMSE, are used to establish benchmarks across different time scales. Building on these benchmarks, we develop an ensemble model that combines the strengths of sequential and tree-based approaches. Experimental results show that the proposed framework improves accuracy by 10 to 15 percent compared to individual models and reduces error during market changes. This study highlights the potential of ensemble methods for financial forecasting and provides a flexible design for integrating new machine learning techniques.
本文件提出一个混合框架,将LSTM(长期短期内存)网络与LightGBM和CatBoost(用于股票价格预测)结合起来,该框架处理时间序列财务数据,并使用七个模型评估业绩:人工神经网络、进化神经网络、双向LSTM(BILSTM)、Vanilla LSTM、XGBost、LightGBM和标准神经网络。主要指标,包括MAE、R-qured、MSE和RMSE,用于制定不同时间尺度的基准。我们以这些基准为基础,开发了将连续和基于树木的方法的优势结合起来的全套模型。实验结果表明,拟议的框架比单个模型的准确性提高了10%至15%,并减少了市场变化期间的错误。本研究报告强调了组合方法在财务预测方面的潜力,为整合新的机器学习技术提供了灵活的设计。
Article 278
Title@2025-05-29 (4): Gradient Methods with Online Scaling Part I. Theoretical Foundations
Title: Gradient Methods with Online Scaling Part I. Theoretical Foundations | Gradient Methoden mit Online-Skalierung Teil I. Theoretische Grundlagen | 在线扩展第一部分的渐进方法 理论基础 2505.23081v1 |
Authors: Wenzhi Gao, Ya-Chi Chu, Yinyu Ye, Madeleine Udell
This paper establishes the theoretical foundations of the online scaled gradient methods (OSGM), a framework that utilizes online learning to adapt stepsizes and provably accelerate first-order methods. OSGM quantifies the effectiveness of a stepsize by a feedback function motivated from a convergence measure and uses the feedback to adjust the stepsize through an online learning algorithm. Consequently, instantiations of OSGM achieve convergence rates that are asymptotically no worse than the optimal stepsize. OSGM yields desirable convergence guarantees on smooth convex problems, including 1) trajectory-dependent global convergence on smooth convex objectives; 2) an improved complexity result on smooth strongly convex problems, and 3) local superlinear convergence. Notably, OSGM constitutes a new family of first-order methods with non-asymptotic superlinear convergence, joining the celebrated quasi-Newton methods. Finally, OSGM explains the empirical success of the popular hypergradient-descent heuristic in optimization for machine learning.
本文确立了在线缩放梯度方法(OSGM)的理论基础,这一框架利用在线学习来调整阶梯化和可以想象地加速一级方法。OSGM量化了由趋同措施驱动的反馈功能所推动的阶梯化步骤的有效性,并使用反馈来通过在线学习算法调整阶梯化步骤。因此,OSGM的即时趋同率并不比最佳步骤更差。OSGM在顺流的锥形问题上提供了理想的趋同保证,包括:(1) 顺流的锥形目标方面取决于轨迹的全球趋同;(2) 顺流的强烈螺旋问题提高了复杂性,(3) 地方超线性趋同。 值得注意的是,OSGM构成一种由非自动超线性趋同法组成的新一流方法组合,加入了庆祝的准纽顿方法。 最后,OSGM解释了在优化机器学习方面流行的超高级日光速超光谱法的成功经验。
Article 279
Title@2025-05-29 (4): Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble
Title: Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble | Zweite Meinungsfrage: Auf dem Weg zu adaptiver klinischer KI über den Konsens des Expert Model Ensembles | 第二意见事项:通过专家示范组共识实现适应性临床AI 2505.23075v1 |
Authors: Amit Kumthekar, Zion Tilley, Henry Duong, Bhargav Patel, Michael Magnoli, Ahmed Omar, Ahmed Nasser, Chaitanya Gharpure, Yevgen Reztzov
Despite the growing clinical adoption of large language models (LLMs), current approaches heavily rely on single model architectures. To overcome risks of obsolescence and rigid dependence on single model systems, we present a novel framework, termed the Consensus Mechanism. Mimicking clinical triage and multidisciplinary clinical decision-making, the Consensus Mechanism implements an ensemble of specialized medical expert agents enabling improved clinical decision making while maintaining robust adaptability. This architecture enables the Consensus Mechanism to be optimized for cost, latency, or performance, purely based on its interior model configuration. To rigorously evaluate the Consensus Mechanism, we employed three medical evaluation benchmarks: MedMCQA, MedQA, and MedXpertQA Text, and the differential diagnosis dataset, DDX+. On MedXpertQA, the Consensus Mechanism achieved an accuracy of 61.0% compared to 53.5% and 45.9% for OpenAI’s O3 and Google’s Gemini 2.5 Pro. Improvement was consistent across benchmarks with an increase in accuracy on MedQA ($\Delta\mathrm{Accuracy}{\mathrm{consensus\text{-}O3}} = 3.4\%$) and MedMCQA ($\Delta\mathrm{Accuracy}{\mathrm{consensus\text{-}O3}} = 9.1\%$). These accuracy gains extended to differential diagnosis generation, where our system demonstrated improved recall and precision (F1$\mathrm{consensus}$ = 0.326 vs. F1${\mathrm{O3\text{-}high}}$ = 0.2886) and a higher top-1 accuracy for DDX (Top1$\mathrm{consensus}$ = 52.0% vs. Top1${\mathrm{O3\text{-}high}}$ = 45.2%).
尽管临床采用了大型语言模型(LLMS),但目前的做法在很大程度上依赖单一模式结构。为了克服过时和严格依赖单一模式系统的风险,我们提出了一个新框架,称为共识机制。在进行临床分流和多学科临床决策时,共识机制实施了一系列专业医疗专家代理机构,以便在保持稳健的适应性的同时改进临床决策。这一架构使共识机制能够完全根据其内部模型配置优化成本、延缓度或性能。为了严格评估共识机制,我们采用了三个医疗评估基准:MDMCQA、MDQA和MedXpertQA文本,以及差分诊断数据集,DDXA。在MedXperQA方面,共识机制实现了61%的准确性,而O3和Google的Gami 2.5 Proper。改进与MedQA(DQQQQQQQQ)的精确度提高值(=QQ_Q_Q_Q_BAR_BAR_BAR_Q_Q_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_B_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_B_B_B_B_B_B_B_B_B_B_B_B_B_BAR_BAR_BAR_BAR_B_B_BAR_BAR_BAR_BAR_B_B_B_BAR_BAR_B_B_B_B_B_B_B_B_B_B_B_B_B_
Article 280
Title@2025-05-29 (4): Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Title: Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts | Shortcut-verbundene Experten-Parallelität für die Beschleunigung von Mixture-of-Experts | 加速混合专家专家专家平行专家 2404.05019v3 |
Authors: Weilin Cai, Juyong Jiang, Le Qin, Junwei Cui, Sunghun Kim, Jiayi Huang
Expert parallelism has emerged as a key strategy for distributing the computational workload of sparsely-gated mixture-of-experts (MoE) models across multiple devices, enabling the processing of increasingly large-scale models. However, the All-to-All communication inherent to expert parallelism poses a significant bottleneck, limiting the efficiency of MoE models. Although existing optimization methods partially mitigate this issue, they remain constrained by the sequential dependency between communication and computation operations. To address this challenge, we propose ScMoE, a novel shortcut-connected MoE architecture integrated with an overlapping parallelization strategy. ScMoE decouples communication from its conventional sequential ordering, enabling up to 100% overlap with computation. Compared to the prevalent top-2 MoE baseline, ScMoE achieves speedups of 1.49 times in training and 1.82 times in inference. Moreover, our experiments and analyses indicate that ScMoE not only achieves comparable but in some instances surpasses the model quality of existing approaches.
专家的平行性已成为一种关键战略,用于通过多种装置分配分散的分散专家混合模型的计算工作量,从而能够处理越来越大规模的模型。然而,专家平行性所固有的 “ 人人交流 “ 构成了一个很大的瓶颈,限制了教育部模式的效率。虽然现有的优化方法在一定程度上缓解了这一问题,但它们仍然受到通信和计算操作之间依次依赖的制约。为了应对这一挑战,我们提议ScMoE,这是一个与重叠的平行战略相结合的新颖的、与捷径相连的教育部结构。ScMoE从常规顺序排序中解析通信,使计算重叠率达到100%。与普遍的上层-2教育部基线相比,ScMoE在培训中实现了1.49倍的加速率,在推断中实现了1.82倍的加速率。此外,我们的实验和分析表明,ScMoE不仅取得了可比较的结果,而且在某些情况下超过了现有方法的模型质量。
Article 281
Title@2025-05-29 (4): Multi-Modal Learning with Bayesian-Oriented Gradient Calibration
Title: Multi-Modal Learning with Bayesian-Oriented Gradient Calibration | Multi-Modal-Lernen mit Bayesian-Oriented Gradient Calibration | 多模式学习,以巴耶斯为主的梯度校准 2505.23071v1 |
Authors: Peizheng Guo, Jingyao Wang, Huijie Guo, Jiangmeng Li, Chuxiong Sun, Changwen Zheng, Wenwen Qiang
Multi-Modal Learning (MML) integrates information from diverse modalities to improve predictive accuracy. However, existing methods mainly aggregate gradients with fixed weights and treat all dimensions equally, overlooking the intrinsic gradient uncertainty of each modality. This may lead to (i) excessive updates in sensitive dimensions, degrading performance, and (ii) insufficient updates in less sensitive dimensions, hindering learning. To address this issue, we propose BOGC-MML, a Bayesian-Oriented Gradient Calibration method for MML to explicitly model the gradient uncertainty and guide the model optimization towards the optimal direction. Specifically, we first model each modality’s gradient as a random variable and derive its probability distribution, capturing the full uncertainty in the gradient space. Then, we propose an effective method that converts the precision (inverse variance) of each gradient distribution into a scalar evidence. This evidence quantifies the confidence of each modality in every gradient dimension. Using these evidences, we explicitly quantify per-dimension uncertainties and fuse them via a reduced Dempster-Shafer rule. The resulting uncertainty-weighted aggregation produces a calibrated update direction that balances sensitivity and conservatism across dimensions. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and advantages of the proposed method.
多模式学习(MML)整合了不同模式的信息,以提高预测准确性;然而,现有方法主要是将具有固定重量的梯度汇总起来,对所有层面一视同仁,忽略每个模式固有的梯度不确定性;这可能导致(一) 敏感层面的过度更新,降低性能,以及(二) 低敏感层面的更新不足,阻碍学习;为解决这一问题,我们提议BOGC-MML,一种巴伊萨-摩尔为对象的梯度渐进校准方法,用于MML明确模拟梯度不确定性,引导模型优化走向最佳方向;具体地说,我们首先将每种模式的梯度作为随机变量,得出其概率分布,捕捉梯度空间的全部不确定性;然后,我们提出一种有效的方法,将每种梯度分布的精度(反差)转换成一个缩放证据;为了解决这一问题,我们提议BOGC-ML,即巴伊斯-东梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度的梯度梯度梯度梯度校准方法,我们明确量化每梯度不确定性,并通过降低调规则将其融合为最佳方向。因此,由不确定性加权汇总得出一个校准的校准新方向,从而产生一个校正校正校准的校准更新方向,以显示宽差差差度的基点点点点度,显示宽度的精度的精度的精度的精度和测度,显示宽度,展示度,展示度的精确度的精度的精度的精度的精度,显示宽度,显示宽度,显示宽度和度的宽度的宽度,显示宽度,显示宽度,显示宽度,显示宽度的深度度和度的宽度的宽度的宽度,显示跨度和节度。度。
Article 282
Title@2025-05-29 (4): Sparse Linear Bandits with Blocking Constraints
Title: Sparse Linear Bandits with Blocking Constraints | Sparse Linear Bandits mit Blockierung Einschränkungen | 带有阻塞限制的粗细线条强力 2410.20041v2 |
Authors: Adit Jain, Soumyabrata Pal, Sunav Choudhary, Ramasuri Narayanam, Harshita Chopra, Vikram Krishnamurthy
We investigate the high-dimensional sparse linear bandits problem in a data-poor regime where the time horizon is much smaller than the ambient dimension and number of arms. We study the setting under the additional blocking constraint where each unique arm can be pulled only once. The blocking constraint is motivated by practical applications in personalized content recommendation and identification of data points to improve annotation efficiency for complex learning tasks. With mild assumptions on the arms, our proposed online algorithm (BSLB) achieves a regret guarantee of $\widetilde{\mathsf{O}}((1+\beta_k)^2k^{\frac{2}{3}} \mathsf{T}^{\frac{2}{3}})$ where the parameter vector has an (unknown) relative tail $\beta_k$ – the ratio of $\ell_1$ norm of the top-$k$ and remaining entries of the parameter vector. To this end, we show novel offline statistical guarantees of the lasso estimator for the linear model that is robust to the sparsity modeling assumption. Finally, we propose a meta-algorithm (C-BSLB) based on corralling that does not need knowledge of optimal sparsity parameter $k$ at minimal cost to regret. Our experiments on multiple real-world datasets demonstrate the validity of our algorithms and theoretical framework.
我们在一个数据贫乏的系统中调查高维分散的线性匪徒问题,因为时间范围远小于环境维度和武器数量。我们研究额外屏障限制下的设置,每个独特的手臂只能拉动一次。屏障限制的动机是个人化内容建议和确定数据点的实际应用,以提高复杂学习任务的批注效率。在对手臂的轻度假设下,我们提议的在线算法(BSLBB)实现了对$\全方位推移=mathfsf{O((1beta_k)%2kfrac{23\mathsf{Tfrac{Tfrac{233)$的额外屏障限制。最后,在参数矢量矢量矢量矢量具有(未知)相对尾量的尾量$\beta_k$ – – 最高值标准为$\ell_1美元,以及参数矢量的剩余条目。到此,我们展示了对线性模型的测算模型的新离线性统计保证。最后,我们提议一个基于模型的顶值的理论-C正数级的理论实验,我们并不需要我们最起码的理论级的模型。
Article 283
Title@2025-05-29 (4): GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers
Title: GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers | GrokFormer: Graph Fourier Kolmogorov-Arnold Transformer | GrokFormer:图示 Fourier Kolmogorov-Arnold变形器 2411.17296v3 |
Authors: Guoguo Ai, Guansong Pang, Hezhe Qiao, Yuan Gao, Hui Yan
Graph Transformers (GTs) have demonstrated remarkable performance in graph representation learning over popular graph neural networks (GNNs). However, self–attention, the core module of GTs, preserves only low-frequency signals in graph features, leading to ineffectiveness in capturing other important signals like high-frequency ones. Some recent GT models help alleviate this issue, but their flexibility and expressiveness are still limited since the filters they learn are fixed on predefined graph spectrum or spectral order. To tackle this challenge, we propose a Graph Fourier Kolmogorov-Arnold Transformer (GrokFormer), a novel GT model that learns highly expressive spectral filters with adaptive graph spectrum and spectral order through a Fourier series modeling over learnable activation functions. We demonstrate theoretically and empirically that the proposed GrokFormer filter offers better expressiveness than other spectral methods. Comprehensive experiments on 10 real-world node classification datasets across various domains, scales, and graph properties, as well as 5 graph classification datasets, show that GrokFormer outperforms state-of-the-art GTs and GNNs. Our code is available at https://github.com/GGA23/GrokFormer
图形变形器(GTs)在广受欢迎的图形神经网络(GNNS)的图形显示学习中表现出了惊人的成绩。然而,GT的核心模块“自我注意”在图形特征中只保留低频信号,只保留低频信号,导致无法有效捕捉其他重要信号,如高频信号等。最近的一些GT模型帮助缓解了这一问题,但由于他们所学的过滤器固定在预定义的图形频谱或光谱顺序上,其灵活性和表达性仍然有限。为了应对这一挑战,我们提议了一个“Flyier Kolmogorov-Arnold变形器”(GrokFormer)这一新型GT模型,通过四重系列模型对适应性图形频谱和光谱顺序进行学习,从而导致无法有效捕捉到其他重要信号。我们从理论上和从经验上证明,拟议的GrokForformer过滤器比其他光谱方法更清晰。关于10个真实世界的无界分类数据集的全面实验,涵盖不同领域、尺度和图形属性,以及5个图表分类数据集,显示Grokformer overs State-Ost-GM23/GMAR_GMS/GNS可使用的代码和GMS。
Article 284
Title@2025-05-29 (4): Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling
Title: Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling | Skalierung von Flüssig-Resistenz-Netzwerken für eine effiziente Sequenzmodellierung | 增强增强流动性恢复力的流动性能力网络,以建立高效序列建模 2505.21717v2 |
Authors: Mónika Farsang, Ramin Hasani, Radu Grosu
We present LrcSSM, a $\textit{nonlinear}$ recurrent model that processes long sequences as fast as today’s linear state-space layers. By forcing the state-transition matrix to be diagonal and learned at every step, the full sequence can be solved in parallel with a single prefix-scan, giving $\mathcal{O}(TD)$ time and memory and only $\mathcal{O}(\log T)$ sequential depth, for input-sequence length $T$ and a state dimension $D$. Moreover, LrcSSM offers a formal gradient-stability guarantee that other input-varying systems such as Liquid-S4 and Mamba do not provide. Lastly, for network depth $L$, as the forward and backward passes cost $\Theta(T\,D\,L)$ FLOPs, with its low sequential depth and parameter count $\Theta(D\,L)$, the model follows the compute-optimal scaling law regime ($\beta \approx 0.42$) recently observed for Mamba, outperforming quadratic-attention Transformers at equal compute while avoiding the memory overhead of FFT-based long convolutions. We show that on a series of long-range forecasting tasks, LrcSSM outperforms LRU, S5 and Mamba.
我们展示了LrcSSSM, 一种与今天的线性状态- 空间层一样快速处理长序列的 $ textit{ nonlinear} 的经常模式。 此外, LcSSSSSSM 提供了正式的梯度可变性保证, 保证其他输入流系统, 如 livers- S4 和 Mamba 等, 每一步都无法提供对等和学习。 最后, 对于网络深度来说, 完全序列可以与单一的前缀扫描平行解决, 给 $\ mathcal{O} (TD) 时间和记忆, 并且只有 $\ mathcal{O} (log T) 的顺序深度和参数, 用于输入序列的长度 $Theta(D\, L) 美元, 该模型遵循了可配置和最佳的测量法制度 $\ betaapprox 0. 42 。 最近观测到的Mamba、 RVal- Revorveal Strial 等的 IMA- Reval- Reval- Reval Stal- IMVal- sal- sileval- sal- silvial laviewal- silval- silval- silval silval laveal laveal labal labs labs) 。
Article 285
Title@2025-05-29 (4): SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models
Title: SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models | SORSA: Singuläre Werte und Orthonormale Regularisierte Singuläre Vektoren Anpassung großer Sprachmodelle | SORSA: 单项价值和正正正的正规化的单项矢量,以适应大语言模式 2409.00055v6 |
Authors: Yang Cao, Zhao Song
In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel parameter efficient fine-tuning (PEFT) method. Each SORSA adapter consists of two main parts: trainable principal singular weights $W_p = U_p \text{diag}(S_p) V^\top_p$, and frozen residual weights $W_r = U_r \text{diag}(S_r) V^\top_r$. These parts are initialized by performing singular value decomposition (SVD) on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which we prove could decrease the condition number of $W_p$ and make the optimization more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. We also introduce a method to analyze the variation of the parameters by performing SVD and discuss and analyze SORSA’s superiority in minimizing the alteration in the SVD aspect. After all, SORSA shows a faster convergence than LoRA and PiSSA in our experiments. On the GSM-8K benchmark, Llama 2 7B adapted using SORSA achieved 56.03\% accuracy, surpassing LoRA (42.30\%) and Full FT (49.05\%). We conclude that SORSA offers a new perspective on parameter-efficient fine-tuning, demonstrating remarkable performance.
在本文中, 我们提议 Singulal 值和 Orthod Reclarizizal Singers Aditors, 或 SORSA, 一种新型参数高效微调( PEFT) 方法。 每个 SORSA 调整器由两个主要部分组成: 可训练的主要单重量 $W_ p = U_ p = U_ p = U_ p text{diag} (S_ r) Vtop_ r$。 这些部分是通过在预训练重量上执行单值分解( SVD ) 的初始化。 此外, 我们实施和分析一个正正正正正正正正正正正的调整器, 我们能降低 $_ p = U_ p = U_ text{diag} (S_ p) (S_ r) Vtr = U_ text{dia} (S_ diag} (S_r (S_r) (S_ text) (VD) Vtop__r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_s_s_s_s_s_s_s_s_s_s_s_s_s_sr_s_s_sr_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_sreford_smmmation_ss_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_s_ss_s_s_s_s
Article 286
Title@2025-05-29 (4): M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes
Title: M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes | M3Bench: Benchmarking Ganzkörper-Bewegungs-Generation für mobile Manipulation in 3D-Szenen | M3Bench:3D场景移动操纵基准全体运动生成 2410.06678v3 |
Authors: Zeyu Zhang, Sixu Yan, Muzhi Han, Zaijin Wang, Xinggang Wang, Song-Chun Zhu, Hangxin Liu
We propose M3Bench, a new benchmark for whole-body motion generation in mobile manipulation tasks. Given a 3D scene context, M3Bench requires an embodied agent to reason about its configuration, environmental constraints, and task objectives to generate coordinated whole-body motion trajectories for object rearrangement. M3Bench features 30,000 object rearrangement tasks across 119 diverse scenes, providing expert demonstrations generated by our newly developed M3BenchMaker, an automatic data generation tool that produces whole-body motion trajectories from high-level task instructions using only basic scene and robot information. Our benchmark includes various task splits to evaluate generalization across different dimensions and leverages realistic physics simulation for trajectory assessment. Extensive evaluation analysis reveals that state-of-the-art models struggle with coordinating base-arm motion while adhering to environmental and task-specific constraints, underscoring the need for new models to bridge this gap. By releasing M3Bench and M3BenchMaker we aim to advance robotics research toward more adaptive and capable mobile manipulation in diverse, real-world environments.
我们提出M3Bench,这是移动操纵任务中全体运动生成的新基准。在3D场景背景下,M3Bench要求一个内含的代理体对其配置、环境限制和任务目标进行解释,以产生协调的全体运动轨迹,用于物体重新排列。 M3Bench具有119个不同场景的30,000个物体重新排列任务的特点,提供我们新开发的M3Bench-Maker产生的专家演示,这是一个自动数据生成工具,仅使用基本场景和机器人信息,从高级任务指令中产生全体运动轨迹。我们的基准包括各种任务分割,以评价不同层面的通用,并利用现实物理模拟进行轨迹评估。广泛的评估分析表明,最先进的模型在坚持环境和特定任务的限制的同时,在协调基本运动方面挣扎,强调需要新的模型来弥合这一差距。我们通过释放M3Bench和M3Bench-M3Benker,目的是推动机器人研究,在多样化的现实世界环境中进行更适应和更有能力的移动操纵。
Article 287
Title@2025-05-29 (4): Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
Title: Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems | Topologisches Strukturlernen sollte eine Forschungspriorität für LLM-basierte Multi-Agent-Systeme sein | 地形结构学习应成为以LLM为基础的多种机构系统的研究重点 2505.22467v2 |
Authors: Jiaxi Yang, Mengqi Zhang, Yiqiao Jin, Hao Chen, Qingsong Wen, Lu Lin, Yi He, Weijie Xu, James Evans, Jindong Wang
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence. Nevertheless, the question of how agents should be structurally organized for optimal cooperation remains largely unexplored. In this position paper, we aim to gently redirect the focus of the MAS research community toward this critical dimension: develop topology-aware MASs for specific tasks. Specifically, the system consists of three core components - agents, communication links, and communication patterns - that collectively shape its coordination performance and efficiency. To this end, we introduce a systematic, three-stage framework: agent selection, structure profiling, and topology synthesis. Each stage would trigger new research opportunities in areas such as language models, reinforcement learning, graph learning, and generative modeling; together, they could unleash the full potential of MASs in complicated real-world applications. Then, we discuss the potential challenges and opportunities in the evaluation of multiple systems. We hope our perspective and framework can offer critical new insights in the era of agentic AI.
大型语言模型多行为者系统(MAS)已成为通过协作情报处理复杂任务的有力范例,然而,关于应如何从结构上组织代理人以实现最佳合作的问题基本上尚未探讨。在本立场文件中,我们的目标是将MAS研究界的重点轻轻地转向这一关键方面:为具体任务开发具有地貌意识的MAS。具体地说,该系统由三个核心组成部分组成:代理、通信联系和通信模式,共同决定其协调性能和效率。为此,我们引入了一个系统化的、三阶段的框架:代理选择、结构特征分析和地形综合。每个阶段都将在语言模型、强化学习、图表学习和基因模型等领域触发新的研究机会;它们一起可以充分发挥MAS在复杂的现实世界应用中的潜力。然后,我们讨论在评估多种系统方面的潜在挑战和机遇。我们希望我们的观点和框架能够在代理性AI时代提供重要的新见解。
Article 288
Title@2025-05-29 (4): Efficient Quantum Approximate $k$NN Algorithm via Granular-Ball Computing
Title: Efficient Quantum Approximate $k$NN Algorithm via Granular-Ball Computing | Effiziente Quanten Ungefähre $k$NN-Algorithmus über Granular-Ball Computing | 通过颗粒球式计算机计算, 近于 $k$NN 的高效量量量 2505.23066v1 |
Authors: Shuyin Xia, Xiaojiang Tian, Suzhen Yuan, Jeremiah D. Deng
High time complexity is one of the biggest challenges faced by $k$-Nearest Neighbors ($k$NN). Although current classical and quantum $k$NN algorithms have made some improvements, they still have a speed bottleneck when facing large amounts of data. To address this issue, we propose an innovative algorithm called Granular-Ball based Quantum $k$NN(GB-Q$k$NN). This approach achieves higher efficiency by first employing granular-balls, which reduces the data size needed to processed. The search process is then accelerated by adopting a Hierarchical Navigable Small World (HNSW) method. Moreover, we optimize the time-consuming steps, such as distance calculation, of the HNSW via quantization, further reducing the time complexity of the construct and search process. By combining the use of granular-balls and quantization of the HNSW method, our approach manages to take advantage of these treatments and significantly reduces the time complexity of the $k$NN-like algorithms, as revealed by a comprehensive complexity analysis.
高时复杂度是近距离邻里面临的最大挑战之一。 尽管目前的古典和量子小世界运算法已经取得了一些改进,但当面临大量数据时,它们仍然有一个速度瓶颈。为了解决这个问题,我们建议采用一种创新算法,称为以Granulal-Ball为基础的Qaunum $k$NN(GB-Q$k$NNN) 。这种方法首先使用颗粒球,从而降低处理所需的数据大小,从而提高效率。然后,通过采用高层次可导航小世界(HNSW)方法加快搜索过程。此外,我们优化了HNSW的花费时间步骤,例如通过四分化计算距离,进一步降低构建和搜索过程的时间复杂性。通过使用颗粒球和HNSW方法的四分化,我们的方法得以利用这些处理方法,大大降低GKNN值类似算法的时间复杂性,全面的复杂性分析揭示了这一点。
Article 289
Title@2025-05-29 (4): Machine Learning Framework for Characterizing Processing-Structure Relationship in Block Copolymer Thin Films
Title: Machine Learning Framework for Characterizing Processing-Structure Relationship in Block Copolymer Thin Films | Machine Learning Framework zur Charakterisierung von Verarbeitungs-Struktur-Beziehungen in Block Copolymer Thin Films | 确定胶合聚合薄薄膜加工-结构关系特征的机械学习框架 2505.23064v1 |
Authors: Bradley Lamb, Saroj Upreti, Yunfei Wang, Daniel Struble, Chenhui Zhu, Guillaume Freychet, Xiaodan Gu, Boran Ma
The morphology of block copolymers (BCPs) critically influences material properties and applications. This work introduces a machine learning (ML)-enabled, high-throughput framework for analyzing grazing incidence small-angle X-ray scattering (GISAXS) data and atomic force microscopy (AFM) images to characterize BCP thin film morphology. A convolutional neural network was trained to classify AFM images by morphology type, achieving 97% testing accuracy. Classified images were then analyzed to extract 2D grain size measurements from the samples in a high-throughput manner. ML models were developed to predict morphological features based on processing parameters such as solvent ratio, additive type, and additive ratio. GISAXS-based properties were predicted with strong performances ($R^2$ > 0.75), while AFM-based property predictions were less accurate ($R^2$ < 0.60), likely due to the localized nature of AFM measurements compared to the bulk information captured by GISAXS. Beyond model performance, interpretability was addressed using Shapley Additive exPlanations (SHAP). SHAP analysis revealed that the additive ratio had the largest impact on morphological predictions, where additive provides the BCP chains with increased volume to rearrange into thermodynamically favorable morphologies. This interpretability helps validate model predictions and offers insight into parameter importance. Altogether, the presented framework combining high-throughput characterization and interpretable ML offers an approach to exploring and optimizing BCP thin film morphology across a broad processing landscape.
这项工作引入了一个机器学习(ML)驱动的高通量框架,用于分析小角X射线散射(GISAXS)的放牧事件、小角X射线分散(GISAXS)的数据和原子力显微镜(AFM)图像,以描述BCP薄薄膜形态学。一个革命神经网络接受了培训,按形态类型对AFM图像进行分类,达到97%的测试精确度。然后对分类图像进行了分析,以高通量方式从样本中提取2D粒度测量数据。根据加工参数,如溶剂比率、添加剂类型和添加剂比率,开发了ML模型,以预测形态特征。基于GISAXS的特性预测有很强的性能(R%2美元 > 0.75),而基于FMM财产的预测则不那么准确(R2美元 < 0.60),可能是由于与GISAXS的定量评估方法相比,亚调度模型的局部性度测量,除模型性能外,还利用可理解性透性剖析性剖面图(SHAPAAP),将深度分析结果显示Brasslievildrolational Styalview的深度分析提供量。
Article 290
Title@2025-05-29 (4): Loss-Guided Model Sharing and Local Learning Correction in Decentralized Federated Learning for Crop Disease Classification
Title: Loss-Guided Model Sharing and Local Learning Correction in Decentralized Federated Learning for Crop Disease Classification | Loss-Guided Model Sharing und lokale Lernkorrektur bei dezentralisiertem Föderated Learning für die Klassifizierung von Crop Diseases | 关于作物疾病分类的分散化联邦学习中损失指导模式共享和地方学习校正 2505.23063v1 |
Authors: Denis Mamba Kabala, Adel Hafiane, Laurent Bobelin, Raphael Canals
Crop disease detection and classification is a critical challenge in agriculture, with major implications for productivity, food security, and environmental sustainability. While deep learning models such as CNN and ViT have shown excellent performance in classifying plant diseases from images, their large-scale deployment is often limited by data privacy concerns. Federated Learning (FL) addresses this issue, but centralized FL remains vulnerable to single-point failures and scalability limits. In this paper, we introduce a novel Decentralized Federated Learning (DFL) framework that uses validation loss (Loss_val) both to guide model sharing between peers and to correct local training via an adaptive loss function controlled by weighting parameter. We conduct extensive experiments using PlantVillage datasets with three deep learning architectures (ResNet50, VGG16, and ViT_B16), analyzing the impact of weighting parameter, the number of shared models, the number of clients, and the use of Loss_val versus Loss_train of other clients. Results demonstrate that our DFL approach not only improves accuracy and convergence speed, but also ensures better generalization and robustness across heterogeneous data environments making it particularly well-suited for privacy-preserving agricultural applications.
作物疾病检测和分类是农业面临的一项重大挑战,对生产力、粮食安全和环境可持续性具有重大影响。CNN和VIT等深层次学习模式在将植物疾病从图像中分类方面表现良好,但其大规模部署往往受到数据隐私问题的限制。联邦学习(FL)处理这一问题,但中央化FL仍然易受单点失灵和可缩放限制的影响。在本文中,我们引入了一个新型的分散化联邦学习(DFL)框架,该框架使用验证损失(Loss_val)来指导同龄人之间分享模型,并通过由加权参数控制的适应性损失功能来纠正当地培训。我们用三种深层学习结构(ResNet50、VGG16和VIT_B16)进行广泛的实验,分析加权参数的影响、共享模型的数量、客户数量、以及使用Lost_val相对于其他客户的损失/损失。结果显示,我们的DFL方法不仅提高精确度和趋近速度,而且还确保更加普及和稳健地贯穿不同数据环境,使其特别适合于保护隐私的应用。
Article 291
Title@2025-05-29 (4): Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Title: Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data | Composite Flow passend zum Verstärkungslernen mit Shifted-Dynamics-Daten | 与上下动动量数据匹配的强化学习综合流程 2505.23062v1 |
Authors: Lingkai Kong, Haichuan Wang, Tonghan Wang, Guojun Xiong, Milind Tambe
Incorporating pre-collected offline data from a source environment can significantly improve the sample efficiency of reinforcement learning (RL), but this benefit is often challenged by discrepancies between the transition dynamics of the source and target environments. Existing methods typically address this issue by penalizing or filtering out source transitions in high dynamics-gap regions. However, their estimation of the dynamics gap often relies on KL divergence or mutual information, which can be ill-defined when the source and target dynamics have disjoint support. To overcome these limitations, we propose CompFlow, a method grounded in the theoretical connection between flow matching and optimal transport. Specifically, we model the target dynamics as a conditional flow built upon the output distribution of the source-domain flow, rather than learning it directly from a Gaussian prior. This composite structure offers two key advantages: (1) improved generalization for learning target dynamics, and (2) a principled estimation of the dynamics gap via the Wasserstein distance between source and target transitions. Leveraging our principled estimation of the dynamics gap, we further introduce an optimistic active data collection strategy that prioritizes exploration in regions of high dynamics gap, and theoretically prove that it reduces the performance disparity with the optimal policy. Empirically, CompFlow outperforms strong baselines across several RL benchmarks with shifted dynamics.
从源环境预先收集的离线数据可以大大提高强化学习(RL)的抽样效率,但这一效益往往受到源与目标环境过渡动态之间的差异的挑战。现有方法通常通过惩罚或过滤高动态差距区域源的过渡来解决这一问题。但是,它们对于动态差距的估计往往依赖KL差异或相互信息,而当源和目标动态得到不连贯的支持时,这种差异或相互信息可能定义不当。为了克服这些限制,我们提议CompFlow,这是基于流动匹配与最佳运输之间理论联系的一种方法。具体地说,我们将目标动态作为基于源-地流动产出分布的有条件流动模型,而不是直接从Gausian之前的区域学习。这一综合结构提供了两个主要优势:(1) 改进学习目标动态的通用化,(2) 通过源与目标转型之间的瓦瑟斯坦距离对动态差距进行有原则性的估计。我们利用我们对动态差距的原则性估计,我们进一步引入了一种乐观的积极数据收集战略,优先在高动态差距区域进行勘探,从理论上证明它能够减少最佳政策基线之间的强性差距。
Article 292
Title@2025-05-29 (4): Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
Title: Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design | Spekulative Dekodierung trifft auf Quantisierung: Kompatibilitätsbewertung und Hierarchisches Framework Design | 投机性下限符合量化:兼容性评价和等级框架设计 2505.22179v2 |
Authors: Yudi Zhang, Weilin Zhao, Xu Han, Tiejun Zhao, Wang Xu, Hailong Cao, Conghui Zhu
Speculative decoding and quantization effectively accelerate memory-bound inference of large language models. Speculative decoding mitigates the memory bandwidth bottleneck by verifying multiple tokens within a single forward pass, which increases computational effort. Quantization achieves this optimization by compressing weights and activations into lower bit-widths and also reduces computations via low-bit matrix multiplications. To further leverage their strengths, we investigate the integration of these two techniques. Surprisingly, experiments applying the advanced speculative decoding method EAGLE-2 to various quantized models reveal that the memory benefits from 4-bit weight quantization are diminished by the computational load from speculative decoding. Specifically, verifying a tree-style draft incurs significantly more time overhead than a single-token forward pass on 4-bit weight quantized models. This finding led to our new speculative decoding design: a hierarchical framework that employs a small model as an intermediate stage to turn tree-style drafts into sequence drafts, leveraging the memory access benefits of the target quantized model. Experimental results show that our hierarchical approach achieves a 2.78$\times$ speedup across various tasks for the 4-bit weight Llama-3-70B model on an A100 GPU, outperforming EAGLE-2 by 1.31$\times$. Code available at https://github.com/AI9Stars/SpecMQuant.
可能解码和量化能够有效加速大型语言模型的内存解析。 猜测解码可以减少记忆带宽瓶颈, 具体来说, 量化可以将重量压缩和激活到小位宽度, 并通过低位基数乘法减少计算。 为了进一步发挥这两个技术的优势, 我们调查了这两种技术的整合情况。 令人惊讶的是, 将先进的投机解码方法 EAGLE-2 应用到各种量化模型的实验表明, 4比位权重量化的记忆因投机解码的计算负荷而减少。 具体地说, 验证树型草案比四位基位基数的单端前端分流要多得多。 发现导致我们新的投机解码设计: 一个等级框架, 使用一个小型的模型将树型草案转换成序列草稿, 利用目标值值值值值EAGLE9- 3的重量量化, 具体地说, 树型草案将A78- ALS 的内存取收益收益, 一个等级方法在1个基数级平比值模型上, 4级分析结果显示, 级方法在1个基比重模型上, 4级方法达到E78_B级。 级方法, 。 4级。 实验结果结果, 我们的等级方法在1级方法在1个基级分级分级分级法方法, 在1比级法方法上, 在1比级法方法上, 在1比。
Article 293
Title@2025-05-29 (4): DINGO: Constrained Inference for Diffusion LLMs
Title: DINGO: Constrained Inference for Diffusion LLMs | DINGO: Beschränkte Schlussfolgerung für Diffusion LLMs | DINGO: 扩散长效LMM的连续推论 2505.23061v1 |
Authors: Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, Gagandeep Singh
Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering significant potential for improved runtime efficiency. However, existing diffusion models lack the ability to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models that generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, which are designed for sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model’s predicted distribution, while strictly satisfying any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point improvement over unconstrained inference
与传统自动递减的LMS相比,LMS已成为一种大有希望的替代传统自动递减的LMS,它为提高运行时间效率提供了巨大的潜力;然而,现有的推广模式缺乏能力,无法对用户指定的正式限制,例如常规表达方式,使其不适于执行需要结构化产出的任务,例如固定的JSON 生成。与自动递减模式不同,扩散LMS同时预测一系列象征性。这种平行使得传统的受限制解码算法(这些算法是为按顺序进行象征性预测而设计的,在保存真正的产出分布方面无效)。为了应对这一限制,我们建议DINGO,这是一个动态的、基于程序化的受限解码战略,既高效又可可移动的分布保存。DINGO能够根据模型预测的分布,在严格满足用户指定的任何常规表达方式时,以最高概率取样产出字符。关于标准的象征性数学和JSONS生成基准,DINGO在未受限制的推算之外,实现了68个百分点的改进。
Article 294
Title@2025-05-29 (4): Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization
Title: Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization | Verbesserte letzte Konvergenz der schrumpfenden Gradienten-Methoden für rauchfreie Convex-Optimierung | 优化非移动convex最佳化的渐进式打碎方法的改进后最后 2505.23056v1 |
Authors: Zijian Liu, Zhengyuan Zhou
We study the convergence of the shuffling gradient method, a popular algorithm employed to minimize the finite-sum function with regularization, in which functions are passed to apply (Proximal) Gradient Descent (GD) one by one whose order is determined by a permutation on the indices of functions. In contrast to its easy implementation and effective performance in practice, the theoretical understanding remains limited. A recent advance by (Liu & Zhou, 2024b) establishes the first last-iterate convergence results under various settings, especially proving the optimal rates for smooth (strongly) convex optimization. However, their bounds for nonsmooth (strongly) convex functions are only as fast as Proximal GD. In this work, we provide the first improved last-iterate analysis for the nonsmooth case demonstrating that the widely used Random Reshuffle ($\textsf{RR}$) and Single Shuffle ($\textsf{SS}$) strategies are both provably faster than Proximal GD, reflecting the benefit of randomness. As an important implication, we give the first (nearly) optimal convergence result for the suffix average under the $\textsf{RR}$ sampling scheme in the general convex case, matching the lower bound shown by (Koren et al., 2022).
我们研究的是折叠梯度方法的趋同性,这是一种常用的算法,目的是尽量减少与正规化的有限和总和功能,在这种算法中,各种功能被传递到应用(最接近的)渐变源(GD),其顺序由功能指数的变异决定。与其易于执行和实践中的有效运作相比,理论上的理解仍然有限。最近由(Liu & Zhou, 2024b) 和Sone Shuffle (textfsffsf{SS}$) 提出的最接近性结果在各种环境下首次确定,特别是证明最优的速率可以顺利(强的)调和同质优化。然而,这些功能对非mooth(强的)共融(GD) 功能的界限仅与最接近性GD(GD) 一样快。在这项工作中,我们第一次改进了对非曲线案例的最后一次计算率分析,表明广泛使用的随机再组合(textffffffs} 20美元) 和Shuffleshleshle shal 最接近结果,在平均的20x 和最接近结果之下。
Article 295
Title@2025-05-29 (4): CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents
Title: CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents | CDR-Agent: Intelligente Auswahl und Durchführung klinischer Entscheidungsregeln unter Verwendung von Large Language Model Agents | CDR-代理:明智选择和执行使用大语言示范物剂的临床决定规则 2505.23055v1 |
Authors: Zhen Xiang, Aliyah R. Hsu, Austin V. Zane, Aaron E. Kornblith, Margaret J. Lin-Martore, Jasmanpreet C. Kaur, Vasuda M. Dokiparthi, Bo Li, Bin Yu
Clinical decision-making is inherently complex and fast-paced, particularly in emergency departments (EDs) where critical, rapid and high-stakes decisions are made. Clinical Decision Rules (CDRs) are standardized evidence-based tools that combine signs, symptoms, and clinical variables into decision trees to make consistent and accurate diagnoses. CDR usage is often hindered by the clinician’s cognitive load, limiting their ability to quickly recall and apply the appropriate rules. We introduce CDR-Agent, a novel LLM-based system designed to enhance ED decision-making by autonomously identifying and applying the most appropriate CDRs based on unstructured clinical notes. To validate CDR-Agent, we curated two novel ED datasets: synthetic and CDR-Bench, although CDR-Agent is applicable to non ED clinics. CDR-Agent achieves a 56.3\% (synthetic) and 8.7\% (CDR-Bench) accuracy gain relative to the standalone LLM baseline in CDR selection. Moreover, CDR-Agent significantly reduces computational overhead. Using these datasets, we demonstrated that CDR-Agent not only selects relevant CDRs efficiently, but makes cautious yet effective imaging decisions by minimizing unnecessary interventions while successfully identifying most positively diagnosed cases, outperforming traditional LLM prompting approaches. Code for our work can be found at: https://github.com/zhenxianglance/medagent-cdr-agent
临床决策具有内在的复杂性,而且速度很快,特别是在紧急部门(急诊部门)尤其如此。临床决策规则(CDR)是标准化的循证工具,将迹象、症状和临床变量结合到决策树中,以便作出一致和准确的诊断。临床决策的使用往往受到临床医生认知负荷的阻碍,限制了他们迅速回忆和适用适当规则的能力。我们引入了CDR-Agency,这是一个以LLM为基础的创新系统,目的是通过自主地识别和应用基于非结构化临床说明的最适当的CDR(CDR-Agency)来加强ED决策。为了验证CDR-Agency,我们调整了两个新型的ED数据集:合成和CDR-Bench,尽管CDR-Agency适用于非ED诊所。CDR-Agentient在CDR的选择中取得了56.3(合成)和8.7(CDR-Ben-Bench)的准确度,而与CDRM的不透明性LM基准相对。此外,CDR-Agency 明显地降低了计算间接费用。我们成功地选择了CRDRDRDR的准确性案例。我们成功地选择了C-C-C-C-C-DRDRDRDRDRDRDRDR(成功),我们发现C-C-C-C-C-de)。
Article 296
Title@2025-05-29 (4): Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network
Title: Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network | Lernen von suboptimalen Daten in der kontinuierlichen Kontrolle über Auto-Regressive Soft Q-Network | 通过自动递减软软QNetwork, 从连续控制中的亚最佳数据中学习 2502.00288v2 |
Authors: Jijia Liu, Feng Gao, Qingmin Liao, Chao Yu, Yu Wang
Reinforcement learning (RL) for continuous control often requires large amounts of online interaction data. Value-based RL methods can mitigate this burden by offering relatively high sample efficiency. Some studies further enhance sample efficiency by incorporating offline demonstration data to “kick-start” training, achieving promising results in continuous control. However, they typically compute the Q-function independently for each action dimension, neglecting interdependencies and making it harder to identify optimal actions when learning from suboptimal data, such as non-expert demonstration and online-collected data during the training process. To address these issues, we propose Auto-Regressive Soft Q-learning (ARSQ), a value-based RL algorithm that models Q-values in a coarse-to-fine, auto-regressive manner. First, ARSQ decomposes the continuous action space into discrete spaces in a coarse-to-fine hierarchy, enhancing sample efficiency for fine-grained continuous control tasks. Next, it auto-regressively predicts dimensional action advantages within each decision step, enabling more effective decision-making in continuous control tasks. We evaluate ARSQ on two continuous control benchmarks, RLBench and D4RL, integrating demonstration data into online training. On D4RL, which includes non-expert demonstrations, ARSQ achieves an average $1.62\times$ performance improvement over SOTA value-based baseline. On RLBench, which incorporates expert demonstrations, ARSQ surpasses various baselines, demonstrating its effectiveness in learning from suboptimal online-collected data. Project page is at https://sites.google.com/view/ar-soft-q
用于连续控制的强化学习(RL)往往需要大量的在线互动数据。基于价值的RL方法可以通过提供相对较高的样本效率来减轻这一负担。有些研究将离线演示数据纳入“启动”培训,从而进一步提高样本效率,从而在连续控制方面实现有希望的成果。然而,它们通常对每个行动层面独立计算Q功能,忽视相互依存关系,在学习非优化数据时更难确定最佳行动,例如培训过程中的非专家演示和在线收集的数据。为了解决这些问题,我们建议采用自动回归软软Q学习(ARSQ),一种基于价值的RL算法,将“离线”到“启动”培训,在连续控制方面,ARSQQQ 将基于价值的模型模型,将连续的操作空间放在一个离散的空间中,提高精细计量的连续控制任务的样本效率。此外,它自动递增地预测了每个决策步骤的维度行动优势,使得以粗略的R-Sft QQL 能够更有效地进行模拟决策,在连续控制中将SO-D-L任务纳入持续管理。
Article 297
Title@2025-05-29 (4): DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
Title: DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration | DenoiseRotator: Verbesserung der Beschneidungsfestigkeit für LLMs durch Bedeutungskonzentration | DenoisRotator:通过重视浓度提高LLMs的稳健力 2505.23049v1 |
Authors: Tianteng Gu, Bei Liu, Bo Xiao, Ke Zeng, Jiacheng Liu, Yanmin Qian
Pruning is a widely used technique to compress large language models (LLMs) by removing unimportant weights, but it often suffers from significant performance degradation - especially under semi-structured sparsity constraints. Existing pruning methods primarily focus on estimating the importance of individual weights, which limits their ability to preserve critical capabilities of the model. In this work, we propose a new perspective: rather than merely selecting which weights to prune, we first redistribute parameter importance to make the model inherently more amenable to pruning. By minimizing the information entropy of normalized importance scores, our approach concentrates importance onto a smaller subset of weights, thereby enhancing pruning robustness. We instantiate this idea through DenoiseRotator, which applies learnable orthogonal transformations to the model’s weight matrices. Our method is model-agnostic and can be seamlessly integrated with existing pruning techniques such as Magnitude, SparseGPT, and Wanda. Evaluated on LLaMA3, Qwen2.5, and Mistral models under 50% unstructured and 2:4 semi-structured sparsity, DenoiseRotator consistently improves perplexity and zero-shot accuracy. For instance, on LLaMA3-70B pruned with SparseGPT at 2:4 semi-structured sparsity, DenoiseRotator reduces the perplexity gap to the dense model by 58%, narrowing the degradation from 8.1 to 3.4 points. Codes are available at https://github.com/Axel-gu/DenoiseRotator.
粗略是一种通过去除不重要的重量压缩大语言模型( LLMs) 的技术, 它被广泛使用, 以压缩大语言模型( LLMs) 。 但是, 我们的方法往往受到显著的性能退化的影响, 特别是在半结构化的宽度限制下。 现有的裁剪方法主要侧重于估算个体重量的重要性, 这限制了它们保存模型关键能力的能力。 在这项工作中, 我们提出了一个新视角: 我们不仅选择对纯度的权重, 我们首先重新分配参数的重要性, 以使模型本身更容易被剪裁。 通过将正常重要性分分数的信息最小化, 我们的方法将重要性集中在一个较小的重量组上, 特别是半结构化的缩略度强度。 我们通过DenoiseRototiator将这一想法快速化, 将可学习的或高度的变异度转换应用到模型的重量矩阵矩阵。 我们的方法是模范- 、 SpressGPT和Wanda 等现有调技术可以顺利地结合。 由LLA3 、 Quencommissionality 和 Mis- mission dealtialalalticalalality 在50 和 2: Drassalticalticalticality上持续地改进了50- deal- dealtial- deal- dealalality 。
Article 298
Title@2025-05-29 (4): ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory Imputation
Title: ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory Imputation | ProDiff: Prototypen-geführte Diffusion für minimale Information Trajektorie Imputation | ProDiff: 用于最小信息轨迹截肢的原型类型辅助扩散 2505.23048v1 |
Authors: Tianci Bu, Le Zhou, Wenchuan Yang, Jianhong Mou, Kang Yang, Suoyi Tan, Feng Yao, Jingyuan Wang, Xin Lu
Trajectory data is crucial for various applications but often suffers from incompleteness due to device limitations and diverse collection scenarios. Existing imputation methods rely on sparse trajectory or travel information, such as velocity, to infer missing points. However, these approaches assume that sparse trajectories retain essential behavioral patterns, which place significant demands on data acquisition and overlook the potential of large-scale human trajectory embeddings. To address this, we propose ProDiff, a trajectory imputation framework that uses only two endpoints as minimal information. It integrates prototype learning to embed human movement patterns and a denoising diffusion probabilistic model for robust spatiotemporal reconstruction. Joint training with a tailored loss function ensures effective imputation. ProDiff outperforms state-of-the-art methods, improving accuracy by 6.28\% on FourSquare and 2.52\% on WuXi. Further analysis shows a 0.927 correlation between generated and real trajectories, demonstrating the effectiveness of our approach.
对各种应用来说,轨迹数据至关重要,但由于装置限制和收集情况多种多样,数据往往不完全。现有的估算方法依靠稀少的轨迹或旅行信息,例如速度,来推断缺失点。然而,这些方法假定,稀疏的轨迹保留了基本的行为模式,对数据获取提出了重大要求,忽视了大规模人类轨迹嵌入的潜力。为解决这一问题,我们提议ProDiff,这是一个轨迹估算框架,仅使用两个端点作为最低限度的信息。它将原型学习纳入人类运动模式,并采用一个去消化的传播概率模型,以进行强大的波段重建。与特定损失函数的联合培训可以确保有效的估算。ProDiff超越了最新方法,通过6.28(FourSquare)和2.52(WuXi)提高了准确度。进一步分析显示,生成的轨迹与实际轨迹之间有0.927的关联,显示了我们的方法的有效性。
Article 299
Title@2025-05-29 (4): Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Title: Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping | Nicht konvexe stochastische Optimierung unter schwerfälligen Geräuschen: Optimale Konvergenz ohne gradientes Clipping | 在重困噪音下非convex 斯托卡优化: 没有梯度缩放的最佳趋同 2412.19529v4 |
Authors: Zijian Liu, Zhengyuan Zhou
Recently, the study of heavy-tailed noises in first-order nonconvex stochastic optimization has gotten a lot of attention since it was recognized as a more realistic condition as suggested by many empirical observations. Specifically, the stochastic noise (the difference between the stochastic and true gradient) is considered to have only a finite $\mathfrak{p}$-th moment where $\mathfrak{p}\in\left(1,2\right]$ instead of assuming it always satisfies the classical finite variance assumption. To deal with this more challenging setting, people have proposed different algorithms and proved them to converge at an optimal $\mathcal{O}(T^{\frac{1-\mathfrak{p}}{3\mathfrak{p}-2}})$ rate for smooth objectives after $T$ iterations. Notably, all these new-designed algorithms are based on the same technique - gradient clipping. Naturally, one may want to know whether the clipping method is a necessary ingredient and the only way to guarantee convergence under heavy-tailed noises. In this work, by revisiting the existing Batched Normalized Stochastic Gradient Descent with Momentum (Batched NSGDM) algorithm, we provide the first convergence result under heavy-tailed noises but without gradient clipping. Concretely, we prove that Batched NSGDM can achieve the optimal $\mathcal{O}(T^{\frac{1-\mathfrak{p}}{3\mathfrak{p}-2}})$ rate even under the relaxed smooth condition. More interestingly, we also establish the first $\mathcal{O}(T^{\frac{1-\mathfrak{p}}{2\mathfrak{p}}})$ convergence rate in the case where the tail index $\mathfrak{p}$ is unknown in advance, which is arguably the common scenario in practice.
最近,对一级(mathfrak{p}p}in\left(1,2\right) $的重尾噪声的研究得到了很多关注,因为许多实证观察都认为这是一个更现实的条件。具体地说,在美元外转后,对重尾噪声(Stochac{matrak}p}p}$的差别)仅认为只有一定的$(mathfrak{p}in\left(1,2\right) 美元,而没有假设它总是符合传统的有限差异假设。要处理这个更具挑战性的设置,人们已经提出了不同的算法,并证明它们会以最佳的 $\ mathalcal{O} (T\\\ mathratch\\\\ markr\\\ translation) 的方式趋同, 平整的算法(现在的变压式曲子) 也可以先用平价缩的变压的变压法建立。
Article 300
Title@2025-05-29 (4): From Theory to Application: Fine-Tuning Large EEG Model with Real-World Stress Data
Title: From Theory to Application: Fine-Tuning Large EEG Model with Real-World Stress Data | Von der Theorie zur Anwendung: Feintuning-Großes EEG-Modell mit realen Stressdaten | 从理论到应用:使用现实世界应激数据精美应用大型电子EEG模型 2505.23042v1 |
Authors: Siwen Wang, Shitou Zhang, Wan-Lin Chen, Dung Truong, Tzyy-Ping Jung
Recent advancements in Large Language Models have inspired the development of foundation models across various domains. In this study, we evaluate the efficacy of Large EEG Models (LEMs) by fine-tuning LaBraM, a state-of-the-art foundation EEG model, on a real-world stress classification dataset collected in a graduate classroom. Unlike previous studies that primarily evaluate LEMs using data from controlled clinical settings, our work assesses their applicability to real-world environments. We train a binary classifier that distinguishes between normal and elevated stress states using resting-state EEG data recorded from 18 graduate students during a class session. The best-performing fine-tuned model achieves a balanced accuracy of 90.47% with a 5-second window, significantly outperforming traditional stress classifiers in both accuracy and inference efficiency. We further evaluate the robustness of the fine-tuned LEM under random data shuffling and reduced channel counts. These results demonstrate the capability of LEMs to effectively process real-world EEG data and highlight their potential to revolutionize brain-computer interface applications by shifting the focus from model-centric to data-centric design.
nan
Article 301
Title@2025-05-29 (4): TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation
Title: TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation | TINED: GNNs-to-MLPs von Lehrerinjektion und Dirichlet Energy Destillation | TINED:通过教师注射和稀释能源蒸馏,将GNNs改为MLP 2412.11180v3 |
Authors: Ziang Zhou, Zhihao Ding, Jieming Shi, Qing Li, Shiqi Shen
Graph Neural Networks (GNNs) are pivotal in graph-based learning, particularly excelling in node classification. However, their scalability is hindered by the need for multi-hop data during inference, limiting their application in latency-sensitive scenarios. Recent efforts to distill GNNs into multi-layer perceptrons (MLPs) for faster inference often underutilize the layer-level insights of GNNs. In this paper, we present TINED, a novel approach that distills GNNs to MLPs on a layer-by-layer basis using Teacher Injection and Dirichlet Energy Distillation techniques. We focus on two key operations in GNN layers: feature transformation (FT) and graph propagation (GP). We recognize that FT is computationally equivalent to a fully-connected (FC) layer in MLPs. Thus, we propose directly transferring teacher parameters from an FT in a GNN to an FC layer in the student MLP, enhanced by fine-tuning. In TINED, the FC layers in an MLP replicate the sequence of FTs and GPs in the GNN. We also establish a theoretical bound for GP approximation. Furthermore, we note that FT and GP operations in GNN layers often exhibit opposing smoothing effects: GP is aggressive, while FT is conservative. Using Dirichlet energy, we develop a DE ratio to measure these effects and propose Dirichlet Energy Distillation to convey these characteristics from GNN layers to MLP layers. Extensive experiments show that TINED outperforms GNNs and leading distillation methods across various settings and seven datasets. Source code are available at https://github.com/scottjiao/TINED_ICML25/.
nan
Article 302
Title@2025-05-29 (4): One Model for One Graph: A New Perspective for Pretraining with Cross-domain Graphs
Title: One Model for One Graph: A New Perspective for Pretraining with Cross-domain Graphs | Ein Modell für einen Graphen: Eine neue Perspektive für das Pretraining mit domänenübergreifenden Graphen | 一图一模型:带有跨领域图的训练前新视角 2412.00315v2 |
Authors: Jingzhe Liu, Haitao Mao, Zhikai Chen, Bingheng Li, Wenqi Fan, Mingxuan Ju, Tong Zhao, Neil Shah, Jiliang Tang
Graph Neural Networks (GNNs) have emerged as a powerful tool to capture intricate network patterns, achieving success across different domains. However, existing GNNs require careful domain-specific architecture designs and training from scratch on each dataset, leading to an expertise-intensive process with difficulty in generalizing across graphs from different domains. Therefore, it can be hard for practitioners to infer which GNN model can generalize well to graphs from their domains. To address this challenge, we propose a novel cross-domain pretraining framework, “one model for one graph,” which overcomes the limitations of previous approaches that failed to use a single GNN to capture diverse graph patterns across domains with significant gaps. Specifically, we pretrain a bank of expert models, with each one corresponding to a specific dataset. When inferring to a new graph, gating functions choose a subset of experts to effectively integrate prior model knowledge while avoiding negative transfer. Extensive experiments consistently demonstrate the superiority of our proposed method on both link prediction and node classification tasks.
nan
Article 303
Title@2025-05-29 (4): Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation
Title: Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation | Cross-modal RAG: Sub-dimensionale Retrieval-Augmented Text-to-Image Generation | 跨模式RAG:次二维检索增强的文本到图像生成 2505.21956v2 |
Authors: Mengdan Zhu, Senhao Cheng, Guangji Bai, Yifei Zhang, Liang Zhao
Text-to-image generation increasingly demands access to domain-specific, fine-grained, and rapidly evolving knowledge that pretrained models cannot fully capture. Existing Retrieval-Augmented Generation (RAG) methods attempt to address this by retrieving globally relevant images, but they fail when no single image contains all desired elements from a complex user query. We propose Cross-modal RAG, a novel framework that decomposes both queries and images into sub-dimensional components, enabling subquery-aware retrieval and generation. Our method introduces a hybrid retrieval strategy - combining a sub-dimensional sparse retriever with a dense retriever - to identify a Pareto-optimal set of images, each contributing complementary aspects of the query. During generation, a multimodal large language model is guided to selectively condition on relevant visual features aligned to specific subqueries, ensuring subquery-aware image synthesis. Extensive experiments on MS-COCO, Flickr30K, WikiArt, CUB, and ImageNet-LT demonstrate that Cross-modal RAG significantly outperforms existing baselines in both retrieval and generation quality, while maintaining high efficiency.
nan
Article 304
Title@2025-05-29 (4): Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction
Title: Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction | Case-Based Reasoning verbessert die vorausschauende Kraft von LLMs in der Arzneimittel-Drogen-Interaktion | 以个案为依据的理由加强药物-药物相互作用LLMs的预测能力 2505.23034v1 |
Authors: Guangyi Liu, Yongqi Zhang, Xunyuan Liu, Quanming Yao
Drug-drug interaction (DDI) prediction is critical for treatment safety. While large language models (LLMs) show promise in pharmaceutical tasks, their effectiveness in DDI prediction remains challenging. Inspired by the well-established clinical practice where physicians routinely reference similar historical cases to guide their decisions through case-based reasoning (CBR), we propose CBR-DDI, a novel framework that distills pharmacological principles from historical cases to improve LLM reasoning for DDI tasks. CBR-DDI constructs a knowledge repository by leveraging LLMs to extract pharmacological insights and graph neural networks (GNNs) to model drug associations. A hybrid retrieval mechanism and dual-layer knowledge-enhanced prompting allow LLMs to effectively retrieve and reuse relevant cases. We further introduce a representative sampling strategy for dynamic case refinement. Extensive experiments demonstrate that CBR-DDI achieves state-of-the-art performance, with a significant 28.7% accuracy improvement over both popular LLMs and CBR baseline, while maintaining high interpretability and flexibility.
nan
Article 305
Title@2025-05-29 (4): Exploring the Limitations of Mamba in COPY and CoT Reasoning
Title: Exploring the Limitations of Mamba in COPY and CoT Reasoning | Erforschung der Grenzen von Mamba in COPY und CoT Reasoning | 探索COPY和COT理由解释中Mamba的局限性 2410.03810v3 |
Authors: Ruifeng Ren, Zhicong Li, Yong Liu
Transformers have become the backbone of modern Large Language Models (LLMs); however, their inference overhead grows linearly with the sequence length, posing challenges for modeling long sequences. In light of this, Mamba has attracted attention for maintaining a constant inference size, with empirical evidence demonstrating that it can match Transformer performance in sequence modeling while significantly reducing computational costs. However, an open question remains: can Mamba always bring savings while achieving performance comparable to Transformers? In this paper, we focus on analyzing the expressive ability of Mamba to perform our defined COPY operation and Chain of Thought (CoT) reasoning. First, inspired by the connection between Mamba and linear attention, we show that constant-sized Mamba may struggle to perform COPY operations while Transformers can handle them more easily. However, when the size of Mamba grows linearly with the input sequence length, it can accurately perform COPY, but in this case, Mamba no longer provides overhead savings. Based on this observation, we further analyze Mamba’s ability to tackle CoT tasks, which can be described by the Dynamic Programming (DP) problems. Our findings suggest that to solve arbitrary DP problems, the total cost of Mamba is still comparable to standard Transformers. However, similar to efficient Transformers, when facing DP problems with favorable properties such as locality, Mamba can provide savings in overhead. Our experiments on the copy and CoT tasks further demonstrate Mamba’s limitations compared to Transformers in learning these tasks.
nan
Article 306
Title@2025-05-29 (4): AntiLeakBench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
Title: AntiLeakBench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge | AntiLeakBench: Datenkontamination durch automatisches Konstruieren von Benchmarks mit aktualisiertem Real-World-Wissen verhindern | 防止泄漏:利用最新现实世界知识自动建立基准,防止数据污染 2412.13670v2 |
Authors: Xiaobao Wu, Liangming Pan, Yuxi Xie, Ruiwen Zhou, Shuai Zhao, Yubo Ma, Mingzhe Du, Rui Mao, Anh Tuan Luu, William Yang Wang
Data contamination hinders fair LLM evaluation by introducing test data into newer models’ training sets. Existing studies solve this challenge by updating benchmarks with newly collected data. However, they fail to guarantee contamination-free evaluation as the newly collected data may contain pre-existing knowledge, and their benchmark updates rely on intensive human labor. To address these issues, we in this paper propose AntiLeak-Bench, an automated anti-leakage benchmarking framework. Instead of simply using newly collected data, we construct samples with explicitly new knowledge absent from LLMs’ training sets, which thus ensures strictly contamination-free evaluation. We further design a fully automated workflow to build and update our benchmark without human labor. This significantly reduces the cost of benchmark maintenance to accommodate emerging LLMs. Through extensive experiments, we highlight that data contamination likely exists before LLMs’ cutoff time and demonstrate AntiLeak-Bench effectively overcomes this challenge.
nan
Article 307
Title@2025-05-29 (4): Bayesian Neural Scaling Laws Extrapolation with Prior-Fitted Networks
Title: Bayesian Neural Scaling Laws Extrapolation with Prior-Fitted Networks | Bayesische Neural Scaling-Gesetze Extrapolation mit vormontierten Netzwerken | Bayesian神经扩增法与事先确定网络的外推法 2505.23032v1 |
Authors: Dongwoo Lee, Dong Bok Lee, Steven Adriaensen, Juho Lee, Sung Ju Hwang, Frank Hutter, Seon Joo Kim, Hae Beom Lee
Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.
nan
Article 308
Title@2025-05-29 (4): Diverse Prototypical Ensembles Improve Robustness to Subpopulation Shift
Title: Diverse Prototypical Ensembles Improve Robustness to Subpopulation Shift | Unterschiedliche prototypische Ensembles verbessern die Robustheit der Subpopulationsverschiebung | 提高亚人口变换能力 2505.23027v1 |
Authors: Minh Nguyen Nhat To, Paul F RWilson, Viet Nguyen, Mohamed Harmanani, Michael Cooper, Fahimeh Fooladgar, Purang Abolmaesumi, Parvin Mousavi, Rahul G. Krishnan
The subpopulationtion shift, characterized by a disparity in subpopulation distributibetween theween the training and target datasets, can significantly degrade the performance of machine learning models. Current solutions to subpopulation shift involve modifying empirical risk minimization with re-weighting strategies to improve generalization. This strategy relies on assumptions about the number and nature of subpopulations and annotations on group membership, which are unavailable for many real-world datasets. Instead, we propose using an ensemble of diverse classifiers to adaptively capture risk associated with subpopulations. Given a feature extractor network, we replace its standard linear classification layer with a mixture of prototypical classifiers, where each member is trained to classify the data while focusing on different features and samples from other members. In empirical evaluation on nine real-world datasets, covering diverse domains and kinds of subpopulation shift, our method of Diverse Prototypical Ensembles (DPEs) often outperforms the prior state-of-the-art in worst-group accuracy. The code is available at https://github.com/minhto2802/dpe4subpop
nan
Article 309
Title@2025-05-29 (4): Graph Wave Networks
Title: Graph Wave Networks | Graphische Wellennetze | 图图波网络 2505.20034v2 |
Authors: Juwei Yue, Haikuo Li, Jiawei Sheng, Yihan Guo, Xinghua Zhang, Chuan Zhou, Tingwen Liu, Li Guo
Dynamics modeling has been introduced as a novel paradigm in message passing (MP) of graph neural networks (GNNs). Existing methods consider MP between nodes as a heat diffusion process, and leverage heat equation to model the temporal evolution of nodes in the embedding space. However, heat equation can hardly depict the wave nature of graph signals in graph signal processing. Besides, heat equation is essentially a partial differential equation (PDE) involving a first partial derivative of time, whose numerical solution usually has low stability, and leads to inefficient model training. In this paper, we would like to depict more wave details in MP, since graph signals are essentially wave signals that can be seen as a superposition of a series of waves in the form of eigenvector. This motivates us to consider MP as a wave propagation process to capture the temporal evolution of wave signals in the space. Based on wave equation in physics, we innovatively develop a graph wave equation to leverage the wave propagation on graphs. In details, we demonstrate that the graph wave equation can be connected to traditional spectral GNNs, facilitating the design of graph wave networks based on various Laplacians and enhancing the performance of the spectral GNNs. Besides, the graph wave equation is particularly a PDE involving a second partial derivative of time, which has stronger stability on graphs than the heat equation that involves a first partial derivative of time. Additionally, we theoretically prove that the numerical solution derived from the graph wave equation are constantly stable, enabling to significantly enhance model efficiency while ensuring its performance. Extensive experiments show that GWNs achieve SOTA and efficient performance on benchmark datasets, and exhibit outstanding performance in addressing challenging graph problems, such as over-smoothing and heterophily.
nan
Article 310
Title@2025-05-29 (4): Offline Learning for Combinatorial Multi-armed Bandits
Title: Offline Learning for Combinatorial Multi-armed Bandits | Offline-Lernen für kombinatorische Multi-Armed Bandits | 多武装混合强盗离线学习 2501.19300v2 |
Authors: Xutong Liu, Xiangxiang Dai, Jinhang Zuo, Siwei Wang, Carlee Joe-Wong, John C. S. Lui, Wei Chen
The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on the online setting, overlooking the substantial costs of online interactions and the readily available offline datasets. To overcome these limitations, we introduce Off-CMAB, the first offline learning framework for CMAB. Central to our framework is the combinatorial lower confidence bound (CLCB) algorithm, which combines pessimistic reward estimations with combinatorial solvers. To characterize the quality of offline datasets, we propose two novel data coverage conditions and prove that, under these conditions, CLCB achieves a near-optimal suboptimality gap, matching the theoretical lower bound up to a logarithmic factor. We validate Off-CMAB through practical applications, including learning to rank, large language model (LLM) caching, and social influence maximization, showing its ability to handle nonlinear reward functions, general feedback models, and out-of-distribution action samples that excludes optimal or even feasible actions. Extensive experiments on synthetic and real-world datasets further highlight the superior performance of CLCB.
nan
Article 311
Title@2025-05-29 (4): An Empirical Study of Federated Prompt Learning for Vision Language Model
Title: An Empirical Study of Federated Prompt Learning for Vision Language Model | Eine empirische Studie über Federated Prompt Learning for Vision Language Model | 联邦快速学习促进愿景语言模式经验研究 2505.23024v1 |
Authors: Zhihao Wang, Wenke Huang, Tian Chen, Zekun Shi, Guancheng Wan, Yu Qiao, Bin Yang, Jian Wang, Bing Li, Mang Ye
The Vision Language Model (VLM) excels in aligning vision and language representations, and prompt learning has emerged as a key technique for adapting such models to downstream tasks. However, the application of prompt learning with VLM in federated learning (\fl{}) scenarios remains underexplored. This paper systematically investigates the behavioral differences between language prompt learning (LPT) and vision prompt learning (VPT) under data heterogeneity challenges, including label skew and domain shift. We conduct extensive experiments to evaluate the impact of various \fl{} and prompt configurations, such as client scale, aggregation strategies, and prompt length, to assess the robustness of Federated Prompt Learning (FPL). Furthermore, we explore strategies for enhancing prompt learning in complex scenarios where label skew and domain shift coexist, including leveraging both prompt types when computational resources allow. Our findings offer practical insights into optimizing prompt learning in federated settings, contributing to the broader deployment of VLMs in privacy-preserving environments.
nan
Article 312
Title@2025-05-29 (4): GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Title: GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning | GuardAgent: LLM-Agenten durch einen Guard Agent durch wissensgestützte Vernunft schützen | 警卫人员:由警卫人员通过 “ 知识化理由 “ 保护有限责任公司代理 2406.09187v3 |
Authors: Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li
The rapid advancement of large language model (LLM) agents has raised new concerns regarding their safety and security. In this paper, we propose GuardAgent, the first guardrail agent to protect target agents by dynamically checking whether their actions satisfy given safety guard requests. Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution. By performing the code execution, GuardAgent can deterministically follow the safety guard request and safeguard target agents. In both steps, an LLM is utilized as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module storing experiences from previous tasks. In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate the safety policies for web agents. We show that GuardAgent effectively moderates the violation actions for different types of agents on these two benchmarks with over 98% and 83% guardrail accuracies, respectively. Project page: https://guardagent.github.io/
nan
Article 313
Title@2025-05-29 (4): SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference
Title: SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference | SCORPIO: Den richtigen Anfragen zur richtigen Zeit für heterogene SLOs in LLM-Schlussfolgerung dienen | 在LLM推理中异基因性溶液的适当时间满足正确的要求 2505.23022v1 |
Authors: Yinghao Tang, Tingfeng Lan, Xiuqi Huang, Hui Lu, Wei Chen
Existing Large Language Model (LLM) serving systems prioritize maximum throughput. They often neglect Service Level Objectives (SLOs) such as Time to First Token (TTFT) and Time Per Output Token (TPOT), which leads to suboptimal SLO attainment. This paper introduces SCORPIO, an SLO-oriented LLM serving system designed to maximize system goodput and SLO attainment for workloads with heterogeneous SLOs. Our core insight is to exploit SLO heterogeneity for adaptive scheduling across admission control, queue management, and batch selection. SCORPIO features a TTFT Guard, which employs least-deadline-first reordering and rejects unattainable requests, and a TPOT Guard, which utilizes a VBS-based admission control and a novel credit-based batching mechanism. Both guards are supported by a predictive module. Evaluations demonstrate that SCORPIO improves system goodput by up to 14.4X and SLO adherence by up to 46.5% compared to state-of-the-art baselines.
nan
Article 314
Title@2025-05-29 (4): SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Title: SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models | SciHorizon: Benchmarking von KI-für-Science Readiness von wissenschaftlichen Daten zu großen Sprachmodellen | SciHorizon:将AI-SciHorizon科学准备程度从科学数据基准确定为大语言模式 2503.13503v3 |
Authors: Chuan Qin, Xin Chen, Chengrui Wang, Pengmin Wu, Xi Chen, Yihang Cheng, Jingyi Zhao, Meng Xiao, Xiangchao Dong, Qingqing Long, Boya Pan, Han Wu, Chengzan Li, Yuanchun Zhou, Hui Xiong, Hengshu Zhu
In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic and evolving field. However, there is still a lack of an effective framework for the overall assessment of AI4Science, particularly from a holistic perspective on data quality and model capability. Therefore, in this study, we propose SciHorizon, a comprehensive assessment framework designed to benchmark the readiness of AI4Science from both scientific data and LLM perspectives. First, we introduce a generalizable framework for assessing AI-ready scientific data, encompassing four key dimensions: Quality, FAIRness, Explainability, and Compliance-which are subdivided into 15 sub-dimensions. Drawing on data resource papers published between 2018 and 2023 in peer-reviewed journals, we present recommendation lists of AI-ready datasets for Earth, Life, and Materials Sciences, making a novel and original contribution to the field. Concurrently, to assess the capabilities of LLMs across multiple scientific disciplines, we establish 16 assessment dimensions based on five core indicators Knowledge, Understanding, Reasoning, Multimodality, and Values spanning Mathematics, Physics, Chemistry, Life Sciences, and Earth and Space Sciences. Using the developed benchmark datasets, we have conducted a comprehensive evaluation of over 50 representative open-source and closed source LLMs. All the results are publicly available and can be accessed online at www.scihorizon.cn/en.
nan
Article 315
Title@2025-05-29 (4): BECAME: BayEsian Continual Learning with Adaptive Model MErging
Title: BECAME: BayEsian Continual Learning with Adaptive Model MErging | BECAME: BayEsian Continual Learning mit adaptivem Modell-Merging | BECAME: 采用适应性示范招生模型的巴伊连续学习 2504.02666v2 |
Authors: Mei Li, Yuxiang Lu, Qinyan Dai, Suizhi Huang, Yue Ding, Hongtao Lu
Continual Learning (CL) strives to learn incrementally across tasks while mitigating catastrophic forgetting. A key challenge in CL is balancing stability (retaining prior knowledge) and plasticity (learning new tasks). While representative gradient projection methods ensure stability, they often limit plasticity. Model merging techniques offer promising solutions, but prior methods typically rely on empirical assumptions and carefully selected hyperparameters. In this paper, we explore the potential of model merging to enhance the stability-plasticity trade-off, providing theoretical insights that underscore its benefits. Specifically, we reformulate the merging mechanism using Bayesian continual learning principles and derive a closed-form solution for the optimal merging coefficient that adapts to the diverse characteristics of tasks. To validate our approach, we introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging. Extensive experiments show that our approach outperforms state-of-the-art CL methods and existing merging strategies.
nan
Article 316
Title@2025-05-29 (4): $K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting
Title: $K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting | $K^2$VAE: Ein Koopman-Kalman-Verbesserter Variations-AutoEncoder für probabilistische Zeitreihenprognosen | 2美元VAE: 概率时间序列预测的Koopman-Kalman增强变异自动编码器 2505.23017v1 |
Authors: Xingjian Wu, Xiangfei Qiu, Hongfan Gao, Jilin Hu, Bin Yang, Chenjuan Guo
Probabilistic Time Series Forecasting (PTSF) plays a crucial role in decision-making across various fields, including economics, energy, and transportation. Most existing methods excell at short-term forecasting, while overlooking the hurdles of Long-term Probabilistic Time Series Forecasting (LPTSF). As the forecast horizon extends, the inherent nonlinear dynamics have a significant adverse effect on prediction accuracy, and make generative models inefficient by increasing the cost of each iteration. To overcome these limitations, we introduce $K^2$VAE, an efficient VAE-based generative model that leverages a KoopmanNet to transform nonlinear time series into a linear dynamical system, and devises a KalmanNet to refine predictions and model uncertainty in such linear system, which reduces error accumulation in long-term forecasting. Extensive experiments demonstrate that $K^2$VAE outperforms state-of-the-art methods in both short- and long-term PTSF, providing a more efficient and accurate solution.
nan
Article 317
Title@2025-05-29 (4): Hyperbolic-PDE GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations
Title: Hyperbolic-PDE GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations | Hyperbolic-PDE GNN: Spektral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations | GNN: 从超曲偏偏部分异差系统的角度看待光谱图形神经网络 2505.23014v1 |
Authors: Juwei Yue, Haikuo Li, Jiawei Sheng, Xiaodong Li, Taoyu Su, Tingwen Liu, Li Guo
Graph neural networks (GNNs) leverage message passing mechanisms to learn the topological features of graph data. Traditional GNNs learns node features in a spatial domain unrelated to the topology, which can hardly ensure topological features. In this paper, we formulates message passing as a system of hyperbolic partial differential equations (hyperbolic PDEs), constituting a dynamical system that explicitly maps node representations into a particular solution space. This solution space is spanned by a set of eigenvectors describing the topological structure of graphs. Within this system, for any moment in time, a node features can be decomposed into a superposition of the basis of eigenvectors. This not only enhances the interpretability of message passing but also enables the explicit extraction of fundamental characteristics about the topological structure. Furthermore, by solving this system of hyperbolic partial differential equations, we establish a connection with spectral graph neural networks (spectral GNNs), serving as a message passing enhancement paradigm for spectral GNNs.We further introduce polynomials to approximate arbitrary filter functions. Extensive experiments demonstrate that the paradigm of hyperbolic PDEs not only exhibits strong flexibility but also significantly enhances the performance of various spectral GNNs across diverse graph tasks.
nan
Article 318
Title@2025-05-29 (4): SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
Title: SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting | SplitLoRA: Balance Stabilität und Plastizität im kontinuierlichen Lernen durch gradienten Raum Splitting | Split LoRA:通过逐步空间分割在持续学习中平衡稳定和可塑性 2505.22370v2 |
Authors: Haomiao Qiu, Miao Zhang, Ziyue Qiao, Weili Guan, Min Zhang, Liqiang Nie
Continual Learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks. Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two orthogonal subspaces: a primary subspace and a minor subspace. New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge. However, existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space. In this work, we consider a continual learning paradigm based on Low-Rank Adaptation, which has gained considerable attention due to its efficiency and wide applicability, and propose a novel approach for continual learning, called SplitLoRA. We first provide a theoretical analysis of how subspace partitioning affects model stability and plasticity. Informed by this analysis, we then introduce an effective method that derives the optimal partition of the gradient space for previously learned tasks. This approach effectively balances stability and plasticity in continual learning. Experimental results on multiple datasets demonstrate that the proposed method achieves state-of-the-art performance.
nan
Article 319
Title@2025-05-29 (4): Scalable Complexity Control Facilitates Reasoning Ability of LLMs
Title: Scalable Complexity Control Facilitates Reasoning Ability of LLMs | Skalierbare Komplexitätskontrolle erleichtert die Fähigkeit von LLMs, sich zu verankern | C. 便利理 理 动 利 利 利 利 商 利 利 利 利 利 商 利 利 利 利 利 商 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 利 2505.23013v1 |
Authors: Liangkai Hang, Junjie Yao, Zhiwei Bai, Tianyi Chen, Yang Chen, Rongjie Diao, Hezhou Li, Pengxiao Lin, Zhiwei Wang, Cheng Xu, Zhongwang Zhang, Zhangchen Zhou, Zhiyu Li, Zehao Lin, Kai Chen, Feiyu Xiong, Yaoyu Zhang, Weinan E, Hongkang Yang, Zhi-Qin John Xu
The reasoning ability of large language models (LLMs) has been rapidly advancing in recent years, attracting interest in more fundamental approaches that can reliably enhance their generalizability. This work demonstrates that model complexity control, conveniently implementable by adjusting the initialization rate and weight decay coefficient, improves the scaling law of LLMs consistently over varying model sizes and data sizes. This gain is further illustrated by comparing the benchmark performance of 2.4B models pretrained on 1T tokens with different complexity hyperparameters. Instead of fixing the initialization std, we found that a constant initialization rate (the exponent of std) enables the scaling law to descend faster in both model and data sizes. These results indicate that complexity control is a promising direction for the continual advancement of LLMs.
nan
Article 320
Title@2025-05-29 (4): BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
Title: BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models | BA-LoRA: Bias-Alleviating Low-Rank Anpassung an Mitigate Katastrophische Vererbung in großen Sprachmodellen | BA-LORA:在大语言模型中,对减轻灾害传承的低率适应 2408.04556v5 |
Authors: Yupeng Chang, Yi Chang, Yuan Wu
Large language models (LLMs) have demonstrated remarkable proficiency across various natural language processing (NLP) tasks. However, adapting LLMs to downstream applications requires computationally intensive and memory-demanding fine-tuning procedures. To alleviate these burdens, parameter-efficient fine-tuning (PEFT) techniques have emerged as a promising approach to tailor LLMs with minimal computational overhead. While PEFT methods offer substantial advantages, they do not fully address the pervasive issue of bias propagation from pre-training data. This work introduces Bias-Alleviating Low-Rank Adaptation (BA-LoRA), a novel PEFT method designed to counteract bias inheritance. BA-LoRA incorporates three distinct regularization terms: (1) a consistency regularizer, (2) a diversity regularizer, and (3) a singular value decomposition regularizer. These regularizers aim to enhance the models’ consistency, diversity, and generalization capabilities during fine-tuning. We conduct extensive experiments on natural language understanding (NLU) and natural language generation (NLG) tasks using prominent LLMs such as LLaMA, Mistral, and Gemma. The results demonstrate that BA-LoRA outperforms LoRA and its state-of-the-art variants. Moreover, the extended experiments demonstrate that our method effectively mitigates the adverse effects of pre-training bias, leading to more reliable and robust model outputs. The code is available at https://github.com/cyp-jlu-ai/BA-LoRA.
nan
Article 321
Title@2025-05-29 (4): EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
Title: EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge | EmergentTTS-Eval: Bewertung von TTS-Modellen auf komplexe Prosodic, Expressivität und sprachliche Herausforderungen mit Model-as-a-Judge | 新兴TTS-Eval:利用 “ 模拟即审法官 “ 评估关于复杂立案、表达性和语言挑战的TTS模型 2505.23009v1 |
Authors: Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi, Mu Li, Alex Smola
Text-to-Speech (TTS) benchmarks often fail to capture how well models handle nuanced and semantically complex text. Building on $\textit{EmergentTTS}$, we introduce $\textit{EmergentTTS-Eval}$, a comprehensive benchmark covering six challenging TTS scenarios: emotions, paralinguistics, foreign words, syntactic complexity, complex pronunciation (e.g. URLs, formulas), and questions. Crucially, our framework automates both test-case generation and evaluation, making the benchmark easily extensible. Starting from a small set of human-written seed prompts, we iteratively extend them using LLMs to target specific structural, phonetic and prosodic challenges, resulting in 1,645 diverse test cases. Moreover, we employ a model-as-a-judge approach, using a Large Audio Language Model (LALM) to assess the speech across multiple dimensions such as expressed emotion, prosodic, intonational, and pronunciation accuracy. We evaluate state-of-the-art open-source and proprietary TTS systems, such as 11Labs, Deepgram, and OpenAI’s 4o-mini-TTS, on EmergentTTS-Eval, demonstrating its ability to reveal fine-grained performance differences. Results show that the model-as-a-judge approach offers robust TTS assessment and a high correlation with human preferences. We open source the evaluation $\href{https://github.com/boson-ai/EmergentTTS-Eval-public}{code}$ and the $\href{https://huggingface.co/datasets/bosonai/EmergentTTS-Eval}{dataset}$.
nan
Article 322
Title@2025-05-29 (4): QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining
Title: QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining | QLIP: Eine dynamische Quadtree Vision verbessert die MLLM-Performance ohne Umschulung | QLIP: 动态的四方愿景,事先提高MLLM业绩,不再培训 2505.23004v1 |
Authors: Kyle R. Chickering, Bangzheng Li, Muhao Chen
Multimodal Large Language Models (MLLMs) encode images into visual tokens, aligning visual and textual signals within a shared latent space to facilitate crossmodal representation learning. The CLIP model is a widely adopted foundational vision language model whose vision encoder has played a critical role in the development of MLLMs such as LLaVA. However, the CLIP vision encoder suffers from notable limitations including being constrained to only handling fixed input resolutions and a failure to produce separated embeddings for dissimilar images. Replacing the vision encoder of an existing model typically incurs substantial computational costs because such a change often necessitates retraining the entire model pipeline. In this work, we identify two factors which underlie the limitations of the CLIP vision encoder: mesoscopic bias and interpolation bias. To address these issues, we propose QLIP, a drop-in replacement for CLIP that can be seamlessly integrated with existing MLLMs with only a few lines of code and can enhance both coarse-grained and fine-grained visual understanding, without re-training. QLIP is designed around an image quadtree which replaces the standard uniform grid patches with a novel content aware patchification. Our experimental results demonstrate that QLIP improves the general visual question answering accuracy of the LLaVA v1.5 model series across various model sizes–without requiring retraining or fine-tuning of the full MLLM. Notably, QLIP boosts detailed understanding performance on the challenging $V^{\ast}$ benchmark by up to 13.6 percent.
nan
Article 323
Title@2025-05-29 (4): Universal Sequence Preconditioning
Title: Universal Sequence Preconditioning | Universelle Sequenz Vorkonditionierung | 通用序列序序预设 2502.06545v2 |
Authors: Annie Marsden, Elad Hazan
We study the problem of preconditioning in the setting of sequential prediction. From the theoretical lens of linear dynamical systems, we show that applying a convolution to the input sequence translates to applying a polynomial to the unknown transition matrix in the hidden space. With this insight, we develop a novel preconditioning method that convolves the input sequence with the coefficients of the Chebyshev or Legendre polynomials. We formally prove that this improves the regret of two distinct prediction methods. Moreover, using this preconditioning technique on either method gives the first sublinear regret bounds that are also hidden dimension independent (up to logarithmic factors) even when the hidden transition matrix is asymmetric. From rigorous experiments on synthetic data we show that our simple preconditioning method generalizes to both 1) settings where the data is not from a linear dynamical system and 2) a broad range of learning algorithms, including recurrent neural networks.
nan
Article 324
Title@2025-05-29 (4): Hybrid Cross-domain Robust Reinforcement Learning
Title: Hybrid Cross-domain Robust Reinforcement Learning | Hybrides Cross-Domain Robustes Verstärkungslernen | 跨部门加强强化学习 2505.23003v1 |
Authors: Linh Le Pham Van, Minh Hoang Nguyen, Hung Le, Hung The Tran, Sunil Gupta
Robust reinforcement learning (RL) aims to learn policies that remain effective despite uncertainties in its environment, which frequently arise in real-world applications due to variations in environment dynamics. The robust RL methods learn a robust policy by maximizing value under the worst-case models within a predefined uncertainty set. Offline robust RL algorithms are particularly promising in scenarios where only a fixed dataset is available and new data cannot be collected. However, these approaches often require extensive offline data, and gathering such datasets for specific tasks in specific environments can be both costly and time-consuming. Using an imperfect simulator offers a faster, cheaper, and safer way to collect data for training, but it can suffer from dynamics mismatch. In this paper, we introduce HYDRO, the first Hybrid Cross-Domain Robust RL framework designed to address these challenges. HYDRO utilizes an online simulator to complement the limited amount of offline datasets in the non-trivial context of robust RL. By measuring and minimizing performance gaps between the simulator and the worst-case models in the uncertainty set, HYDRO employs novel uncertainty filtering and prioritized sampling to select the most relevant and reliable simulator samples. Our extensive experiments demonstrate HYDRO’s superior performance over existing methods across various tasks, underscoring its potential to improve sample efficiency in offline robust RL.
nan
Article 325
Title@2025-05-29 (4): Improved and Oracle-Efficient Online $\ell_1$-Multicalibration
Title: Improved and Oracle-Efficient Online $\ell_1$-Multicalibration | Verbesserte und Oracle-Effizient Online $\ell_1$-Multikalibrierung | 改进和 Oracle-Effacient 在线 $\ell_1美元-多边校准 2505.17365v2 |
Authors: Rohan Ghuge, Vidya Muthukumar, Sahil Singla
We study \emph{online multicalibration}, a framework for ensuring calibrated predictions across multiple groups in adversarial settings, across $T$ rounds. Although online calibration is typically studied in the $\ell_1$ norm, prior approaches to online multicalibration have taken the indirect approach of obtaining rates in other norms (such as $\ell_2$ and $\ell_{\infty}$) and then transferred these guarantees to $\ell_1$ at additional loss. In contrast, we propose a direct method that achieves improved and oracle-efficient rates of $\widetilde{\mathcal{O}}(T^{-1/3})$ and $\widetilde{\mathcal{O}}(T^{-1/4})$ respectively, for online $\ell_1$-multicalibration. Our key insight is a novel reduction of online (\ell_1)-multicalibration to an online learning problem with product-based rewards, which we refer to as \emph{online linear-product optimization} ($\mathtt{OLPO}$). To obtain the improved rate of $\widetilde{\mathcal{O}}(T^{-1/3})$, we introduce a linearization of $\mathtt{OLPO}$ and design a no-regret algorithm for this linearized problem. Although this method guarantees the desired sublinear rate (nearly matching the best rate for online calibration), it is computationally expensive when the group family (\mathcal{H}) is large or infinite, since it enumerates all possible groups. To address scalability, we propose a second approach to $\mathtt{OLPO}$ that makes only a polynomial number of calls to an offline optimization (\emph{multicalibration evaluation}) oracle, resulting in \emph{oracle-efficient} online (\ell_1)-multicalibration with a rate of $\widetilde{\mathcal{O}}(T^{-1/4})$. Our framework also extends to certain infinite families of groups (e.g., all linear functions on the context space) by exploiting a $1$-Lipschitz property of the (\ell_1)-multicalibration error with respect to (\mathcal{H}).
nan
Article 326
Title@2025-05-29 (4): Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning
Title: Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning | Dolphin: Ein programmierbares Framework für skalierbares neurosymbolisches Lernen | Dolphin: 可缩放的神经元学习程序框架 2410.03348v4 |
Authors: Aaditya Naik, Jason Liu, Claire Wang, Amish Sethi, Saikat Dutta, Mayur Naik, Eric Wong
Neurosymbolic learning enables the integration of symbolic reasoning with deep learning but faces significant challenges in scaling to complex symbolic programs, large datasets, or both. We introduce DOLPHIN, a framework that tackles these challenges by supporting neurosymbolic programs in Python, executing complex symbolic reasoning on the CPU while vectorizing probabilistic computations and gradient propagation on the GPU. Across 13 benchmarks spanning tasks over text, image, and video data, with symbolic reasoning features like recursion and black-box functions, DOLPHIN converges to state-of-the-art accuracies on the more complex benchmarks while existing frameworks such as Scallop, ISED, and IndeCateR+ fail to converge within the time limit. On simpler benchmarks, DOLPHIN matches their performance, while achieving these results 1.71x to 62x faster than the baselines. Overall, DOLPHIN advances the scalability of neurosymbolic frameworks, achieving state-of-the-art efficiency and convergence on difficult benchmarks where existing frameworks struggle. The code is published at https://github.com/Dolphin-NeSy/Dolphin.
nan
Article 327
Title@2025-05-29 (4): A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints
Title: A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints | Ein Bayesian Modellauswahl-Kriterium für die Auswahl von Vortrainings-Checkpoints | 选择培训前检查站的巴伊西亚示范甄选标准标准 2410.05612v2 |
Authors: Michael Munn, Susan Wei
Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers. These models are first pretrained on vast and diverse datasets and then adapted to specific downstream tasks, often with significantly less data. However, the mechanisms behind the success of this ubiquitous pretrain-then-adapt paradigm remain underexplored, particularly the characteristics of pretraining checkpoints that enhance downstream adaptation. We introduce a Bayesian model selection criterion, called the downstream free energy, which quantifies a checkpoint’s adaptability by measuring the concentration of nearby favorable parameters for the downstream task. We demonstrate that this Bayesian model selection criterion can be effectively implemented without access to the downstream data or prior knowledge of the downstream task. Furthermore, we provide empirical evidence that the criterion reliably correlates with improved finetuning performance, offering a principled approach to predicting model adaptability.
nan
Article 328
Title@2025-05-29 (4): HydraNet: Momentum-Driven State Space Duality for Multi-Granularity Tennis Tournaments Analysis
Title: HydraNet: Momentum-Driven State Space Duality for Multi-Granularity Tennis Tournaments Analysis | HydraNet: Momentum-getriebene State Space-Dualität für Multi-Granularity-Tennisturniere Analyse | HydraNet: 动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力-动力- 2505.21882v2 |
Authors: Ruijie Li, Xiang Zhao, Qiao Ning, Shikai Guo
In tennis tournaments, momentum, a critical yet elusive phenomenon, reflects the dynamic shifts in performance of athletes that can decisively influence match outcomes. Despite its significance, momentum in terms of effective modeling and multi-granularity analysis across points, games, sets, and matches in tennis tournaments remains underexplored. In this study, we define a novel Momentum Score (MS) metric to quantify a player’s momentum level in multi-granularity tennis tournaments, and design HydraNet, a momentum-driven state-space duality-based framework, to model MS by integrating thirty-two heterogeneous dimensions of athletes performance in serve, return, psychology and fatigue. HydraNet integrates a Hydra module, which builds upon a state-space duality (SSD) framework, capturing explicit momentum with a sliding-window mechanism and implicit momentum through cross-game state propagation. It also introduces a novel Versus Learning method to better enhance the adversarial nature of momentum between the two athletes at a macro level, along with a Collaborative-Adversarial Attention Mechanism (CAAM) for capturing and integrating intra-player and inter-player dynamic momentum at a micro level. Additionally, we construct a million-level tennis cross-tournament dataset spanning from 2012-2023 Wimbledon and 2013-2023 US Open, and validate the multi-granularity modeling capability of HydraNet for the MS metric on this dataset. Extensive experimental evaluations demonstrate that the MS metric constructed by the HydraNet framework provides actionable insights into how momentum impacts outcomes at different granularities, establishing a new foundation for momentum modeling and sports analysis. To the best of our knowledge, this is the first work to explore and effectively model momentum across multiple granularities in professional tennis tournaments.
nan
Article 329
Title@2025-05-29 (4): Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Title: Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment | Jenseits der Belohnung Hacking: Kausale Belohnungen für großsprachige Modellausrichtung | 优胜后加分:大语言模型对齐的因果奖励 2501.09620v2 |
Authors: Chaoqi Wang, Zhuokai Zhao, Yibo Jiang, Zhaorun Chen, Chen Zhu, Yuxin Chen, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, Hao Ma, Sinong Wang
Recent advances in large language models (LLMs) have demonstrated significant progress in performing complex tasks. While Reinforcement Learning from Human Feedback (RLHF) has been effective in aligning LLMs with human preferences, it is susceptible to spurious correlations in reward modeling. Consequently, it often introduces biases-such as length bias, sycophancy, conceptual bias, and discrimination-that hinder the model’s ability to capture true causal relationships. To address this, we propose a novel causal reward modeling approach that integrates causality to mitigate these spurious correlations. Our method enforces counterfactual invariance, ensuring reward predictions remain consistent when irrelevant variables are altered. Through experiments on both synthetic and real-world datasets, we show that our approach mitigates various types of spurious correlations effectively, resulting in more reliable and fair alignment of LLMs with human preferences. As a drop-in enhancement to the existing RLHF workflow, our causal reward modeling provides a practical way to improve the trustworthiness and fairness of LLM finetuning.
nan
Article 330
Title@2025-05-29 (4): ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Title: ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning | ReinFlow: Feinsteuerungs-Flow Matching-Politik mit Online-Verstärkungs-Lernen | ReinFlow: 与在线强化学习匹配流动政策的微调 2505.22094v2 |
Authors: Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy’s deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/
nan
Article 331
Title@2025-05-29 (4): Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement
Title: Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement | Ist Achtung für Transformer-Inferenz erforderlich? Erkunden Sie Funktionserhaltende Aufmerksamkeitsersatz | 需要注意吗? 探索功能保持注意替换 2505.21535v2 |
Authors: Yuxin Ren, Maxwell D Collins, Miao Hu, Huanrui Yang
While transformers excel across vision and language pretraining tasks, their reliance on attention mechanisms poses challenges for inference efficiency, especially on edge and embedded accelerators with limited parallelism and memory bandwidth. Hinted by the observed redundancy of attention at inference time, we hypothesize that though the model learns complicated token dependency through pretraining, the inference-time sequence-to-sequence mapping in each attention layer is actually ‘‘simple’’ enough to be represented with a much cheaper function. In this work, we explore FAR, a Function-preserving Attention Replacement framework that replaces all attention blocks in pretrained transformers with learnable sequence-to-sequence modules, exemplified by an LSTM. FAR optimize a multi-head LSTM architecture with a block-wise distillation objective and a global structural pruning framework to achieve a family of efficient LSTM-based models from pretrained transformers. We validate FAR on the DeiT vision transformer family and demonstrate that it matches the accuracy of the original models on ImageNet and multiple downstream tasks with reduced parameters and latency. Further analysis shows that FAR preserves the semantic token relationships and the token-to-token correlation learned in the transformer’s attention module.
nan
Article 332
Title@2025-05-29 (4): LLM Agents for Bargaining with Utility-based Feedback
Title: LLM Agents for Bargaining with Utility-based Feedback | LLM-Agenten für Schnäppchen mit Utility-basiertem Feedback | LLM 与基于利用的反馈进行交涉的代理代理 2505.22998v1 |
Authors: Jihwan Oh, Murad Aghazada, Se-Young Yun, Taehyeon Kim
Bargaining, a critical aspect of real-world interactions, presents challenges for large language models (LLMs) due to limitations in strategic depth and adaptation to complex human factors. Existing benchmarks often fail to capture this real-world complexity. To address this and enhance LLM capabilities in realistic bargaining, we introduce a comprehensive framework centered on utility-based feedback. Our contributions are threefold: (1) BargainArena, a novel benchmark dataset with six intricate scenarios (e.g., deceptive practices, monopolies) to facilitate diverse strategy modeling; (2) human-aligned, economically-grounded evaluation metrics inspired by utility theory, incorporating agent utility and negotiation power, which implicitly reflect and promote opponent-aware reasoning (OAR); and (3) a structured feedback mechanism enabling LLMs to iteratively refine their bargaining strategies. This mechanism can positively collaborate with in-context learning (ICL) prompts, including those explicitly designed to foster OAR. Experimental results show that LLMs often exhibit negotiation strategies misaligned with human preferences, and that our structured feedback mechanism significantly improves their performance, yielding deeper strategic and opponent-aware reasoning.
nan
Article 333
Title@2025-05-29 (4): Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features
Title: Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features | Theoretische Grundlagen des Deep Copula Klassifikators: Ein generativer Ansatz zur Modellierung abhängiger Merkmale | 深 Cocula 分类法理论基础:建模附属地貌的开创性方法 2505.22997v1 |
Authors: Agnideep Aich, Ashit Baran Aich, Bruce Wade
Traditional classifiers often assume feature independence or rely on overly simplistic relationships, leading to poor performance in settings where real-world dependencies matter. We introduce the Deep Copula Classifier (DCC), a generative model that separates the learning of each feature’s marginal distribution from the modeling of their joint dependence structure via neural network-parameterized copulas. For each class, lightweight neural networks are used to flexibly and adaptively capture feature interactions, making DCC particularly effective when classification is driven by complex dependencies. We establish that DCC converges to the Bayes-optimal classifier under standard conditions and provide explicit convergence rates of O(n^{-r/(2r + d)}) for r-smooth copula densities. Beyond theoretical guarantees, we outline several practical extensions, including high-dimensional scalability through vine and factor copula architectures, semi-supervised learning via entropy regularization, and online adaptation using streaming gradient methods. By unifying statistical rigor with the representational power of neural networks, DCC offers a mathematically grounded and interpretable framework for dependency-aware classification.
nan
Article 334
Title@2025-05-29 (4): Walking the Weight Manifold: a Topological Approach to Conditioning Inspired by Neuromodulation
Title: Walking the Weight Manifold: a Topological Approach to Conditioning Inspired by Neuromodulation | Wiege manifold gehen: ein topologischer Ansatz zur Konditionierung Inspiriert durch Neuromodulation | 身穿轻重背重力:在神经调节的启发下,从地形学角度处理条件问题 2505.22994v1 |
Authors: Ari S. Benjamin, Kyle Daruwalla, Christian Pehle, Anthony M. Zador
One frequently wishes to learn a range of similar tasks as efficiently as possible, re-using knowledge across tasks. In artificial neural networks, this is typically accomplished by conditioning a network upon task context by injecting context as input. Brains have a different strategy: the parameters themselves are modulated as a function of various neuromodulators such as serotonin. Here, we take inspiration from neuromodulation and propose to learn weights which are smoothly parameterized functions of task context variables. Rather than optimize a weight vector, i.e. a single point in weight space, we optimize a smooth manifold in weight space with a predefined topology. To accomplish this, we derive a formal treatment of optimization of manifolds as the minimization of a loss functional subject to a constraint on volumetric movement, analogous to gradient descent. During inference, conditioning selects a single point on this manifold which serves as the effective weight matrix for a particular sub-task. This strategy for conditioning has two main advantages. First, the topology of the manifold (whether a line, circle, or torus) is a convenient lever for inductive biases about the relationship between tasks. Second, learning in one state smoothly affects the entire manifold, encouraging generalization across states. To verify this, we train manifolds with several topologies, including straight lines in weight space (for conditioning on e.g. noise level in input data) and ellipses (for rotated images). Despite their simplicity, these parameterizations outperform conditioning identical networks by input concatenation and better generalize to out-of-distribution samples. These results suggest that modulating weights over low-dimensional manifolds offers a principled and effective alternative to traditional conditioning.
nan
Article 335
Title@2025-05-29 (4): Number of Clusters in a Dataset: A Regularized K-means Approach
Title: Number of Clusters in a Dataset: A Regularized K-means Approach | Anzahl der Cluster in einem Datensatz: Ein regularisierter K-Mittelansatz | 数据集中的组群数量:正规化的K手段方法 2505.22991v1 |
Authors: Behzad Kamgar-Parsi, Behrooz Kamgar-Parsi
Finding the number of meaningful clusters in an unlabeled dataset is important in many applications. Regularized k-means algorithm is a possible approach frequently used to find the correct number of distinct clusters in datasets. The most common formulation of the regularization function is the additive linear term $\lambda k$, where $k$ is the number of clusters and $\lambda$ a positive coefficient. Currently, there are no principled guidelines for setting a value for the critical hyperparameter $\lambda$. In this paper, we derive rigorous bounds for $\lambda$ assuming clusters are {\em ideal}. Ideal clusters (defined as $d$-dimensional spheres with identical radii) are close proxies for k-means clusters ($d$-dimensional spherically symmetric distributions with identical standard deviations). Experiments show that the k-means algorithm with additive regularizer often yields multiple solutions. Thus, we also analyze k-means algorithm with multiplicative regularizer. The consensus among k-means solutions with additive and multiplicative regularizations reduces the ambiguity of multiple solutions in certain cases. We also present selected experiments that demonstrate performance of the regularized k-means algorithms as clusters deviate from the ideal assumption.
nan
Article 336
Title@2025-05-29 (4): MenTeR: A fully-automated Multi-agenT workflow for end-to-end RF/Analog Circuits Netlist Design
Title: MenTeR: A fully-automated Multi-agenT workflow for end-to-end RF/Analog Circuits Netlist Design | MenTeR: Ein vollautomatisierter Multi-AgenT-Workflow für End-to-End-RF/Analog-Schaltungen Netlist Design | MenTeR: 终端至终端RF/Analog 电路网络列表设计全自动多元T工作流程 2505.22990v1 |
Authors: Pin-Han Chen, Yu-Sheng Lin, Wei-Cheng Lee, Tin-Yu Leu, Po-Hsiang Hsu, Anjana Dissanayake, Sungjin Oh, Chinq-Shiun Chiu
RF/Analog design is essential for bridging digital technologies with real-world signals, ensuring the functionality and reliability of a wide range of electronic systems. However, analog design procedures are often intricate, time-consuming and reliant on expert intuition, and hinder the time and cost efficiency of circuit development. To overcome the limitations of the manual circuit design, we introduce MenTeR - a multiagent workflow integrated into an end-to-end analog design framework. By employing multiple specialized AI agents that collaboratively address different aspects of the design process, such as specification understanding, circuit optimization, and test bench validation, MenTeR reduces the dependency on frequent trial-and-error-style intervention. MenTeR not only accelerates the design cycle time but also facilitates a broader exploration of the design space, demonstrating robust capabilities in handling real-world analog systems. We believe that MenTeR lays the groundwork for future “RF/Analog Copilots” that can collaborate seamlessly with human designers.
nan
Article 337
Title@2025-05-29 (4): Effects of Dropout on Performance in Long-range Graph Learning Tasks
Title: Effects of Dropout on Performance in Long-range Graph Learning Tasks | Auswirkungen des Dropouts auf die Leistungsfähigkeit in großflächigen Graphen-Lernaufgaben | 辍学对远程图表学习任务绩效的影响 2502.07364v2 |
Authors: Jasraj Singh, Keyue Jiang, Brooks Paige, Laura Toni
Message Passing Neural Networks (MPNNs) are a class of Graph Neural Networks (GNNs) that propagate information across the graph via local neighborhoods. The scheme gives rise to two key challenges: over-smoothing and over-squashing. While several Dropout-style algorithms, such as DropEdge and DropMessage, have successfully addressed over-smoothing, their impact on over-squashing remains largely unexplored. This represents a critical gap in the literature, as failure to mitigate over-squashing would make these methods unsuitable for long-range tasks – the intended use case of deep MPNNs. In this work, we study the aforementioned algorithms, and closely related edge-dropping algorithms – DropNode, DropAgg and DropGNN – in the context of over-squashing. We present theoretical results showing that DropEdge-variants reduce sensitivity between distant nodes, limiting their suitability for long-range tasks. To address this, we introduce DropSens, a sensitivity-aware variant of DropEdge that explicitly controls the proportion of information lost due to edge-dropping, thereby increasing sensitivity to distant nodes despite dropping the same number of edges. Our experiments on long-range synthetic and real-world datasets confirm the predicted limitations of existing edge-dropping and feature-dropping methods. Moreover, DropSens consistently outperforms graph rewiring techniques designed to mitigate over-squashing, suggesting that simple, targeted modifications can substantially improve a model’s ability to capture long-range interactions. Our conclusions highlight the need to re-evaluate and re-design existing methods for training deep GNNs, with a renewed focus on modelling long-range interactions.
nan
Article 338
Title@2025-05-29 (4): Model-Preserving Adaptive Rounding
Title: Model-Preserving Adaptive Rounding | Modellschonende adaptive Rundung | 模型保护适应性四舍五入 2505.22988v1 |
Authors: Albert Tseng, Zhaofeng Sun, Christopher De Sa
The main goal of post-training quantization (PTQ) is to produced a compressed model whose output distribution is as close to the original model’s as possible. To do this tractably, almost all LLM PTQ algorithms quantize linear layers by independently minimizing the immediate activation error. However, this localized objective ignores the effect of subsequent layers, so reducing it does not necessarily give a closer model. In this work, we introduce Yet Another Quantization Algorithm (YAQA), an adaptive rounding algorithm that uses Kronecker-factored approximations of each linear layer’s Hessian with respect to the \textit{full model} KL divergence. YAQA consists of two components: Kronecker-factored sketches of the full layerwise Hessian that can be tractably computed for hundred-billion parameter LLMs, and a quantizer-independent rounding algorithm that uses these sketches and comes with theoretical guarantees. Across a wide range of models and quantizers, YAQA empirically reduces the KL divergence to the original model by $\approx 30\%$ while achieving state of the art performance on downstream tasks.
nan
Article 339
Title@2025-05-29 (4): Knowledge Distillation for Reservoir-based Classifier: Human Activity Recognition
Title: Knowledge Distillation for Reservoir-based Classifier: Human Activity Recognition | Wissensdestillation für Reservoir-basierte Klassifikator: Menschliche Aktivitätserkennung | 以储量为基础的分类法知识蒸馏:人类活动认识 2505.22985v1 |
Authors: Masaharu Kagiyama, Tsuyoshi Okita
This paper aims to develop an energy-efficient classifier for time-series data by introducing PatchEchoClassifier, a novel model that leverages a reservoir-based mechanism known as the Echo State Network (ESN). The model is designed for human activity recognition (HAR) using one-dimensional sensor signals and incorporates a tokenizer to extract patch-level representations. To train the model efficiently, we propose a knowledge distillation framework that transfers knowledge from a high-capacity MLP-Mixer teacher to the lightweight reservoir-based student model. Experimental evaluations on multiple HAR datasets demonstrate that our model achieves over 80 percent accuracy while significantly reducing computational cost. Notably, PatchEchoClassifier requires only about one-sixth of the floating point operations (FLOPS) compared to DeepConvLSTM, a widely used convolutional baseline. These results suggest that PatchEchoClassifier is a promising solution for real-time and energy-efficient human activity recognition in edge computing environments.
nan
Article 340
Title@2025-05-29 (4): A Computational Approach to Improving Fairness in K-means Clustering
Title: A Computational Approach to Improving Fairness in K-means Clustering | Ein Computational Approach zur Verbesserung der Fairness im K-Mittel-Clustering | 改进K类手段分类组合的公平性计算方法 2505.22984v1 |
Authors: Guancheng Zhou, Haiping Xu, Hongkang Xu, Chenyu Li, Donghui Yan
The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation. Some cluster may have disproportionately more (or fewer) points from one of the subpopulations in terms of some sensitive variable, e.g., gender or race. Such a fairness issue may cause bias and unexpected social consequences. This work attempts to improve the fairness of K-means clustering with a two-stage optimization formulation–clustering first and then adjust cluster membership of a small subset of selected data points. Two computationally efficient algorithms are proposed in identifying those data points that are expensive for fairness, with one focusing on nearest data points outside of a cluster and the other on highly ‘mixed’ data points. Experiments on benchmark datasets show substantial improvement on fairness with a minimal impact to clustering quality. The proposed algorithms can be easily extended to a broad class of clustering algorithms or fairness metrics.
nan
Article 341
Title@2025-05-29 (4): MedRAX: Medical Reasoning Agent for Chest X-ray
Title: MedRAX: Medical Reasoning Agent for Chest X-ray | MedRAX: Medizinischer Reasoning Agent für Bruströntgen | MedraX: 胸前X光医疗理疗代理 2502.02673v2 |
Authors: Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo Wang
Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems. Data and code have been publicly available at https://github.com/bowang-lab/MedRAX
nan
Article 342
Title@2025-05-29 (4): Theoretical guarantees on the best-of-n alignment policy
Title: Theoretical guarantees on the best-of-n alignment policy | Theoretische Garantien für die optimale Ausrichtungspolitik | 关于最佳协调政策理论保障 2401.01879v3 |
Authors: Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D’Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh
A simple and effective method for the inference-time alignment and scaling test-time compute of generative models is best-of-$n$ sampling, where $n$ samples are drawn from a reference policy, ranked based on a reward function, and the highest ranking one is selected. A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the reference policy is equal to $\log (n) - (n-1)/n.$ We disprove the validity of this claim, and show that it is an upper bound on the actual KL divergence. We also explore the tightness of this upper bound in different regimes, and propose a new estimator for the KL divergence and empirically show that it provides a tight approximation. We also show that the win rate of the best-of-$n$ policy against the reference policy is upper bounded by $n/(n+1)$ and derive bounds on the tightness of this characterization. We conclude with analyzing the tradeoffs between win rate and KL divergence of the best-of-$n$ alignment policy, which demonstrate that very good tradeoffs are achievable with $n < 1000$.
nan
Article 343
Title@2025-05-29 (4): Learning coordinated badminton skills for legged manipulators
Title: Learning coordinated badminton skills for legged manipulators | Koordinierte Badminton-Fähigkeiten für Legged Manipulatoren lernen | 为腿脚操纵者学习协调的羽毛球技能 2505.22974v1 |
Authors: Yuntao Ma, Andrei Cramariuc, Farbod Farshidian, Marco Hutter
Coordinating the motion between lower and upper limbs and aligning limb control with perception are substantial challenges in robotics, particularly in dynamic environments. To this end, we introduce an approach for enabling legged mobile manipulators to play badminton, a task that requires precise coordination of perception, locomotion, and arm swinging. We propose a unified reinforcement learning-based control policy for whole-body visuomotor skills involving all degrees of freedom to achieve effective shuttlecock tracking and striking. This policy is informed by a perception noise model that utilizes real-world camera data, allowing for consistent perception error levels between simulation and deployment and encouraging learned active perception behaviors. Our method includes a shuttlecock prediction model, constrained reinforcement learning for robust motion control, and integrated system identification techniques to enhance deployment readiness. Extensive experimental results in a variety of environments validate the robot’s capability to predict shuttlecock trajectories, navigate the service area effectively, and execute precise strikes against human players, demonstrating the feasibility of using legged mobile manipulators in complex and dynamic sports scenarios.
nan
Article 344
Title@2025-05-29 (4): EquiReg: Equivariance Regularized Diffusion for Inverse Problems
Title: EquiReg: Equivariance Regularized Diffusion for Inverse Problems | EquiReg: Äquivarianz Regularisierte Diffusion für Inverse Probleme | equireg: 用于反向问题的公平、正规化传播 2505.22973v1 |
Authors: Bahareh Tolooshams, Aditi Chandrashekar, Rayhan Zirvi, Abbas Mammadov, Jiachen Yao, Chuwei Wang, Anima Anandkumar
Diffusion models represent the state-of-the-art for solving inverse problems such as image restoration tasks. In the Bayesian framework, diffusion-based inverse solvers incorporate a likelihood term to guide the prior sampling process, generating data consistent with the posterior distribution. However, due to the intractability of the likelihood term, many current methods rely on isotropic Gaussian approximations, which lead to deviations from the data manifold and result in inconsistent, unstable reconstructions. We propose Equivariance Regularized (EquiReg) diffusion, a general framework for regularizing posterior sampling in diffusion-based inverse problem solvers. EquiReg enhances reconstructions by reweighting diffusion trajectories and penalizing those that deviate from the data manifold. We define a new distribution-dependent equivariance error, empirically identify functions that exhibit low error for on-manifold samples and higher error for off-manifold samples, and leverage these functions to regularize the diffusion sampling process. When applied to a variety of solvers, EquiReg outperforms state-of-the-art diffusion models in both linear and nonlinear image restoration tasks, as well as in reconstructing partial differential equations.
nan
Article 345
Title@2025-05-29 (4): Minimal Sufficient Views: A DNN model making predictions with more evidence has higher accuracy
Title: Minimal Sufficient Views: A DNN model making predictions with more evidence has higher accuracy | Minimal Ausreichende Ansichten: Ein DNN-Modell, das Vorhersagen mit mehr Beweisen macht, hat höhere Genauigkeit | 最低限度的充分意见:一个DNN模型,用更多证据作出预测,其准确性更高 2402.01095v2 |
Authors: Keisuke Kawano, Takuro Kutsuna, Keisuke Sano
Deep neural networks (DNNs) exhibit high performance in image recognition; however, the reasons for their strong generalization abilities remain unclear. A plausible hypothesis is that DNNs achieve robust and accurate predictions by identifying multiple pieces of evidence from images. Thus, to test this hypothesis, this study proposed minimal sufficient views (MSVs). MSVs is defined as a set of minimal regions within an input image that are sufficient to preserve the prediction of DNNs, thus representing the evidence discovered by the DNN. We empirically demonstrated a strong correlation between the number of MSVs (i.e., the number of pieces of evidence) and the generalization performance of the DNN models. Remarkably, this correlation was found to hold within a single DNN as well as between different DNNs, including convolutional and transformer models. This suggested that a DNN model that makes its prediction based on more evidence has a higher generalization performance. We proposed a metric based on MSVs for DNN model selection that did not require label information. Consequently, we empirically showed that the proposed metric was less dependent on the degree of overfitting, rendering it a more reliable indicator of model performance than existing metrics, such as average confidence.
nan
Article 346
Title@2025-05-29 (4): MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming
Title: MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming | MermaidFlow: Neudefinition der agentischen Workflow-Generierung durch sicherheitsbeschränkte evolutionäre Programmierung | 美人鱼:通过受安全限制的进化方案拟订,重新确定干燥性工作流的产生 2505.22967v1 |
Authors: Chengqi Zheng, Jianda Chen, Yueming Lyu, Wen Zheng Terence Ng, Haopeng Zhang, Yew-Soon Ong, Ivor Tsang, Haiyan Yin
Despite the promise of autonomous agentic reasoning, existing workflow generation methods frequently produce fragile, unexecutable plans due to unconstrained LLM-driven construction. We introduce MermaidFlow, a framework that redefines the agentic search space through safety-constrained graph evolution. At its core, MermaidFlow represent workflows as a verifiable intermediate representation using Mermaid, a structured and human-interpretable graph language. We formulate domain-aware evolutionary operators, i.e., crossover, mutation, insertion, and deletion, to preserve semantic correctness while promoting structural diversity, enabling efficient exploration of a high-quality, statically verifiable workflow space. Without modifying task settings or evaluation protocols, MermaidFlow achieves consistent improvements in success rates and faster convergence to executable plans on the agent reasoning benchmark. The experimental results demonstrate that safety-constrained graph evolution offers a scalable, modular foundation for robust and interpretable agentic reasoning systems.
nan
Article 347
Title@2025-05-29 (4): Exploring Scaling Laws for EHR Foundation Models
Title: Exploring Scaling Laws for EHR Foundation Models | Erforschung von Skalierungsgesetzen für EHR-Stiftungsmodelle | 探索EHR基金会模式的扩展法律 2505.22964v1 |
Authors: Sheng Zhang, Qin Liu, Naoto Usuyama, Cliff Wong, Tristan Naumann, Hoifung Poon
The emergence of scaling laws has profoundly shaped the development of large language models (LLMs), enabling predictable performance gains through systematic increases in model size, dataset volume, and compute. Yet, these principles remain largely unexplored in the context of electronic health records (EHRs) – a rich, sequential, and globally abundant data source that differs structurally from natural language. In this work, we present the first empirical investigation of scaling laws for EHR foundation models. By training transformer architectures on patient timeline data from the MIMIC-IV database across varying model sizes and compute budgets, we identify consistent scaling patterns, including parabolic IsoFLOPs curves and power-law relationships between compute, model parameters, data size, and clinical utility. These findings demonstrate that EHR models exhibit scaling behavior analogous to LLMs, offering predictive insights into resource-efficient training strategies. Our results lay the groundwork for developing powerful EHR foundation models capable of transforming clinical prediction tasks and advancing personalized healthcare.
nan
Article 348
Title@2025-05-29 (4): INRFlow: Flow Matching for INRs in Ambient Space
Title: INRFlow: Flow Matching for INRs in Ambient Space | INRFlow: Flow Passend für INRs im Umgebungsraum | INFRFlow: 环境空间IRR的流量匹配 2412.03791v2 |
Authors: Yuyang Wang, Anurag Ranjan, Josh Susskind, Miguel Angel Bautista
Flow matching models have emerged as a powerful method for generative modeling on domains like images or videos, and even on irregular or unstructured data like 3D point clouds or even protein structures. These models are commonly trained in two stages: first, a data compressor is trained, and in a subsequent training stage a flow matching generative model is trained in the latent space of the data compressor. This two-stage paradigm sets obstacles for unifying models across data domains, as hand-crafted compressors architectures are used for different data modalities. To this end, we introduce INRFlow, a domain-agnostic approach to learn flow matching transformers directly in ambient space. Drawing inspiration from INRs, we introduce a conditionally independent point-wise training objective that enables INRFlow to make predictions continuously in coordinate space. Our empirical results demonstrate that INRFlow effectively handles different data modalities such as images, 3D point clouds and protein structure data, achieving strong performance in different domains and outperforming comparable approaches. INRFlow is a promising step towards domain-agnostic flow matching generative models that can be trivially adopted in different data domains.
nan
Article 349
Title@2025-05-29 (4): ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind
Title: ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind | ToMAP: Training Gegner-Bewusst LLM überzeugt mit Theorie des Geistes | ToMAP:培训有思想理论的对抗者软件软件LLM 2505.22961v1 |
Authors: Peixuan Han, Zijia Liu, Jiaxuan You
Large language models (LLMs) have shown promising potential in persuasion, but existing works on training LLM persuaders are still preliminary. Notably, while humans are skilled in modeling their opponent’s thoughts and opinions proactively and dynamically, current LLMs struggle with such Theory of Mind (ToM) reasoning, resulting in limited diversity and opponent awareness. To address this limitation, we introduce Theory of Mind Augmented Persuader (ToMAP), a novel approach for building more flexible persuader agents by incorporating two theory of mind modules that enhance the persuader’s awareness and analysis of the opponent’s mental state. Specifically, we begin by prompting the persuader to consider possible objections to the target central claim, and then use a text encoder paired with a trained MLP classifier to predict the opponent’s current stance on these counterclaims. Our carefully designed reinforcement learning schema enables the persuader learns how to analyze opponent-related information and utilize it to generate more effective arguments. Experiments show that the ToMAP persuader, while containing only 3B parameters, outperforms much larger baselines, like GPT-4o, with a relative gain of 39.4% across multiple persuadee models and diverse corpora. Notably, ToMAP exhibits complex reasoning chains and reduced repetition during training, which leads to more diverse and effective arguments. The opponent-aware feature of ToMAP also makes it suitable for long conversations and enables it to employ more logical and opponent-aware strategies. These results underscore our method’s effectiveness and highlight its potential for developing more persuasive language agents. Code is available at: https://github.com/ulab-uiuc/ToMAP.
nan
Article 350
Title@2025-05-29 (4): Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Title: Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness | Multi-Agenten-Debatte als Test-Time Scaling: Eine systematische Studie der bedingten Wirksamkeit | 重新审议作为试验时间尺度的多机构辩论:对有条件有效性的系统研究 2505.22960v1 |
Authors: Yongjin Yang, Euiin Yi, Jongwoo Ko, Kimin Lee, Zhijing Jin, Se-Young Yun
The remarkable growth in large language model (LLM) capabilities has spurred exploration into multi-agent systems, with debate frameworks emerging as a promising avenue for enhanced problem-solving. These multi-agent debate (MAD) approaches, where agents collaboratively present, critique, and refine arguments, potentially offer improved reasoning, robustness, and diverse perspectives over monolithic models. Despite prior studies leveraging MAD, a systematic understanding of its effectiveness compared to self-agent methods, particularly under varying conditions, remains elusive. This paper seeks to fill this gap by conceptualizing MAD as a test-time computational scaling technique, distinguished by collaborative refinement and diverse exploration capabilities. We conduct a comprehensive empirical investigation comparing MAD with strong self-agent test-time scaling baselines on mathematical reasoning and safety-related tasks. Our study systematically examines the influence of task difficulty, model scale, and agent diversity on MAD’s performance. Key findings reveal that, for mathematical reasoning, MAD offers limited advantages over self-agent scaling but becomes more effective with increased problem difficulty and decreased model capability, while agent diversity shows little benefit. Conversely, for safety tasks, MAD’s collaborative refinement can increase vulnerability, but incorporating diverse agent configurations facilitates a gradual reduction in attack success through the collaborative refinement process. We believe our findings provide critical guidance for the future development of more effective and strategically deployed MAD systems.
nan
Article 351
Title@2025-05-29 (4): Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View
Title: Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View | Enthüllen von Umweltauswirkungen von großsprachigen Modellen: Eine funktionale Einheitsansicht | 大型语文服务模式的不懈环境影响:职能单位观点 2502.11256v2 |
Authors: Yanran Wu, Inez Hua, Yi Ding
Large language models (LLMs) offer powerful capabilities but come with significant environmental impact, particularly in carbon emissions. Existing studies benchmark carbon emissions but lack a standardized basis for comparison across different model configurations. To address this, we introduce the concept of functional unit (FU) as a standardized basis and develop FUEL, the first FU-based framework for evaluating LLM serving’s environmental impact. Through three case studies, we uncover key insights and trade-offs in reducing carbon emissions by optimizing model size, quantization strategy, and hardware choice, paving the way for more sustainable LLM serving. The code is available at https://github.com/jojacola/FUEL.
nan
Article 352
Title@2025-05-29 (4): CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
Title: CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance | CodeSteer: Symbolisch-Augmentierte Sprachmodelle über Code/Text Anleitung | 代码器:通过编码/文本指导的代码/文本指导的代码器:代号辅助语言模式 2502.04350v2 |
Authors: Yongchao Chen, Yilun Hao, Yueying Liu, Yang Zhang, Chuchu Fan
Existing methods fail to effectively steer Large Language Models (LLMs) between textual reasoning and code generation, leaving symbolic computing capabilities underutilized. We introduce CodeSteer, an effective method for guiding LLM code/text generation. We construct a comprehensive benchmark SymBench comprising 37 symbolic tasks with adjustable complexity and also synthesize datasets of 12k multi-turn guidance/generation trajectories and 5.5k guidance comparison pairs. We fine-tune the Llama-3-8B model with a newly designed multi-turn supervised fine-tuning (SFT) and direct preference optimization (DPO). The resulting model, CodeSteerLLM, augmented with the proposed symbolic and self-answer checkers, effectively guides the code/text generation of larger models. Augmenting GPT-4o with CodeSteer raises its average performance score from 53.3 to 86.4, even outperforming the existing best LLM OpenAI o1 (82.7), o1-preview (74.8), and DeepSeek R1 (76.8) across all 37 tasks (28 seen, 9 unseen). Trained for GPT-4o, CodeSteer demonstrates superior generalizability, providing an average 41.8 performance boost on Claude, Mistral, and GPT-3.5. CodeSteer-guided LLMs fully harness symbolic computing to maintain strong performance on highly complex tasks. Models, Datasets, and Codes are available at https://github.com/yongchao98/CodeSteer-v1.0 and https://huggingface.co/yongchao98.
nan
Article 353
Title@2025-05-29 (4): Understanding Bias Reinforcement in LLM Agents Debate
Title: Understanding Bias Reinforcement in LLM Agents Debate | Verständnis der Bias-Verstärkung in LLM-Agenten-Debatte | 了解LLLM代理商的强化申请 2503.16814v2 |
Authors: Jihwan Oh, Minchan Jeong, Jongwoo Ko, Se-Young Yun
Large Language Models $($LLMs$)$ solve complex problems using training-free methods like prompt engineering and in-context learning, yet ensuring reasoning correctness remains challenging. While self-correction methods such as self-consistency and self-refinement aim to improve reliability, they often reinforce biases due to the lack of effective feedback mechanisms. Multi-Agent Debate $($MAD$)$ has emerged as an alternative, but we identify two key limitations: bias reinforcement, where debate amplifies model biases instead of correcting them, and lack of perspective diversity, as all agents share the same model and reasoning patterns, limiting true debate effectiveness. To systematically evaluate these issues, we introduce $\textit{MetaNIM Arena}$, a benchmark designed to assess LLMs in adversarial strategic decision-making, where dynamic interactions influence optimal decisions. To overcome MAD’s limitations, we propose $\textbf{DReaMAD}$ $($$\textbf{D}$iverse $\textbf{Rea}$soning via $\textbf{M}$ulti-$\textbf{A}$gent $\textbf{D}$ebate with Refined Prompt$)$, a novel framework that $(1)$ refines LLM’s strategic prior knowledge to improve reasoning quality and $(2)$ promotes diverse viewpoints within a single model by systematically modifying prompts, reducing bias. Empirical results show that $\textbf{DReaMAD}$ significantly improves decision accuracy, reasoning diversity, and bias mitigation across multiple strategic tasks, establishing it as a more effective approach for LLM-based decision-making.
nan
Article 354
Title@2025-05-29 (4): Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach
Title: Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach | Leistungsgarantie Vergiftung Angriffe im Föderierten Lernen: Ein Schiebemodus Ansatz | 联邦学习中保证中毒袭击的绩效:一种脱落模式方法 2505.16403v2 |
Authors: Huazi Pan, Yanjun Zhang, Leo Yu Zhang, Scott Adams, Abbas Kouzani, Suiyang Khoo
Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA) scheme, aiming at precisely introducing the extent of poisoning in a subtle controlled manner. It operates with a predefined objective, such as reducing global model’s prediction accuracy by 10%. FedSA integrates robust nonlinear control-Sliding Mode Control (SMC) theory with model poisoning attacks. It can manipulate the updates from malicious clients to drive the global model towards a compromised state, achieving this at a controlled and inconspicuous rate. Additionally, leveraging the robust control properties of FedSA allows precise control over the convergence bounds, enabling the attacker to set the global accuracy of the poisoned model to any desired level. Experimental results demonstrate that FedSA can accurately achieve a predefined global accuracy with fewer malicious clients while maintaining a high level of stealth and adjustable learning rates.
nan
Article 355
Title@2025-05-29 (4): CellFlux: Simulating Cellular Morphology Changes via Flow Matching
Title: CellFlux: Simulating Cellular Morphology Changes via Flow Matching | CellFlux: simulierende zelluläre Morphologie-Änderungen durch Flow Matching | 细胞通量:通过流动匹配模拟细胞生理变化 2502.09775v2 |
Authors: Yuhui Zhang, Yuchang Su, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, Serena Yeung-Levy
Building a virtual cell capable of accurately simulating cellular behaviors in silico has long been a dream in computational biology. We introduce CellFlux, an image-generative model that simulates cellular morphology changes induced by chemical and genetic perturbations using flow matching. Unlike prior methods, CellFlux models distribution-wise transformations from unperturbed to perturbed cell states, effectively distinguishing actual perturbation effects from experimental artifacts such as batch effects – a major challenge in biological data. Evaluated on chemical (BBBC021), genetic (RxRx1), and combined perturbation (JUMP) datasets, CellFlux generates biologically meaningful cell images that faithfully capture perturbation-specific morphological changes, achieving a 35% improvement in FID scores and a 12% increase in mode-of-action prediction accuracy over existing methods. Additionally, CellFlux enables continuous interpolation between cellular states, providing a potential tool for studying perturbation dynamics. These capabilities mark a significant step toward realizing virtual cell modeling for biomedical research. Project page: https://yuhui-zh15.github.io/CellFlux/.
nan
Article 356
Title@2025-05-29 (4): Directed Graph Grammars for Sequence-based Learning
Title: Directed Graph Grammars for Sequence-based Learning | Gezielte Graphen-Grammatik für sequenzbasiertes Lernen | 以序列为基础的学习方向图表语法 2505.22949v1 |
Authors: Michael Sun, Orion Foo, Gang Liu, Wojciech Matusik, Jie Chen
Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approach to constructing a principled, compact and equivalent sequential representation of a DAG. Specifically, we view a graph as derivations over an unambiguous grammar, where the DAG corresponds to a unique sequence of production rules. Equivalently, the procedure to construct such a description can be viewed as a lossless compression of the data. Such a representation has many uses, including building a generative model for graph generation, learning a latent space for property prediction, and leveraging the sequence representational continuity for Bayesian Optimization over structured data. Code is available at https://github.com/shiningsunnyday/induction.
nan
Article 357
Title@2025-05-28 (3): NegVQA: Can Vision Language Models Understand Negation?
Title: NegVQA: Can Vision Language Models Understand Negation? | NegVQA: Können Visions-Sprachmodelle Negation verstehen? | NegVQA:视觉语言模式能理解差吗? 2505.22946v1 |
Authors: Yuhui Zhang, Yuchang Su, Yiming Liu, Serena Yeung-Levy
Negation is a fundamental linguistic phenomenon that can entirely reverse the meaning of a sentence. As vision language models (VLMs) continue to advance and are deployed in high-stakes applications, assessing their ability to comprehend negation becomes essential. To address this, we introduce NegVQA, a visual question answering (VQA) benchmark consisting of 7,379 two-choice questions covering diverse negation scenarios and image-question distributions. We construct NegVQA by leveraging large language models to generate negated versions of questions from existing VQA datasets. Evaluating 20 state-of-the-art VLMs across seven model families, we find that these models struggle significantly with negation, exhibiting a substantial performance drop compared to their responses to the original questions. Furthermore, we uncover a U-shaped scaling trend, where increasing model size initially degrades performance on NegVQA before leading to improvements. Our benchmark reveals critical gaps in VLMs’ negation understanding and offers insights into future VLM development. Project page available at https://yuhui-zh15.github.io/NegVQA/.
nan
Article 358
Title@2025-05-28 (3): Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
Title: Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates | Kann LLMs CLIP deciive? Benchmarking Adversarial Compositionalität der vortrainierten multimodalen Darstellung über Textaktualisierungen | LLMs CLIP能否通过文本更新确定培训前多模式代表的反向构成基准? 2505.22943v1 |
Authors: Jaewoo Ahn, Heeseung Yun, Dayoon Ko, Gunhee Kim
While pre-trained multimodal representations (e.g., CLIP) have shown impressive capabilities, they exhibit significant compositional vulnerabilities leading to counterintuitive judgments. We introduce Multimodal Adversarial Compositionality (MAC), a benchmark that leverages large language models (LLMs) to generate deceptive text samples to exploit these vulnerabilities across different modalities and evaluates them through both sample-wise attack success rate and group-wise entropy-based diversity. To improve zero-shot methods, we propose a self-training approach that leverages rejection-sampling fine-tuning with diversity-promoting filtering, which enhances both attack success rate and sample diversity. Using smaller language models like Llama-3.1-8B, our approach demonstrates superior performance in revealing compositional vulnerabilities across various multimodal representations, including images, videos, and audios.
nan
Article 359
Title@2025-05-28 (3): Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
Title: Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified? | Sind Domain Generalization Benchmarks mit Genauigkeit auf der Zeile falsch angegeben? | 域通用基准与误标线的准确性是否一致? 2504.00186v2 |
Authors: Olawale Salaudeen, Nicole Chiou, Shiny Weng, Sanmi Koyejo
Spurious correlations are unstable statistical associations that hinder robust decision-making. Conventional wisdom suggests that models relying on such correlations will fail to generalize out-of-distribution (OOD), especially under strong distribution shifts. However, empirical evidence challenges this view as naive in-distribution empirical risk minimizers often achieve the best OOD accuracy across popular OOD generalization benchmarks. In light of these results, we propose a different perspective: many widely used benchmarks for evaluating robustness to spurious correlations are misspecified. Specifically, they fail to include shifts in spurious correlations that meaningfully impact OOD generalization, making them unsuitable for evaluating the benefit of removing such correlations. We establish conditions under which a distribution shift can reliably assess a model’s reliance on spurious correlations. Crucially, under these conditions, we should not observe a strong positive correlation between in-distribution and OOD accuracy, often called “accuracy on the line.” Yet, most state-of-the-art benchmarks exhibit this pattern, suggesting they do not effectively assess robustness. Our findings expose a key limitation in current benchmarks used to evaluate domain generalization algorithms, that is, models designed to avoid spurious correlations. We highlight the need to rethink how robustness to spurious correlations is assessed, identify well-specified benchmarks the field should prioritize, and enumerate strategies for designing future benchmarks that meaningfully reflect robustness under distribution shift.
nan
Article 360
Title@2025-05-28 (3): Generative Social Choice: The Next Generation
Title: Generative Social Choice: The Next Generation | Generative soziale Wahl: Die nächste Generation | 产生社会选择:下一代 2505.22939v1 |
Authors: Niclas Boehmer, Sara Fish, Ariel D. Procaccia
A key task in certain democratic processes is to produce a concise slate of statements that proportionally represents the full spectrum of user opinions. This task is similar to committee elections, but unlike traditional settings, the candidate set comprises all possible statements of varying lengths, and so it can only be accessed through specific queries. Combining social choice and large language models, prior work has approached this challenge through a framework of generative social choice. We extend the framework in two fundamental ways, providing theoretical guarantees even in the face of approximately optimal queries and a budget limit on the overall length of the slate. Using GPT-4o to implement queries, we showcase our approach on datasets related to city improvement measures and drug reviews, demonstrating its effectiveness in generating representative slates from unstructured user opinions.
nan
Article 361
Title@2025-05-28 (3): Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models
Title: Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models | Ist die Lärmkonditionierung notwendig? Eine einheitliche Theorie der Bedingungslosen Graphen-Diffusionsmodelle | 是否有必要设定噪音条件? 无条件图形扩散模型的统一理论 2505.22935v1 |
Authors: Jipeng Li, Yanning Shen
Explicit noise-level conditioning is widely regarded as essential for the effective operation of Graph Diffusion Models (GDMs). In this work, we challenge this assumption by investigating whether denoisers can implicitly infer noise levels directly from corrupted graph structures, potentially eliminating the need for explicit noise conditioning. To this end, we develop a theoretical framework centered on Bernoulli edge-flip corruptions and extend it to encompass more complex scenarios involving coupled structure-attribute noise. Extensive empirical evaluations on both synthetic and real-world graph datasets, using models such as GDSS and DiGress, provide strong support for our theoretical findings. Notably, unconditional GDMs achieve performance comparable or superior to their conditioned counterparts, while also offering reductions in parameters (4-6%) and computation time (8-10%). Our results suggest that the high-dimensional nature of graph data itself often encodes sufficient information for the denoising process, opening avenues for simpler, more efficient GDM architectures.
nan
Article 362
Title@2025-05-28 (3): Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging
Title: Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging | Unraveling LoRA Interferenz: Orthogonale Subräume für robuste Modellzusammenführung | 开放 LoRA 干涉度: 用于强力模型合并的正弦形子空间 2505.22934v1 |
Authors: Haobo Zhang, Jiayu Zhou
Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation. In this paper, we show that this issue arises from a previously overlooked interplay between model parameters and data distributions. We propose Orthogonal Subspaces for Robust model Merging (OSRM) to constrain the LoRA subspace prior to fine-tuning, ensuring that updates relevant to one task do not adversely shift outputs for others. Our approach can seamlessly integrate with most existing merging algorithms, reducing the unintended interference among tasks. Extensive experiments on eight datasets, tested with three widely used LMs and two large LMs, demonstrate that our method not only boosts merging performance but also preserves single-task accuracy. Furthermore, our approach exhibits greater robustness to the hyperparameters of merging. These results highlight the importance of data-parameter interaction in model merging and offer a plug-and-play solution for merging LoRA models.
nan
Article 363
Title@2025-05-28 (3): K-Paths: Reasoning over Graph Paths for Drug Repurposing and Drug Interaction Prediction
Title: K-Paths: Reasoning over Graph Paths for Drug Repurposing and Drug Interaction Prediction | K-Paths: Begründung über Graphenpfade für Drogenrepurposing und Drogeninteraktionsvorhersage | K-Paths: 以图解路径为依据进行药物再定位和药物相互作用预测 2502.13344v3 |
Authors: Tassallah Abdullahi, Ioanna Gemou, Nihal V. Nayak, Ghulam Murtaza, Stephen H. Bach, Carsten Eickhoff, Ritambhara Singh
Biomedical knowledge graphs (KGs) encode rich, structured information critical for drug discovery tasks, but extracting meaningful insights from large-scale KGs remains challenging due to their complex structure. Existing biomedical subgraph retrieval methods are tailored for graph neural networks (GNNs), limiting compatibility with other paradigms, including large language models (LLMs). We introduce K-Paths, a model-agnostic retrieval framework that extracts structured, diverse, and biologically meaningful multi-hop paths from dense biomedical KGs. These paths enable the prediction of unobserved drug-drug and drug-disease interactions, including those involving entities not seen during training, thus supporting inductive reasoning. K-Paths is training-free and employs a diversity-aware adaptation of Yen’s algorithm to extract the K shortest loopless paths between entities in a query, prioritizing biologically relevant and relationally diverse connections. These paths serve as concise, interpretable reasoning chains that can be directly integrated with LLMs or GNNs to improve generalization, accuracy, and enable explainable inference. Experiments on benchmark datasets show that K-Paths improves zero-shot reasoning across state-of-the-art LLMs. For instance, Tx-Gemma 27B improves by 19.8 and 4.0 F1 points on interaction severity prediction and drug repurposing tasks, respectively. Llama 70B achieves gains of 8.5 and 6.2 points on the same tasks. K-Paths also boosts the training efficiency of EmerGNN, a state-of-the-art GNN, by reducing the KG size by 90% while maintaining predictive performance. Beyond efficiency, K-Paths bridges the gap between KGs and LLMs, enabling scalable and explainable LLM-augmented scientific discovery. We release our code and the retrieved paths as a benchmark for inductive reasoning.
nan
Article 364
Title@2025-05-28 (3): How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Title: How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias | Wie Transformer lernen Regelmäßige Spracherkennung: Eine theoretische Studie über Trainingsdynamik und Implizite Bias | 变换人如何学习常规语言识别:关于培训动态和隐含偏见的理论研究 2505.00926v3 |
Authors: Ruiquan Huang, Yingbin Liang, Jing Yang
Language recognition tasks are fundamental in natural language processing (NLP) and have been widely used to benchmark the performance of large language models (LLMs). These tasks also play a crucial role in explaining the working mechanisms of transformers. In this work, we focus on two representative tasks in the category of regular language recognition, known as even pairs' and
parity check’, the aim of which is to determine whether the occurrences of certain subsequences in a given sequence are even. Our goal is to explore how a one-layer transformer, consisting of an attention layer followed by a linear layer, learns to solve these tasks by theoretically analyzing its training dynamics under gradient descent. While even pairs can be solved directly by a one-layer transformer, parity check need to be solved by integrating Chain-of-Thought (CoT), either into the inference stage of a transformer well-trained for the even pairs task, or into the training of a one-layer transformer. For both problems, our analysis shows that the joint training of attention and linear layers exhibits two distinct phases. In the first phase, the attention layer grows rapidly, mapping data sequences into separable vectors. In the second phase, the attention layer becomes stable, while the linear layer grows logarithmically and approaches in direction to a max-margin hyperplane that correctly separates the attention layer outputs into positive and negative samples, and the loss decreases at a rate of $O(1/t)$. Our experiments validate those theoretical results.
nan
Article 365
Title@2025-05-28 (3): Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking
Title: Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking | Skalierbare Parameter und Speicher Effizientes Vortraining für LLM: Algorithmische Fortschritte und Benchmarking | LLM的可缩放参数和记忆高效预修培训:最近的演算进展和基准 2505.22922v1 |
Authors: Athanasios Glentis, Jiaxiang Li, Qiulin Shang, Andi Han, Ioannis Tsaknakis, Quan Wei, Mingyi Hong
Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by substantial computational challenges, particularly regarding the memory and compute resources required for training and fine-tuning. Numerous approaches have been explored to address these issues, such as LoRA. While these methods are effective for fine-tuning, their application to pre-training is significantly more challenging due to the need to learn vast datasets. Motivated by this issue, we aim to address the following questions: Can parameter- or memory-efficient methods enhance pre-training efficiency while achieving performance comparable to full-model training? How can the performance gap be narrowed? To this end, the contributions of this work are the following. (1) We begin by conducting a comprehensive survey that summarizes state-of-the-art methods for efficient pre-training. (2) We perform a benchmark evaluation of several representative memory efficient pre-training approaches to comprehensively evaluate their performance across model sizes. We observe that with a proper choice of optimizer and hyperparameters, full-rank training delivers the best performance, as expected. We also notice that incorporating high-rank updates in low-rank approaches is the key to improving their performance. (3) Finally, we propose two practical techniques, namely weight refactorization and momentum reset, to enhance the performance of efficient pre-training methods. We observe that applying these techniques to the low-rank method (on a 1B model) can achieve a lower perplexity than popular memory efficient algorithms such as GaLore and Fira, while simultaneously using about 25% less memory.
nan
Article 366
Title@2025-05-28 (3): Unlocking Mental Health: Exploring College Students’ Well-being through Smartphone Behaviors
Title: Unlocking Mental Health: Exploring College Students’ Well-being through Smartphone Behaviors | Entsperren der psychischen Gesundheit: Erforschen des Wohlbefindens der Studenten durch Smartphone-Verhalten | 解锁心理健康:通过智能手机行为探索大学生福祉 2502.08766v2 |
Authors: Wei Xuan, Meghna Roy Chowdhury, Yi Ding, Yixue Zhao
The global mental health crisis is a pressing concern, with college students particularly vulnerable to rising mental health disorders. The widespread use of smartphones among young adults, while offering numerous benefits, has also been linked to negative outcomes such as addiction and regret, significantly impacting well-being. Leveraging the longest longitudinal dataset collected over four college years through passive mobile sensing, this study is the first to examine the relationship between students’ smartphone unlocking behaviors and their mental health at scale in real-world settings. We provide the first evidence demonstrating the predictability of phone unlocking behaviors for mental health outcomes based on a large dataset, highlighting the potential of these novel features for future predictive models. Our findings reveal important variations in smartphone usage across genders and locations, offering a deeper understanding of the interplay between digital behaviors and mental health. We highlight future research directions aimed at mitigating adverse effects and promoting digital well-being in this population.
nan
Article 367
Title@2025-05-28 (3): Enhancing Semi-supervised Learning with Zero-shot Pseudolabels
Title: Enhancing Semi-supervised Learning with Zero-shot Pseudolabels | Halbbeaufsichtigtes Lernen mit Null-Shot-Pseudo-Labels verbessern | 用零弹Pseudo标签加强半监督的学习 2502.12584v2 |
Authors: Jichan Chung, Irene Y. Chen
The high cost of data labeling presents a major barrier to deploying machine learning systems at scale. Semi-supervised learning (SSL) mitigates this challenge by utilizing unlabeled data alongside limited labeled examples, while the emergence of foundation models (FMs) offers powerful zero-shot capabilities that can further reduce labeling cost. However, directly fine-tuning large FMs is often impractical in resource-constrained settings, and na"ively using their pseudo-labels for unlabeled data can degrade performance due to its unreliablity or domain mismatch with target task. In this work, we introduce ZeroMatch, a novel SSL framework that integrates knowledge distillation with consistency-based learning to jointly leverage labeled data, unlabeled data, and pseudo-labels from FMs. ZeroMatch enables training compact student models using only FM inference, making it suitable for low-resource environments such as personal devices with limited compute. Experiments on six vision and language classification benchmarks show that ZeroMatch consistently outperforms standard SSL and zero-shot augmented methods, demonstrating its effectiveness and robustness across a range of foundation model qualities.
nan
Article 368
Title@2025-05-28 (3): cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning
Title: cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning | cadrille: Multimodale CAD-Rekonstruktion mit Online-Verstärkung | 与在线强化学习相结合的多模式 CAD重建 2505.22914v1 |
Authors: Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna Vorontsova, Anton Konushin, Vladislav Kurenkov, Danila Rukhovich
Computer-Aided Design (CAD) plays a central role in engineering and manufacturing, making it possible to create precise and editable 3D models. Using a variety of sensor or user-provided data as inputs for CAD reconstruction can democratize access to design applications. However, existing methods typically focus on a single input modality, such as point clouds, images, or text, which limits their generalizability and robustness. Leveraging recent advances in vision-language models (VLM), we propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities. Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically. Furthermore, we are the first to explore RL fine-tuning of LLMs for CAD tasks demonstrating that online RL algorithms such as Group Relative Preference Optimization (GRPO) outperform offline alternatives. In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously. More importantly, after RL fine-tuning, cadrille sets new state-of-the-art on three challenging datasets, including a real-world one.
nan
Article 369
Title@2025-05-28 (3): Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Title: Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference | Mustafar: Förderung unstrukturierter Sparsamkeit für KV Cache Pruning in LLM Inferenz | Mustafar:在LLM推理中促进KV Cache Pruning的无结构平衡 2505.22913v1 |
Authors: Donghyeon Joo, Helya Hosseini, Ramyad Hadidi, Bahar Asgari
We demonstrate that unstructured sparsity significantly improves KV cache compression for LLMs, enabling sparsity levels up to 70% without compromising accuracy or requiring fine-tuning. We conduct a systematic exploration of pruning strategies and find per-token magnitude-based pruning as highly effective for both Key and Value caches under unstructured sparsity, surpassing prior structured pruning schemes. The Key cache benefits from prominent outlier elements, while the Value cache surprisingly benefits from a simple magnitude-based pruning despite its uniform distribution. KV cache size is the major bottleneck in decode performance due to high memory overhead for large context lengths. To address this, we use a bitmap-based sparse format and a custom attention kernel capable of compressing and directly computing over compressed caches pruned to arbitrary sparsity patterns, significantly accelerating memory-bound operations in decode computations and thereby compensating for the overhead of runtime pruning and compression. Our custom attention kernel coupled with the bitmap-based format delivers substantial compression of KV cache upto 45% of dense inference and thereby enables longer context length and increased tokens/sec throughput of upto 2.23x compared to dense inference. Our pruning mechanism and sparse attention kernel is available at https://github.com/dhjoo98/mustafar.
nan
Article 370
Title@2025-05-28 (3): GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Title: GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation | GraphEval: Ein leichter Graph-basierter LLM-Rahmen für die Idee-Evaluierung | 图图Eval:基于轻量图图的理论评估LLM框架 2503.12600v2 |
Authors: Tao Feng, Yihang Sun, Jiaxuan You
The powerful capabilities of Large Language Models (LLMs) have led to their growing use in evaluating human-generated content, particularly in evaluating research ideas within academic settings. Existing solutions primarily rely on prompt-based LLM methods or fine-tuned lightweight language models for idea evaluation. However, these methods are often unstable and struggle to comprehend the complex semantic information embedded in the ideas, impeding their ability to perform high-quality evaluations. To address the above challenges, we propose GraphEval, a lightweight graph-based LLM framework for idea evaluation. Our insight is that a complex idea can be broken down into comprehensible viewpoint nodes using prompts from small LLMs. These viewpoint nodes can then be linked together through edges created from LLM-based relation extraction and/or BERT similarity scores. The created viewpoint-graph can be used to conveniently propagate scores across view-nodes to improve the robustness of the idea evaluations. In particular, we propose two lightweight graph-based methods for idea evaluation: (1) GraphEval-LP: a training-free label propagation algorithm that propagates evaluation scores from known view-nodes to unknown nodes; (2) GraphEval-GNN: a Graph Neural Networks (GNN) that is trained to predict the evaluation scores given the observed graph with minimal computation resources. Moreover, to overcome LLM’s limitation in objectively assessing the novelty of ideas, we further propose a novelty detection model to GraphEval-GNN to enhance its capability in judging idea novelty. Experiments on two datasets show GraphEval improves F1 scores by at least 14% with low computation and API costs. Additionally, GraphEval can effectively detect plagiarized ideas.
nan
Article 371
Title@2025-05-28 (3): Ensuring User-side Fairness in Dynamic Recommender Systems
Title: Ensuring User-side Fairness in Dynamic Recommender Systems | Gewährleistung der benutzerseitigen Fairness in dynamischen Recommender-Systemen | 确保动态建议系统在用户方面的公平公正 2308.15651v3 |
Authors: Hyunsik Yoo, Zhichen Zeng, Jian Kang, Ruizhong Qiu, David Zhou, Zhining Liu, Fei Wang, Charlie Xu, Eunice Chan, Hanghang Tong
User-side group fairness is crucial for modern recommender systems, aiming to alleviate performance disparities among user groups defined by sensitive attributes like gender, race, or age. In the ever-evolving landscape of user-item interactions, continual adaptation to newly collected data is crucial for recommender systems to stay aligned with the latest user preferences. However, we observe that such continual adaptation often exacerbates performance disparities. This necessitates a thorough investigation into user-side fairness in dynamic recommender systems, an area that has been unexplored in the literature. This problem is challenging due to distribution shifts, frequent model updates, and non-differentiability of ranking metrics. To our knowledge, this paper presents the first principled study on ensuring user-side fairness in dynamic recommender systems. We start with theoretical analyses on fine-tuning v.s. retraining, showing that the best practice is incremental fine-tuning with restart. Guided by our theoretical analyses, we propose FAir Dynamic rEcommender (FADE), an end-to-end fine-tuning framework to dynamically ensure user-side fairness over time. To overcome the non-differentiability of recommendation metrics in the fairness loss, we further introduce Differentiable Hit (DH) as an improvement over the recent NeuralNDCG method, not only alleviating its gradient vanishing issue but also achieving higher efficiency. Besides that, we also address the instability issue of the fairness loss by leveraging the competing nature between the recommendation loss and the fairness loss. Through extensive experiments on real-world datasets, we demonstrate that FADE effectively and efficiently reduces performance disparities with little sacrifice in the overall recommendation performance.
nan
Article 372
Title@2025-05-28 (3): SP2RINT: Spatially-Decoupled Physics-Inspired Progressive Inverse Optimization for Scalable, PDE-Constrained Meta-Optical Neural Network Training
Title: SP2RINT: Spatially-Decoupled Physics-Inspired Progressive Inverse Optimization for Scalable, PDE-Constrained Meta-Optical Neural Network Training | SP2RINT: Spatially-Decoupled Physics-Inspired Progressive Inverse Optimization für skalierbare, PDE-Constrained Meta-Optical Neural Network Training | SP2RINT: 空间-减速物理激励-渐进式反向优化,用于可缩放、PDE-受培训的元神经网络培训 2505.18377v2 |
Authors: Pingchuan Ma, Ziang Yin, Qi Jing, Zhengqi Gao, Nicholas Gangi, Boyang Zhang, Tsung-Wei Huang, Zhaoran Huang, Duane S. Boning, Yu Yao, Jiaqi Gu
DONNs leverage light propagation for efficient analog AI and signal processing. Advances in nanophotonic fabrication and metasurface-based wavefront engineering have opened new pathways to realize high-capacity DONNs across various spectral regimes. Training such DONN systems to determine the metasurface structures remains challenging. Heuristic methods are fast but oversimplify metasurfaces modulation, often resulting in physically unrealizable designs and significant performance degradation. Simulation-in-the-loop optimizes implementable metasurfaces via adjoint methods, but is computationally prohibitive and unscalable. To address these limitations, we propose SP2RINT, a spatially decoupled, progressive training framework that formulates DONN training as a PDE-constrained learning problem. Metasurface responses are first relaxed into freely trainable transfer matrices with a banded structure. We then progressively enforce physical constraints by alternating between transfer matrix training and adjoint-based inverse design, avoiding per-iteration PDE solves while ensuring final physical realizability. To further reduce runtime, we introduce a physics-inspired, spatially decoupled inverse design strategy based on the natural locality of field interactions. This approach partitions the metasurface into independently solvable patches, enabling scalable and parallel inverse design with system-level calibration. Evaluated across diverse DONN training tasks, SP2RINT achieves digital-comparable accuracy while being 1825 times faster than simulation-in-the-loop approaches. By bridging the gap between abstract DONN models and implementable photonic hardware, SP2RINT enables scalable, high-performance training of physically realizable meta-optical neural systems. Our code is available at https://github.com/ScopeX-ASU/SP2RINT
nan
Article 373
Title@2025-05-28 (3): Defining Foundation Models for Computational Science: A Call for Clarity and Rigor
Title: Defining Foundation Models for Computational Science: A Call for Clarity and Rigor | Fundamentalmodelle für die Computerwissenschaft definieren: Ein Ruf nach Klarheit und Starrheit | 界定计算科学基础模型:要求明确和严格 2505.22904v1 |
Authors: Youngsoo Choi, Siu Wun Cheung, Youngkyu Kim, Ping-Hsuan Tsai, Alejandro N. Diaz, Ivan Zanardi, Seung Whan Chung, Dylan Matthew Copeland, Coleman Kendrick, William Anderson, Traian Iliescu, Matthias Heinkenschloss
The widespread success of foundation models in natural language processing and computer vision has inspired researchers to extend the concept to scientific machine learning and computational science. However, this position paper argues that as the term “foundation model” is an evolving concept, its application in computational science is increasingly used without a universally accepted definition, potentially creating confusion and diluting its precise scientific meaning. In this paper, we address this gap by proposing a formal definition of foundation models in computational science, grounded in the core values of generality, reusability, and scalability. We articulate a set of essential and desirable characteristics that such models must exhibit, drawing parallels with traditional foundational methods, like the finite element and finite volume methods. Furthermore, we introduce the Data-Driven Finite Element Method (DD-FEM), a framework that fuses the modular structure of classical FEM with the representational power of data-driven learning. We demonstrate how DD-FEM addresses many of the key challenges in realizing foundation models for computational science, including scalability, adaptability, and physics consistency. By bridging traditional numerical methods with modern AI paradigms, this work provides a rigorous foundation for evaluating and developing novel approaches toward future foundation models in computational science.
nan
Article 374
Title@2025-05-28 (3): Norm-Bounded Low-Rank Adaptation
Title: Norm-Bounded Low-Rank Adaptation | Normgebundene Low-Rank-Anpassung | 适应性 2501.19050v3 |
Authors: Ruigang Wang, Krishnamurthy Dvijotham, Ian R. Manchester
In this work, we propose norm-bounded low-rank adaptation (NB-LoRA) for parameter-efficient fine tuning. NB-LoRA is a novel parameterization of low-rank weight adaptations that admits explicit bounds on each singular value of the adaptation matrix, which can thereby satisfy any prescribed unitarily invariant norm bound, including the Schatten norms (e.g., nuclear, Frobenius, spectral norm). The proposed parameterization is unconstrained, smooth, and complete, i.e. it covers all matrices satisfying the prescribed rank and singular-value bounds. Comparative experiments on large language models show that NB-LoRA achieves superior adaptation performance and faster training over a range of models, tasks and ranks. Vision fine-tuning experiments show that NB-LoRA can achieve strong adaptation performance while avoiding model catastrophic forgetting, and compared to existing approaches it is substantially more robust to a hyper-parameters such as including adaptation rank, learning rate and number of training epochs.
nan
Article 375
Title@2025-05-28 (3): On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning
Title: On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning | Zum dynamischen Bedauern, dem regularisierten Führer zu folgen: Optimismus mit Geschichtsveredelung | 在追赶正规领导人之后的强烈遗憾:对历史的乐观态度 2505.22899v1 |
Authors: Naram Mhaisen, George Iosifidis
We revisit the Follow the Regularized Leader (FTRL) framework for Online Convex Optimization (OCO) over compact sets, focusing on achieving dynamic regret guarantees. Prior work has highlighted the framework’s limitations in dynamic environments due to its tendency to produce “lazy” iterates. However, building on insights showing FTRL’s ability to produce “agile” iterates, we show that it can indeed recover known dynamic regret bounds through optimistic composition of future costs and careful linearization of past costs, which can lead to pruning some of them. This new analysis of FTRL against dynamic comparators yields a principled way to interpolate between greedy and agile updates and offers several benefits, including refined control over regret terms, optimism without cyclic dependence, and the application of minimal recursive regularization akin to AdaFTRL. More broadly, we show that it is not the lazy projection style of FTRL that hinders (optimistic) dynamic regret, but the decoupling of the algorithm’s state (linearized history) from its iterates, allowing the state to grow arbitrarily. Instead, pruning synchronizes these two when necessary.
nan
Article 376
Title@2025-05-28 (3): The Geometry of ReLU Networks through the ReLU Transition Graph
Title: The Geometry of ReLU Networks through the ReLU Transition Graph | Die Geometrie von ReLU-Netzwerken durch den ReLU-Übergangsgraphen | 通过 ReLU 过渡图绘制 ReLU 网络的几何图 2505.11692v2 |
Authors: Sahil Rajesh Dhayalkar
We develop a novel theoretical framework for analyzing ReLU neural networks through the lens of a combinatorial object we term the ReLU Transition Graph (RTG). In this graph, each node corresponds to a linear region induced by the network’s activation patterns, and edges connect regions that differ by a single neuron flip. Building on this structure, we derive a suite of new theoretical results connecting RTG geometry to expressivity, generalization, and robustness. Our contributions include tight combinatorial bounds on RTG size and diameter, a proof of RTG connectivity, and graph-theoretic interpretations of VC-dimension. We also relate entropy and average degree of the RTG to generalization error. Each theoretical result is rigorously validated via carefully controlled experiments across varied network depths, widths, and data regimes. This work provides the first unified treatment of ReLU network structure via graph theory and opens new avenues for compression, regularization, and complexity control rooted in RTG analysis.
nan
Article 377
Title@2025-05-28 (3): Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory
Title: Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory | Neurale Netzwerke als universelle Finite-State-Maschinen: Eine konstruktive Deterministische Finite-Automaten-Theorie | 神经网络作为普遍有限国家机器:具有建设性决定作用的有限自定义理论 2505.11694v2 |
Authors: Sahil Rajesh Dhayalkar
We present a complete theoretical and empirical framework establishing feedforward neural networks as universal finite-state machines (N-FSMs). Our results prove that finite-depth ReLU and threshold networks can exactly simulate deterministic finite automata (DFAs) by unrolling state transitions into depth-wise neural layers, with formal characterizations of required depth, width, and state compression. We demonstrate that DFA transitions are linearly separable, binary threshold activations allow exponential compression, and Myhill-Nerode equivalence classes can be embedded into continuous latent spaces while preserving separability. We also formalize the expressivity boundary: fixed-depth feedforward networks cannot recognize non-regular languages requiring unbounded memory. Unlike prior heuristic or probing-based studies, we provide constructive proofs and design explicit DFA-unrolled neural architectures that empirically validate every claim. Our results bridge deep learning, automata theory, and neural-symbolic computation, offering a rigorous blueprint for how discrete symbolic processes can be realized in continuous neural systems.
nan
Article 378
Title@2025-05-28 (3): A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization
Title: A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization | A Combinatorial Theory of Dropout: Subnetzwerke, Graphische Geometrie und Generalisierung | 辍学综合理论:子网络、图形几何和一般化 2504.14762v2 |
Authors: Sahil Rajesh Dhayalkar
We propose a combinatorial and graph-theoretic theory of dropout by modeling training as a random walk over a high-dimensional graph of binary subnetworks. Each node represents a masked version of the network, and dropout induces stochastic traversal across this space. We define a subnetwork contribution score that quantifies generalization and show that it varies smoothly over the graph. Using tools from spectral graph theory, PAC-Bayes analysis, and combinatorics, we prove that generalizing subnetworks form large, connected, low-resistance clusters, and that their number grows exponentially with network width. This reveals dropout as a mechanism for sampling from a robust, structured ensemble of well-generalizing subnetworks with built-in redundancy. Extensive experiments validate every theoretical claim across diverse architectures. Together, our results offer a unified foundation for understanding dropout and suggest new directions for mask-guided regularization and subnetwork optimization.
nan
Article 379
Title@2025-05-28 (3): Smart Surrogate Losses for Contextual Stochastic Linear Optimization with Robust Constraints
Title: Smart Surrogate Losses for Contextual Stochastic Linear Optimization with Robust Constraints | Intelligente Surrogatverluste für kontextuelle stochastische Linearoptimierung mit robusten Einschränkungen | 具有强力限制的内幕斯托卡式线性优化的智能代谢损失 2505.22881v1 |
Authors: Hyungki Im, Wyame Benslimane, Paul Grigas
We study an extension of contextual stochastic linear optimization (CSLO) that, in contrast to most of the existing literature, involves inequality constraints that depend on uncertain parameters predicted by a machine learning model. To handle the constraint uncertainty, we use contextual uncertainty sets constructed via methods like conformal prediction. Given a contextual uncertainty set method, we introduce the “Smart Predict-then-Optimize with Robust Constraints” (SPO-RC) loss, a feasibility-sensitive adaptation of the SPO loss that measures decision error of predicted objective parameters. We also introduce a convex surrogate, SPO-RC+, and prove Fisher consistency with SPO-RC. To enhance performance, we train on truncated datasets where true constraint parameters lie within the uncertainty sets, and we correct the induced sample selection bias using importance reweighting techniques. Through experiments on fractional knapsack and alloy production problem instances, we demonstrate that SPO-RC+ effectively handles uncertainty in constraints and that combining truncation with importance reweighting can further improve performance.
nan
Article 380
Title@2025-05-28 (3): Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks
Title: Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks | Signaldämpfung ermöglicht skalierbares dezentrales Multi-Agenten-Verstärkungslernen über Netzwerke | 信号减速使可伸缩的分散式多试剂强化学习超越网络 2505.11461v2 |
Authors: Wesley A Suttle, Vipul K Sharma, Brian M Sadler
Multi-agent reinforcement learning (MARL) methods typically require that agents enjoy global state observability, preventing development of decentralized algorithms and limiting scalability. Recent work has shown that, under assumptions on decaying inter-agent influence, global observability can be replaced by local neighborhood observability at each agent, enabling decentralization and scalability. Real-world applications enjoying such decay properties remain underexplored, however, despite the fact that signal power decay, or signal attenuation, due to path loss is an intrinsic feature of many problems in wireless communications and radar networks. In this paper, we show that signal attenuation enables decentralization in MARL by considering the illustrative special case of performing power allocation for target detection in a radar network. To achieve this, we propose two new constrained multi-agent Markov decision process formulations of this power allocation problem, derive local neighborhood approximations for global value function and policy gradient estimates and establish corresponding error bounds, and develop decentralized saddle point policy gradient algorithms for solving the proposed problems. Our approach, though oriented towards the specific radar network problem we consider, provides a useful model for extensions to additional problems in wireless communications and radar networks.
nan
Article 381
Title@2025-05-28 (3): CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models
Title: CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models | CFP-Gen: Kombinatorische funktionelle Proteinerzeugung über Diffusions-Sprachenmodelle | CFP-Gen:通过传播语言模式生成混合功能性蛋白质 2505.22869v1 |
Authors: Junbo Yin, Chao Zha, Wenjia He, Chencheng Xu, Xin Gao
Existing PLMs generate protein sequences based on a single-condition constraint from a specific modality, struggling to simultaneously satisfy multiple constraints across different modalities. In this work, we introduce CFP-Gen, a novel diffusion language model for Combinatorial Functional Protein GENeration. CFP-Gen facilitates the de novo protein design by integrating multimodal conditions with functional, sequence, and structural constraints. Specifically, an Annotation-Guided Feature Modulation (AGFM) module is introduced to dynamically adjust the protein feature distribution based on composable functional annotations, e.g., GO terms, IPR domains and EC numbers. Meanwhile, the Residue-Controlled Functional Encoding (RCFE) module captures residue-wise interaction to ensure more precise control. Additionally, off-the-shelf 3D structure encoders can be seamlessly integrated to impose geometric constraints. We demonstrate that CFP-Gen enables high-throughput generation of novel proteins with functionality comparable to natural proteins, while achieving a high success rate in designing multifunctional proteins. Code and data available at https://github.com/yinjunbo/cfpgen.
nan
Article 382
Title@2025-05-28 (3): Multimodal Survival Modeling in the Age of Foundation Models
Title: Multimodal Survival Modeling in the Age of Foundation Models | Multimodale Überlebensmodellierung im Zeitalter der Gründungsmodelle | 基金会时代多模式生存模型 2505.07683v2 |
Authors: Steven Song, Morgan Borjigin-Wang, Irene Madejski, Robert L. Grossman
The Cancer Genome Atlas (TCGA) has enabled novel discoveries and served as a large-scale reference through its harmonized genomics, clinical, and image data. Prior studies have trained bespoke cancer survival prediction models from unimodal or multimodal TCGA data. A modern paradigm in biomedical deep learning is the development of foundation models (FMs) to derive meaningful feature embeddings, agnostic to a specific modeling task. Biomedical text especially has seen growing development of FMs. While TCGA contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the feasibility of training classical, multimodal survival models over zero-shot embeddings extracted by FMs. We show the ease and additive effect of multimodal fusion, outperforming unimodal models. We demonstrate the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we modernize survival modeling by leveraging FMs and information extraction from pathology reports.
nan
Article 383
Title@2025-05-28 (3): CrossNAS: A Cross-Layer Neural Architecture Search Framework for PIM Systems
Title: CrossNAS: A Cross-Layer Neural Architecture Search Framework for PIM Systems | CrossNAS: Ein Cross-Layer Neural Architecture Search Framework für PIM-Systeme | CrossNAS:PIM系统跨行业神经结构搜索框架 2505.22868v1 |
Authors: Md Hasibul Amin, Mohammadreza Mohammadi, Jason D. Bakos, Ramtin Zand
In this paper, we propose the CrossNAS framework, an automated approach for exploring a vast, multidimensional search space that spans various design abstraction layers-circuits, architecture, and systems-to optimize the deployment of machine learning workloads on analog processing-in-memory (PIM) systems. CrossNAS leverages the single-path one-shot weight-sharing strategy combined with the evolutionary search for the first time in the context of PIM system mapping and optimization. CrossNAS sets a new benchmark for PIM neural architecture search (NAS), outperforming previous methods in both accuracy and energy efficiency while maintaining comparable or shorter search times.
nan
Article 384
Title@2025-05-28 (3): Scaling Offline RL via Efficient and Expressive Shortcut Models
Title: Scaling Offline RL via Efficient and Expressive Shortcut Models | Skalierung von Offline-RL über effiziente und Expressive Shortcut-Modelle | 通过高效和直表达快捷键模式缩放离线 RL 2505.22866v1 |
Authors: Nicolas Espinosa-Dice, Yiyi Zhang, Yiding Chen, Bradley Guo, Owen Oertell, Gokul Swamy, Kiante Brantley, Wen Sun
Diffusion and flow models have emerged as powerful generative approaches capable of modeling diverse and multimodal behavior. However, applying these models to offline reinforcement learning (RL) remains challenging due to the iterative nature of their noise sampling processes, making policy optimization difficult. In this paper, we introduce Scalable Offline Reinforcement Learning (SORL), a new offline RL algorithm that leverages shortcut models - a novel class of generative models - to scale both training and inference. SORL’s policy can capture complex data distributions and can be trained simply and efficiently in a one-stage training procedure. At test time, SORL introduces both sequential and parallel inference scaling by using the learned Q-function as a verifier. We demonstrate that SORL achieves strong performance across a range of offline RL tasks and exhibits positive scaling behavior with increased test-time compute. We release the code at nico-espinosadice.github.io/projects/sorl.
nan
Article 385
Title@2025-05-28 (3): Your Data, My Model: Learning Who Really Helps in Federated Learning
Title: Your Data, My Model: Learning Who Really Helps in Federated Learning | Ihre Daten, mein Modell: Lernen, die wirklich hilft beim Federated Learning | 您的数据, 我的模型: 学习谁真正帮助联邦学习 2409.02064v3 |
Authors: Shamsiiat Abdurakhmanova, Amirhossein Mohammadi, Yasmin SarcheshmehPour, Alexander Jung
Many important machine learning applications involve networks of devices-such as wearables or smartphones-that generate local data and train personalized models. A key challenge is determining which peers are most beneficial for collaboration. We propose a simple and privacy-preserving method to select relevant collaborators by evaluating how much a model improves after a single gradient step using another devices data-without sharing raw data. This method naturally extends to non-parametric models by replacing the gradient step with a non-parametric generalization. Our approach enables model-agnostic, data-driven peer selection for personalized federated learning (PersFL).
nan
Article 386
Title@2025-05-28 (3): Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel
Title: Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel | Causal-PIK: Kausalitätsbasierte Physical Reasoning mit einem physikinformierten Kernel | 原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-因物理内心造成的身体原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-原因-因物理 2505.22861v1 |
Authors: Carlota Parés-Morlans, Michelle Yi, Claire Chen, Sarah A. Wu, Rika Antonova, Tobias Gerstenberg, Jeannette Bohg
Tasks that involve complex interactions between objects with unknown dynamics make planning before execution difficult. These tasks require agents to iteratively improve their actions after actively exploring causes and effects in the environment. For these type of tasks, we propose Causal-PIK, a method that leverages Bayesian optimization to reason about causal interactions via a Physics-Informed Kernel to help guide efficient search for the best next action. Experimental results on Virtual Tools and PHYRE physical reasoning benchmarks show that Causal-PIK outperforms state-of-the-art results, requiring fewer actions to reach the goal. We also compare Causal-PIK to human studies, including results from a new user study we conducted on the PHYRE benchmark. We find that Causal-PIK remains competitive on tasks that are very challenging, even for human problem-solvers.
nan
Article 387
Title@2025-05-28 (3): Permissioned LLMs: Enforcing Access Control in Large Language Models
Title: Permissioned LLMs: Enforcing Access Control in Large Language Models | Zugelassene LLMs: Erzwingen der Zugriffskontrolle in großen Sprachmodellen | 获得许可的LLMM:在大语言模型中实施访问控制 2505.22860v1 |
Authors: Bargav Jayaraman, Virendra J. Marathe, Hamid Mozaffari, William F. Shen, Krishnaram Kenthapadi
In enterprise settings, organizational data is segregated, siloed and carefully protected by elaborate access control frameworks. These access control structures can completely break down if an LLM fine-tuned on the siloed data serves requests, for downstream tasks, from individuals with disparate access privileges. We propose Permissioned LLMs (PermLLM), a new class of LLMs that superimpose the organizational data access control structures on query responses they generate. We formalize abstractions underpinning the means to determine whether access control enforcement happens correctly over LLM query responses. Our formalism introduces the notion of a relevant response that can be used to prove whether a PermLLM mechanism has been implemented correctly. We also introduce a novel metric, called access advantage, to empirically evaluate the efficacy of a PermLLM mechanism. We introduce three novel PermLLM mechanisms that build on Parameter Efficient Fine-Tuning to achieve the desired access control. We furthermore present two instantiations of access advantage–(i) Domain Distinguishability Index (DDI) based on Membership Inference Attacks, and (ii) Utility Gap Index (UGI) based on LLM utility evaluation. We demonstrate the efficacy of our PermLLM mechanisms through extensive experiments on four public datasets (GPQA, RCV1, SimpleQA, and WMDP), in addition to evaluating the validity of DDI and UGI metrics themselves for quantifying access control in LLMs.
nan
Article 388
Title@2025-05-28 (3): NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding
Title: NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding | NGPU-LM: GPU-beschleunigtes N-Gram-Sprachenmodell für Kontext-Biasing in Greedy ASR-Dekodierung | NGPU-LM: 加速GPU-加速型N-Gram语语模式,用于在贪婪ASR标记中进行背景切换 2505.22857v1 |
Authors: Vladimir Bataev, Andrei Andrusenko, Lilit Grigoryan, Aleksandr Laptev, Vitaly Lavrukhin, Boris Ginsburg
Statistical n-gram language models are widely used for context-biasing tasks in Automatic Speech Recognition (ASR). However, existing implementations lack computational efficiency due to poor parallelization, making context-biasing less appealing for industrial use. This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference. Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types - including transducers, attention encoder-decoder models, and CTC - with less than 7% computational overhead. The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search. The implementation of the proposed NGPU-LM is open-sourced.
nan
Article 389
Title@2025-05-28 (3): Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Title: Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning | Nutzung von nicht gekennzeichneten Daten durch Kernel-Funktion Annäherung im Offline-Verstärkungs-Lernen | 在离线强化学习中,通过 Kernel 函数相近接近的内核功能利用未贴标签的数据分享来利用无标签数据分享 2408.12307v3 |
Authors: Yen-Ru Lai, Fu-Chieh Chang, Pei-Yuan Wu
Offline reinforcement learning (RL) learns policies from a fixed dataset, but often requires large amounts of data. The challenge arises when labeled datasets are expensive, especially when rewards have to be provided by human labelers for large datasets. In contrast, unlabelled data tends to be less expensive. This situation highlights the importance of finding effective ways to use unlabelled data in offline RL, especially when labelled data is limited or expensive to obtain. In this paper, we present the algorithm to utilize the unlabeled data in the offline RL method with kernel function approximation and give the theoretical guarantee. We present various eigenvalue decay conditions of $\mathcal{H}_k$ which determine the complexity of the algorithm. In summary, our work provides a promising approach for exploiting the advantages offered by unlabeled data in offline RL, whilst maintaining theoretical assurances.
nan
Article 390
Title@2025-05-28 (3): Point Cloud Synthesis Using Inner Product Transforms
Title: Point Cloud Synthesis Using Inner Product Transforms | Punkt-Cloud-Synthese mit inneren Produkt-Transformationen | 使用内产产品变换的点云合成 2410.18987v3 |
Authors: Ernst Röell, Bastian Rieck
Point-cloud synthesis, i.e. the generation of novel point clouds from an input distribution, remains a challenging task, for which numerous complex machine-learning models have been devised. We develop a novel method that encodes geometrical-topological characteristics of point clouds using inner products, leading to a highly-efficient point cloud representation with provable expressivity properties. Integrated into deep learning models, our encoding exhibits high quality in typical tasks like reconstruction, generation, and interpolation, with inference times orders of magnitude faster than existing methods.
nan
Article 391
Title@2025-05-28 (3): RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation
Title: RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation | RocqStar: Leveraging-ähnliche Retrieval- und Agentiksysteme für die Rocq-Generation | RocqStar:利用利用相似度驱动回收系统和干系统来生成Rocq 2505.22846v1 |
Authors: Nikita Khramov, Andrei Kozyrev, Gleb Solovev, Anton Podkopaev
Interactive Theorem Proving was repeatedly shown to be fruitful combined with Generative Artificial Intelligence. This paper assesses multiple approaches to Rocq generation and illuminates potential avenues for improvement. We highlight the importance of thorough premise selection for generating Rocq proofs and propose a novel approach, leveraging retrieval via a self-attentive embedder model. The evaluation of the designed approach shows up to 28% relative increase of the generator’s performance. We tackle the problem of writing Rocq proofs using a multi-stage agentic system, tailored for formal verification, and demonstrate its high effectiveness. We conduct an ablation study and show the use of multi-agent debate on the planning stage of proof synthesis.
nan
Article 392
Title@2025-05-28 (3): Entropy-regularized Gradient Estimators for Approximate Bayesian Inference
Title: Entropy-regularized Gradient Estimators for Approximate Bayesian Inference | Entropie-regularisierte Gradienten-Estimatoren für ungefähre Bayesische Schlussfolgerung | 用于近近贝耶斯推断的全天正规化梯度测算器 2503.11964v3 |
Authors: Jasmeet Kaur
Effective uncertainty quantification is important for training modern predictive models with limited data, enhancing both accuracy and robustness. While Bayesian methods are effective for this purpose, they can be challenging to scale. When employing approximate Bayesian inference, ensuring the quality of samples from the posterior distribution in a computationally efficient manner is essential. This paper addresses the estimation of the Bayesian posterior to generate diverse samples by approximating the gradient flow of the Kullback-Leibler (KL) divergence and the cross entropy of the target approximation under the metric induced by the Stein Operator. It presents empirical evaluations on classification tasks to assess the method’s performance and discuss its effectiveness for Model-Based Reinforcement Learning that uses uncertainty-aware network dynamics models.
nan
Article 393
Title@2025-05-28 (3): Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Title: Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion | Jenseits der Permutationssymmetrie der Transformer: Die Rolle der Rotation für die Modellfusion | 变异器超越变异对称:变动对模型融合的作用 2502.00264v2 |
Authors: Binchi Zhang, Zaiyi Zheng, Zhengzhang Chen, Jundong Li
Symmetry in the parameter space of deep neural networks (DNNs) has proven beneficial for various deep learning applications. A well-known example is the permutation symmetry in Multi-Layer Perceptrons (MLPs), where permuting the rows of weight matrices in one layer and applying the inverse permutation to adjacent layers yields a functionally equivalent model. While permutation symmetry fully characterizes the equivalence set for MLPs, its discrete nature limits its utility for transformers. In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. Unlike permutation symmetry, rotation symmetry operates in a continuous domain, thereby significantly expanding the equivalence set for transformers. Based on this property, we propose a theoretically optimal parameter matching algorithm as a plug-and-play module to enhance model fusion. We evaluate our approach using pre-trained transformers across diverse natural language and vision tasks. Experimental results demonstrate that our rotation symmetry-based matching algorithm substantially improves model fusion, highlighting the potential of parameter space symmetry to facilitate model fusion. Our code is available on https://github.com/zhengzaiyi/RotationSymmetry.
nan
Article 394
Title@2025-05-28 (3): Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
Title: Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation | Bayesian Attention Mechanism: Ein probabilistisches Framework für die Positionskodierung und Kontextlängen-Extrapolation | Bayesian注意机制:定位编码和背景长度外推概率框架 2505.22842v1 |
Authors: Arthur S. Bianchessi, Rodrigo C. Barros, Lucas S. Kupssinskü
Transformer-based language models rely on positional encoding (PE) to handle token order and support context length extrapolation. However, existing PE methods lack theoretical clarity and rely on limited evaluation metrics to substantiate their extrapolation claims. We propose the Bayesian Attention Mechanism (BAM), a theoretical framework that formulates positional encoding as a prior within a probabilistic model. BAM unifies existing methods (e.g., NoPE and ALiBi) and motivates a new Generalized Gaussian positional prior that substantially improves long-context generalization. Empirically, BAM enables accurate information retrieval at $500\times$ the training context length, outperforming previous state-of-the-art context length generalization in long context retrieval accuracy while maintaining comparable perplexity and introducing minimal additional parameters.
nan
Article 395
Title@2025-05-28 (3): Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study
Title: Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study | Kernelgeglättete Punktzahlen für die Denoisierung der Diffusion: Eine Bias-Varianz-Studie | Disoising 扩散的内核悬浮分数:生物量变化研究 2505.22841v1 |
Authors: Franck Gabriel, François Ged, Maria Han Veiga, Emmanuel Schertzer
Diffusion models now set the benchmark in high-fidelity generative sampling, yet they can, in principle, be prone to memorization. In this case, their learned score overfits the finite dataset so that the reverse-time SDE samples are mostly training points. In this paper, we interpret the empirical score as a noisy version of the true score and show that its covariance matrix is asymptotically a re-weighted data PCA. In large dimension, the small time limit makes the noise variance blow up while simultaneously reducing spatial correlation. To reduce this variance, we introduce a kernel-smoothed empirical score and analyze its bias-variance trade-off. We derive asymptotic bounds on the Kullback-Leibler divergence between the true distribution and the one generated by the modified reverse SDE. Regularization on the score has the same effect as increasing the size of the training dataset, and thus helps prevent memorization. A spectral decomposition of the forward diffusion suggests better variance control under some regularity conditions of the true data distribution. Reverse diffusion with kernel-smoothed empirical score can be reformulated as a gradient descent drifted toward a Log-Exponential Double-Kernel Density Estimator (LED-KDE). This perspective highlights two regularization mechanisms taking place in denoising diffusions: an initial Gaussian kernel first diffuses mass isotropically in the ambient space, while a second kernel applied in score space concentrates and spreads that mass along the data manifold. Hence, even a straightforward regularization-without any learning-already mitigates memorization and enhances generalization. Numerically, we illustrate our results with several experiments on synthetic and MNIST datasets.
nan
Article 396
Title@2025-05-28 (3): Development and Validation of SXI++ LNM Algorithm for Sepsis Prediction
Title: Development and Validation of SXI++ LNM Algorithm for Sepsis Prediction | Entwicklung und Validierung von SXI++ LNM-Algorithmus für Sepsis-Vorhersage | SXI+++ LNM 测距算法的制定和校验 2505.22840v1 |
Authors: Dharambir Mahto, Prashant Yadav, Mahesh Banavar, Jim Keany, Alan T Joseph, Srinivas Kilambi
Sepsis is a life-threatening condition affecting over 48.9 million people globally and causing 11 million deaths annually. Despite medical advancements, predicting sepsis remains a challenge due to non-specific symptoms and complex pathophysiology. The SXI++ LNM is a machine learning scoring system that refines sepsis prediction by leveraging multiple algorithms and deep neural networks. This study aims to improve robustness in clinical applications and evaluates the predictive performance of the SXI++ LNM for sepsis prediction. The model, utilizing a deep neural network, was trained and tested using multiple scenarios with different dataset distributions. The model’s performance was assessed against unseen test data, and accuracy, precision, and area under the curve (AUC) were calculated. THE SXI++ LNM outperformed the state of the art in three use cases, achieving an AUC of 0.99 (95% CI: 0.98-1.00). The model demonstrated a precision of 99.9% (95% CI: 99.8-100.0) and an accuracy of 99.99% (95% CI: 99.98-100.0), maintaining high reliability.
nan
Article 397
Title@2025-05-28 (3): How Do Diffusion Models Improve Adversarial Robustness?
Title: How Do Diffusion Models Improve Adversarial Robustness? | Wie verbessern Diffusionsmodelle die widrige Robustheit? | 传播模型如何改善反逆能力? 2505.22839v1 |
Authors: Liu Yuezhang, Xue-Xin Wei
Recent findings suggest that diffusion models significantly enhance empirical adversarial robustness. While some intuitive explanations have been proposed, the precise mechanisms underlying these improvements remain unclear. In this work, we systematically investigate how and how well diffusion models improve adversarial robustness. First, we observe that diffusion models intriguingly increase, rather than decrease, the $\ell_p$ distance to clean samples–challenging the intuition that purification denoises inputs closer to the original data. Second, we find that the purified images are heavily influenced by the internal randomness of diffusion models, where a compression effect arises within each randomness configuration. Motivated by this observation, we evaluate robustness under fixed randomness and find that the improvement drops to approximately 24% on CIFAR-10–substantially lower than prior reports approaching 70%. Importantly, we show that this remaining robustness gain strongly correlates with the model’s ability to compress the input space, revealing the compression rate as a reliable robustness indicator without requiring gradient-based analysis. Our findings provide novel insights into the mechanisms underlying diffusion-based purification, and offer guidance for developing more effective and principled adversarial purification systems.
nan
Article 398
Title@2025-05-28 (3): Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies
Title: Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies | Bridging Distribution Shift und KI-Sicherheit: Konzeptionelle und methodische Synergien | 搭桥分配转变与AI安全:概念与方法的协同作用 2505.22829v1 |
Authors: Chenruo Liu, Kenan Tang, Yao Qin, Qi Lei
This paper bridges distribution shift and AI safety through a comprehensive analysis of their conceptual and methodological synergies. While prior discussions often focus on narrow cases or informal analogies, we establish two types connections between specific causes of distribution shift and fine-grained AI safety issues: (1) methods addressing a specific shift type can help achieve corresponding safety goals, or (2) certain shifts and safety issues can be formally reduced to each other, enabling mutual adaptation of their methods. Our findings provide a unified perspective that encourages fundamental integration between distribution shift and AI safety research.
nan
Article 399
Title@2025-05-28 (3): PGLearn – An Open-Source Learning Toolkit for Optimal Power Flow
Title: PGLearn – An Open-Source Learning Toolkit for Optimal Power Flow | PGLearn – Ein Open-Source-Learning-Toolkit für optimalen Stromfluss | PGLearn – – 最佳电力流动开放源学习工具包 2505.22825v1 |
Authors: Michael Klamkin, Mathieu Tanneau, Pascal Van Hentenryck
Machine Learning (ML) techniques for Optimal Power Flow (OPF) problems have recently garnered significant attention, reflecting a broader trend of leveraging ML to approximate and/or accelerate the resolution of complex optimization problems. These developments are necessitated by the increased volatility and scale in energy production for modern and future grids. However, progress in ML for OPF is hindered by the lack of standardized datasets and evaluation metrics, from generating and solving OPF instances, to training and benchmarking machine learning models. To address this challenge, this paper introduces PGLearn, a comprehensive suite of standardized datasets and evaluation tools for ML and OPF. PGLearn provides datasets that are representative of real-life operating conditions, by explicitly capturing both global and local variability in the data generation, and by, for the first time, including time series data for several large-scale systems. In addition, it supports multiple OPF formulations, including AC, DC, and second-order cone formulations. Standardized datasets are made publicly available to democratize access to this field, reduce the burden of data generation, and enable the fair comparison of various methodologies. PGLearn also includes a robust toolkit for training, evaluating, and benchmarking machine learning models for OPF, with the goal of standardizing performance evaluation across the field. By promoting open, standardized datasets and evaluation metrics, PGLearn aims at democratizing and accelerating research and innovation in machine learning applications for optimal power flow problems. Datasets are available for download at https://www.huggingface.co/PGLearn.
nan
Article 400
Title@2025-05-28 (3): Comparing Human and AI Rater Effects Using the Many-Facet Rasch Model
Title: Comparing Human and AI Rater Effects Using the Many-Facet Rasch Model | Vergleich menschlicher und KI-Rater-Effekte mit dem Multi-Facet-Rasch-Modell | 使用多面 Rasch 模型比较人类和AI Rater效应 2505.18486v2 |
Authors: Hong Jiao, Dan Song, Won-Chan Lee
Large language models (LLMs) have been widely explored for automated scoring in low-stakes assessment to facilitate learning and instruction. Empirical evidence related to which LLM produces the most reliable scores and induces least rater effects needs to be collected before the use of LLMs for automated scoring in practice. This study compared ten LLMs (ChatGPT 3.5, ChatGPT 4, ChatGPT 4o, OpenAI o1, Claude 3.5 Sonnet, Gemini 1.5, Gemini 1.5 Pro, Gemini 2.0, as well as DeepSeek V3, and DeepSeek R1) with human expert raters in scoring two types of writing tasks. The accuracy of the holistic and analytic scores from LLMs compared with human raters was evaluated in terms of Quadratic Weighted Kappa. Intra-rater consistency across prompts was compared in terms of Cronbach Alpha. Rater effects of LLMs were evaluated and compared with human raters using the Many-Facet Rasch model. The results in general supported the use of ChatGPT 4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet with high scoring accuracy, better rater reliability, and less rater effects.
nan
Article 401
Title@2025-05-28 (3): Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
Title: Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection | Hybride Disagreement-Diversity Aktives Lernen für die bioakustische Sound-Erkennung | 生物声波声音事件探测发现活动积极学习 2505.20956v2 |
Authors: Shiqi Zhang, Tuomas Virtanen
Bioacoustic sound event detection (BioSED) is crucial for biodiversity conservation but faces practical challenges during model development and training: limited amounts of annotated data, sparse events, species diversity, and class imbalance. To address these challenges efficiently with a limited labeling budget, we apply the mismatch-first farthest-traversal (MFFT), an active learning method integrating committee voting disagreement and diversity analysis. We also refine an existing BioSED dataset specifically for evaluating active learning algorithms. Experimental results demonstrate that MFFT achieves a mAP of 68% when cold-starting and 71% when warm-starting (which is close to the fully-supervised mAP of 75%) while using only 2.3% of the annotations. Notably, MFFT excels in cold-start scenarios and with rare species, which are critical for monitoring endangered species, demonstrating its practical value.
nan
Article 402
Title@2025-05-28 (3): Scalable Differentially Private Bayesian Optimization
Title: Scalable Differentially Private Bayesian Optimization | Skalierbare differenzierte private Bayesian-Optimierung | Bayesian优化化 2502.06044v2 |
Authors: Getoar Sopa, Juraj Marusic, Marco Avella-Medina, John P. Cunningham
In recent years, there has been much work on scaling Bayesian Optimization to high-dimensional problems, for example hyperparameter tuning in large machine learning models. These scalable methods have been successful, finding high objective values much more quickly than traditional global Bayesian Optimization or random search-based methods. At the same time, these large models often use sensitive data, but preservation of Differential Privacy has not scaled alongside these modern Bayesian Optimization procedures. Here we develop a method to privately optimize potentially high-dimensional parameter spaces using privatized Gradient Informative Bayesian Optimization. Our theoretical results show that under suitable conditions, our method converges exponentially fast to a locally optimal parameter configuration, up to a natural privacy error. Moreover, regardless of whether the assumptions are satisfied, we prove that our algorithm maintains privacy and empirically display superior performance to existing methods in the high-dimensional hyperparameter setting.
nan
Article 403
Title@2025-05-28 (3): When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations
Title: When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations | Wenn Kollaborative Filterung nicht kollaborativ ist: Unfairness von PCA für Empfehlungen | 当协作过滤不是协作过滤时:常设仲裁院不公平以征求建议 2310.09687v2 |
Authors: David Liu, Jackie Baek, Tina Eliassi-Rad
We study the fairness of dimensionality reduction methods for recommendations. We focus on the fundamental method of principal component analysis (PCA), which identifies latent components and produces a low-rank approximation via the leading components while discarding the trailing components. Prior works have defined notions of “fair PCA”; however, these definitions do not answer the following question: why is PCA unfair? We identify two underlying popularity mechanisms that induce item unfairness in PCA. The first negatively impacts less popular items because less popular items rely on trailing latent components to recover their values. The second negatively impacts highly popular items, since the leading PCA components specialize in individual popular items instead of capturing similarities between items. To address these issues, we develop a polynomial-time algorithm, Item-Weighted PCA, that flexibly up-weights less popular items when optimizing for leading principal components. We theoretically show that PCA, in all cases, and Normalized PCA, in cases of block-diagonal matrices, are instances of Item-Weighted PCA. We empirically show that there exist datasets for which Item-Weighted PCA yields the optimal solution while the baselines do not. In contrast to past dimensionality reduction re-weighting techniques, Item-Weighted PCA solves a convex optimization problem and enforces a hard rank constraint. Our evaluations on real-world datasets show that Item-Weighted PCA not only mitigates both unfairness mechanisms, but also produces recommendations that outperform those of PCA baselines.
nan
Article 404
Title@2025-05-28 (3): Preference Learning with Response Time
Title: Preference Learning with Response Time | Präferenz-Lernen mit Reaktionszeit | 具有响应时间的优先学习 2505.22820v1 |
Authors: Ayush Sawarni, Sahasrajit Sarmasarkar, Vasilis Syrgkanis
This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While binary preference data has become fundamental in fine-tuning foundation models, generative AI systems, and other large-scale models, the valuable temporal information inherent in user decision-making remains largely unexploited. We propose novel methodologies to incorporate response time information alongside binary choice data, leveraging the Evidence Accumulation Drift Diffusion (EZ) model, under which response time is informative of the preference strength. We develop Neyman-orthogonal loss functions that achieve oracle convergence rates for reward model learning, matching the theoretical optimal rates that would be attained if the expected response times for each query were known a priori. Our theoretical analysis demonstrates that for linear reward functions, conventional preference learning suffers from error rates that scale exponentially with reward magnitude. In contrast, our response time-augmented approach reduces this to polynomial scaling, representing a significant improvement in sample efficiency. We extend these guarantees to non-parametric reward function spaces, establishing convergence properties for more complex, realistic reward models. Our extensive experiments validate our theoretical findings in the context of preference learning over images.
nan
Article 405
Title@2025-05-28 (3): IMTS is Worth Time $\times$ Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction
Title: IMTS is Worth Time $\times$ Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction | IMTS ist Zeit wert $\times$ Channel Patches: Visual Masked Autoencoder für irreguläre Multivariate Time Series Prediction | IMTS 是有价值的时间 $\ times$$ 频道补丁: 用于非常规多变时间序列预测的视觉蒙面自动编码器 2505.22815v1 |
Authors: Zhangyi Hu, Jiemin Wu, Hua Xu, Mingqian Liao, Ninghui Feng, Bo Gao, Songning Lai, Yutao Yue
Irregular Multivariate Time Series (IMTS) forecasting is challenging due to the unaligned nature of multi-channel signals and the prevalence of extensive missing data. Existing methods struggle to capture reliable temporal patterns from such data due to significant missing values. While pre-trained foundation models show potential for addressing these challenges, they are typically designed for Regularly Sampled Time Series (RTS). Motivated by the visual Mask AutoEncoder’s (MAE) powerful capability for modeling sparse multi-channel information and its success in RTS forecasting, we propose VIMTS, a framework adapting Visual MAE for IMTS forecasting. To mitigate the effect of missing values, VIMTS first processes IMTS along the timeline into feature patches at equal intervals. These patches are then complemented using learned cross-channel dependencies. Then it leverages visual MAE’s capability in handling sparse multichannel data for patch reconstruction, followed by a coarse-to-fine technique to generate precise predictions from focused contexts. In addition, we integrate self-supervised learning for improved IMTS modeling by adapting the visual MAE to IMTS data. Extensive experiments demonstrate VIMTS’s superior performance and few-shot capability, advancing the application of visual foundation models in more general time series tasks. Our code is available at https://github.com/WHU-HZY/VIMTS.
nan
Article 406
Title@2025-05-28 (3): Regression and Forecasting of U.S. Stock Returns Based on LSTM
Title: Regression and Forecasting of U.S. Stock Returns Based on LSTM | Regression und Prognose von US-Aktienrenditen basierend auf LSTM | 根据LSTM对美国库存收益的回归和预测 2502.05210v3 |
Authors: Shicheng Zhou, Zizhou Zhang, Rong Zhang, Yuchen Yin, Chia Hong Chang, Qinyan Shen
This paper analyses the investment returns of three stock sectors, Manuf, Hitec, and Other, in the U.S. stock market, based on the Fama-French three-factor model, the Carhart four-factor model, and the Fama-French five-factor model, in order to test the validity of the Fama-French three-factor model, the Carhart four-factor model, and the Fama-French five-factor model for the three sectors of the market. French five-factor model for the three sectors of the market. Also, the LSTM model is used to explore the additional factors affecting stock returns. The empirical results show that the Fama-French five-factor model has better validity for the three segments of the market under study, and the LSTM model has the ability to capture the factors affecting the returns of certain industries, and can better regress and predict the stock returns of the relevant industries. Keywords- Fama-French model; Carhart model; Factor model; LSTM model.
nan
Article 407
Title@2025-05-28 (3): X-Factor: Quality Is a Dataset-Intrinsic Property
Title: X-Factor: Quality Is a Dataset-Intrinsic Property | X-Factor: Qualität ist eine datensatzintrinsische Eigenschaft | X 要素: 质量是一个数据集 - Intrins 属性 2505.22813v1 |
Authors: Josiah Couch, Miao Li, Rima Arnaout, Ramy Arnaout
In the universal quest to optimize machine-learning classifiers, three factors – model architecture, dataset size, and class balance – have been shown to influence test-time performance but do not fully account for it. Previously, evidence was presented for an additional factor that can be referred to as dataset quality, but it was unclear whether this was actually a joint property of the dataset and the model architecture, or an intrinsic property of the dataset itself. If quality is truly dataset-intrinsic and independent of model architecture, dataset size, and class balance, then the same datasets should perform better (or worse) regardless of these other factors. To test this hypothesis, here we create thousands of datasets, each controlled for size and class balance, and use them to train classifiers with a wide range of architectures, from random forests and support-vector machines to deep networks. We find that classifier performance correlates strongly by subset across architectures ($R^2=0.79$), supporting quality as an intrinsic property of datasets independent of dataset size and class balance and of model architecture. Digging deeper, we find that dataset quality appears to be an emergent property of something more fundamental: the quality of datasets’ constituent classes. Thus, quality joins size, class balance, and model architecture as an independent correlate of performance and a separate target for optimizing machine-learning-based classification.
nan
Article 408
Title@2025-05-28 (3): Credit Risk Identification in Supply Chains Using Generative Adversarial Networks
Title: Credit Risk Identification in Supply Chains Using Generative Adversarial Networks | Kreditrisikoidentifizierung in Lieferketten mit generativen Adversarial-Netzwerken | 利用产生反逆网络的供应链中的信用风险识别 2501.10348v4 |
Authors: Zizhou Zhang, Xinshi Li, Yu Cheng, Zhenrui Chen, Qianying Liu
Credit risk management within supply chains has emerged as a critical research area due to its significant implications for operational stability and financial sustainability. The intricate interdependencies among supply chain participants mean that credit risks can propagate across networks, with impacts varying by industry. This study explores the application of Generative Adversarial Networks (GANs) to enhance credit risk identification in supply chains. GANs enable the generation of synthetic credit risk scenarios, addressing challenges related to data scarcity and imbalanced datasets. By leveraging GAN-generated data, the model improves predictive accuracy while effectively capturing dynamic and temporal dependencies in supply chain data. The research focuses on three representative industries-manufacturing (steel), distribution (pharmaceuticals), and services (e-commerce) to assess industry-specific credit risk contagion. Experimental results demonstrate that the GAN-based model outperforms traditional methods, including logistic regression, decision trees, and neural networks, achieving superior accuracy, recall, and F1 scores. The findings underscore the potential of GANs in proactive risk management, offering robust tools for mitigating financial disruptions in supply chains. Future research could expand the model by incorporating external market factors and supplier relationships to further enhance predictive capabilities. Keywords- Generative Adversarial Networks (GANs); Supply Chain Risk; Credit Risk Identification; Machine Learning; Data Augmentation
nan
Article 409
Title@2025-05-28 (3): Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Title: Highly Efficient and Effective LLMs with Multi-Boolean Architectures | Hocheffiziente und effektive LLMs mit Multi-Boolean-Architekturen | 多Boolean建筑群高效益、高效益、高效益、高效益、高效益、高效益的LLMs 2505.22811v1 |
Authors: Ba-Hien Tran, Van Minh Nguyen
Weight binarization has emerged as a promising strategy to drastically reduce the complexity of large language models (LLMs). It is mainly classified into two approaches: post-training binarization and finetuning with training-aware binarization methods. The first approach, while having low complexity, leads to significant loss of information from the original LLMs, resulting in poor performance. The second approach, on the other hand, relies heavily on full-precision latent weights for gradient approximation of binary weights, which not only remains suboptimal but also introduces substantial complexity. In this paper, we introduce a novel framework that effectively transforms LLMs into multi-kernel Boolean parameters, for the first time, finetunes them directly in the Boolean domain, eliminating the need for expensive latent weights. This significantly reduces complexity during both finetuning and inference. Through extensive and insightful experiments across a wide range of LLMs, we demonstrate that our method outperforms recent ultra low-bit quantization and binarization methods.
nan
Article 410
Title@2025-05-28 (3): Distribution free M-estimation
Title: Distribution free M-estimation | Verteilungsfreie M-Schätzung | 免费分发 M - 估计 2505.22807v1 |
Authors: John C. Duchi
The basic question of delineating those statistical problems that are solvable without making any assumptions on the underlying data distribution has long animated statistics and learning theory. This paper characterizes when a (univariate) convex M-estimation or stochastic optimization problem is solvable in such an assumption-free setting, providing a precise dividing line between solvable and unsolvable problems. The conditions we identify show, perhaps surprisingly, that Lipschitz continuity of the loss being minimized is not necessary for distribution free minimization, and they are also distinct from classical characterizations of learnability in machine learning.
nan
Article 411
Title@2025-05-28 (3): Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation
Title: Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation | Anomalien durch Synthese: Anomalieerkennung mit generativen Diffusionsmodellen für Off-Road-Navigation | 合成反常现象:使用非轨道导航生成扩散模型进行异常检测 2505.22805v1 |
Authors: Siddharth Ancha, Sunshine Jiang, Travis Manderson, Laura Brandt, Yilun Du, Philip R. Osteen, Nicholas Roy
In order to navigate safely and reliably in off-road and unstructured environments, robots must detect anomalies that are out-of-distribution (OOD) with respect to the training data. We present an analysis-by-synthesis approach for pixel-wise anomaly detection without making any assumptions about the nature of OOD data. Given an input image, we use a generative diffusion model to synthesize an edited image that removes anomalies while keeping the remaining image unchanged. Then, we formulate anomaly detection as analyzing which image segments were modified by the diffusion model. We propose a novel inference approach for guided diffusion by analyzing the ideal guidance gradient and deriving a principled approximation that bootstraps the diffusion model to predict guidance gradients. Our editing technique is purely test-time that can be integrated into existing workflows without the need for retraining or fine-tuning. Finally, we use a combination of vision-language foundation models to compare pixels in a learned feature space and detect semantically meaningful edits, enabling accurate anomaly detection for off-road navigation. Project website: https://siddancha.github.io/anomalies-by-diffusion-synthesis/
nan
Article 412
Title@2025-05-28 (3): CLUE: Neural Networks Calibration via Learning Uncertainty-Error alignment
Title: CLUE: Neural Networks Calibration via Learning Uncertainty-Error alignment | CLUE: Neurale Netzwerke Kalibrierung über Learning Uncertainty-Error Alignment | CLUE:通过学习不确定性-差错对齐校准神经网络 2505.22803v1 |
Authors: Pedro Mendes, Paolo Romano, David Garlan
Reliable uncertainty estimation is critical for deploying neural networks (NNs) in real-world applications. While existing calibration techniques often rely on post-hoc adjustments or coarse-grained binning methods, they remain limited in scalability, differentiability, and generalization across domains. In this work, we introduce CLUE (Calibration via Learning Uncertainty-Error Alignment), a novel approach that explicitly aligns predicted uncertainty with observed error during training, grounded in the principle that well-calibrated models should produce uncertainty estimates that match their empirical loss. CLUE adopts a novel loss function that jointly optimizes predictive performance and calibration, using summary statistics of uncertainty and loss as proxies. The proposed method is fully differentiable, domain-agnostic, and compatible with standard training pipelines. Through extensive experiments on vision, regression, and language modeling tasks, including out-of-distribution and domain-shift scenarios, we demonstrate that CLUE achieves superior calibration quality and competitive predictive performance with respect to state-of-the-art approaches without imposing significant computational overhead.
nan
Article 413
Title@2025-05-28 (3): Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Title: Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning | Instruct-SkillMix: Eine leistungsstarke Pipeline für LLM Instruction Tuning | 指令- SkillMix: 用于LLM 指令导导图的强大管道 2408.14774v4 |
Authors: Simran Kaur, Simon Park, Anirudh Goyal, Sanjeev Arora
We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core “skills” for instruction-following by directly prompting the model. This is inspired by LLM metacognition'' of Didolkar et al. (2024); (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty. The estimated cost of creating the dataset is under $600. Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just 4K examples, LLaMA-3-8B-Base achieves 42.76% length-controlled win rate on AlpacaEval 2.0, a level similar to frontier models like Claude 3 Opus and LLaMA-3.1-405B-Instruct. Ablation studies also suggest plausible reasons for why creating open instruction-tuning datasets via naive crowd-sourcing has proved difficult. In our dataset, adding 20% low quality answers (
shirkers’’) causes a noticeable degradation in performance. The Instruct-SkillMix pipeline seems flexible and adaptable to other settings.
nan
Article 414
Title@2025-05-28 (3): SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains
Title: SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains | SequentialBreak: Große Sprachmodelle können durch Einbetten von Jailbreak Prompts in Sequential Prompt Chains ausgeblendet werden | 顺序式布雷克:大语言模型可以通过将破狱线索嵌入顺序式提示链来蒙骗大语言模型 2411.06426v3 |
Authors: Bijoy Ahmed Saiem, MD Sadik Hossain Shanto, Rakib Ahsan, Md Rafi ur Rashid
As the integration of the Large Language Models (LLMs) into various applications increases, so does their susceptibility to misuse, raising significant security concerns. Numerous jailbreak attacks have been proposed to assess the security defense of LLMs. Current jailbreak attacks mainly rely on scenario camouflage, prompt obfuscation, prompt optimization, and prompt iterative optimization to conceal malicious prompts. In particular, sequential prompt chains in a single query can lead LLMs to focus on certain prompts while ignoring others, facilitating context manipulation. This paper introduces SequentialBreak, a novel jailbreak attack that exploits this vulnerability. We discuss several scenarios, not limited to examples like Question Bank, Dialog Completion, and Game Environment, where the harmful prompt is embedded within benign ones that can fool LLMs into generating harmful responses. The distinct narrative structures of these scenarios show that SequentialBreak is flexible enough to adapt to various prompt formats beyond those discussed. Extensive experiments demonstrate that SequentialBreak uses only a single query to achieve a substantial gain of attack success rate over existing baselines against both open-source and closed-source models. Through our research, we highlight the urgent need for more robust and resilient safeguards to enhance LLM security and prevent potential misuse. All the result files and website associated with this research are available in this GitHub repository: https://anonymous.4open.science/r/JailBreakAttack-4F3B/.
nan
Article 415
Title@2025-05-28 (3): Efficient Preimage Approximation for Neural Network Certification
Title: Efficient Preimage Approximation for Neural Network Certification | Effiziente Preimage-Annäherung für die Neural Network Zertifizierung | 神经网络认证的高效预感近似率 2505.22798v1 |
Authors: Anton Björklund, Mykola Zaitsev, Marta Kwiatkowska
The growing reliance on artificial intelligence in safety- and security-critical applications demands effective neural network certification. A challenging real-world use case is certification against ``patch attacks’’, where adversarial patches or lighting conditions obscure parts of images, for example traffic signs. One approach to certification, which also gives quantitative coverage estimates, utilizes preimages of neural networks, i.e., the set of inputs that lead to a specified output. However, these preimage approximation methods, including the state-of-the-art PREMAP algorithm, struggle with scalability. This paper presents novel algorithmic improvements to PREMAP involving tighter bounds, adaptive Monte Carlo sampling, and improved branching heuristics. We demonstrate efficiency improvements of at least an order of magnitude on reinforcement learning control benchmarks, and show that our method scales to convolutional neural networks that were previously infeasible. Our results demonstrate the potential of preimage approximation methodology for reliability and robustness certification.
nan
Article 416
Title@2025-05-28 (3): DeSocial: Blockchain-based Decentralized Social Networks
Title: DeSocial: Blockchain-based Decentralized Social Networks | DeSocial: Dezentrale soziale Netzwerke auf Blockchain-Basis | 社会:基于供应链的权力下放社会网络 2505.21388v2 |
Authors: Jingyuan Huang, Xi Zhu, Minghao Guo, Yongfeng Zhang
Web 2.0 social platforms are inherently centralized, with user data and algorithmic decisions controlled by the platform. However, users can only passively receive social predictions without being able to choose the underlying algorithm, which limits personalization. Fortunately, with the emergence of blockchain, users are allowed to choose algorithms that are tailored to their local situation, improving prediction results in a personalized way. In a blockchain environment, each user possesses its own model to perform the social prediction, capturing different perspectives on social interactions. In our work, we propose DeSocial, a decentralized social network learning framework deployed on an Ethereum (ETH) local development chain that integrates distributed data storage, node-level consensus, and user-driven model selection through Ganache. In the first stage, each user leverages DeSocial to evaluate multiple backbone models on their local subgraph. DeSocial coordinates the execution and returns model-wise prediction results, enabling the user to select the most suitable backbone for personalized social prediction. Then, DeSocial uniformly selects several validation nodes that possess the algorithm specified by each user, and aggregates the prediction results by majority voting, to prevent errors caused by any single model’s misjudgment. Extensive experiments show that DeSocial has an evident improvement compared to the five classical centralized social network learning models, promoting user empowerment in blockchain-based decentralized social networks, showing the importance of multi-node validation and personalized algorithm selection based on blockchain. Our implementation is available at: https://github.com/agiresearch/DeSocial.
nan
Article 417
Title@2025-05-28 (3): The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli
Title: The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli | Das Empirische Mittel ist Minimax Optimal für lokale Glivenko-Cantelli | 当地格利文科-坎泰利的经验中值为 Minimax 最佳当地格利文科-坎泰利 2410.02835v2 |
Authors: Doron Cohen, Aryeh Kontorovich, Roi Weiss
We revisit the recently introduced Local Glivenko-Cantelli setting, which studies distribution-dependent uniform convergence rates of the Empirical Mean Estimator (EME). In this work, we investigate generalizations of this setting where arbitrary estimators are allowed rather than just the EME. Can a strictly larger class of measures be learned? Can better risk decay rates be obtained? We provide exhaustive answers to these questions, which are both negative, provided the learner is barred from exploiting some infinite-dimensional pathologies. On the other hand, allowing such exploits does lead to a strictly larger class of learnable measures.
nan
Article 418
Title@2025-05-28 (3): KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Title: KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | KVQuant: In Richtung 10 Millionen Kontextlänge LLM-Inferenz mit KV Cache-Quantisierung | KVQuant: 努力达到1000万个内长长LLM 与 KV 缓存量推论 2401.18079v6 |
Authors: Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami
LLMs are seeing growing use for applications which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in sub-4-bit precision. Our work, KVQuant, facilitates low precision KV cache quantization by incorporating several novel methods: (i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution; (ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization; (iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions; and (iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges. By applying our method to the LLaMA, Llama-2, Llama-3, and Mistral models, we achieve < 0.1 perplexity degradation with 3-bit quantization on both Wikitext-2 and C4, outperforming existing approaches. Our method enables serving LLaMA-7B with a context length of up to 1 million on a single A100-80GB GPU and up to 10 million on an 8-GPU system. We develop custom CUDA kernels for KVQuant, showing that we can achieve up to ~1.7x speedups, compared to baseline fp16 matrix-vector multiplications, for the LLaMA-7B model.
nan
Article 419
Title@2025-05-28 (3): Navigating the Latent Space Dynamics of Neural Models
Title: Navigating the Latent Space Dynamics of Neural Models | Navigation der latenten Raumdynamik von Neuralmodellen | 导航内壳模型的冷层空间动态 2505.22785v1 |
Authors: Marco Fumero, Luca Moschella, Emanuele Rodolà, Francesco Locatello
Neural networks transform high-dimensional data into compact, structured representations, often modeled as elements of a lower dimensional latent space. In this paper, we present an alternative interpretation of neural models as dynamical systems acting on the latent manifold. Specifically, we show that autoencoder models implicitly define a latent vector field on the manifold, derived by iteratively applying the encoding-decoding map, without any additional training. We observe that standard training procedures introduce inductive biases that lead to the emergence of attractor points within this vector field. Drawing on this insight, we propose to leverage the vector field as a representation for the network, providing a novel tool to analyze the properties of the model and the data. This representation enables to: (i) analyze the generalization and memorization regimes of neural models, even throughout training; (ii) extract prior knowledge encoded in the network’s parameters from the attractors, without requiring any input data; (iii) identify out-of-distribution samples from their trajectories in the vector field. We further validate our approach on vision foundation models, showcasing the applicability and effectiveness of our method in real-world scenarios.
nan
Article 420
Title@2025-05-28 (3): On the definition and importance of interpretability in scientific machine learning
Title: On the definition and importance of interpretability in scientific machine learning | Zur Definition und Bedeutung der Deutbarkeit im wissenschaftlichen maschinellen Lernen | 关于科学机器学习中可解释性的定义和重要性 2505.13510v2 |
Authors: Conor Rowan, Alireza Doostan
Though neural networks trained on large datasets have been successfully used to describe and predict many physical phenomena, there is a sense among scientists that, unlike traditional scientific models comprising simple mathematical expressions, their findings cannot be integrated into the body of scientific knowledge. Critics of machine learning’s inability to produce human-understandable relationships have converged on the concept of “interpretability” as its point of departure from more traditional forms of science. As the growing interest in interpretability has shown, researchers in the physical sciences seek not just predictive models, but also to uncover the fundamental principles that govern a system of interest. However, clarity around a definition of interpretability and the precise role that it plays in science is lacking in the literature. In this work, we argue that researchers in equation discovery and symbolic regression tend to conflate the concept of sparsity with interpretability. We review key papers on interpretable machine learning from outside the scientific community and argue that, though the definitions and methods they propose can inform questions of interpretability for scientific machine learning (SciML), they are inadequate for this new purpose. Noting these deficiencies, we propose an operational definition of interpretability for the physical sciences. Our notion of interpretability emphasizes understanding of the mechanism over mathematical sparsity. Innocuous though it may seem, this emphasis on mechanism shows that sparsity is often unnecessary. It also questions the possibility of interpretable scientific discovery when prior knowledge is lacking. We believe a precise and philosophically informed definition of interpretability in SciML will help focus research efforts toward the most significant obstacles to realizing a data-driven scientific future.
nan
Article 421
Title@2025-05-28 (3): Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
Title: Adaptive Exploration for Multi-Reward Multi-Policy Evaluation | Adaptive Exploration für Multi-Reward Multi-Policy-Bewertung | 多方奖励多政策评价的适应性探索 2502.02516v2 |
Authors: Alessio Russo, Aldo Pacchiano
We study the policy evaluation problem in an online multi-reward multi-policy discounted setting, where multiple reward functions must be evaluated simultaneously for different policies. We adopt an $(\epsilon,\delta)$-PAC perspective to achieve $\epsilon$-accurate estimates with high confidence across finite or convex sets of rewards, a setting that has not been investigated in the literature. Building on prior work on Multi-Reward Best Policy Identification, we adapt the MR-NaS exploration scheme to jointly minimize sample complexity for evaluating different policies across different reward sets. Our approach leverages an instance-specific lower bound revealing how the sample complexity scales with a measure of value deviation, guiding the design of an efficient exploration policy. Although computing this bound entails a hard non-convex optimization, we propose an efficient convex approximation that holds for both finite and convex reward sets. Experiments in tabular domains demonstrate the effectiveness of this adaptive exploration scheme.
nan
Article 422
Title@2025-05-28 (3): Temporal Convolutional Autoencoder for Interference Mitigation in FMCW Radar Altimeters
Title: Temporal Convolutional Autoencoder for Interference Mitigation in FMCW Radar Altimeters | Temporal Convolutional Autoencoder für Interferenzmilderung in FMCW Radar Höhenmessern | FMCC 雷达测高仪中用于减少干扰干扰的时时变自动算器 2505.22783v1 |
Authors: Charles E. Thornton, Jamie Sloop, Samuel Brown, Aaron Orndorff, William C. Headley, Stephen Young
We investigate the end-to-end altitude estimation performance of a convolutional autoencoder-based interference mitigation approach for frequency-modulated continuous-wave (FMCW) radar altimeters. Specifically, we show that a Temporal Convolutional Network (TCN) autoencoder effectively exploits temporal correlations in the received signal, providing superior interference suppression compared to a Least Mean Squares (LMS) adaptive filter. Unlike existing approaches, the present method operates directly on the received FMCW signal. Additionally, we identify key challenges in applying deep learning to wideband FMCW interference mitigation and outline directions for future research to enhance real-time feasibility and generalization to arbitrary interference conditions.
nan
Article 423
Title@2025-05-28 (3): Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
Title: Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games | Finite-Sample-Konvergenzgrenzen für die Optimierung der Treuhandregion-Politik in Mittelfeld-Spielen | 平地运动会中信任区政策优化 2505.22781v1 |
Authors: Antonio Ocello, Daniil Tiapkin, Lorenzo Mancini, Mathieu Laurière, Eric Moulines
We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.
nan
Article 424
Title@2025-05-28 (3): Machine Learning Models Have a Supply Chain Problem
Title: Machine Learning Models Have a Supply Chain Problem | Modelle des maschinellen Lernens haben ein Problem mit der Lieferkette | 机器学习模式有供应链问题 2505.22778v1 |
Authors: Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac, Spencer Schrock, Laurent Simon, Ilia Shumailov
Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.
nan
Article 425
Title@2025-05-28 (3): GraphNarrator: Generating Textual Explanations for Graph Neural Networks
Title: GraphNarrator: Generating Textual Explanations for Graph Neural Networks | GraphNarrator: Erzeugen von Texterklärungen für Graph Neuronale Netzwerke | 图示记录器:生成图形神经网络的文字解释 2410.15268v2 |
Authors: Bo Pan, Zhen Xiong, Guanchen Wu, Zheng Zhang, Yifei Zhang, Liang Zhao
Graph representation learning has garnered significant attention due to its broad applications in various domains, such as recommendation systems and social network analysis. Despite advancements in graph learning methods, challenges still remain in explainability when graphs are associated with semantic features. In this paper, we present GraphNarrator, the first method designed to generate natural language explanations for Graph Neural Networks. GraphNarrator employs a generative language model that maps input-output pairs to explanations reflecting the model’s decision-making process. To address the lack of ground truth explanations to train the model, we propose first generating pseudo-labels that capture the model’s decisions from saliency-based explanations, then using Expert Iteration to iteratively train the pseudo-label generator based on training objectives on explanation quality. The high-quality pseudo-labels are finally utilized to train an end-to-end explanation generator model. Extensive experiments are conducted to demonstrate the effectiveness of GraphNarrator in producing faithful, concise, and human-preferred natural language explanations.
nan
Article 426
Title@2025-05-28 (3): The Value of Information in Human-AI Decision-making
Title: The Value of Information in Human-AI Decision-making | Der Wert von Informationen in der Mensch-AI-Entscheidungsfindung | 信息在人类-大赦国际决策中的价值 2502.06152v4 |
Authors: Ziyang Guo, Yifan Wu, Jason Hartline, Jessica Hullman
Multiple agents – including humans and AI models – are increasingly combined to make decisions with the expectation of achieving complementary performance, where the decisions they make together outperform those made individually. However, knowing how to improve the performance of collaborating agents is often difficult without knowing more about what particular information and strategies each agent employs. With a focus on human-AI pairings, we contribute a decision-theoretic framework for characterizing the value of information – and consequently, opportunities for agents to better exploit available information – in AI-assisted decision workflows. We present a novel explanation technique (ILIV-SHAP) that adapts SHAP explanations to highlight human-complementing information. We validate the effectiveness of the framework and ILIV-SHAP through a study of human-AI decision-making. We show that our measure of complementary information can be used to identify which AI model will best complement human decisions. We also find that presenting ILIV-SHAP with AI predictions leads to reliably greater reductions in error over non-AI assisted decisions more than vanilla SHAP.
nan
Article 427
Title@2025-05-28 (3): Calibrated Value-Aware Model Learning with Stochastic Environment Models
Title: Calibrated Value-Aware Model Learning with Stochastic Environment Models | Kalibriertes wertbewusstes Modelllernen mit stochastischen Umweltmodellen | 使用存储环境模型校准价值软件模型学习 2505.22772v1 |
Authors: Claas Voelcker, Anastasiia Pedan, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-massoud Farahmand
The idea of value-aware model learning, that models should produce accurate value estimates, has gained prominence in model-based reinforcement learning. The MuZero loss, which penalizes a model’s value function prediction compared to the ground-truth value function, has been utilized in several prominent empirical works in the literature. However, theoretical investigation into its strengths and weaknesses is limited. In this paper, we analyze the family of value-aware model learning losses, which includes the popular MuZero loss. We show that these losses, as normally used, are uncalibrated surrogate losses, which means that they do not always recover the correct model and value function. Building on this insight, we propose corrections to solve this issue. Furthermore, we investigate the interplay between the loss calibration, latent model architectures, and auxiliary losses that are commonly employed when training MuZero-style agents. We show that while deterministic models can be sufficient to predict accurate values, learning calibrated stochastic models is still advantageous.
nan
Article 428
Title@2025-05-28 (3): Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting
Title: Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting | Multivariate de Bruijn Graphen: Ein symbolisches Graphen-Framework für die Vorhersage von Zeitreihen | 布鲁伊图多变量图:时间序列预测符号图框架 2505.22768v1 |
Authors: Mert Onur Cakiroglu, Idil Bilge Altun, Hasan Kurban, Elham Buxton, Mehmet Dalkilic
Time series forecasting remains a challenging task for foundation models due to temporal heterogeneity, high dimensionality, and the lack of inherent symbolic structure. In this work, we propose DRAGON (Discrete Representation and Augmented Graph encoding Over deBruijN Graphs), a novel encoder that introduces Multivariate de Bruijn Graphs (MdBGs) to bridge the gap between symbolic representations and neural modeling. DRAGON discretizes continuous input sequences and maps them onto a fixed graph structure, enabling dynamic context recovery via graph-based attention. Integrated as an auxiliary module within a dual-branch architecture, DRAGON augments conventional CNN-based encoders with symbolic, structure-aware representations. All code developed for this study is available at: https://github.com/KurbanIntelligenceLab/MultdBG-Time-Series-Library
nan
Article 429
Title@2025-05-28 (3): Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks
Title: Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks | Degenerierung von Mess- und Regellösungen über aufgabenorientierte recurrente Neuralnetzwerke hinweg | 跨任务技术经常性神经网络的退化 2410.03972v2 |
Authors: Ann Huang, Satpreet H. Singh, Flavio Martinelli, Kanaka Rajan
Task-trained recurrent neural networks (RNNs) are widely used in neuroscience and machine learning to model dynamical computations. To gain mechanistic insight into how neural systems solve tasks, prior work often reverse-engineers individual trained networks. However, different RNNs trained on the same task and achieving similar performance can exhibit strikingly different internal solutions-a phenomenon known as solution degeneracy. Here, we develop a unified framework to systematically quantify and control solution degeneracy across three levels: behavior, neural dynamics, and weight space. We apply this framework to 3,400 RNNs trained on four neuroscience-relevant tasks-flip-flop memory, sine wave generation, delayed discrimination, and path integration-while systematically varying task complexity, learning regime, network size, and regularization. We find that higher task complexity and stronger feature learning reduce degeneracy in neural dynamics but increase it in weight space, with mixed effects on behavior. In contrast, larger networks and structural regularization reduce degeneracy at all three levels. These findings empirically validate the Contravariance Principle and provide practical guidance for researchers aiming to tailor RNN solutions-whether to uncover shared neural mechanisms or to model individual variability observed in biological systems. This work provides a principled framework for quantifying and controlling solution degeneracy in task-trained RNNs, offering new tools for building more interpretable and biologically grounded models of neural computation.
nan
Article 430
Title@2025-05-28 (3): Test-time augmentation improves efficiency in conformal prediction
Title: Test-time augmentation improves efficiency in conformal prediction | Testzeitvergrößerung verbessert die Effizienz in der konformen Vorhersage | 提高试验时间的提高提高符合预测的效率 2505.22764v1 |
Authors: Divya Shanmugam, Helen Lu, Swami Sankaranarayanan, John Guttag
A conformal classifier produces a set of predicted classes and provides a probabilistic guarantee that the set includes the true class. Unfortunately, it is often the case that conformal classifiers produce uninformatively large sets. In this work, we show that test-time augmentation (TTA)–a technique that introduces inductive biases during inference–reduces the size of the sets produced by conformal classifiers. Our approach is flexible, computationally efficient, and effective. It can be combined with any conformal score, requires no model retraining, and reduces prediction set sizes by 10%-14% on average. We conduct an evaluation of the approach spanning three datasets, three models, two established conformal scoring methods, different guarantee strengths, and several distribution shifts to show when and why test-time augmentation is a useful addition to the conformal pipeline.
nan
Article 431
Title@2025-05-28 (3): Generalizable Representation Learning for fMRI-based Neurological Disorder Identification
Title: Generalizable Representation Learning for fMRI-based Neurological Disorder Identification | Generalisierbares Repräsentationslernen für die fMRI-basierte neurologische Störungserkennung | FMRI基于神经疾病识别的神经疾病学学习 2412.16197v2 |
Authors: Wenhui Cui, Haleh Akrami, Anand A. Joshi, Richard M. Leahy
Despite the impressive advances achieved using deep learning for functional brain activity analysis, the heterogeneity of functional patterns and the scarcity of imaging data still pose challenges in tasks such as identifying neurological disorders. For functional Magnetic Resonance Imaging (fMRI), while data may be abundantly available from healthy controls, clinical data is often scarce, especially for rare diseases, limiting the ability of models to identify clinically-relevant features. We overcome this limitation by introducing a novel representation learning strategy integrating meta-learning with self-supervised learning to improve the generalization from normal to clinical features. This approach enables generalization to challenging clinical tasks featuring scarce training data. We achieve this by leveraging self-supervised learning on the control dataset to focus on inherent features that are not limited to a particular supervised task and incorporating meta-learning to improve the generalization across domains. To explore the generalizability of the learned representations to unseen clinical applications, we apply the model to four distinct clinical datasets featuring scarce and heterogeneous data for neurological disorder classification. Results demonstrate the superiority of our representation learning strategy on diverse clinically-relevant tasks. Code is publicly available at https://github.com/wenhui0206/MeTSK/tree/main
nan
Article 432
Title@2025-05-28 (3): MIAS-SAM: Medical Image Anomaly Segmentation without thresholding
Title: MIAS-SAM: Medical Image Anomaly Segmentation without thresholding | MIAS-SAM: Medizinische Bildanomalie Segmentierung ohne Schwellenbildung | MIAS-SAM: 医学形象非典型分割,无阈值 2505.22762v1 |
Authors: Marco Colussi, Dragan Ahmetovic, Sergio Mascetti
This paper presents MIAS-SAM, a novel approach for the segmentation of anomalous regions in medical images. MIAS-SAM uses a patch-based memory bank to store relevant image features, which are extracted from normal data using the SAM encoder. At inference time, the embedding patches extracted from the SAM encoder are compared with those in the memory bank to obtain the anomaly map. Finally, MIAS-SAM computes the center of gravity of the anomaly map to prompt the SAM decoder, obtaining an accurate segmentation from the previously extracted features. Differently from prior works, MIAS-SAM does not require to define a threshold value to obtain the segmentation from the anomaly map. Experimental results conducted on three publicly available datasets, each with a different imaging modality (Brain MRI, Liver CT, and Retina OCT) show accurate anomaly segmentation capabilities measured using DICE score. The code is available at: https://github.com/warpcut/MIAS-SAM
nan
Article 433
Title@2025-05-28 (3): Non-convex entropic mean-field optimization via Best Response flow
Title: Non-convex entropic mean-field optimization via Best Response flow | Nicht konvexe entropische Mittelfeld-Optimierung über Best Response Flow | 通过最佳反应流程优化非convex 电子中位平均场 2505.22760v1 |
Authors: Razvan-Andrei Lascu, Mateusz B. Majka
We study the problem of minimizing non-convex functionals on the space of probability measures, regularized by the relative entropy (KL divergence) with respect to a fixed reference measure, as well as the corresponding problem of solving entropy-regularized non-convex-non-concave min-max problems. We utilize the Best Response flow (also known in the literature as the fictitious play flow) and study how its convergence is influenced by the relation between the degree of non-convexity of the functional under consideration, the regularization parameter and the tail behaviour of the reference measure. In particular, we demonstrate how to choose the regularizer, given the non-convex functional, so that the Best Response operator becomes a contraction with respect to the $L^1$-Wasserstein distance, which then ensures the existence of its unique fixed point, which is then shown to be the unique global minimizer for our optimization problem. This extends recent results where the Best Response flow was applied to solve convex optimization problems regularized by the relative entropy with respect to arbitrary reference measures, and with arbitrary values of the regularization parameter. Our results explain precisely how the assumption of convexity can be relaxed, at the expense of making a specific choice of the regularizer. Additionally, we demonstrate how these results can be applied in reinforcement learning in the context of policy optimization for Markov Decision Processes and Markov games with softmax parametrized policies in the mean-field regime.
nan
Article 434
Title@2025-05-28 (3): FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
Title: FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference | FlashFormer: Ganzmodell-Kernel für effiziente Low-Batch-Inferenz | FlashFormer: 用于高效低批量推断的全模块内核 2505.22758v1 |
Authors: Aniruddha Nrusimha, William Brandon, Mayank Mishra, Yikang Shen, Rameswar Panda, Jonathan Ragan-Kelley, Yoon Kim
The size and compute characteristics of modern large language models have led to an increased interest in developing specialized kernels tailored for training and inference. Existing kernels primarily optimize for compute utilization, targeting the large-batch training and inference settings. However, low-batch inference, where memory bandwidth and kernel launch overheads contribute are significant factors, remains important for many applications of interest such as in edge deployment and latency-sensitive applications. This paper describes FlashFormer, a proof-of-concept kernel for accelerating single-batch inference for transformer-based large language models. Across various model sizes and quantizations settings, we observe nontrivial speedups compared to existing state-of-the-art inference kernels.
nan
Article 435
Title@2025-05-28 (3): Decomposing Elements of Problem Solving: What “Math” Does RL Teach?
Title: Decomposing Elements of Problem Solving: What “Math” Does RL Teach? | Zersetzende Elemente der Problemlösung: Was “Math” lehrt RL? | 问题解决的分解要素:RL教什么“马思”? 2505.22756v1 |
Authors: Tian Qin, Core Francisco Park, Mujin Kwun, Aaron Walsman, Eran Malach, Nikhil Anand, Hidenori Tanaka, David Alvarez-Melis
Mathematical reasoning tasks have become prominent benchmarks for assessing the reasoning capabilities of LLMs, especially with reinforcement learning (RL) methods such as GRPO showing significant performance gains. However, accuracy metrics alone do not support fine-grained assessment of capabilities and fail to reveal which problem-solving skills have been internalized. To better understand these capabilities, we propose to decompose problem solving into fundamental capabilities: Plan (mapping questions to sequences of steps), Execute (correctly performing solution steps), and Verify (identifying the correctness of a solution). Empirically, we find that GRPO mainly enhances the execution skill-improving execution robustness on problems the model already knows how to solve-a phenomenon we call temperature distillation. More importantly, we show that RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills. To explore RL’s impact more deeply, we construct a minimal, synthetic solution-tree navigation task as an analogy for mathematical problem-solving. This controlled setup replicates our empirical findings, confirming RL primarily boosts execution robustness. Importantly, in this setting, we identify conditions under which RL can potentially overcome the coverage wall through improved exploration and generalization to new solution paths. Our findings provide insights into the role of RL in enhancing LLM reasoning, expose key limitations, and suggest a path toward overcoming these barriers. Code is available at https://github.com/cfpark00/RL-Wall.
nan
Article 436
Title@2025-05-28 (3): Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling
Title: Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling | Darstellungsdynamiken von Diffusionsmodellen durch Low-Dimensional Modeling verstehen | 通过低多样性建模理解通过低多样性建模传播模型的动态 2502.05743v2 |
Authors: Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu
Diffusion models, though originally designed for generative tasks, have demonstrated impressive self-supervised representation learning capabilities. A particularly intriguing phenomenon in these models is the emergence of unimodal representation dynamics, where the quality of learned features peaks at an intermediate noise level. In this work, we conduct a comprehensive theoretical and empirical investigation of this phenomenon. Leveraging the inherent low-dimensionality structure of image data, we theoretically demonstrate that the unimodal dynamic emerges when the diffusion model successfully captures the underlying data distribution. The unimodality arises from an interplay between denoising strength and class confidence across noise scales. Empirically, we further show that, in classification tasks, the presence of unimodal dynamics reliably indicates generalization: it emerges when the model generalizes and gradually transitions to a monotonically decreasing curve as the model begins to memorize the training data.
nan
Article 437
Title@2025-05-28 (3): VideoRAG: Retrieval-Augmented Generation over Video Corpus
Title: VideoRAG: Retrieval-Augmented Generation over Video Corpus | VideoRAG: Retrieval-Augmented Generation über Video Corpus | VideoRAG: 利用视频公司回收的原始一代 2501.05874v3 |
Authors: Soyeong Jeong, Kangsan Kim, Jinheon Baek, Sung Ju Hwang
Retrieval-Augmented Generation (RAG) is a powerful strategy for improving the factual accuracy of models by retrieving external knowledge relevant to queries and incorporating it into the generation process. However, existing approaches primarily focus on text, with some recent advancements considering images, and they largely overlook videos, a rich source of multimodal knowledge capable of representing contextual details more effectively than any other modality. While very recent studies explore the use of videos in response generation, they either predefine query-associated videos without retrieval or convert videos into textual descriptions losing multimodal richness. To tackle these, we introduce VideoRAG, a framework that not only dynamically retrieves videos based on their relevance with queries but also utilizes both visual and textual information. The operation of VideoRAG is powered by recent Large Video Language Models (LVLMs), which enable the direct processing of video content to represent it for retrieval and the seamless integration of retrieved videos jointly with queries for response generation. Also, inspired by that the context size of LVLMs may not be sufficient to process all frames in extremely long videos and not all frames are equally important, we introduce a video frame selection mechanism to extract the most informative subset of frames, along with a strategy to extract textual information from videos (as it can aid the understanding of video content) when their subtitles are not available. We experimentally validate the effectiveness of VideoRAG, showcasing that it is superior to relevant baselines. Code is available at https://github.com/starsuzi/VideoRAG.
nan
Article 438
Title@2025-05-28 (3): Self-orthogonalizing attractor neural networks emerging from the free energy principle
Title: Self-orthogonalizing attractor neural networks emerging from the free energy principle | Selbst-orthogonalisierendes Attraktor-Neuralnetzwerk, das aus dem Prinzip der freien Energie entspringt | 根据自由能源原则建立的自我调整的吸引人神经网络 2505.22749v1 |
Authors: Tamas Spisak, Karl Friston
Attractor dynamics are a hallmark of many complex systems, including the brain. Understanding how such self-organizing dynamics emerge from first principles is crucial for advancing our understanding of neuronal computations and the design of artificial intelligence systems. Here we formalize how attractor networks emerge from the free energy principle applied to a universal partitioning of random dynamical systems. Our approach obviates the need for explicitly imposed learning and inference rules and identifies emergent, but efficient and biologically plausible inference and learning dynamics for such self-organizing systems. These result in a collective, multi-level Bayesian active inference process. Attractors on the free energy landscape encode prior beliefs; inference integrates sensory data into posterior beliefs; and learning fine-tunes couplings to minimize long-term surprise. Analytically and via simulations, we establish that the proposed networks favor approximately orthogonalized attractor representations, a consequence of simultaneously optimizing predictive accuracy and model complexity. These attractors efficiently span the input subspace, enhancing generalization and the mutual information between hidden causes and observable effects. Furthermore, while random data presentation leads to symmetric and sparse couplings, sequential data fosters asymmetric couplings and non-equilibrium steady-state dynamics, offering a natural extension to conventional Boltzmann Machines. Our findings offer a unifying theory of self-organizing attractor networks, providing novel insights for AI and neuroscience.
nan
Article 439
Title@2025-05-28 (3): An unsupervised method for MRI recovery: Deep image prior with structured sparsity
Title: An unsupervised method for MRI recovery: Deep image prior with structured sparsity | Eine unüberwachte Methode für die MRT-Wiederherstellung: Tiefenbild vor mit strukturierter Sparsamkeit | MRI 恢复的一种不受监督的方法: 结构宽度之前的深图像 2501.01482v3 |
Authors: Muhammad Ahmad Sultan, Chong Chen, Yingmin Liu, Katarzyna Gil, Karolina Zareba, Rizwan Ahmad
Objective: To propose and validate an unsupervised MRI reconstruction method that does not require fully sampled k-space data. Materials and Methods: The proposed method, deep image prior with structured sparsity (DISCUS), extends the deep image prior (DIP) by introducing group sparsity to frame-specific code vectors, enabling the discovery of a low-dimensional manifold for capturing temporal variations. \discus was validated using four studies: (I) simulation of a dynamic Shepp-Logan phantom to demonstrate its manifold discovery capabilities, (II) comparison with compressed sensing and DIP-based methods using simulated single-shot late gadolinium enhancement (LGE) image series from six distinct digital cardiac phantoms in terms of normalized mean square error (NMSE) and structural similarity index measure (SSIM), (III) evaluation on retrospectively undersampled single-shot LGE data from eight patients, and (IV) evaluation on prospectively undersampled single-shot LGE data from eight patients, assessed via blind scoring from two expert readers. Results: DISCUS outperformed competing methods, demonstrating superior reconstruction quality in terms of NMSE and SSIM (Studies I–III) and expert reader scoring (Study IV). Discussion: An unsupervised image reconstruction method is presented and validated on simulated and measured data. These developments can benefit applications where acquiring fully sampled data is challenging.
nan
Article 440
Title@2025-05-28 (3): StarBASE-GP: Biologically-Guided Automated Machine Learning for Genotype-to-Phenotype Association Analysis
Title: StarBASE-GP: Biologically-Guided Automated Machine Learning for Genotype-to-Phenotype Association Analysis | StarBASE-GP: Biologisch geführtes automatisiertes maschinelles Lernen für die Analyse von Genotyp-zu-Phenotyp-Verbindungen | StarBASE-GP: 基因型至极型协会分析的生物辅助自动计算机学习 2505.22746v1 |
Authors: Jose Guadalupe Hernandez, Attri Ghosh, Philip J. Freda, Yufei Meng, Nicholas Matsumoto, Jason H. Moore
We present the Star-Based Automated Single-locus and Epistasis analysis tool - Genetic Programming (StarBASE-GP), an automated framework for discovering meaningful genetic variants associated with phenotypic variation in large-scale genomic datasets. StarBASE-GP uses a genetic programming-based multi-objective optimization strategy to evolve machine learning pipelines that simultaneously maximize explanatory power (r2) and minimize pipeline complexity. Biological domain knowledge is integrated at multiple stages, including the use of nine inheritance encoding strategies to model deviations from additivity, a custom linkage disequilibrium pruning node that minimizes redundancy among features, and a dynamic variant recommendation system that prioritizes informative candidates for pipeline inclusion. We evaluate StarBASE-GP on a cohort of Rattus norvegicus (brown rat) to identify variants associated with body mass index, benchmarking its performance against a random baseline and a biologically naive version of the tool. StarBASE-GP consistently evolves Pareto fronts with superior performance, yielding higher accuracy in identifying both ground truth and novel quantitative trait loci, highlighting relevant targets for future validation. By incorporating evolutionary search and relevant biological theory into a flexible automated machine learning framework, StarBASE-GP demonstrates robust potential for advancing variant discovery in complex traits.
nan
Article 441
Title@2025-05-28 (3): Information-Computation Gaps in Quantum Learning via Low-Degree Likelihood
Title: Information-Computation Gaps in Quantum Learning via Low-Degree Likelihood | Informations-Computation Lücken im Quanten-Lernen über Low-Degree Likelihood | 通过低贫困风险学习的量子学习中的信息估计差距 2505.22743v1 |
Authors: Sitan Chen, Weiyuan Gong, Jonas Haferkamp, Yihui Quek
In a variety of physically relevant settings for learning from quantum data, designing protocols that can computationally efficiently extract information remains largely an art, and there are important cases where we believe this to be impossible, that is, where there is an information-computation gap. While there is a large array of tools in the classical literature for giving evidence for average-case hardness of statistical inference problems, the corresponding tools in the quantum literature are far more limited. One such framework in the classical literature, the low-degree method, makes predictions about hardness of inference problems based on the failure of estimators given by low-degree polynomials. In this work, we extend this framework to the quantum setting. We establish a general connection between state designs and low-degree hardness. We use this to obtain the first information-computation gaps for learning Gibbs states of random, sparse, non-local Hamiltonians. We also use it to prove hardness for learning random shallow quantum circuit states in a challenging model where states can be measured in adaptively chosen bases. To our knowledge, the ability to model adaptivity within the low-degree framework was open even in classical settings. In addition, we also obtain a low-degree hardness result for quantum error mitigation against strategies with single-qubit measurements. We define a new quantum generalization of the planted biclique problem and identify the threshold at which this problem becomes computationally hard for protocols that perform local measurements. Interestingly, the complexity landscape for this problem shifts when going from local measurements to more entangled single-copy measurements. We show average-case hardness for the “standard” variant of Learning Stabilizers with Noise and for agnostically learning product states.
nan
Article 442
Title@2025-05-28 (3): Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Title: Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing | Darstellung Shattering in Transformers: Synthetische Studie mit Wissensbearbeitung | 在变形器中代表变形器:带有知识编辑的合成研究 2410.17194v4 |
Authors: Kento Nishi, Maya Okawa, Rahul Ramesh, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana
Knowledge Editing (KE) algorithms alter models’ weights to perform targeted updates to incorrect, outdated, or otherwise unwanted factual associations. However, recent work has shown that applying KE can adversely affect models’ broader factual recall accuracy and diminish their reasoning abilities. Although these studies give insights into the potential harms of KE algorithms, e.g., performance evaluations on benchmarks, little is understood about why such destructive failures occur. Motivated by this, we define a novel synthetic task in which a Transformer is trained from scratch to internalize a “structured” knowledge graph. The structure enforces relationships between entities of the graph, such that editing a factual association has “trickling effects” on other entities (e.g., altering X’s parent is Y to Z affects who X’s siblings’ parent is). Through evaluations of edited models on this task, we show that KE inadvertently affects representations of entities beyond the targeted one, distorting relevant structures that allow a model to infer unseen knowledge about an entity. We call this phenomenon representation shattering and demonstrate that it degrades models’ factual recall and reasoning performance. We further corroborate our findings in naturalistic settings with pre-trained Llama and Mamba models as well. Overall, our work yields a precise mechanistic hypothesis to explain why KE has adverse effects on model abilities.
nan
Article 443
Title@2025-05-28 (3): AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models
Title: AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models | AutoL2S: Auto-Lang-Short-Reasoning für effiziente große Sprachmodelle | 自动L2S:高效大语言模式的自动长期短期理由 2505.22662v1 |
Authors: Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, Xia Hu
The reasoning-capable large language models (LLMs) demonstrate strong performance on complex reasoning tasks but often suffer from overthinking, generating unnecessarily long chain-of-thought (CoT) reasoning paths for easy reasoning questions, thereby increasing inference cost and latency. Recent approaches attempt to address this challenge by manually deciding when to apply long or short reasoning. However, they lack the flexibility to adapt CoT length dynamically based on question complexity. In this paper, we propose Auto Long-Short Reasoning (AutoL2S), a dynamic and model-agnostic framework that enables LLMs to dynamically compress their generated reasoning path based on the complexity of the reasoning question. AutoL2S enables a learned paradigm, in which LLMs themselves can decide when longer reasoning is necessary and when shorter reasoning suffices, by training on data annotated with our proposed method, which includes both long and short CoT paths and a special
nan
Article 444
Title@2025-05-28 (3): 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
Title: 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model | 3DLLM-Mem: Langzeit-Raum-Temporal-Speicher für körpereigenes 3D-Großsprachmodell | 3DLLM-Mem:3D大语言模型内嵌成的3D大语言长期空间-时间记忆 2505.22657v1 |
Authors: Wenbo Hu, Yining Hong, Yanjun Wang, Leison Gao, Zibu Wei, Xingcheng Yao, Nanyun Peng, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang
Humans excel at performing complex tasks by leveraging long-term memory across temporal and spatial experiences. In contrast, current Large Language Models (LLMs) struggle to effectively plan and act in dynamic, multi-room 3D environments. We posit that part of this limitation is due to the lack of proper 3D spatial-temporal memory modeling in LLMs. To address this, we first introduce 3DMem-Bench, a comprehensive benchmark comprising over 26,000 trajectories and 2,892 embodied tasks, question-answering and captioning, designed to evaluate an agent’s ability to reason over long-term memory in 3D environments. Second, we propose 3DLLM-Mem, a novel dynamic memory management and fusion model for embodied spatial-temporal reasoning and actions in LLMs. Our model uses working memory tokens, which represents current observations, as queries to selectively attend to and fuse the most useful spatial and temporal features from episodic memory, which stores past observations and interactions. Our approach allows the agent to focus on task-relevant information while maintaining memory efficiency in complex, long-horizon environments. Experimental results demonstrate that 3DLLM-Mem achieves state-of-the-art performance across various tasks, outperforming the strongest baselines by 16.5% in success rate on 3DMem-Bench’s most challenging in-the-wild embodied tasks.
nan
Article 445
Title@2025-05-28 (3): Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents
Title: Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents | Position: Ungewissheitsquantifizierung braucht eine Neubewertung für großsprachige Modellagenten | 位置:大语言示范物剂的不确定性量化需求评估 2505.22655v1 |
Authors: Michael Kirchhof, Gjergji Kasneci, Enkelejda Kasneci
Large-language models (LLMs) and chatbot agents are known to provide wrong outputs at times, and it was recently found that this can never be fully prevented. Hence, uncertainty quantification plays a crucial role, aiming to quantify the level of ambiguity in either one overall number or two numbers for aleatoric and epistemic uncertainty. This position paper argues that this traditional dichotomy of uncertainties is too limited for the open and interactive setup that LLM agents operate in when communicating with a user, and that we need to research avenues that enrich uncertainties in this novel scenario. We review the literature and find that popular definitions of aleatoric and epistemic uncertainties directly contradict each other and lose their meaning in interactive LLM agent settings. Hence, we propose three novel research directions that focus on uncertainties in such human-computer interactions: Underspecification uncertainties, for when users do not provide all information or define the exact task at the first go, interactive learning, to ask follow-up questions and reduce the uncertainty about the current context, and output uncertainties, to utilize the rich language and speech space to express uncertainties as more than mere numbers. We expect that these new ways of dealing with and communicating uncertainties will lead to LLM agent interactions that are more transparent, trustworthy, and intuitive.
nan
Article 446
Title@2025-05-28 (3): Sherlock: Self-Correcting Reasoning in Vision-Language Models
Title: Sherlock: Self-Correcting Reasoning in Vision-Language Models | Sherlock: Selbstkorrekte Vernunft in Vision-Sprachen-Modellen | 夏洛克:视觉语言模型中的自我校正理由 2505.22651v1 |
Authors: Yi Ding, Ruqi Zhang
Reasoning Vision-Language Models (VLMs) have shown promising performance on complex multimodal tasks. However, they still face significant challenges: they are highly sensitive to reasoning errors, require large volumes of annotated data or accurate verifiers, and struggle to generalize beyond specific domains. To address these limitations, we explore self-correction as a strategy to enhance reasoning VLMs. We first conduct an in-depth analysis of reasoning VLMs’ self-correction abilities and identify key gaps. Based on our findings, we introduce Sherlock, a self-correction and self-improvement training framework. Sherlock introduces a trajectory-level self-correction objective, a preference data construction method based on visual perturbation, and a dynamic $\beta$ for preference tuning. Once the model acquires self-correction capabilities using only 20k randomly sampled annotated data, it continues to self-improve without external supervision. Built on the Llama3.2-Vision-11B model, Sherlock achieves remarkable results across eight benchmarks, reaching an average accuracy of 64.1 with direct generation and 65.4 after self-correction. It outperforms LLaVA-CoT (63.2), Mulberry (63.9), and LlamaV-o1 (63.4) while using less than 20% of the annotated data.
nan
Article 447
Title@2025-05-28 (3): On Learning Verifiers for Chain-of-Thought Reasoning
Title: On Learning Verifiers for Chain-of-Thought Reasoning | Über das Lernen von Prüfern für die Ketten-of-Thought-Reasoning | 关于研究链理由的学习验证符 2505.22650v1 |
Authors: Maria-Florina Balcan, Avrim Blum, Zhiyuan Li, Dravyansh Sharma
Chain-of-Thought reasoning has emerged as a powerful approach for solving complex mathematical and logical problems. However, it can often veer off track through incorrect or unsubstantiated inferences. Formal mathematical reasoning, which can be checked with a formal verifier, is one approach to addressing this issue. However, currently LLMs are simply not good enough to solve complex problems in a formal way, and even just formalizing an informal problem statement can be challenging. Motivated by this fact, in this work we consider the problem of learning reliable verifiers for natural language Chain-of-Thought reasoning. That is, given a problem statement and step-by-step solution in natural language, the aim of the verifier is to output [Yes] if the reasoning steps in the solution are all valid, and [No] otherwise. In this work we give a formal PAC-learning framework for studying this problem. We propose and analyze several natural verification goals, at different levels of strength, in this framework. We provide sample complexity upper-bounds for learning verifiers satisfying these goals, as well as lower-bound and impossibility results for learning other natural verification objectives without additional assumptions.
nan
Article 448
Title@2025-05-28 (3): Private Rate-Constrained Optimization with Applications to Fair Learning
Title: Private Rate-Constrained Optimization with Applications to Fair Learning | Private Rate-Constrained Optimization mit Anwendungen für faires Lernen | 利用公平学习申请实现优化 2505.22703v1 |
Authors: Mohammad Yaghini, Tudor Cebere, Michael Menart, Aurélien Bellet, Nicolas Papernot
Many problems in trustworthy ML can be formulated as minimization of the model error under constraints on the prediction rates of the model for suitably-chosen marginals, including most group fairness constraints (demographic parity, equality of odds, etc.). In this work, we study such constrained minimization problems under differential privacy (DP). Standard DP optimization techniques like DP-SGD rely on the loss function’s decomposability into per-sample contributions. However, rate constraints introduce inter-sample dependencies, violating the decomposability requirement. To address this, we develop RaCO-DP, a DP variant of the Stochastic Gradient Descent-Ascent (SGDA) algorithm which solves the Lagrangian formulation of rate constraint problems. We demonstrate that the additional privacy cost of incorporating these constraints reduces to privately estimating a histogram over the mini-batch at each optimization step. We prove the convergence of our algorithm through a novel analysis of SGDA that leverages the linear structure of the dual parameter. Finally, empirical results on learning under group fairness constraints demonstrate that our method Pareto-dominates existing private learning approaches in fairness-utility trade-offs.
nan
Article 449
Title@2025-05-28 (3): Spectral Survival Analysis
Title: Spectral Survival Analysis | Spektrale Überlebensanalyse | 光谱生存分析 2505.22641v1 |
Authors: Chengzhi Shi, Stratis Ioannidis
Survival analysis is widely deployed in a diverse set of fields, including healthcare, business, ecology, etc. The Cox Proportional Hazard (CoxPH) model is a semi-parametric model often encountered in the literature. Despite its popularity, wide deployment, and numerous variants, scaling CoxPH to large datasets and deep architectures poses a challenge, especially in the high-dimensional regime. We identify a fundamental connection between rank regression and the CoxPH model: this allows us to adapt and extend the so-called spectral method for rank regression to survival analysis. Our approach is versatile, naturally generalizing to several CoxPH variants, including deep models. We empirically verify our method’s scalability on multiple real-world high-dimensional datasets; our method outperforms legacy methods w.r.t. predictive performance and efficiency.
nan
Article 450
Title@2025-05-28 (3): SimProcess: High Fidelity Simulation of Noisy ICS Physical Processes
Title: SimProcess: High Fidelity Simulation of Noisy ICS Physical Processes | SimProcess: Hohe Fidelity-Simulation von lärmigen ICS-Physischen Prozessen | 中间过程:高菲力模拟有噪音的ICS物理过程 2505.22638v1 |
Authors: Denis Donadel, Gabriele Crestanello, Giulio Morandini, Daniele Antonioli, Mauro Conti, Massimo Merro
Industrial Control Systems (ICS) manage critical infrastructures like power grids and water treatment plants. Cyberattacks on ICSs can disrupt operations, causing severe economic, environmental, and safety issues. For example, undetected pollution in a water plant can put the lives of thousands at stake. ICS researchers have increasingly turned to honeypots – decoy systems designed to attract attackers, study their behaviors, and eventually improve defensive mechanisms. However, existing ICS honeypots struggle to replicate the ICS physical process, making them susceptible to detection. Accurately simulating the noise in ICS physical processes is challenging because different factors produce it, including sensor imperfections and external interferences. In this paper, we propose SimProcess, a novel framework to rank the fidelity of ICS simulations by evaluating how closely they resemble real-world and noisy physical processes. It measures the simulation distance from a target system by estimating the noise distribution with machine learning models like Random Forest. Unlike existing solutions that require detailed mathematical models or are limited to simple systems, SimProcess operates with only a timeseries of measurements from the real system, making it applicable to a broader range of complex dynamic systems. We demonstrate the framework’s effectiveness through a case study using real-world power grid data from the EPIC testbed. We compare the performance of various simulation methods, including static and generative noise techniques. Our model correctly classifies real samples with a recall of up to 1.0. It also identifies Gaussian and Gaussian Mixture as the best distribution to simulate our power systems, together with a generative solution provided by an autoencoder, thereby helping developers to improve honeypot fidelity. Additionally, we make our code publicly available.
nan
Article 451
Title@2025-05-28 (3): Understanding (Un)Reliability of Steering Vectors in Language Models
Title: Understanding (Un)Reliability of Steering Vectors in Language Models | Verständnis (Un)Zuverlässigkeit von Steuerungsvektoren in Sprachmodellen | (un) 语言模式指导矢量的可靠性 2505.22637v1 |
Authors: Joschka Braun, Carsten Eickhoff, David Krueger, Seyed Ali Bahrainian, Dmitrii Krasheninnikov
Steering vectors are a lightweight method to control language model behavior by adding a learned bias to the activations at inference time. Although steering demonstrates promising performance, recent work shows that it can be unreliable or even counterproductive in some cases. This paper studies the influence of prompt types and the geometry of activation differences on steering reliability. First, we find that all seven prompt types used in our experiments produce a net positive steering effect, but exhibit high variance across samples, and often give an effect opposite of the desired one. No prompt type clearly outperforms the others, and yet the steering vectors resulting from the different prompt types often differ directionally (as measured by cosine similarity). Second, we show that higher cosine similarity between training set activation differences predicts more effective steering. Finally, we observe that datasets where positive and negative activations are better separated are more steerable. Our results suggest that vector steering is unreliable when the target behavior is not represented by a coherent direction.
nan
Article 452
Title@2025-05-28 (3): Spatial Knowledge Graph-Guided Multimodal Synthesis
Title: Spatial Knowledge Graph-Guided Multimodal Synthesis | Raumwissen Graph-geführte multimodale Synthese | 空间知识图表辅助多模式合成 2505.22633v1 |
Authors: Yida Xue, Zhen Bi, Jinnan Yang, Jungang Lou, Huajun Chen, Ningyu Zhang
Recent advances in multimodal large language models (MLLMs) have significantly enhanced their capabilities; however, their spatial perception abilities remain a notable limitation. To address this challenge, multimodal data synthesis offers a promising solution. Yet, ensuring that synthesized data adhere to spatial common sense is a non-trivial task. In this work, we introduce SKG2Data, a novel multimodal synthesis approach guided by spatial knowledge graphs, grounded in the concept of knowledge-to-data generation. SKG2Data automatically constructs a Spatial Knowledge Graph (SKG) to emulate human-like perception of spatial directions and distances, which is subsequently utilized to guide multimodal data synthesis. Extensive experiments demonstrate that data synthesized from diverse types of spatial knowledge, including direction and distance, not only enhance the spatial perception and reasoning abilities of MLLMs but also exhibit strong generalization capabilities. We hope that the idea of knowledge-based data synthesis can advance the development of spatial intelligence.
nan
Article 453
Title@2025-05-28 (3): GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Title: GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks | GraphOmni: Ein umfassender und erweiterbarer Benchmark-Rahmen für große Sprachmodelle zu graphtheoretischen Aufgaben | 图图Omni:图理学任务大语言模型综合和可扩展基准框架 2504.12764v3 |
Authors: Hao Xu, Xiangru Jian, Xinjian Zhao, Wei Pang, Chao Zhang, Suyuchen Wang, Qixin Zhang, Zhengyuan Dong, Joao Monteiro, Bang Liu, Qiuzhuang Sun, Tianshu Yu
This paper introduces GraphOmni, a comprehensive benchmark designed to evaluate the reasoning capabilities of LLMs on graph-theoretic tasks articulated in natural language. GraphOmni encompasses diverse graph types, serialization formats, and prompting schemes, significantly exceeding prior efforts in both scope and depth. Through extensive systematic evaluation, we identify critical interactions among these dimensions, demonstrating their substantial impact on model performance. Our experiments reveal that state-of-the-art models like Claude-3.5 and o4-mini consistently outperform other models, yet even these leading models exhibit substantial room for improvement. Performance variability is evident depending on the specific combinations of factors we considered, underscoring the necessity of comprehensive evaluations across these interconnected dimensions. Additionally, we observe distinct impacts of serialization and prompting strategies between open-source and closed-source models, encouraging the development of tailored approaches. Motivated by the findings, we also propose a reinforcement learning-inspired framework that adaptively selects the optimal factors influencing LLM reasoning capabilities. This flexible and extendable benchmark not only deepens our understanding of LLM performance on structured tasks but also provides a robust foundation for advancing research in LLM-based graph reasoning. The code and datasets are available at https://github.com/GAI-Community/GraphOmni.
nan
Article 454
Title@2025-05-28 (3): SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning
Title: SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning | SCIZOR: Ein selbstüberwachter Ansatz zur Datenkuration für großflächiges Imitationslernen | SCIZOR: 大规模模拟学习数据计算法的自我监督办法 2505.22626v1 |
Authors: Yu Zhang, Yuqi Xie, Huihan Liu, Rutav Shah, Michael Wan, Linxi Fan, Yuke Zhu
Imitation learning advances robot capabilities by enabling the acquisition of diverse behaviors from human demonstrations. However, large-scale datasets used for policy training often introduce substantial variability in quality, which can negatively impact performance. As a result, automatically curating datasets by filtering low-quality samples to improve quality becomes essential. Existing robotic curation approaches rely on costly manual annotations and perform curation at a coarse granularity, such as the dataset or trajectory level, failing to account for the quality of individual state-action pairs. To address this, we introduce SCIZOR, a self-supervised data curation framework that filters out low-quality state-action pairs to improve the performance of imitation learning policies. SCIZOR targets two complementary sources of low-quality data: suboptimal data, which hinders learning with undesirable actions, and redundant data, which dilutes training with repetitive patterns. SCIZOR leverages a self-supervised task progress predictor for suboptimal data to remove samples lacking task progression, and a deduplication module operating on joint state-action representation for samples with redundant patterns. Empirically, we show that SCIZOR enables imitation learning policies to achieve higher performance with less data, yielding an average improvement of 15.4% across multiple benchmarks. More information is available at: https://ut-austin-rpl.github.io/SCIZOR/
nan
Article 455
Title@2025-05-28 (3): Principled Out-of-Distribution Generalization via Simplicity
Title: Principled Out-of-Distribution Generalization via Simplicity | Prinzipielle Nicht-Verteilung Verallgemeinerung über Einfachheit | 通过简单化普遍化 2505.22622v1 |
Authors: Jiawei Ge, Amanda Wang, Shange Tang, Chi Jin
Modern foundation models exhibit remarkable out-of-distribution (OOD) generalization, solving tasks far beyond the support of their training data. However, the theoretical principles underpinning this phenomenon remain elusive. This paper investigates this problem by examining the compositional generalization abilities of diffusion models in image generation. Our analysis reveals that while neural network architectures are expressive enough to represent a wide range of models – including many with undesirable behavior on OOD inputs – the true, generalizable model that aligns with human expectations typically corresponds to the simplest among those consistent with the training data. Motivated by this observation, we develop a theoretical framework for OOD generalization via simplicity, quantified using a predefined simplicity metric. We analyze two key regimes: (1) the constant-gap setting, where the true model is strictly simpler than all spurious alternatives by a fixed gap, and (2) the vanishing-gap setting, where the fixed gap is replaced by a smoothness condition ensuring that models close in simplicity to the true model yield similar predictions. For both regimes, we study the regularized maximum likelihood estimator and establish the first sharp sample complexity guarantees for learning the true, generalizable, simple model.
nan
Article 456
Title@2025-05-28 (3): The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Title: The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models | Der Entropie-Mechanismus des Verstärkten Lernens für sinnvolle Sprachmodelle | 理由语言模式强化学习的全英机制 2505.22617v1 |
Authors: Ganqu Cui, Yuchen Zhang, Jiacheng Chen, Lifan Yuan, Zhi Wang, Yuxin Zuo, Haozhan Li, Yuchen Fan, Huayu Chen, Weize Chen, Zhiyuan Liu, Hao Peng, Lei Bai, Wanli Ouyang, Yu Cheng, Bowen Zhou, Ning Ding
This paper aims to overcome a major obstacle in scaling RL for reasoning with LLMs, namely the collapse of policy entropy. Such phenomenon is consistently observed across vast RL runs without entropy intervention, where the policy entropy dropped sharply at the early training stage, this diminished exploratory ability is always accompanied with the saturation of policy performance. In practice, we establish a transformation equation R=-a*e^H+b between entropy H and downstream performance R. This empirical law strongly indicates that, the policy performance is traded from policy entropy, thus bottlenecked by its exhaustion, and the ceiling is fully predictable H=0, R=-a+b. Our finding necessitates entropy management for continuous exploration toward scaling compute for RL. To this end, we investigate entropy dynamics both theoretically and empirically. Our derivation highlights that, the change in policy entropy is driven by the covariance between action probability and the change in logits, which is proportional to its advantage when using Policy Gradient-like algorithms. Empirical study shows that, the values of covariance term and entropy differences matched exactly, supporting the theoretical conclusion. Moreover, the covariance term stays mostly positive throughout training, further explaining why policy entropy would decrease monotonically. Through understanding the mechanism behind entropy dynamics, we motivate to control entropy by restricting the update of high-covariance tokens. Specifically, we propose two simple yet effective techniques, namely Clip-Cov and KL-Cov, which clip and apply KL penalty to tokens with high covariances respectively. Experiments show that these methods encourage exploration, thus helping policy escape entropy collapse and achieve better downstream performance.
nan
Article 457
Title@2025-05-28 (3): Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Title: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning | Bridging Supervised Learning und Verstärkung Lernen in Mathe-Reasoning | 在数学原因方面的受监督学习和强化学习架桥 2505.18116v2 |
Authors: Huayu Chen, Kaiwen Zheng, Qinsheng Zhang, Ganqu Cui, Yin Cui, Haotian Ye, Tsung-Yi Lin, Ming-Yu Liu, Jun Zhu, Haoxiang Wang
Reinforcement Learning (RL) has played a central role in the recent surge of LLMs’ math abilities by enabling self-improvement through binary verifier signals. In contrast, Supervised Learning (SL) is rarely considered for such verification-driven training, largely due to its heavy reliance on reference answers and inability to reflect on mistakes. In this work, we challenge the prevailing notion that self-improvement is exclusive to RL and propose Negative-aware Fine-Tuning (NFT) – a supervised approach that enables LLMs to reflect on their failures and improve autonomously with no external teachers. In online training, instead of throwing away self-generated negative answers, NFT constructs an implicit negative policy to model them. This implicit policy is parameterized with the same positive LLM we target to optimize on positive data, enabling direct policy optimization on all LLMs’ generations. We conduct experiments on 7B and 32B models in math reasoning tasks. Results consistently show that through the additional leverage of negative feedback, NFT significantly improves over SL baselines like Rejection sampling Fine-Tuning, matching or even surpassing leading RL algorithms like GRPO and DAPO. Furthermore, we demonstrate that NFT and GRPO are actually equivalent in strict-on-policy training, even though they originate from entirely different theoretical foundations. Our experiments and theoretical findings bridge the gap between SL and RL methods in binary-feedback learning systems.
nan
Article 458
Title@2025-05-28 (3): Fully Heteroscedastic Count Regression with Deep Double Poisson Networks
Title: Fully Heteroscedastic Count Regression with Deep Double Poisson Networks | Voll heterogene Grafenregression mit tiefen Doppelpoisson-Netzwerken | 带有深双 Poisson 网络的全导流计数回归 2406.09262v4 |
Authors: Spencer Young, Porter Jenkins, Longchao Da, Jeff Dotson, Hua Wei
Neural networks capable of accurate, input-conditional uncertainty representation are essential for real-world AI systems. Deep ensembles of Gaussian networks have proven highly effective for continuous regression due to their ability to flexibly represent aleatoric uncertainty via unrestricted heteroscedastic variance, which in turn enables accurate epistemic uncertainty estimation. However, no analogous approach exists for count regression, despite many important applications. To address this gap, we propose the Deep Double Poisson Network (DDPN), a novel neural discrete count regression model that outputs the parameters of the Double Poisson distribution, enabling arbitrarily high or low predictive aleatoric uncertainty for count data and improving epistemic uncertainty estimation when ensembled. We formalize and prove that DDPN exhibits robust regression properties similar to heteroscedastic Gaussian models via learnable loss attenuation, and introduce a simple loss modification to control this behavior. Experiments on diverse datasets demonstrate that DDPN outperforms current baselines in accuracy, calibration, and out-of-distribution detection, establishing a new state-of-the-art in deep count regression.
nan
Article 459
Title@2025-05-28 (3): Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
Title: Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency | Abgeschirmte Diffusion: Erzeugen von neuen und vielfältigen Bildern mit Sparse Repellency | 盾牌扩散:利用微缩生成新奇和多样化图像 2410.06025v3 |
Authors: Michael Kirchhof, James Thornton, Louis Béthune, Pierre Ablin, Eugene Ndiaye, Marco Cuturi
The adoption of text-to-image diffusion models raises concerns over reliability, drawing scrutiny under the lens of various metrics like calibration, fairness, or compute efficiency. We focus in this work on two issues that arise when deploying these models: a lack of diversity when prompting images, and a tendency to recreate images from the training set. To solve both problems, we propose a method that coaxes the sampled trajectories of pretrained diffusion models to land on images that fall outside of a reference set. We achieve this by adding repellency terms to the diffusion SDE throughout the generation trajectory, which are triggered whenever the path is expected to land too closely to an image in the shielded reference set. Our method is sparse in the sense that these repellency terms are zero and inactive most of the time, and even more so towards the end of the generation trajectory. Our method, named SPELL for sparse repellency, can be used either with a static reference set that contains protected images, or dynamically, by updating the set at each timestep with the expected images concurrently generated within a batch, and with the images of previously generated batches. We show that adding SPELL to popular diffusion models improves their diversity while impacting their FID only marginally, and performs comparatively better than other recent training-free diversity methods. We also demonstrate how SPELL can ensure a shielded generation away from a very large set of protected images by considering all 1.2M images from ImageNet as the protected set.
nan
Article 460
Title@2025-05-28 (3): Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay
Title: Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay | Inverse Probleme mit tiefen linearen neuralen Netzwerken lösen: Globale Konvergenzgarantien für gradienten Abstieg mit Gewichtsverfall | 解决深线神经神经网络的反面问题:全球一致保障渐变后裔与体重衰减 2502.15522v2 |
Authors: Hannah Laus, Suzanna Parkinson, Vasileios Charisopoulos, Felix Krahmer, Rebecca Willett
Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few measurements generated via a known acquisition procedure. In particular, neural networks perform well empirically but have limited theoretical guarantees. In this work, we study an underdetermined linear inverse problem that admits several possible solution mappings. A standard remedy (e.g., in compressed sensing) establishing uniqueness of the solution mapping is to assume knowledge of latent low-dimensional structure in the source signal. We ask the following question: do deep neural networks adapt to this low-dimensional structure when trained by gradient descent with weight decay regularization? We prove that mildly overparameterized deep linear networks trained in this manner converge to an approximate solution that accurately solves the inverse problem while implicitly encoding latent subspace structure. To our knowledge, this is the first result to rigorously show that deep linear networks trained with weight decay automatically adapt to latent subspace structure in the data under practical stepsize and weight initialization schemes. Our work highlights that regularization and overparameterization improve generalization, while overparameterization also accelerates convergence during training.
nan
Article 461
Title@2025-05-28 (3): Chest Disease Detection In X-Ray Images Using Deep Learning Classification Method
Title: Chest Disease Detection In X-Ray Images Using Deep Learning Classification Method | Brusterkrankungen Detektion in Röntgenbildern mit Deep Learning-Klassifikationsmethode | 利用深学习分类方法在X射线图像中检测胸前疾病 2505.22609v1 |
Authors: Alanna Hazlett, Naomi Ohashi, Timothy Rodriguez, Sodiq Adewole
In this work, we investigate the performance across multiple classification models to classify chest X-ray images into four categories of COVID-19, pneumonia, tuberculosis (TB), and normal cases. We leveraged transfer learning techniques with state-of-the-art pre-trained Convolutional Neural Networks (CNNs) models. We fine-tuned these pre-trained architectures on a labeled medical x-ray images. The initial results are promising with high accuracy and strong performance in key classification metrics such as precision, recall, and F1 score. We applied Gradient-weighted Class Activation Mapping (Grad-CAM) for model interpretability to provide visual explanations for classification decisions, improving trust and transparency in clinical applications.
nan
Article 462
Title@2025-05-28 (3): AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling
Title: AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling | AutoElicit: Mit großen Sprachmodellen für vorausschauende Modellierung von Expertenvoraussagen | 自动:在预测模拟中使用大语言模型,供专家使用 2411.17284v5 |
Authors: Alexander Capstick, Rahul G. Krishnan, Payam Barnaghi
Large language models (LLMs) acquire a breadth of information across various domains. However, their computational complexity, cost, and lack of transparency often hinder their direct application for predictive tasks where privacy and interpretability are paramount. In fields such as healthcare, biology, and finance, specialised and interpretable linear models still hold considerable value. In such domains, labelled data may be scarce or expensive to obtain. Well-specified prior distributions over model parameters can reduce the sample complexity of learning through Bayesian inference; however, eliciting expert priors can be time-consuming. We therefore introduce AutoElicit to extract knowledge from LLMs and construct priors for predictive models. We show these priors are informative and can be refined using natural language. We perform a careful study contrasting AutoElicit with in-context learning and demonstrate how to perform model selection between the two methods. We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning. We show that AutoElicit saves over 6 months of labelling effort when building a new predictive model for urinary tract infections from sensor recordings of people living with dementia.
nan
Article 463
Title@2025-05-28 (3): One Rank at a Time: Cascading Error Dynamics in Sequential Learning
Title: One Rank at a Time: Cascading Error Dynamics in Sequential Learning | Ein Rang zu einer Zeit: Cascading Error Dynamics in Sequential Learning | 一次一排: 序列学习中连带错误动态 2505.22602v1 |
Authors: Mahtab Alizadeh Vandchali, Fangshuo, Liao, Anastasios Kyrillidis
Sequential learning – where complex tasks are broken down into simpler, hierarchical components – has emerged as a paradigm in AI. This paper views sequential learning through the lens of low-rank linear regression, focusing specifically on how errors propagate when learning rank-1 subspaces sequentially. We present an analysis framework that decomposes the learning process into a series of rank-1 estimation problems, where each subsequent estimation depends on the accuracy of previous steps. Our contribution is a characterization of the error propagation in this sequential process, establishing bounds on how errors – e.g., due to limited computational budgets and finite precision – affect the overall model accuracy. We prove that these errors compound in predictable ways, with implications for both algorithmic design and stability guarantees.
nan
Article 464
Title@2025-05-28 (3): Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching
Title: Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching | Adjoint Sampling: Hoch skalierbare Diffusions-Probenehmer über Adjoint Matching | 联合采样:通过联合配配制的高可缩放扩散采样器 2504.11713v3 |
Authors: Aaron Havens, Benjamin Kurt Miller, Bing Yan, Carles Domingo-Enrich, Anuroop Sriram, Brandon Wood, Daniel Levine, Bin Hu, Brandon Amos, Brian Karrer, Xiang Fu, Guan-Horng Liu, Ricky T. Q. Chen
We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities, or energy functions. It is the first on-policy approach that allows significantly more gradient updates than the number of energy evaluations and model samples, allowing us to scale to much larger problem settings than previously explored by similar methods. Our framework is theoretically grounded in stochastic optimal control and shares the same theoretical guarantees as Adjoint Matching, being able to train without the need for corrective measures that push samples towards the target distribution. We show how to incorporate key symmetries, as well as periodic boundary conditions, for modeling molecules in both cartesian and torsional coordinates. We demonstrate the effectiveness of our approach through extensive experiments on classical energy functions, and further scale up to neural network-based energy models where we perform amortized conformer generation across many molecular systems. To encourage further research in developing highly scalable sampling methods, we plan to open source these challenging benchmarks, where successful methods can directly impact progress in computational chemistry.
nan
Article 465
Title@2025-05-28 (3): Machine Unlearning under Overparameterization
Title: Machine Unlearning under Overparameterization | Maschine Unlearning unter Überparameterisierung | 超参数化下脱学机 2505.22601v1 |
Authors: Jacob L. Block, Aryan Mokhtari, Sanjay Shakkottai
Machine unlearning algorithms aim to remove the influence of specific training samples, ideally recovering the model that would have resulted from training on the remaining data alone. We study unlearning in the overparameterized setting, where many models interpolate the data, and defining the unlearning solution as any loss minimizer over the retained set$\unicode{x2013}$as in prior work in the underparameterized setting$\unicode{x2013}$is inadequate, since the original model may already interpolate the retained data and satisfy this condition. In this regime, loss gradients vanish, rendering prior methods based on gradient perturbations ineffective, motivating both new unlearning definitions and algorithms. For this setting, we define the unlearning solution as the minimum-complexity interpolator over the retained data and propose a new algorithmic framework that only requires access to model gradients on the retained set at the original solution. We minimize a regularized objective over perturbations constrained to be orthogonal to these model gradients, a first-order relaxation of the interpolation condition. For different model classes, we provide exact and approximate unlearning guarantees, and we demonstrate that an implementation of our framework outperforms existing baselines across various unlearning experiments.
nan
Article 466
Title@2025-05-28 (3): HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym
Title: HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym | HDDLGym: Ein Tool zum Studieren multi-agenter Hierarchischer Probleme, definiert in HDDL mit OpenAI Gym | HDDLGym: 与 OpenAI Gym 一起研究在HDDL 中界定的多代理等级问题的工具 2505.22597v1 |
Authors: Ngoc La, Ruaridh Mon-Williams, Julie A. Shah
In recent years, reinforcement learning (RL) methods have been widely tested using tools like OpenAI Gym, though many tasks in these environments could also benefit from hierarchical planning. However, there is a lack of a tool that enables seamless integration of hierarchical planning with RL. Hierarchical Domain Definition Language (HDDL), used in classical planning, introduces a structured approach well-suited for model-based RL to address this gap. To bridge this integration, we introduce HDDLGym, a Python-based tool that automatically generates OpenAI Gym environments from HDDL domains and problems. HDDLGym serves as a link between RL and hierarchical planning, supporting multi-agent scenarios and enabling collaborative planning among agents. This paper provides an overview of HDDLGym’s design and implementation, highlighting the challenges and design choices involved in integrating HDDL with the Gym interface, and applying RL policies to support hierarchical planning. We also provide detailed instructions and demonstrations for using the HDDLGym framework, including how to work with existing HDDL domains and problems from International Planning Competitions, exemplified by the Transport domain. Additionally, we offer guidance on creating new HDDL domains for multi-agent scenarios and demonstrate the practical use of HDDLGym in the Overcooked domain. By leveraging the advantages of HDDL and Gym, HDDLGym aims to be a valuable tool for studying RL in hierarchical planning, particularly in multi-agent contexts.
nan
Article 467
Title@2025-05-28 (3): SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Title: SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement | SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement | Synworld: 用于改进制剂行动知识的虚拟情景合成 2504.03561v2 |
Authors: Runnan Fang, Xiaobin Wang, Yuan Liang, Shuofei Qiao, Jialong Wu, Zekun Xi, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments. Code is available at https://github.com/zjunlp/SynWorld.
nan
Article 468
Title@2025-05-28 (3): Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Title: Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning | Self-Error-Instruct: Verallgemeinern von Fehlern für LLMs Mathematische Begründung | 自错误教学法: 数学理由LLMs 的错误一般化 2505.22591v1 |
Authors: Erxin Yu, Jing Li, Ming Liao, Qi Zhu, Boyang Xue, Minghui Xu, Baojun Wang, Lanqing Hong, Fei Mi, Lifeng Shang
Although large language models demonstrate strong performance across various domains, they still struggle with numerous bad cases in mathematical reasoning. Previous approaches to learning from errors synthesize training data by solely extrapolating from isolated bad cases, thereby failing to generalize the extensive patterns inherent within these cases. This paper presents Self-Error-Instruct (SEI), a framework that addresses these model weaknesses and synthesizes more generalized targeted training data. Specifically, we explore a target model on two mathematical datasets, GSM8K and MATH, to pinpoint bad cases. Then, we generate error keyphrases for these cases based on the instructor model’s (GPT-4o) analysis and identify error types by clustering these keyphrases. Next, we sample a few bad cases during each generation for each identified error type and input them into the instructor model, which synthesizes additional training data using a self-instruct approach. This new data is refined through a one-shot learning process to ensure that only the most effective examples are kept. Finally, we use these curated data to fine-tune the target model, iteratively repeating the process to enhance performance. We apply our framework to various models and observe improvements in their reasoning abilities across both in-domain and out-of-domain mathematics datasets. These results demonstrate the effectiveness of self-error instruction in improving LLMs’ mathematical reasoning through error generalization.
nan
Article 469
Title@2025-05-28 (3): VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Title: VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use | VTool-R1: VLMs lernen mit Bildern zu denken, indem sie mehr über multimodale Werkzeugnutzung lernen | VTool-R1:VLMs通过多模式工具使用强化学习学习如何用图像思考 2505.19255v2 |
Authors: Mingyuan Wu, Jingcheng Yang, Jize Jiang, Meitang Li, Kaizhuo Yan, Hanchao Yu, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt
Reinforcement Learning Finetuning (RFT) has significantly advanced the reasoning capabilities of large language models (LLMs) by enabling long chains of thought, self-correction, and effective tool use. While recent works attempt to extend RFT to vision-language models (VLMs), these efforts largely produce text-only reasoning conditioned on static image inputs, falling short of true multimodal reasoning in the response. In contrast, test-time methods like Visual Sketchpad incorporate visual steps but lack training mechanisms. We introduce VTool-R1, the first framework that trains VLMs to generate multimodal chains of thought by interleaving text and intermediate visual reasoning steps. VTool-R1 integrates Python-based visual editing tools into the RFT process, enabling VLMs to learn when and how to generate visual reasoning steps that benefit final reasoning. Trained with outcome-based rewards tied to task accuracy, our approach elicits strategic visual tool use for reasoning without relying on process-based supervision. Experiments on structured visual question answering over charts and tables show that VTool-R1 enhances reasoning performance by teaching VLMs to “think with images” and generate multimodal chain of thoughts with tools.
nan
Article 470
Title@2025-05-28 (3): ReLearn: Unlearning via Learning for Large Language Models
Title: ReLearn: Unlearning via Learning for Large Language Models | ReLearn: Entlernen über Learning for Large Language Models | Reearn:通过学习大语言模式来重新学习 2502.11190v3 |
Authors: Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang
Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability. Code is available at https://github.com/zjunlp/unlearn.
nan
Article 471
Title@2025-05-28 (3): Benignity of loss landscape with weight decay requires both large overparametrization and initialization
Title: Benignity of loss landscape with weight decay requires both large overparametrization and initialization | Die Benignität der Verlustlandschaft mit dem Verfall des Gewichts erfordert sowohl große Überparametrierung als auch Initialisierung | 损失景观与体重衰减的尊严要求大规模过度平衡和初始化 2505.22578v1 |
Authors: Etienne Boursier, Matthew Bowditch, Matthias Englert, Ranko Lazic
The optimization of neural networks under weight decay remains poorly understood from a theoretical standpoint. While weight decay is standard practice in modern training procedures, most theoretical analyses focus on unregularized settings. In this work, we investigate the loss landscape of the $\ell_2$-regularized training loss for two-layer ReLU networks. We show that the landscape becomes benign – i.e., free of spurious local minima – under large overparametrization, specifically when the network width $m$ satisfies $m \gtrsim \min(n^d, 2^n)$, where $n$ is the number of data points and $d$ the input dimension. More precisely in this regime, almost all constant activation regions contain a global minimum and no spurious local minima. We further show that this level of overparametrization is not only sufficient but also necessary via the example of orthogonal data. Finally, we demonstrate that such loss landscape results primarily hold relevance in the large initialization regime. In contrast, for small initializations – corresponding to the feature learning regime – optimization can still converge to spurious local minima, despite the global benignity of the landscape.
nan
Article 472
Title@2025-05-28 (3): FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators
Title: FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators | FNOPE: Simulationsbasierte Inferenz auf Funktionsräumen mit Fourier-Neural-Betreibern | FNOPE: Fourier神经操作员对功能空间的模拟推推 2505.22573v1 |
Authors: Guy Moss, Leah Sophie Muhle, Reinhard Drews, Jakob H. Macke, Cornelius Schröder
Simulation-based inference (SBI) is an established approach for performing Bayesian inference on scientific simulators. SBI so far works best on low-dimensional parametric models. However, it is difficult to infer function-valued parameters, which frequently occur in disciplines that model spatiotemporal processes such as the climate and earth sciences. Here, we introduce an approach for efficient posterior estimation, using a Fourier Neural Operator (FNO) architecture with a flow matching objective. We show that our approach, FNOPE, can perform inference of function-valued parameters at a fraction of the simulation budget of state of the art methods. In addition, FNOPE supports posterior evaluation at arbitrary discretizations of the domain, as well as simultaneous estimation of vector-valued parameters. We demonstrate the effectiveness of our approach on several benchmark tasks and a challenging spatial inference task from glaciology. FNOPE extends the applicability of SBI methods to new scientific domains by enabling the inference of function-valued parameters.
nan
Article 473
Title@2025-05-28 (3): PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion
Title: PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion | PRISM: Videodatensatz-Kondensation mit progressiver Veredelung und Einfügung für Sparse Motion | PRISM: 视频数据集浓缩,并逐步精化和插入,用于微缩移动 2505.22564v1 |
Authors: Jaehyun Choi, Jiwan Hur, Gyojin Han, Jaemyung Yu, Junmo Kim
Video dataset condensation has emerged as a critical technique for addressing the computational challenges associated with large-scale video data processing in deep learning applications. While significant progress has been made in image dataset condensation, the video domain presents unique challenges due to the complex interplay between spatial content and temporal dynamics. This paper introduces PRISM, Progressive Refinement and Insertion for Sparse Motion, for video dataset condensation, a novel approach that fundamentally reconsiders how video data should be condensed. Unlike the previous method that separates static content from dynamic motion, our method preserves the essential interdependence between these elements. Our approach progressively refines and inserts frames to fully accommodate the motion in an action while achieving better performance but less storage, considering the relation of gradients for each frame. Extensive experiments across standard video action recognition benchmarks demonstrate that PRISM outperforms existing disentangled approaches while maintaining compact representations suitable for resource-constrained environments.
nan
Article 474
Title@2025-05-28 (3): Geometric Hyena Networks for Large-scale Equivariant Learning
Title: Geometric Hyena Networks for Large-scale Equivariant Learning | Geometrische Hyänennetze für großmaßstäbliches Äquivalent-Lernen | 大规模平等学习的几何Hyena网络 2505.22560v1 |
Authors: Artem Moskalev, Mangal Prakash, Junjie Xu, Tianyu Cui, Rui Liao, Tommaso Mansi
Processing global geometric context while preserving equivariance is crucial when modeling biological, chemical, and physical systems. Yet, this is challenging due to the computational demands of equivariance and global context at scale. Standard methods such as equivariant self-attention suffer from quadratic complexity, while local methods such as distance-based message passing sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, we introduce Geometric Hyena, the first equivariant long-convolutional model for geometric systems. Geometric Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on all-atom property prediction of large RNA molecules and full protein molecular dynamics, Geometric Hyena outperforms existing equivariant models while requiring significantly less memory and compute that equivariant self-attention. Notably, our model processes the geometric context of 30k tokens 20x faster than the equivariant transformer and allows 72x longer context within the same budget.
nan
Article 475
Title@2025-05-28 (3): Preference Adaptive and Sequential Text-to-Image Generation
Title: Preference Adaptive and Sequential Text-to-Image Generation | Präferenz Adaptive und sequentielle Text-zu-Bild-Generierung | 适应性和顺序性文字到图像生成 2412.10419v2 |
Authors: Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, Moonkyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier
We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user’s intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.
nan
Article 476
Title@2025-05-28 (3): Can Copulas Be Used for Feature Selection? A Machine Learning Study on Diabetes Risk Prediction
Title: Can Copulas Be Used for Feature Selection? A Machine Learning Study on Diabetes Risk Prediction | Kann Copulas für die Feature-Auswahl verwendet werden? Eine maschinelle Studie über Diabetes Risikovorhersage | Copulas 能够用来选择特质吗? 糖尿病风险预测的机器学习研究。 2505.22554v1 |
Authors: Agnideep Aich, Md Monzur Murshed, Amanda Mayeaux, Sameera Hewage
Accurate diabetes risk prediction relies on identifying key features from complex health datasets, but conventional methods like mutual information (MI) filters and genetic algorithms (GAs) often overlook extreme dependencies critical for high-risk subpopulations. In this study we introduce a feature-selection framework using the upper-tail dependence coefficient ({\lambda}U) of the novel A2 copula, which quantifies how often extreme higher values of a predictor co-occur with diabetes diagnoses (target variable). Applied to the CDC Diabetes Health Indicators dataset (n=253,680), our method prioritizes five predictors (self-reported general health, high blood pressure, body mass index, mobility limitations, and high cholesterol levels) based on upper tail dependencies. These features match or outperform MI and GA selected subsets across four classifiers (Random Forest, XGBoost, Logistic Regression, Gradient Boosting), achieving accuracy up to 86.5% (XGBoost) and AUC up to 0.806 (Gradient Boosting), rivaling the full 21-feature model. Permutation importance confirms clinical relevance, with BMI and general health driving accuracy. To our knowledge, this is the first work to apply a copula’s upper-tail dependence for supervised feature selection, bridging extreme-value theory and machine learning to deliver a practical toolkit for diabetes prevention.
nan
Article 477
Title@2025-05-28 (3): Data-Distill-Net: A Data Distillation Approach Tailored for Reply-based Continual Learning
Title: Data-Distill-Net: A Data Distillation Approach Tailored for Reply-based Continual Learning | Data-Distill-Net: Ein Datendestillationsansatz, der auf Reply-based Continual Learning zugeschnitten ist | Data-still-Net:为基于答复的不断学习量身定制的数据蒸馏方法 2505.20135v2 |
Authors: Wenyang Liao, Quanziang Wang, Yichen Wu, Renzhen Wang, Deyu Meng
Replay-based continual learning (CL) methods assume that models trained on a small subset can also effectively minimize the empirical risk of the complete dataset. These methods maintain a memory buffer that stores a sampled subset of data from previous tasks to consolidate past knowledge. However, this assumption is not guaranteed in practice due to the limited capacity of the memory buffer and the heuristic criteria used for buffer data selection. To address this issue, we propose a new dataset distillation framework tailored for CL, which maintains a learnable memory buffer to distill the global information from the current task data and accumulated knowledge preserved in the previous memory buffer. Moreover, to avoid the computational overhead and overfitting risks associated with parameterizing the entire buffer during distillation, we introduce a lightweight distillation module that can achieve global information distillation solely by generating learnable soft labels for the memory buffer data. Extensive experiments show that, our method can achieve competitive results and effectively mitigates forgetting across various datasets. The source code will be publicly available.
nan
Article 478
Title@2025-05-28 (3): DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
Title: DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models | DES-LOC: Entsynced Low Communication Adaptive Optimizers for Training Foundation Models | DES-LOC:为培训基金会模型提供发光的低通信适应性适应性优化剂 2505.22549v1 |
Authors: Alex Iacob, Lorenzo Sani, Mher Safaryan, Paris Giampouras, Samuel Horváth, Andrej Jovanovic, Meghdad Kurmanji, Preslav Aleksandrov, William F. Shen, Xinchi Qiu, Nicholas D. Lane
Scaling foundation model training with Distributed Data Parallel (DDP) methods is bandwidth-limited. Existing infrequent communication methods like Local SGD were designed to synchronize only model parameters and cannot be trivially applied to adaptive optimizers due to additional optimizer states. Current approaches extending Local SGD either lack convergence guarantees or require synchronizing all optimizer states, tripling communication costs. We propose Desynced Low Communication Adaptive Optimizers (DES-LOC), a family of optimizers assigning independent synchronization periods to parameters and momenta, enabling lower communication costs while preserving convergence. Through extensive experiments on language models of up to 1.7B, we show that DES-LOC can communicate 170x less than DDP and 2x less than the previous state-of-the-art Local ADAM. Furthermore, unlike previous heuristic approaches, DES-LOC is suited for practical training scenarios prone to system failures. DES-LOC offers a scalable, bandwidth-efficient, and fault-tolerant solution for foundation model training.
nan
Article 479
Title@2025-05-28 (3): A Human-Centric Approach to Explainable AI for Personalized Education
Title: A Human-Centric Approach to Explainable AI for Personalized Education | Ein menschlich-zentraler Ansatz zur erklärbaren KI für die personalisierte Bildung | 以人文文化方式解释个人个性化教育的可解释的AI 2505.22541v1 |
Authors: Vinitra Swamy
Deep neural networks form the backbone of artificial intelligence research, with potential to transform the human experience in areas ranging from autonomous driving to personal assistants, healthcare to education. However, their integration into the daily routines of real-world classrooms remains limited. It is not yet common for a teacher to assign students individualized homework targeting their specific weaknesses, provide students with instant feedback, or simulate student responses to a new exam question. While these models excel in predictive performance, this lack of adoption can be attributed to a significant weakness: the lack of explainability of model decisions, leading to a lack of trust from students, parents, and teachers. This thesis aims to bring human needs to the forefront of eXplainable AI (XAI) research, grounded in the concrete use case of personalized learning and teaching. We frame the contributions along two verticals: technical advances in XAI and their aligned human studies. We investigate explainability in AI for education, revealing systematic disagreements between post-hoc explainers and identifying a need for inherently interpretable model architectures. We propose four novel technical contributions in interpretability with a multimodal modular architecture (MultiModN), an interpretable mixture-of-experts model (InterpretCC), adversarial training for explainer stability, and a theory-driven LLM-XAI framework to present explanations to students (iLLuMinaTE), which we evaluate in diverse settings with professors, teachers, learning scientists, and university students. By combining empirical evaluations of existing explainers with novel architectural designs and human studies, our work lays a foundation for human-centric AI systems that balance state-of-the-art performance with built-in transparency and trust.
nan
Article 480
Title@2025-05-28 (3): Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks
Title: Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks | Ungewissheitsquantifizierung mit korrekten Bewertungsregeln: Anpassung von Maßnahmen an Vorhersageaufgaben | 以适当排序规则对不确定性进行量化:预测任务调整措施 2505.22538v1 |
Authors: Paul Hofman, Yusuf Sale, Eyke Hüllermeier
We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.
nan
Article 481
Title@2025-05-28 (3): TabularQGAN: A Quantum Generative Model for Tabular Data
Title: TabularQGAN: A Quantum Generative Model for Tabular Data | TabularQGAN: Ein Quantum Generatives Modell für Tabulardaten | 表格QGAN:表格数据量子生成模型 2505.22533v1 |
Authors: Pallavi Bhardwaj, Caitlin Jones, Lasse Dierich, Aleksandar Vučković
In this paper, we introduce a novel quantum generative model for synthesizing tabular data. Synthetic data is valuable in scenarios where real-world data is scarce or private, it can be used to augment or replace existing datasets. Real-world enterprise data is predominantly tabular and heterogeneous, often comprising a mixture of categorical and numerical features, making it highly relevant across various industries such as healthcare, finance, and software. We propose a quantum generative adversarial network architecture with flexible data encoding and a novel quantum circuit ansatz to effectively model tabular data. The proposed approach is tested on the MIMIC III healthcare and Adult Census datasets, with extensive benchmarking against leading classical models, CTGAN, and CopulaGAN. Experimental results demonstrate that our quantum model outperforms classical models by an average of 8.5% with respect to an overall similarity score from SDMetrics, while using only 0.072% of the parameters of the classical models. Additionally, we evaluate the generalization capabilities of the models using two custom-designed metrics that demonstrate the ability of the proposed quantum model to generate useful and novel samples. To our knowledge, this is one of the first demonstrations of a successful quantum generative model for handling tabular data, indicating that this task could be well-suited to quantum computers.
nan
Article 482
Title@2025-05-28 (3): Prediction of the Most Fire-Sensitive Point in Building Structures with Differentiable Agents for Thermal Simulators
Title: Prediction of the Most Fire-Sensitive Point in Building Structures with Differentiable Agents for Thermal Simulators | Vorhersage des feuerempfindlichsten Punkts in Gebäudestrukturen mit differenzierbaren Agenten für thermische Simulatoren | 预测热模拟器使用不同物剂建造结构时最能防火的火敏度点 2502.03424v4 |
Authors: Yuan Xinjie, Khalid M. Mosalam
Fire safety is crucial for ensuring the stability of building structures, yet evaluating whether a structure meets fire safety requirement is challenging. Fires can originate at any point within a structure, and simulating every potential fire scenario is both expensive and time-consuming. To address this challenge, we propose the concept of the Most Fire-Sensitive Point (MFSP) and an efficient machine learning framework for its identification. The MFSP is defined as the location at which a fire, if initiated, would cause the most severe detrimental impact on the building’s stability, effectively representing the worst-case fire scenario. In our framework, a Graph Neural Network (GNN) serves as an efficient and differentiable agent for conventional Finite Element Analysis (FEA) simulators by predicting the Maximum Interstory Drift Ratio (MIDR) under fire, which then guides the training and evaluation of the MFSP predictor. Additionally, we enhance our framework with a novel edge update mechanism and a transfer learning-based training scheme. Evaluations on a large-scale simulation dataset demonstrate the good performance of the proposed framework in identifying the MFSP, offering a transformative tool for optimizing fire safety assessments in structural design. All developed datasets and codes are open-sourced online.
nan
Article 483
Title@2025-05-28 (3): Training RL Agents for Multi-Objective Network Defense Tasks
Title: Training RL Agents for Multi-Objective Network Defense Tasks | Schulung von RL-Agenten für multi-objektive Netzwerkverteidigungsaufgaben | 多目标网络防御任务培训RL代理 2505.22531v1 |
Authors: Andres Molina-Markham, Luis Robaina, Sean Steinle, Akash Trivedi, Derek Tsui, Nicholas Potteiger, Lauren Brandt, Ransom Winder, Ahmed Ridley
Open-ended learning (OEL) – which emphasizes training agents that achieve broad capability over narrow competency – is emerging as a paradigm to develop artificial intelligence (AI) agents to achieve robustness and generalization. However, despite promising results that demonstrate the benefits of OEL, applying OEL to develop autonomous agents for real-world cybersecurity applications remains a challenge. We propose a training approach, inspired by OEL, to develop autonomous network defenders. Our results demonstrate that like in other domains, OEL principles can translate into more robust and generalizable agents for cyber defense. To apply OEL to network defense, it is necessary to address several technical challenges. Most importantly, it is critical to provide a task representation approach over a broad universe of tasks that maintains a consistent interface over goals, rewards and action spaces. This way, the learning agent can train with varying network conditions, attacker behaviors, and defender goals while being able to build on previously gained knowledge. With our tools and results, we aim to fundamentally impact research that applies AI to solve cybersecurity problems. Specifically, as researchers develop gyms and benchmarks for cyber defense, it is paramount that they consider diverse tasks with consistent representations, such as those we propose in our work.
nan
Article 484
Title@2025-05-28 (3): Symplectic Generative Networks (SGNs): A Hamiltonian Framework for Invertible Deep Generative Modeling
Title: Symplectic Generative Networks (SGNs): A Hamiltonian Framework for Invertible Deep Generative Modeling | Symplektische Generative Netzwerke (SGNs): Ein Hamiltonsches Framework für invertible Deep Generative Modeling | 症状产生网络:一个汉密尔顿框架,用于可垂直产生深层产生模型的建立 2505.22527v1 |
Authors: Agnideep Aich, Ashit Aich, Bruce Wade
We introduce the Symplectic Generative Network (SGN), a deep generative model that leverages Hamiltonian mechanics to construct an invertible, volume-preserving mapping between a latent space and the data space. By endowing the latent space with a symplectic structure and modeling data generation as the time evolution of a Hamiltonian system, SGN achieves exact likelihood evaluation without incurring the computational overhead of Jacobian determinant calculations. In this work, we provide a rigorous mathematical foundation for SGNs through a comprehensive theoretical framework that includes: (i) complete proofs of invertibility and volume preservation, (ii) a formal complexity analysis with theoretical comparisons to Variational Autoencoders and Normalizing Flows, (iii) strengthened universal approximation results with quantitative error bounds, (iv) an information-theoretic analysis based on the geometry of statistical manifolds, and (v) an extensive stability analysis with adaptive integration guarantees. These contributions highlight the fundamental advantages of SGNs and establish a solid foundation for future empirical investigations and applications to complex, high-dimensional data.
nan
Article 485
Title@2025-05-28 (3): Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo
Title: Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo | Test-Time Alignment von diskreten Diffusionsmodellen mit Sequential Monte Carlo | 使用顺序式蒙特卡洛的分解传播模型的测试时间对齐 2505.22524v1 |
Authors: Chinmay Pani, Zijing Ou, Yingzhen Li
Discrete diffusion models have become highly effective across various domains. However, real-world applications often require the generative process to adhere to certain constraints but without task-specific fine-tuning. To this end, we propose a training-free method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution at the test time. Our approach leverages twisted SMC with an approximate locally optimal proposal, obtained via a first-order Taylor expansion of the reward function. To address the challenge of ill-defined gradients in discrete spaces, we incorporate a Gumbel-Softmax relaxation, enabling efficient gradient-based approximation within the discrete generative framework. Empirical results on both synthetic datasets and image modelling validate the effectiveness of our approach.
nan
Article 486
Title@2025-05-28 (3): Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data
Title: Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data | Bewertung von überwachten Lernmodellen für Betrugserkennung: Eine vergleichende Studie klassischer und tiefer Architekturen zu unausgewogenen Transaktionsdaten | 评价受监督的欺诈侦查学习模式:关于不平衡交易数据的经典和深层结构比较研究 2505.22521v1 |
Authors: Chao Wang, Chuanhao Nie, Yunbo Liu
Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce, where undetected fraudulent transactions can lead to significant economic losses. In this study, we systematically compare the performance of four supervised learning models - Logistic Regression, Random Forest, Light Gradient Boosting Machine (LightGBM), and a Gated Recurrent Unit (GRU) network - on a large-scale, highly imbalanced online transaction dataset. While ensemble methods such as Random Forest and LightGBM demonstrated superior performance in both overall and class-specific metrics, Logistic Regression offered a reliable and interpretable baseline. The GRU model showed strong recall for the minority fraud class, though at the cost of precision, highlighting a trade-off relevant for real-world deployment. Our evaluation emphasizes not only weighted averages but also per-class precision, recall, and F1-scores, providing a nuanced view of each model’s effectiveness in detecting rare but consequential fraudulent activity. The findings underscore the importance of choosing models based on the specific risk tolerance and operational needs of fraud detection systems.
nan
Article 487
Title@2025-05-28 (3): IGNIS: A Neural Network Framework for Robust Parameter Estimation in Archimedean Copulas
Title: IGNIS: A Neural Network Framework for Robust Parameter Estimation in Archimedean Copulas | IGNIS: Ein neurales Netzwerk-Framework für robuste Parameterschätzungen in Archimedischen Copulas | INGNIS: Archimedean Copuulas 强参数估计神经网络框架 2505.22518v1 |
Authors: Agnideep Aich, Ashit Baran Aich, Bruce Wade
Parameter estimation for Archimedean copulas remains a challenging problem, particularly for the recently developed A1 and A2 families that exhibit complex dependency structures. Traditional methods, such as the Method of Moments (MoM), Maximum Likelihood Estimation (MLE), and Maximum Pseudo-Likelihood (MPL), often struggle due to issues of non-monotonic relationship with dependency measures such as Kendall’s tau (as in the case of A1) and numerical instability. In this paper, we present the IGNIS Network, a novel, unified neural framework that learns a direct mapping from observable dependency measures to copula parameters, thereby overcoming the limitations of classical approaches. Our approach is trained on simulated data spanning five Archimedean copula families including Clayton, Gumbel, Frank, A1, and A2, ensuring its general applicability across the entire family. Extensive simulation studies demonstrate that the IGNIS Network reduces estimation errors compared to MoM, while inherently enforcing parameter constraints through theory-guided post-processing. We further validate the practical utility of our method on diverse real-world datasets, including financial returns (AAPL-MSFT), healthcare metrics (CDC Diabetes indicators), and environmental measurements (PM2.5 air quality). Our results underscore the transformative potential of neural methods for robust and accurate dependence modeling in modern applications.
nan
Article 488
Title@2025-05-28 (3): Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
Title: Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers? | Kolmogorov-Arnold Achtung: Ist erlernbare Aufmerksamkeit besser für Vision Transformer? | 科尔莫戈罗夫-阿诺尔德关注:对愿景转变者来说,学习关注是否更好? 2503.10632v2 |
Authors: Subhajit Maity, Killian Hitsman, Xin Li, Aritra Dutta
Kolmogorov-Arnold networks (KANs) are a remarkable innovation consisting of learnable activation functions with the potential to capture more complex relationships from data. Presently, KANs are deployed by replacing multilayer perceptrons (MLPs) in deep networks, including advanced architectures such as vision Transformers (ViTs). This work asks whether a similar replacement in the attention can bring benefits. In this paper, we design the first learnable attention called Kolmogorov-Arnold Attention (KArAt) for ViTs that can operate on any basis, ranging from Fourier, Wavelets, Splines, to Rational Functions. However, learnable activations in attention cause a memory explosion. To remedy this, we propose a modular version of KArAt that uses a low-rank approximation. By adopting the Fourier basis, Fourier-KArAt and its variants, in some cases, outperform their traditional softmax counterparts, or show comparable performance on CIFAR-10, CIFAR-100, and ImageNet-1K datasets. We also deploy Fourier KArAt to ConViT and Swin-Transformer, and use it in detection and segmentation with ViT-Det. We dissect these architectures’ performance by analyzing their loss landscapes, weight distributions, optimizer path, attention visualization, and transferability to other datasets. KArAt’s learnable activation shows a better attention score across all ViTs, indicating better token-to-token interactions, contributing to better inference. Still, its generalizability does not scale with larger ViTs. However, many factors, including the present computing interface, affect the performance of parameter- and memory-heavy KArAts. We note that the goal of this paper is not to produce efficient attention or challenge the traditional activations; by designing KArAt, we are the first to show that attention can be learned and encourage researchers to explore KArAt in conjunction with more advanced architectures.
nan
Article 489
Title@2025-05-28 (3): Accelerating Optimization via Differentiable Stopping Time
Title: Accelerating Optimization via Differentiable Stopping Time | Beschleunigung der Optimierung durch differenzierbare Stoppzeit | 通过有区别的停止时间加速优化 2505.22509v1 |
Authors: Zhonglin Xie, Yiman Fong, Haoran Yuan, Zaiwen Wen
Optimization is an important module of modern machine learning applications. Tremendous efforts have been made to accelerate optimization algorithms. A common formulation is achieving a lower loss at a given time. This enables a differentiable framework with respect to the algorithm hyperparameters. In contrast, its dual, minimizing the time to reach a target loss, is believed to be non-differentiable, as the time is not differentiable. As a result, it usually serves as a conceptual framework or is optimized using zeroth-order methods. To address this limitation, we propose a differentiable stopping time and theoretically justify it based on differential equations. An efficient algorithm is designed to backpropagate through it. As a result, the proposed differentiable stopping time enables a new differentiable formulation for accelerating algorithms. We further discuss its applications, such as online hyperparameter tuning and learning to optimize. Our proposed methods show superior performance in comprehensive experiments across various problems, which confirms their effectiveness.
nan
Article 490
Title@2025-05-28 (3): Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models
Title: Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models | Closed-Form Training Dynamics Reveal Erlernte Funktionen und lineare Struktur in Word2Vec-ähnlichen Modellen | 类似Word2Vec 模型中的封闭形式培训动态观测发现特性和线形结构 2502.09863v2 |
Authors: Dhruva Karkada, James B. Simon, Yasaman Bahri, Michael R. DeWeese
Self-supervised word embedding algorithms such as word2vec provide a minimal setting for studying representation learning in language modeling. We examine the quartic Taylor approximation of the word2vec loss around the origin, and we show that both the resulting training dynamics and the final performance on downstream tasks are empirically very similar to those of word2vec. Our main contribution is to analytically solve for both the gradient flow training dynamics and the final word embeddings in terms of only the corpus statistics and training hyperparameters. The solutions reveal that these models learn orthogonal linear subspaces one at a time, each one incrementing the effective rank of the embeddings until model capacity is saturated. Training on Wikipedia, we find that each of the top linear subspaces represents an interpretable topic-level concept. Finally, we apply our theory to describe how linear representations of more abstract semantic concepts emerge during training; these can be used to complete analogies via vector addition.
nan
Article 491
Title@2025-05-28 (3): Sparsification and Reconstruction from the Perspective of Representation Geometry
Title: Sparsification and Reconstruction from the Perspective of Representation Geometry | Sparsifikation und Rekonstruktion aus Sicht der Repräsentationsgeometrie | 从代表制角度看分解与重建 2505.22506v1 |
Authors: Wenjie Sun, Bingzhe Wu, Zhile Yang, Chengke Wu
Sparse Autoencoders (SAEs) have emerged as a predominant tool in mechanistic interpretability, aiming to identify interpretable monosemantic features. However, how does sparse encoding organize the representations of activation vector from language models? What is the relationship between this organizational paradigm and feature disentanglement as well as reconstruction performance? To address these questions, we propose the SAEMA, which validates the stratified structure of the representation by observing the variability of the rank of the symmetric semipositive definite (SSPD) matrix corresponding to the modal tensor unfolded along the latent tensor with the level of noise added to the residual stream. To systematically investigate how sparse encoding alters representational structures, we define local and global representations, demonstrating that they amplify inter-feature distinctions by merging similar semantic features and introducing additional dimensionality. Furthermore, we intervene the global representation from an optimization perspective, proving a significant causal relationship between their separability and the reconstruction performance. This study explains the principles of sparsity from the perspective of representational geometry and demonstrates the impact of changes in representational structure on reconstruction performance. Particularly emphasizes the necessity of understanding representations and incorporating representational constraints, providing empirical references for developing new interpretable tools and improving SAEs. The code is available at \hyperlink{https://github.com/wenjie1835/SAERepGeo}{https://github.com/wenjie1835/SAERepGeo}.
nan
Article 492
Title@2025-05-28 (3): Geometric GNNs for Charged Particle Tracking at GlueX
Title: Geometric GNNs for Charged Particle Tracking at GlueX | Geometrische GNNs für geladene Partikelverfolgung bei GlueX | GNNs 用于凝胶X充电粒子跟踪的几何 GNNs 2505.22504v1 |
Authors: Ahmed Hossam Mohammed, Kishansingh Rajput, Simon Taylor, Denis Furletov, Sergey Furletov, Malachi Schram
Nuclear physics experiments are aimed at uncovering the fundamental building blocks of matter. The experiments involve high-energy collisions that produce complex events with many particle trajectories. Tracking charged particles resulting from collisions in the presence of a strong magnetic field is critical to enable the reconstruction of particle trajectories and precise determination of interactions. It is traditionally achieved through combinatorial approaches that scale worse than linearly as the number of hits grows. Since particle hit data naturally form a 3-dimensional point cloud and can be structured as graphs, Graph Neural Networks (GNNs) emerge as an intuitive and effective choice for this task. In this study, we evaluate the GNN model for track finding on the data from the GlueX experiment at Jefferson Lab. We use simulation data to train the model and test on both simulation and real GlueX measurements. We demonstrate that GNN-based track finding outperforms the currently used traditional method at GlueX in terms of segment-based efficiency at a fixed purity while providing faster inferences. We show that the GNN model can achieve significant speedup by processing multiple events in batches, which exploits the parallel computation capability of Graphical Processing Units (GPUs). Finally, we compare the GNN implementation on GPU and FPGA and describe the trade-off.
nan
Article 493
Title@2025-05-28 (3): Assessing Quantum Advantage for Gaussian Process Regression
Title: Assessing Quantum Advantage for Gaussian Process Regression | Bewertung des Quantenvorteils für Gaussian Process Regression | 评估高山进程倒退的量度优势 2505.22502v1 |
Authors: Dominic Lowe, M. S. Kim, Roberto Bondesan
Gaussian Process Regression is a well-known machine learning technique for which several quantum algorithms have been proposed. We show here that in a wide range of scenarios these algorithms show no exponential speedup. We achieve this by rigorously proving that the condition number of a kernel matrix scales at least linearly with the matrix size under general assumptions on the data and kernel. We additionally prove that the sparsity and Frobenius norm of a kernel matrix scale linearly under similar assumptions. The implications for the quantum algorithms runtime are independent of the complexity of loading classical data on a quantum computer and also apply to dequantised algorithms. We supplement our theoretical analysis with numerical verification for popular kernels in machine learning.
nan
Article 494
Title@2025-05-28 (3): Novelty Detection in Reinforcement Learning with World Models
Title: Novelty Detection in Reinforcement Learning with World Models | Neuheitserkennung im Verstärkungslernen mit Weltmodellen | 利用世界模式加强学习新颖发现 2310.08731v4 |
Authors: Geigh Zollicoffer, Kenneth Eaton, Jonathan Balloch, Julia Kim, Wei Zhou, Robert Wright, Mark O. Riedl
Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model’s hallucinated states and the true observed states as an anomaly score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.
nan
Article 495
Title@2025-05-28 (3): ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods
Title: ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods | ProSpero: Aktives Lernen für robustes Proteindesign jenseits von Wild-Typ-Nachbarschaften | ProSpero:在野生部落邻里以外积极学习巨型蛋白设计 2505.22494v1 |
Authors: Michal Kmicikiewicz, Vincent Fortuin, Ewa Szczurek
Designing protein sequences of both high fitness and novelty is a challenging task in data-efficient protein engineering. Exploration beyond wild-type neighborhoods often leads to biologically implausible sequences or relies on surrogate models that lose fidelity in novel regions. Here, we propose ProSpero, an active learning framework in which a frozen pre-trained generative model is guided by a surrogate updated from oracle feedback. By integrating fitness-relevant residue selection with biologically-constrained Sequential Monte Carlo sampling, our approach enables exploration beyond wild-type neighborhoods while preserving biological plausibility. We show that our framework remains effective even when the surrogate is misspecified. ProSpero consistently outperforms or matches existing methods across diverse protein engineering tasks, retrieving sequences of both high fitness and novelty.
nan
Article 496
Title@2025-05-28 (3): Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Title: Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation | Entmystifizierung des Paradoxon der wichtigen Probenahme mit einer geschätzten historisch-nachfolgenden Verhaltenspolitik in der Off-Policy-Bewertung | 以非政策评价中的估计历史依赖者行为政策来解开重要性抽样反常现象的神秘化 2505.22492v1 |
Authors: Hongyi Zhou, Josiah P. Hanna, Jin Zhu, Ying Yang, Chengchun Shi
This paper studies off-policy evaluation (OPE) in reinforcement learning with a focus on behavior policy estimation for importance sampling. Prior work has shown empirically that estimating a history-dependent behavior policy can lead to lower mean squared error (MSE) even when the true behavior policy is Markovian. However, the question of why the use of history should lower MSE remains open. In this paper, we theoretically demystify this paradox by deriving a bias-variance decomposition of the MSE of ordinary importance sampling (IS) estimators, demonstrating that history-dependent behavior policy estimation decreases their asymptotic variances while increasing their finite-sample biases. Additionally, as the estimated behavior policy conditions on a longer history, we show a consistent decrease in variance. We extend these findings to a range of other OPE estimators, including the sequential IS estimator, the doubly robust estimator and the marginalized IS estimator, with the behavior policy estimated either parametrically or non-parametrically.
nan
Article 497
Title@2025-05-28 (3): On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling
Title: On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling | Über die überraschende Wirksamkeit großer Lernraten unter Standardbreitenskalierung | 根据标准宽宽度比例扩大的大型学习率的惊人效果 2505.22491v1 |
Authors: Moritz Haas, Sebastian Bordt, Ulrike von Luxburg, Leena Chennuru Vankadara
The dominant paradigm for training large-scale vision and language models is He initialization and a single global learning rate (\textit{standard parameterization}, SP). Despite its practical success, standard parametrization remains poorly understood from a theoretical perspective: Existing infinite-width theory would predict instability under large learning rates and vanishing feature learning under stable learning rates. However, empirically optimal learning rates consistently decay much slower than theoretically predicted. By carefully studying neural network training dynamics, we demonstrate that this discrepancy is not fully explained by finite-width phenomena such as catapult effects or a lack of alignment between weights and incoming activations. We instead show that the apparent contradiction can be fundamentally resolved by taking the loss function into account: In contrast to Mean Squared Error (MSE) loss, we prove that under cross-entropy (CE) loss, an intermediate \textit{controlled divergence} regime emerges, where logits diverge but loss, gradients, and activations remain stable. Stable training under large learning rates enables persistent feature evolution at scale in all hidden layers, which is crucial for the practical success of SP. In experiments across optimizers (SGD, Adam), architectures (MLPs, GPT) and data modalities (vision, language), we validate that neural networks operate in this controlled divergence regime under CE loss but not under MSE loss. Our empirical evidence suggests that width-scaling considerations are surprisingly useful for predicting empirically optimal learning rate exponents. Finally, our analysis clarifies the effectiveness and limitations of recently proposed layerwise learning rate scalings for standard initialization.
nan
Article 498
Title@2025-05-28 (3): Understanding Adversarial Training with Energy-based Models
Title: Understanding Adversarial Training with Energy-based Models | Verständnis von Adversarial Training mit energiebasierten Modellen | 与基于能源模式的对等培训的谅解 2505.22486v1 |
Authors: Mujtaba Hussain Mirza, Maria Rosaria Briglia, Filippo Bartolucci, Senad Beadini, Giuseppe Lisanti, Iacopo Masi
We aim at using Energy-based Model (EBM) framework to better understand adversarial training (AT) in classifiers, and additionally to analyze the intrinsic generative capabilities of robust classifiers. By viewing standard classifiers through an energy lens, we begin by analyzing how the energies of adversarial examples, generated by various attacks, differ from those of the natural samples. The central focus of our work is to understand the critical phenomena of Catastrophic Overfitting (CO) and Robust Overfitting (RO) in AT from an energy perspective. We analyze the impact of existing AT approaches on the energy of samples during training and observe that the behavior of the ``delta energy’ – change in energy between original sample and its adversarial counterpart – diverges significantly when CO or RO occurs. After a thorough analysis of these energy dynamics and their relationship with overfitting, we propose a novel regularizer, the Delta Energy Regularizer (DER), designed to smoothen the energy landscape during training. We demonstrate that DER is effective in mitigating both CO and RO across multiple benchmarks. We further show that robust classifiers, when being used as generative models, have limits in handling trade-off between image quality and variability. We propose an improved technique based on a local class-wise principal component analysis (PCA) and energy-based guidance for better class-specific initialization and adaptive stopping, enhancing sample diversity and generation quality. Considering that we do not explicitly train for generative modeling, we achieve a competitive Inception Score (IS) and Fr'echet inception distance (FID) compared to hybrid discriminative-generative models.
nan
Article 499
Title@2025-05-28 (3): Intrinsic User-Centric Interpretability through Global Mixture of Experts
Title: Intrinsic User-Centric Interpretability through Global Mixture of Experts | Intrinsische Benutzer-Centric-Interpretability durch globale Mischung von Experten | 通过全球专家混合解释 2402.02933v4 |
Authors: Vinitra Swamy, Syrielle Montariol, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser
In human-centric settings like education or healthcare, model accuracy and model explainability are key factors for user adoption. Towards these two goals, intrinsically interpretable deep learning models have gained popularity, focusing on accurate predictions alongside faithful explanations. However, there exists a gap in the human-centeredness of these approaches, which often produce nuanced and complex explanations that are not easily actionable for downstream users. We present InterpretCC (interpretable conditional computation), a family of intrinsically interpretable neural networks at a unique point in the design space that optimizes for ease of human understanding and explanation faithfulness, while maintaining comparable performance to state-of-the-art models. InterpretCC achieves this through adaptive sparse activation of features before prediction, allowing the model to use a different, minimal set of features for each instance. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows users to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply InterpretCC for text, time series and tabular data across several real-world datasets, demonstrating comparable performance with non-interpretable baselines and outperforming intrinsically interpretable baselines. Through a user study involving 56 teachers, InterpretCC explanations are found to have higher actionability and usefulness over other intrinsically interpretable approaches.
nan
Article 500
Title@2025-05-28 (3): A Closer Look at Multimodal Representation Collapse
Title: A Closer Look at Multimodal Representation Collapse | Ein genauerer Blick auf multimodale Darstellungskollaps | 更仔细地审视多模式代表制的崩溃 2505.22483v1 |
Authors: Abhra Chaudhuri, Anjan Dutta, Tu Bui, Serban Georgescu
We aim to develop a fundamental understanding of modality collapse, a recently observed empirical phenomenon wherein models trained for multimodal fusion tend to rely only on a subset of the modalities, ignoring the rest. We show that modality collapse happens when noisy features from one modality are entangled, via a shared set of neurons in the fusion head, with predictive features from another, effectively masking out positive contributions from the predictive features of the former modality and leading to its collapse. We further prove that cross-modal knowledge distillation implicitly disentangles such representations by freeing up rank bottlenecks in the student encoder, denoising the fusion-head outputs without negatively impacting the predictive features from either modality. Based on the above findings, we propose an algorithm that prevents modality collapse through explicit basis reallocation, with applications in dealing with missing modalities. Extensive experiments on multiple multimodal benchmarks validate our theoretical claims. Project page: https://abhrac.github.io/mmcollapse/.
nan
Article 501
Title@2025-05-28 (3): Hypothesis Testing in Imaging Inverse Problems
Title: Hypothesis Testing in Imaging Inverse Problems | Hypothesenprüfung in bildgebenden Inversen Problemen | 想象反反问题假设测试 2505.22481v1 |
Authors: Yiming Xi, Konstantinos Zygalakis, Marcelo Pereyra
This paper proposes a framework for semantic hypothesis testing tailored to imaging inverse problems. Modern imaging methods struggle to support hypothesis testing, a core component of the scientific method that is essential for the rigorous interpretation of experiments and robust interfacing with decision-making processes. There are three main reasons why image-based hypothesis testing is challenging. First, the difficulty of using a single observation to simultaneously reconstruct an image, formulate hypotheses, and quantify their statistical significance. Second, the hypotheses encountered in imaging are mostly of semantic nature, rather than quantitative statements about pixel values. Third, it is challenging to control test error probabilities because the null and alternative distributions are often unknown. Our proposed approach addresses these difficulties by leveraging concepts from self-supervised computational imaging, vision-language models, and non-parametric hypothesis testing with e-values. We demonstrate our proposed framework through numerical experiments related to image-based phenotyping, where we achieve excellent power while robustly controlling Type I errors.
nan
Article 502
Title@2025-05-28 (3): Position: Don’t Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints
Title: Position: Don’t Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints | Position: Verwenden Sie den CLT nicht in LLM-Evalen mit weniger als ein paar hundert Datenpunkten | 位置: 不要在LLM Evals中使用 CLT, 其数据点小于几百个数据点 2503.01747v3 |
Authors: Sam Bowyer, Laurence Aitchison, Desi R. Ivanova
Rigorous statistical evaluations of large language models (LLMs), including valid error bars and significance testing, are essential for meaningful and reliable performance assessment. Currently, when such statistical measures are reported, they typically rely on the Central Limit Theorem (CLT). In this position paper, we argue that while CLT-based methods for uncertainty quantification are appropriate when benchmarks consist of thousands of examples, they fail to provide adequate uncertainty estimates for LLM evaluations that rely on smaller, highly specialized benchmarks. In these small-data settings, we demonstrate that CLT-based methods perform very poorly, usually dramatically underestimating uncertainty (i.e. producing error bars that are too small). We give recommendations for alternative frequentist and Bayesian methods that are both easy to implement and more appropriate in these increasingly common scenarios. We provide a simple Python library for these Bayesian methods at https://github.com/sambowyer/bayes_evals .
nan
Article 503
Title@2025-05-28 (3): Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Title: Non-Asymptotic Analysis of (Sticky) Track-and-Stop | Nicht-asymptotische Analyse von (Sticky) Track-and-Stop | 对(Stiskky)轨道和停止的非症状分析 2505.22475v1 |
Authors: Riccardo Poiani, Martino Bernasconi, Andrea Celli
In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter $\delta$ and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for $\delta\to 0$ whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., $\epsilon$-optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.
nan
Article 504
Title@2025-05-28 (3): Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks
Title: Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks | Überbrückung von Sprache, Vision und Aktion: Multimodale VAE in Robotermanipulationsaufgaben | 架桥语言、愿景和行动:机器人操纵任务中的多式机动性 2404.01932v2 |
Authors: Gabriela Sejnova, Michal Vavrecka, Karla Stepanova
In this work, we focus on unsupervised vision-language-action mapping in the area of robotic manipulation. Recently, multiple approaches employing pre-trained large language and vision models have been proposed for this task. However, they are computationally demanding and require careful fine-tuning of the produced outputs. A more lightweight alternative would be the implementation of multimodal Variational Autoencoders (VAEs) which can extract the latent features of the data and integrate them into a joint representation, as has been demonstrated mostly on image-image or image-text data for the state-of-the-art models. Here we explore whether and how can multimodal VAEs be employed in unsupervised robotic manipulation tasks in a simulated environment. Based on the obtained results, we propose a model-invariant training alternative that improves the models’ performance in a simulator by up to 55%. Moreover, we systematically evaluate the challenges raised by the individual tasks such as object or robot position variability, number of distractors or the task length. Our work thus also sheds light on the potential benefits and limitations of using the current multimodal VAEs for unsupervised learning of robotic motion trajectories based on vision and language.
nan
Article 505
Title@2025-05-28 (3): Forecasting Multivariate Urban Data via Decomposition and Spatio-Temporal Graph Analysis
Title: Forecasting Multivariate Urban Data via Decomposition and Spatio-Temporal Graph Analysis | Voraussichtliche Multivariate Stadtdaten durch Zersetzung und räumlich-Temporale Graphenanalyse | 通过分解和时空空间图分析预测多变量城市数据 2505.22474v1 |
Authors: Amirhossein Sohrabbeig, Omid Ardakanian, Petr Musilek
The forecasting of multivariate urban data presents a complex challenge due to the intricate dependencies between various urban metrics such as weather, air pollution, carbon intensity, and energy demand. This paper introduces a novel multivariate time-series forecasting model that utilizes advanced Graph Neural Networks (GNNs) to capture spatial dependencies among different time-series variables. The proposed model incorporates a decomposition-based preprocessing step, isolating trend, seasonal, and residual components to enhance the accuracy and interpretability of forecasts. By leveraging the dynamic capabilities of GNNs, the model effectively captures interdependencies and improves the forecasting performance. Extensive experiments on real-world datasets, including electricity usage, weather metrics, carbon intensity, and air pollution data, demonstrate the effectiveness of the proposed approach across various forecasting scenarios. The results highlight the potential of the model to optimize smart infrastructure systems, contributing to energy-efficient urban development and enhanced public well-being.
nan
Article 506
Title@2025-05-28 (3): Pure Exploration with Infinite Answers
Title: Pure Exploration with Infinite Answers | Reine Exploration mit unendlichen Antworten | 纯探索无无限答案 2505.22473v1 |
Authors: Riccardo Poiani, Martino Bernasconi, Andrea Celli
We study pure exploration problems where the set of correct answers is possibly infinite, e.g., the regression of any continuous function of the means of the bandit. We derive an instance-dependent lower bound for these problems. By analyzing it, we discuss why existing methods (i.e., Sticky Track-and-Stop) for finite answer problems fail at being asymptotically optimal in this more general setting. Finally, we present a framework, Sticky-Sequence Track-and-Stop, which generalizes both Track-and-Stop and Sticky Track-and-Stop, and that enjoys asymptotic optimality. Due to its generality, our analysis also highlights special cases where existing methods enjoy optimality.
nan
Article 507
Title@2025-05-28 (3): CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs
Title: CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs | CPINN-ABPI: Physik-informierte Neuralnetze für genaue Leistungsschätzung in MPCs | CPINN-ABPI: MPSoCs中精确功率估计物理内建神经网络 2505.22469v1 |
Authors: Mohamed R. Elshamy, Mehdi Elahi, Ahmad Patooghy, Abdel-Hameed A. Badawy
Efficient thermal and power management in modern multiprocessor systems-on-chip (MPSoCs) demands accurate power consumption estimation. One of the state-of-the-art approaches, Alternative Blind Power Identification (ABPI), theoretically eliminates the dependence on steady-state temperatures, addressing a major shortcoming of previous approaches. However, ABPI performance has remained unverified in actual hardware implementations. In this study, we conduct the first empirical validation of ABPI on commercial hardware using the NVIDIA Jetson Xavier AGX platform. Our findings reveal that, while ABPI provides computational efficiency and independence from steady-state temperature, it exhibits considerable accuracy deficiencies in real-world scenarios. To overcome these limitations, we introduce a novel approach that integrates Custom Physics-Informed Neural Networks (CPINNs) with the underlying thermal model of ABPI. Our approach employs a specialized loss function that harmonizes physical principles with data-driven learning, complemented by multi-objective genetic algorithm optimization to balance estimation accuracy and computational cost. In experimental validation, CPINN-ABPI achieves a reduction of 84.7\% CPU and 73.9\% GPU in the mean absolute error (MAE) relative to ABPI, with the weighted mean absolute percentage error (WMAPE) improving from 47\%–81\% to $\sim$12\%. The method maintains real-time performance with 195.3~$\mu$s of inference time, with similar 85\%–99\% accuracy gains across heterogeneous SoCs.
nan
Article 508
Title@2025-05-28 (3): FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation
Title: FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation | FitCF: Ein Framework für die automatische Feature-Importanz-geführte kontrafaktische Beispielgenerierung | FitCF: 自动地物、重要引导反事实实例生成框架 2501.00777v3 |
Authors: Qianli Wang, Nils Feldhus, Simon Ostermann, Luis Felipe Villa-Arenas, Sebastian Möller, Vera Schmitt
Counterfactual examples are widely used in natural language processing (NLP) as valuable data to improve models, and in explainable artificial intelligence (XAI) to understand model behavior. The automated generation of counterfactual examples remains a challenging task even for large language models (LLMs), despite their impressive performance on many tasks. In this paper, we first introduce ZeroCF, a faithful approach for leveraging important words derived from feature attribution methods to generate counterfactual examples in a zero-shot setting. Second, we present a new framework, FitCF, which further verifies aforementioned counterfactuals by label flip verification and then inserts them as demonstrations for few-shot prompting, outperforming two state-of-the-art baselines. Through ablation studies, we identify the importance of each of FitCF’s core components in improving the quality of counterfactuals, as assessed through flip rate, perplexity, and similarity measures. Furthermore, we show the effectiveness of LIME and Integrated Gradients as backbone attribution methods for FitCF and find that the number of demonstrations has the largest effect on performance. Finally, we reveal a strong correlation between the faithfulness of feature attribution scores and the quality of generated counterfactuals, which we hope will serve as an important finding for future research in this direction.
nan
Article 509
Title@2025-05-28 (3): Embedding Safety into RL: A New Take on Trust Region Methods
Title: Embedding Safety into RL: A New Take on Trust Region Methods | Einbettung der Sicherheit in RL: Ein neuer Ansatz für Methoden der Vertrauensregion | 将安全嵌入RL:信任区域方法的新做法 2411.02957v3 |
Authors: Nikola Milosevic, Johannes Müller, Nico Scherf
Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.
nan
Article 510
Title@2025-05-28 (3): OptiMindTune: A Multi-Agent Framework for Intelligent Hyperparameter Optimization
Title: OptiMindTune: A Multi-Agent Framework for Intelligent Hyperparameter Optimization | OptiMindTune: Multi-Agenten-Framework für intelligente Hyperparameter-Optimierung | OptiMindTunne: 智能超参数优化的多机构框架 2505.19205v2 |
Authors: Meher Bhaskar Madiraju, Meher Sai Preetam Madiraju
Hyperparameter optimization (HPO) is a critical yet challenging aspect of machine learning model development, significantly impacting model performance and generalization. Traditional HPO methods often struggle with high dimensionality, complex interdependencies, and computational expense. This paper introduces OptiMindTune, a novel multi-agent framework designed to intelligently and efficiently optimize hyperparameters. OptiMindTune leverages the collaborative intelligence of three specialized AI agents – a Recommender Agent, an Evaluator Agent, and a Decision Agent – each powered by Google’s Gemini models. These agents address distinct facets of the HPO problem, from model selection and hyperparameter suggestion to robust evaluation and strategic decision-making. By fostering dynamic interactions and knowledge sharing, OptiMindTune aims to converge to optimal hyperparameter configurations more rapidly and robustly than existing single-agent or monolithic approaches. Our framework integrates principles from advanced large language models, and adaptive search to achieve scalable and intelligent AutoML. We posit that this multi-agent paradigm offers a promising avenue for tackling the increasing complexity of modern machine learning model tuning.
nan
Article 511
Title@2025-05-28 (3): Depth-Based Matrix Classification for the HHL Quantum Algorithm
Title: Depth-Based Matrix Classification for the HHL Quantum Algorithm | Tiefenbasierte Matrix-Klassifikation für den HHL-Quantenalgorithmus | HHL 量图算法的深度矩阵分类 2505.22454v1 |
Authors: Mark Danza, Sonia Lopez Alarcon, Cory Merkel
Under the nearing error-corrected era of quantum computing, it is necessary to understand the suitability of certain post-NISQ algorithms for practical problems. One of the most promising, applicable and yet difficult to implement in practical terms is the Harrow, Hassidim and Lloyd (HHL) algorithm for linear systems of equations. An enormous number of problems can be expressed as linear systems of equations, from Machine Learning to fluid dynamics. However, in most cases, HHL will not be able to provide a practical, reasonable solution to these problems. This paper’s goal inquires about whether problems can be labeled using Machine Learning classifiers as suitable or unsuitable for HHL implementation when some numerical information about the problem is known beforehand. This work demonstrates that training on significantly representative data distributions is critical to achieve good classifications of the problems based on the numerical properties of the matrix representing the system of equations. Accurate classification is possible through Multi-Layer Perceptrons, although with careful design of the training data distribution and classifier parameters.
nan
Article 512
Title@2025-05-28 (3): Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Title: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO | Unüberwachte Nachschulung für Multi-Modal LLM Reasoning via GRPO | 无人监督的多模式LLM通过GROPO进行多模式LLM进修培训后培训 2505.22453v1 |
Authors: Lai Wei, Yuting Li, Chen Wang, Yue Wang, Linghe Kong, Weiran Huang, Lichao Sun
Improving Multi-modal Large Language Models (MLLMs) in the post-training stage typically relies on supervised fine-tuning (SFT) or reinforcement learning (RL). However, these supervised methods require expensive and manually annotated multi-modal data–an ultimately unsustainable resource. While recent efforts have explored unsupervised post-training, their methods are complex and difficult to iterate. In this work, we are the first to investigate the use of GRPO, a stable and scalable online RL algorithm, for enabling continual self-improvement without any external supervision. We propose MM-UPT, a simple yet effective framework for unsupervised post-training of MLLMs. MM-UPT builds upon GRPO, replacing traditional reward signals with a self-rewarding mechanism based on majority voting over multiple sampled responses. Our experiments demonstrate that MM-UPT significantly improves the reasoning ability of Qwen2.5-VL-7B (e.g., 66.3 %$\rightarrow$72.9 % on MathVista, 62.9 %$\rightarrow$68.7 % on We-Math), using standard dataset without ground truth labels. MM-UPT also outperforms prior unsupervised baselines and even approaches the results of supervised GRPO. Furthermore, we show that incorporating synthetic questions, generated solely by MLLM itself, can boost performance as well, highlighting a promising approach for scalable self-improvement. Overall, MM-UPT offers a new paradigm for continual, autonomous enhancement of MLLMs in the absence of external supervision. Our code is available at https://github.com/waltonfuture/MM-UPT.
nan
Article 513
Title@2025-05-28 (3): Position: All Current Generative Fidelity and Diversity Metrics are Flawed
Title: Position: All Current Generative Fidelity and Diversity Metrics are Flawed | Position: Alle aktuellen Generativen Fidelity und Diversity Metrics sind abgeflacht | 位置:所有当前产生分裂性和多样性 2505.22450v1 |
Authors: Ossi Räisä, Boris van Breugel, Mihaela van der Schaar
Any method’s development and practical application is limited by our ability to measure its reliability. The popularity of generative modeling emphasizes the importance of good synthetic data metrics. Unfortunately, previous works have found many failure cases in current metrics, for example lack of outlier robustness and unclear lower and upper bounds. We propose a list of desiderata for synthetic data metrics, and a suite of sanity checks: carefully chosen simple experiments that aim to detect specific and known generative modeling failure modes. Based on these desiderata and the results of our checks, we arrive at our position: all current generative fidelity and diversity metrics are flawed. This significantly hinders practical use of synthetic data. Our aim is to convince the research community to spend more effort in developing metrics, instead of models. Additionally, through analyzing how current metrics fail, we provide practitioners with guidelines on how these metrics should (not) be used.
nan
Article 514
Title@2025-05-28 (3): SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning
Title: SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning | SOReL und TOReL: Zwei Methoden für vollständiges Offline-Verstärkungslernen | SOLEL和TOREL: 完全脱线强化学习的两种方法 2505.22442v1 |
Authors: Mattie Fellows, Clarisse Wibault, Uljad Berdica, Johannes Forkel, Jakob N. Foerster, Michael A. Osborne
Sample efficiency remains a major obstacle for real world adoption of reinforcement learning (RL): success has been limited to settings where simulators provide access to essentially unlimited environment interactions, which in reality are typically costly or dangerous to obtain. Offline RL in principle offers a solution by exploiting offline data to learn a near-optimal policy before deployment. In practice, however, current offline RL methods rely on extensive online interactions for hyperparameter tuning, and have no reliable bound on their initial online performance. To address these two issues, we introduce two algorithms. Firstly, SOReL: an algorithm for safe offline reinforcement learning. Using only offline data, our Bayesian approach infers a posterior over environment dynamics to obtain a reliable estimate of the online performance via the posterior predictive uncertainty. Crucially, all hyperparameters are also tuned fully offline. Secondly, we introduce TOReL: a tuning for offline reinforcement learning algorithm that extends our information rate based offline hyperparameter tuning methods to general offline RL approaches. Our empirical evaluation confirms SOReL’s ability to accurately estimate regret in the Bayesian setting whilst TOReL’s offline hyperparameter tuning achieves competitive performance with the best online hyperparameter tuning methods using only offline data. Thus, SOReL and TOReL make a significant step towards safe and reliable offline RL, unlocking the potential for RL in the real world. Our implementations are publicly available: https://github.com/CWibault/sorel_torel.
nan
Article 515
Title@2025-05-28 (3): Variational Positive-incentive Noise: How Noise Benefits Models
Title: Variational Positive-incentive Noise: How Noise Benefits Models | Variational Positiv-incentive Noise: Wie Lärm Vorteile Modelle | 变化式积极积极激励噪音:如何创造噪音效益模式 2306.07651v2 |
Authors: Hongyuan Zhang, Sida Huang, Yubin Guo, Xuelong Li
A large number of works aim to alleviate the impact of noise due to an underlying conventional assumption of the negative role of noise. However, some existing works show that the assumption does not always hold. In this paper, we investigate how to benefit the classical models by random noise under the framework of Positive-incentive Noise (Pi-Noise). Since the ideal objective of Pi-Noise is intractable, we propose to optimize its variational bound instead, namely variational Pi-Noise (VPN). With the variational inference, a VPN generator implemented by neural networks is designed for enhancing base models and simplifying the inference of base models, without changing the architecture of base models. Benefiting from the independent design of base models and VPN generators, the VPN generator can work with most existing models. From the experiments, it is shown that the proposed VPN generator can improve the base models. It is appealing that the trained variational VPN generator prefers to blur the irrelevant ingredients in complicated images, which meets our expectations.
nan
Article 516
Title@2025-05-28 (3): LAMBDA: A Large Model Based Data Agent
Title: LAMBDA: A Large Model Based Data Agent | LAMBDA: Ein großer modellbasierter Datenagent | LAMBDA:一个大型模型数据代理 2407.17535v3 |
Authors: Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large language models. LAMBDA is designed to address data analysis challenges in data-driven applications through innovatively designed data agents using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user’s instructions and domain-specific knowledge, while the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention. Moreover, LAMBDA can flexibly integrate external models and algorithms through our proposed Knowledge Integration Mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various data analysis tasks. It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for users from diverse backgrounds. The strong performance of LAMBDA in solving data analysis problems is demonstrated using real-world data examples. The code for LAMBDA is available at https://github.com/AMA-CMFAI/LAMBDA and videos of three case studies can be viewed at https://www.polyu.edu.hk/ama/cmfai/lambda.html.
nan
Article 517
Title@2025-05-28 (3): Data-Driven Antenna Miniaturization: A Knowledge-Based System Integrating Quantum PSO and Predictive Machine Learning Models
Title: Data-Driven Antenna Miniaturization: A Knowledge-Based System Integrating Quantum PSO and Predictive Machine Learning Models | Datengetriebene Antenne Miniaturisierung: Ein wissensbasiertes System zur Integration von Quanten-PSO und vorausschauenden Machine Learning-Modellen | 数据驱动天线微型化:以知识为基础的系统综合量子PSO和可预测性机器学习模型 2505.22440v1 |
Authors: Khan Masood Parvez, Sk Md Abidar Rahaman, Ali Shiri Sichani
The rapid evolution of wireless technologies necessitates automated design frameworks to address antenna miniaturization and performance optimization within constrained development cycles. This study demonstrates a machine learning enhanced workflow integrating Quantum-Behaved Dynamic Particle Swarm Optimization (QDPSO) with ANSYS HFSS simulations to accelerate antenna design. The QDPSO algorithm autonomously optimized loop dimensions in 11.53 seconds, achieving a resonance frequency of 1.4208 GHz a 12.7 percent reduction compared to conventional 1.60 GHz designs. Machine learning models (SVM, Random Forest, XGBoost, and Stacked ensembles) predicted resonance frequencies in 0.75 seconds using 936 simulation datasets, with stacked models showing superior training accuracy (R2=0.9825) and SVM demonstrating optimal validation performance (R2=0.7197). The complete design cycle, encompassing optimization, prediction, and ANSYS validation, required 12.42 minutes on standard desktop hardware (Intel i5-8500, 16GB RAM), contrasting sharply with the 50-hour benchmark of PSADEA-based approaches. This 240 times of acceleration eliminates traditional trial-and-error methods that often extend beyond seven expert-led days. The system enables precise specifications of performance targets with automated generation of fabrication-ready parameters, particularly benefiting compact consumer devices requiring rapid frequency tuning. By bridging AI-driven optimization with CAD validation, this framework reduces engineering workloads while ensuring production-ready designs, establishing a scalable paradigm for next-generation RF systems in 6G and IoT applications.
nan
Article 518
Title@2025-05-28 (3): Synonymous Variational Inference for Perceptual Image Compression
Title: Synonymous Variational Inference for Perceptual Image Compression | Synonyme Variationsableitung für Wahrnehmungsbildkompression | 感知图像压缩的同义同义变异推理 2505.22438v1 |
Authors: Zijian Liang, Kai Niu, Changshuo Wang, Jin Xu, Ping Zhang
Recent contributions of semantic information theory reveal the set-element relationship between semantic and syntactic information, represented as synonymous relationships. In this paper, we propose a synonymous variational inference (SVI) method based on this synonymity viewpoint to re-analyze the perceptual image compression problem. It takes perceptual similarity as a typical synonymous criterion to build an ideal synonymous set (Synset), and approximate the posterior of its latent synonymous representation with a parametric density by minimizing a partial semantic KL divergence. This analysis theoretically proves that the optimization direction of perception image compression follows a triple tradeoff that can cover the existing rate-distortion-perception schemes. Additionally, we introduce synonymous image compression (SIC), a new image compression scheme that corresponds to the analytical process of SVI, and implement a progressive SIC codec to fully leverage the model’s capabilities. Experimental results demonstrate comparable rate-distortion-perception performance using a single progressive SIC codec, thus verifying the effectiveness of our proposed analysis method.
nan
Article 519
Title@2025-05-28 (3): Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Title: Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models | Ausgelagerte Diffusionsprobenahme: Effiziente hintere Inferenz in latenten Räumen generativer Modelle | 外部外包扩散采样:在基因变异模型潜在空间中有效的后继推论 2502.06999v2 |
Authors: Siddarth Venkatraman, Mohsin Hasan, Minsu Kim, Luca Scimeca, Marcin Sendera, Yoshua Bengio, Glen Berseth, Nikolay Malkin
Any well-behaved generative model over a variable $\mathbf{x}$ can be expressed as a deterministic transformation of an exogenous (‘outsourced’) Gaussian noise variable $\mathbf{z}$: $\mathbf{x}=f_\theta(\mathbf{z})$. In such a model (\eg, a VAE, GAN, or continuous-time flow-based model), sampling of the target variable $\mathbf{x} \sim p_\theta(\mathbf{x})$ is straightforward, but sampling from a posterior distribution of the form $p(\mathbf{x}\mid\mathbf{y}) \propto p_\theta(\mathbf{x})r(\mathbf{x},\mathbf{y})$, where $r$ is a constraint function depending on an auxiliary variable $\mathbf{y}$, is generally intractable. We propose to amortize the cost of sampling from such posterior distributions with diffusion models that sample a distribution in the noise space ($\mathbf{z}$). These diffusion samplers are trained by reinforcement learning algorithms to enforce that the transformed samples $f_\theta(\mathbf{z})$ are distributed according to the posterior in the data space ($\mathbf{x}$). For many models and constraints, the posterior in noise space is smoother than in data space, making it more suitable for amortized inference. Our method enables conditional sampling under unconditional GAN, (H)VAE, and flow-based priors, comparing favorably with other inference methods. We demonstrate the proposed outsourced diffusion sampling in several experiments with large pretrained prior models: conditional image generation, reinforcement learning with human feedback, and protein structure generation.
nan
Article 520
Title@2025-05-28 (3): C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models
Title: C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models | C-LoRA: Kontextuelle Low-Rank-Anpassung für Unsicherheitsabschätzungen in großen Sprachmodellen | C-LORA:用于大语言模型中不确定性估算的不确定性估算的上下文性低风险适应 2505.17773v2 |
Authors: Amir Hossein Rahmati, Sanket Jantre, Weifeng Zhang, Yucheng Wang, Byung-Jun Yoon, Nathan M. Urban, Xiaoning Qian
Low-Rank Adaptation (LoRA) offers a cost-effective solution for fine-tuning large language models (LLMs), but it often produces overconfident predictions in data-scarce few-shot settings. To address this issue, several classical statistical learning approaches have been repurposed for scalable uncertainty-aware LoRA fine-tuning. However, these approaches neglect how input characteristics affect the predictive uncertainty estimates. To address this limitation, we propose Contextual Low-Rank Adaptation (\textbf{C-LoRA}) as a novel uncertainty-aware and parameter efficient fine-tuning approach, by developing new lightweight LoRA modules contextualized to each input data sample to dynamically adapt uncertainty estimates. Incorporating data-driven contexts into the parameter posteriors, C-LoRA mitigates overfitting, achieves well-calibrated uncertainties, and yields robust predictions. Extensive experiments demonstrate that C-LoRA consistently outperforms the state-of-the-art uncertainty-aware LoRA methods in both uncertainty quantification and model generalization. Ablation studies further confirm the critical role of our contextual modules in capturing sample-specific uncertainties. C-LoRA sets a new standard for robust, uncertainty-aware LLM fine-tuning in few-shot regimes.
nan
Article 521
Title@2025-05-28 (3): AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy
Title: AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy | AstroVisBench: Ein Code-Bench für wissenschaftliche Computing und Visualisierung in der Astronomie | AstroVisbench:天文科学计算和可视化标准 2505.20538v2 |
Authors: Sebastian Antony Joseph, Syed Murtaza Husain, Stella S. R. Offner, Stéphanie Juneau, Paul Torrey, Adam S. Bolton, Juan P. Farias, Niall Gaffney, Greg Durrett, Junyi Jessy Li
Large Language Models (LLMs) are being explored for applications in scientific research, including their capabilities to synthesize literature, answer research questions, generate research ideas, and even conduct computational experiments. Ultimately, our goal is for these to help scientists derive novel scientific insights. In many areas of science, such insights often arise from processing and visualizing data to understand its patterns. However, evaluating whether an LLM-mediated scientific workflow produces outputs conveying the correct scientific insights is challenging to evaluate and has not been addressed in past work. We introduce AstroVisBench, the first benchmark for both scientific computing and visualization in the astronomy domain. AstroVisBench judges a language model’s ability to both (1) create astronomy-specific workflows to process and analyze data and (2) visualize the results of these workflows through complex plots. Our evaluation of visualizations uses a novel LLM-as-a-judge workflow, which is validated against annotation by five professional astronomers. Using AstroVisBench we present an evaluation of state-of-the-art language models, showing a significant gap in their ability to engage in astronomy research as useful assistants. This evaluation provides a strong end-to-end evaluation for AI scientists that offers a path forward for the development of visualization-based workflows, which are central to a broad range of domains from physics to biology.
nan
Article 522
Title@2025-05-28 (3): Scaling Reasoning without Attention
Title: Scaling Reasoning without Attention | Skalierung ohne Aufmerksamkeit | 无人注意的调整理由 2505.22425v1 |
Authors: Xueliang Zhao, Wei Wu, Lingpeng Kong
Large language models (LLMs) have made significant advances in complex reasoning tasks, yet they remain bottlenecked by two core challenges: architectural inefficiency due to reliance on Transformers, and a lack of structured fine-tuning for high-difficulty domains. We introduce \ourmodel, an attention-free language model that addresses both issues through architectural and data-centric innovations. Built on the state space dual (SSD) layers of Mamba-2, our model eliminates the need for self-attention and key-value caching, enabling fixed-memory, constant-time inference. To train it for complex reasoning, we propose a two-phase curriculum fine-tuning strategy based on the \textsc{PromptCoT} synthesis paradigm, which generates pedagogically structured problems via abstract concept selection and rationale-guided generation. On benchmark evaluations, \ourmodel-7B outperforms strong Transformer and hybrid models of comparable scale, and even surpasses the much larger Gemma3-27B by 2.6\% on AIME 24, 0.6\% on AIME 25, and 3.0\% on Livecodebench. These results highlight the potential of state space models as efficient and scalable alternatives to attention-based architectures for high-capacity reasoning.
nan
Article 523
Title@2025-05-28 (3): STaR-Bets: Sequential Target-Recalculating Bets for Tighter Confidence Intervals
Title: STaR-Bets: Sequential Target-Recalculating Bets for Tighter Confidence Intervals | StaR-Bets: Sequentielle Target-Rekalkulationswetten für engere Vertrauensintervalle | STaR-Bets: 更密切信任间隔的序列目标-计算重新计算保证 2505.22422v1 |
Authors: Václav Voráček, Francesco Orabona
The construction of confidence intervals for the mean of a bounded random variable is a classical problem in statistics with numerous applications in machine learning and virtually all scientific fields. In particular, obtaining the tightest possible confidence intervals is vital every time the sampling of the random variables is expensive. The current state-of-the-art method to construct confidence intervals is by using betting algorithms. This is a very successful approach for deriving optimal confidence sequences, even matching the rate of law of iterated logarithms. However, in the fixed horizon setting, these approaches are either sub-optimal or based on heuristic solutions with strong empirical performance but without a finite-time guarantee. Hence, no betting-based algorithm guaranteeing the optimal $\mathcal{O}(\sqrt{\frac{\sigma^2\log\frac1\delta}{n}})$ width of the confidence intervals are known. This work bridges this gap. We propose a betting-based algorithm to compute confidence intervals that empirically outperforms the competitors. Our betting strategy uses the optimal strategy in every step (in a certain sense), whereas the standard betting methods choose a constant strategy in advance. Leveraging this fact results in strict improvements even for classical concentration inequalities, such as the ones of Hoeffding or Bernstein. Moreover, we also prove that the width of our confidence intervals is optimal up to an $1+o(1)$ factor diminishing with $n$. The code is available on~https://github.com/vvoracek/STaR-bets-confidence-interval.
nan
Article 524
Title@2025-05-28 (3): Beyond Verifiable Rewards: Scaling Reinforcement Learning for Language Models to Unverifiable Data
Title: Beyond Verifiable Rewards: Scaling Reinforcement Learning for Language Models to Unverifiable Data | Jenseits von überprüfbaren Belohnungen: Skalierung von Verstärkung Lernen für Sprachmodelle zu unüberprüfbaren Daten | 超越可核实的奖励:加强语文模式的强化学习,以获得不可核实的数据 2503.19618v2 |
Authors: Yunhao Tang, Sid Wang, Lovish Madaan, Rémi Munos
We propose to scale RL to unverifiable data with a novel algorithm JEPO (Jensen’s Evidence lower bound Policy Optimization). While most prior efforts on scaling RL for LLMs focus on verifiable data where ground truth answers are typically short-form and can be matched easily; we investigate the case where such assumptions are less valid (e.g., when answers are long-form such as mathematical proofs). To scale RL training to unverifiable data with contemporary training constraints, we propose JEPO. JEPO applies Jensen’s evidence lower bound, a pragmatic simplification of the evidence lower bound which views chain-of-thought as a latent variable in the generative process. We show that on verifiable data (math), JEPO is as effective as RL with verifiable rewards; on semi-verifiable data (numina), JEPO improves on soft-match based evaluations compared to RL with verifiable rewards which can only leverage a subset of the data source; finally, on unverifiable data (numina-proof), JEPO outperforms SFT and a few ablation baselines on likelihood evaluations.
nan
Article 525
Title@2025-05-28 (3): Mitigating Overthinking in Large Reasoning Models via Manifold Steering
Title: Mitigating Overthinking in Large Reasoning Models via Manifold Steering | Überdenken in großen Vernunftmodellen durch Manifold Steering verhindern | 通过 MManicform 指导减轻大型理性模型中的过度思考 2505.22411v1 |
Authors: Yao Huang, Huanran Chen, Shouwei Ruan, Yichi Zhang, Xingxing Wei, Yinpeng Dong
Recent advances in Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex tasks such as mathematics and coding. However, these models frequently exhibit a phenomenon known as overthinking during inference, characterized by excessive validation loops and redundant deliberation, leading to substantial computational overheads. In this paper, we aim to mitigate overthinking by investigating the underlying mechanisms from the perspective of mechanistic interpretability. We first showcase that the tendency of overthinking can be effectively captured by a single direction in the model’s activation space and the issue can be eased by intervening the activations along this direction. However, this efficacy soon reaches a plateau and even deteriorates as the intervention strength increases. We therefore systematically explore the activation space and find that the overthinking phenomenon is actually tied to a low-dimensional manifold, which indicates that the limited effect stems from the noises introduced by the high-dimensional steering direction. Based on this insight, we propose Manifold Steering, a novel approach that elegantly projects the steering direction onto the low-dimensional activation manifold given the theoretical approximation of the interference noise. Extensive experiments on DeepSeek-R1 distilled models validate that our method reduces output tokens by up to 71% while maintaining and even improving the accuracy on several mathematical benchmarks. Our method also exhibits robust cross-domain transferability, delivering consistent token reduction performance in code generation and knowledge-based QA tasks. Code is available at: https://github.com/Aries-iai/Manifold_Steering.
nan
Article 526
Title@2025-05-28 (3): Decoupled Subgraph Federated Learning
Title: Decoupled Subgraph Federated Learning | Entkoppelter Subgraph Federated Learning | 分校分科分科分科分科 2402.19163v3 |
Authors: Javad Aliakbari, Johan Östman, Alexandre Graell i Amat
We address the challenge of federated learning on graph-structured data distributed across multiple clients. Specifically, we focus on the prevalent scenario of interconnected subgraphs, where interconnections between different clients play a critical role. We present a novel framework for this scenario, named FedStruct, that harnesses deep structural dependencies. To uphold privacy, unlike existing methods, FedStruct eliminates the necessity of sharing or generating sensitive node features or embeddings among clients. Instead, it leverages explicit global graph structure information to capture inter-node dependencies. We validate the effectiveness of FedStruct through experimental results conducted on six datasets for semi-supervised node classification, showcasing performance close to the centralized approach across various scenarios, including different data partitioning methods, varying levels of label availability, and number of clients.
nan
Article 527
Title@2025-05-28 (3): Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring
Title: Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring | Jenseits von externen Monitoren: Verbesserung der Transparenz von großen Sprachmodellen für eine einfachere Überwachung | 外部监测之外的外部监测:提高大语言模型的透明度,促进更易监测 2502.05242v2 |
Authors: Guanxu Chen, Dongrui Liu, Tao Luo, Lijie Hu, Jing Shao
Large language models (LLMs) are becoming increasingly capable, but the mechanisms of their thinking and decision-making process remain unclear. Chain-of-thoughts (CoTs) have been commonly utilized to monitor LLMs, but this strategy fails to accurately reflect LLMs’ thinking process. Techniques based on LLMs’ hidden representations provide an inner perspective to monitor their latent thinking. However, previous methods only try to develop external monitors instead of making LLMs themselves easier to monitor. In this paper, we propose a novel method TELLME, improving the transparency of LLMs and helping monitors identify unsuitable and sensitive behaviors. Furthermore, we showcase the applications of TELLME on trustworthiness tasks (\eg, safety risks monitoring tasks and detoxification tasks), where LLMs achieve consistent improvement in transparency and task performance. More crucially, we theoretically analyze the improvement of TELLME on LLMs’ generalization ability through optimal transport theory.
nan
Article 528
Title@2025-05-28 (3): BILBO: BILevel Bayesian Optimization
Title: BILBO: BILevel Bayesian Optimization | BILBO: BILevel Bayesian Optimierung | BILBO: BI级巴耶斯最佳优化 2502.02121v2 |
Authors: Ruth Wan Theng Chew, Quoc Phong Nguyen, Bryan Kian Hsiang Low
Bilevel optimization is characterized by a two-level optimization structure, where the upper-level problem is constrained by optimal lower-level solutions, and such structures are prevalent in real-world problems. The constraint by optimal lower-level solutions poses significant challenges, especially in noisy, constrained, and derivative-free settings, as repeating lower-level optimizations is sample inefficient and predicted lower-level solutions may be suboptimal. We present BILevel Bayesian Optimization (BILBO), a novel Bayesian optimization algorithm for general bilevel problems with blackbox functions, which optimizes both upper- and lower-level problems simultaneously, without the repeated lower-level optimization required by existing methods. BILBO samples from confidence-bounds based trusted sets, which bounds the suboptimality on the lower level. Moreover, BILBO selects only one function query per iteration, where the function query selection strategy incorporates the uncertainty of estimated lower-level solutions and includes a conditional reassignment of the query to encourage exploration of the lower-level objective. The performance of BILBO is theoretically guaranteed with a sublinear regret bound for commonly used kernels and is empirically evaluated on several synthetic and real-world problems.
nan
Article 529
Title@2025-05-28 (3): Simultaneously Solving FBSDEs and their Associated Semilinear Elliptic PDEs with Small Neural Operators
Title: Simultaneously Solving FBSDEs and their Associated Semilinear Elliptic PDEs with Small Neural Operators | Gleichzeitige Lösung von FBSDs und ihren zugehörigen semilinearen elliptischen PDEs mit kleinen neuralen Operatoren | 与小型神经操作器同时解决FBSDEs及其相关半线性椭圆形粒体 2410.14788v2 |
Authors: Takashi Furuya, Anastasis Kratsios
Forward-backwards stochastic differential equations (FBSDEs) play an important role in optimal control, game theory, economics, mathematical finance, and in reinforcement learning. Unfortunately, the available FBSDE solvers operate on \textit{individual} FBSDEs, meaning that they cannot provide a computationally feasible strategy for solving large families of FBSDEs, as these solvers must be re-run several times. \textit{Neural operators} (NOs) offer an alternative approach for \textit{simultaneously solving} large families of decoupled FBSDEs by directly approximating the solution operator mapping \textit{inputs:} terminal conditions and dynamics of the backwards process to \textit{outputs:} solutions to the associated FBSDE. Though universal approximation theorems (UATs) guarantee the existence of such NOs, these NOs are unrealistically large. Upon making only a few simple theoretically-guided tweaks to the standard convolutional NO build, we confirm that ``small’’ NOs can uniformly approximate the solution operator to structured families of FBSDEs with random terminal time, uniformly on suitable compact sets determined by Sobolev norms using a logarithmic depth, a constant width, and a polynomial rank in the reciprocal approximation error. This result is rooted in our second result, and main contribution to the NOs for PDE literature, showing that our convolutional NOs of similar depth and width but grow only \textit{quadratically} (at a dimension-free rate) when uniformly approximating the solution operator of the associated class of semilinear Elliptic PDEs to these families of FBSDEs. A key insight into how NOs work we uncover is that the convolutional layers of our NO can approximately implement the fixed point iteration used to prove the existence of a unique solution to these semilinear Elliptic PDEs.
nan
Article 530
Title@2025-05-28 (3): Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Title: Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing | Inferenz-Time Scaling für Flow-Modelle über stochastische Generation und Rollover Budget Forcing | 通过存储器生成和滚转预算推力对流动模型的推推时间调整 2503.19385v4 |
Authors: Jaihoon Kim, Taehoon Yoon, Jisung Hwang, Minhyuk Sung
We propose an inference-time scaling approach for pretrained flow models. Recently, inference-time scaling has gained significant attention in LLMs and diffusion models, improving sample quality or better aligning outputs with user preferences by leveraging additional computation. For diffusion models, particle sampling has allowed more efficient scaling due to the stochasticity at intermediate denoising steps. On the contrary, while flow models have gained popularity as an alternative to diffusion models–offering faster generation and high-quality outputs in state-of-the-art image and video generative models–efficient inference-time scaling methods used for diffusion models cannot be directly applied due to their deterministic generative process. To enable efficient inference-time scaling for flow models, we propose three key ideas: 1) SDE-based generation, enabling particle sampling in flow models, 2) Interpolant conversion, broadening the search space and enhancing sample diversity, and 3) Rollover Budget Forcing (RBF), an adaptive allocation of computational resources across timesteps to maximize budget utilization. Our experiments show that SDE-based generation, particularly variance-preserving (VP) interpolant-based generation, improves the performance of particle sampling methods for inference-time scaling in flow models. Additionally, we demonstrate that RBF with VP-SDE achieves the best performance, outperforming all previous inference-time scaling approaches.
nan
Article 531
Title@2025-05-28 (3): Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation
Title: Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation | Physik-informierte Destillation von Diffusionsmodellen für PDE-kontrainierte Generation | PDE - 受培训的一代的传播模型的物理改造 2505.22391v1 |
Authors: Yi Zhang, Difan Zou
Modeling physical systems in a generative manner offers several advantages, including the ability to handle partial observations, generate diverse solutions, and address both forward and inverse problems. Recently, diffusion models have gained increasing attention in the modeling of physical systems, particularly those governed by partial differential equations (PDEs). However, diffusion models only access noisy data $\boldsymbol{x}_t$ at intermediate steps, making it infeasible to directly enforce constraints on the clean sample $\boldsymbol{x}_0$ at each noisy level. As a workaround, constraints are typically applied to the expectation of clean samples $\mathbb{E}[\boldsymbol{x}_0 | \boldsymbol{x}_t]$, which is estimated using the learned score network. However, imposing PDE constraints on the expectation does not strictly represent the one on the true clean data, known as Jensen’s Gap. This gap creates a trade-off: enforcing PDE constraints may come at the cost of reduced accuracy in generative modeling. To address this, we propose a simple yet effective post-hoc distillation approach, where PDE constraints are not injected directly into the diffusion process, but instead enforced during a post-hoc distillation stage. We term our method as Physics-Informed Distillation of Diffusion Models (PIDDM). This distillation not only facilitates single-step generation with improved PDE satisfaction, but also support both forward and inverse problem solving and reconstruction from randomly partial observation. Extensive experiments across various PDE benchmarks demonstrate that PIDDM significantly improves PDE satisfaction over several recent and competitive baselines, such as PIDM, DiffusionPDE, and ECI-sampling, with less computation overhead. Our approach can shed light on more efficient and effective strategies for incorporating physical constraints into diffusion models. |
nan
Article 532
Title@2025-05-28 (3): Revisiting Feature Interactions from the Perspective of Quadratic Neural Networks for Click-through Rate Prediction
Title: Revisiting Feature Interactions from the Perspective of Quadratic Neural Networks for Click-through Rate Prediction | Überprüfung von Feature-Interaktionen aus der Perspektive quadratischer neuraler Netzwerke für Click-through-Rate-Vorhersage | 从 “ 点击通速率预测 “ 四方神经网络的角度重新审视地貌相互作用 2505.17999v2 |
Authors: Honghao Li, Yiwen Zhang, Yi Zhang, Lei Sang, Jieming Zhu
Hadamard Product (HP) has long been a cornerstone in click-through rate (CTR) prediction tasks due to its simplicity, effectiveness, and ability to capture feature interactions without additional parameters. However, the underlying reasons for its effectiveness remain unclear. In this paper, we revisit HP from the perspective of Quadratic Neural Networks (QNN), which leverage quadratic interaction terms to model complex feature relationships. We further reveal QNN’s ability to expand the feature space and provide smooth nonlinear approximations without relying on activation functions. Meanwhile, we find that traditional post-activation does not further improve the performance of the QNN. Instead, mid-activation is a more suitable alternative. Through theoretical analysis and empirical evaluation of 25 QNN neuron formats, we identify a good-performing variant and make further enhancements on it. Specifically, we propose the Multi-Head Khatri-Rao Product as a superior alternative to HP and a Self-Ensemble Loss with dynamic ensemble capability within the same network to enhance computational efficiency and performance. Ultimately, we propose a novel neuron format, QNN-alpha, which is tailored for CTR prediction tasks. Experimental results show that QNN-alpha achieves new state-of-the-art performance on six public datasets while maintaining low inference latency, good scalability, and excellent compatibility. The code, running logs, and detailed hyperparameter configurations are available at: https://github.com/salmon1802/QNN.
nan
Article 533
Title@2025-05-28 (3): DAM: Domain-Aware Module for Multi-Domain Dataset Condensation
Title: DAM: Domain-Aware Module for Multi-Domain Dataset Condensation | DAM: Domain-Aware-Modul für Multi-Domain-Datensatz-Kondensation | DAM: 多域数据集集中的域- 软件模块 2505.22387v1 |
Authors: Jaehyun Choi, Gyojin Han, Dong-Jae Lee, Sunghyun Baek, Junmo Kim
Dataset Condensation (DC) has emerged as a promising solution to mitigate the computational and storage burdens associated with training deep learning models. However, existing DC methods largely overlook the multi-domain nature of modern datasets, which are increasingly composed of heterogeneous images spanning multiple domains. In this paper, we extend DC and introduce Multi-Domain Dataset Condensation (MDDC), which aims to condense data that generalizes across both single-domain and multi-domain settings. To this end, we propose the Domain-Aware Module (DAM), a training-time module that embeds domain-related features into each synthetic image via learnable spatial masks. As explicit domain labels are mostly unavailable in real-world datasets, we employ frequency-based pseudo-domain labeling, which leverages low-frequency amplitude statistics. DAM is only active during the condensation process, thus preserving the same images per class (IPC) with prior methods. Experiments show that DAM consistently improves in-domain, out-of-domain, and cross-architecture performance over baseline dataset condensation methods.
nan
Article 534
Title@2025-05-28 (3): When do neural networks learn world models?
Title: When do neural networks learn world models? | Wann lernen neuronale Netzwerke Weltmodelle? | 神经网络何时学习世界模型? 2502.09297v3 |
Authors: Tianren Zhang, Guanyu Chen, Feng Chen
Humans develop world models that capture the underlying generation process of data. Whether neural networks can learn similar world models remains an open problem. In this work, we present the first theoretical results for this problem, showing that in a multi-task setting, models with a low-degree bias provably recover latent data-generating variables under mild assumptions – even if proxy tasks involve complex, non-linear functions of the latents. However, such recovery is sensitive to model architecture. Our analysis leverages Boolean models of task solutions via the Fourier-Walsh transform and introduces new techniques for analyzing invertible Boolean transforms, which may be of independent interest. We illustrate the algorithmic implications of our results and connect them to related research areas, including self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.
nan
Article 535
Title@2025-05-28 (3): Infinite-dimensional Mahalanobis Distance with Applications to Kernelized Novelty Detection
Title: Infinite-dimensional Mahalanobis Distance with Applications to Kernelized Novelty Detection | Infinite-dimensionale Mahalanobis-Distanz mit Anwendungen zur kernisierten Neuheitserkennung | 无限的马哈拉诺比斯距离,应用内核新闻探测技术 2407.11873v2 |
Authors: Nikita Zozoulenko, Thomas Cass, Lukas Gonon
The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in $\bbR^d$. In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a Cameron-Martin norm associated with a probability measure. This approach leads to a basis-free, data-driven notion of anomaly distance through the so-called variance norm, which can naturally be estimated using empirical measures of a sample. Our framework generalizes the classical $\bbR^d$, functional $(L^2[0,1])^d$, and kernelized settings; importantly, it incorporates non-injective covariance operators. We prove that the variance norm is invariant under invertible bounded linear transformations of the data, extending previous results which are limited to unitary operators. In the Hilbert space setting, we connect the variance norm to the RKHS of the covariance operator and establish consistency and convergence results for estimation using empirical measures. Using the variance norm, we introduce the notion of a kernelized nearest-neighbour Mahalanobis distance. In an empirical study on 12 real-world data sets, we demonstrate that the kernelized nearest-neighbour Mahalanobis distance outperforms the traditional kernelized Mahalanobis distance for multivariate time series novelty detection, using state-of-the-art time series kernels such as the signature, global alignment, and Volterra reservoir kernels.
nan
Article 536
Title@2025-05-28 (3): Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning
Title: Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning | Überwindung von Dimensional Factorization Limits in diskreten Diffusionsmodellen durch Quantum Joint Distribution Learning | 通过量子联合分发学习克服分辨传播模式中的分量限制 2505.05151v2 |
Authors: Chuangtao Chen, Qinglin Zhao, MengChu Zhou, Zhimin He, Haozhen Situ
This study explores quantum-enhanced discrete diffusion models to overcome classical limitations in learning high-dimensional distributions. We rigorously prove that classical discrete diffusion models, which calculate per-dimension transition probabilities to avoid exponential computational cost, exhibit worst-case linear scaling of Kullback-Leibler (KL) divergence with data dimension. To address this, we propose a Quantum Discrete Denoising Diffusion Probabilistic Model (QD3PM), which enables joint probability learning through diffusion and denoising in exponentially large Hilbert spaces. By deriving posterior states through quantum Bayes’ theorem, similar to the crucial role of posterior probabilities in classical diffusion models, and by learning the joint probability, we establish a solid theoretical foundation for quantum-enhanced diffusion models. For denoising, we design a quantum circuit using temporal information for parameter sharing and learnable classical-data-controlled rotations for encoding. Exploiting joint distribution learning, our approach enables single-step sampling from pure noise, eliminating iterative requirements of existing models. Simulations demonstrate the proposed model’s superior accuracy in modeling complex distributions compared to factorization methods. Hence, this paper establishes a new theoretical paradigm in generative models by leveraging the quantum advantage in joint distribution learning.
nan
Article 537
Title@2025-05-28 (3): A Divide-and-Conquer Approach for Modeling Arrival Times in Business Process Simulation
Title: A Divide-and-Conquer Approach for Modeling Arrival Times in Business Process Simulation | Ein Divide-and-Conquer-Ansatz für die Modellierung von Ankunftszeiten in der Business Process Simulation | 在模拟商业进程中模拟抵达时 2505.22381v1 |
Authors: Lukas Kirchdorfer, Konrad Özdemir, Stjepan Kusenic, Han van der Aa, Heiner Stuckenschmidt
Business Process Simulation (BPS) is a critical tool for analyzing and improving organizational processes by estimating the impact of process changes. A key component of BPS is the case-arrival model, which determines the pattern of new case entries into a process. Although accurate case-arrival modeling is essential for reliable simulations, as it influences waiting and overall cycle times, existing approaches often rely on oversimplified static distributions of inter-arrival times. These approaches fail to capture the dynamic and temporal complexities inherent in organizational environments, leading to less accurate and reliable outcomes. To address this limitation, we propose Auto Time Kernel Density Estimation (AT-KDE), a divide-and-conquer approach that models arrival times of processes by incorporating global dynamics, day-of-week variations, and intraday distributional changes, ensuring both precision and scalability. Experiments conducted across 20 diverse processes demonstrate that AT-KDE is far more accurate and robust than existing approaches while maintaining sensible execution time efficiency.
nan
Article 538
Title@2025-05-28 (3): Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association
Title: Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association | Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association | Lipschitz-Driven 不确定性为空间协会量化 2502.06067v2 |
Authors: David R. Burt, Renato Berlinghieri, Stephen Bates, Tamara Broderick
Estimating associations between spatial covariates and responses - rather than merely predicting responses - is central to environmental science, epidemiology, and economics. For instance, public health officials might be interested in whether air pollution has a strictly positive association with a health outcome, and the magnitude of any effect. Standard machine learning methods often provide accurate predictions but offer limited insight into covariate-response relationships. And we show that existing methods for constructing confidence (or credible) intervals for associations fail to provide nominal coverage in the face of model misspecification and distribution shift - despite both being essentially always present in spatial problems. We introduce a method that constructs valid frequentist confidence intervals for associations in spatial settings. Our method requires minimal assumptions beyond a form of spatial smoothness. In particular, we do not require model correctness or covariate overlap between training and target locations. Our approach is the first to guarantee nominal coverage in this setting and outperforms existing techniques in both real and simulated experiments.
nan
Article 539
Title@2025-05-28 (3): Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization
Title: Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization | Memento No More: Coaching von KI-Agenten zu Master mehrere Aufgaben durch Hinweise Internalisierung | 不再纪念:通过Hints内部化,指导AI代理人员掌握多项任务 2502.01562v2 |
Authors: Minttu Alakuijala, Ya Gao, Georgy Ananov, Samuel Kaski, Pekka Marttinen, Alexander Ilin, Harri Valpola
As the general capabilities of artificial intelligence (AI) agents continue to evolve, their ability to learn to master multiple complex tasks through experience remains a key challenge. Current LLM agents, particularly those based on proprietary language models, typically rely on prompts to incorporate knowledge about the target tasks. This approach does not allow the agent to internalize this information and instead relies on ever-expanding prompts to sustain its functionality in diverse scenarios. This resembles a system of notes used by a person affected by anterograde amnesia, the inability to form new memories. In this paper, we propose a novel method to train AI agents to incorporate knowledge and skills for multiple tasks without the need for either cumbersome note systems or prior high-quality demonstration data. Our approach employs an iterative process where the agent collects new experiences, receives corrective feedback from humans in the form of hints, and integrates this feedback into its weights via a context distillation training procedure. We demonstrate the efficacy of our approach by implementing it in a Llama-3-based agent that, after only a few rounds of feedback, outperforms advanced models GPT-4o and DeepSeek-V3 in tasksets requiring correct sequencing of information retrieval, tool use, and question answering.
nan
Article 540
Title@2025-05-28 (3): Update Your Transformer to the Latest Release: Re-Basin of Task Vectors
Title: Update Your Transformer to the Latest Release: Re-Basin of Task Vectors | Aktualisieren Sie Ihren Transformer auf die neueste Version: Re-Basin der Task-Vektoren | 将您的变换器更新为最新版本: 任务矢量的重新 Basin 2505.22697v1 |
Authors: Filippo Rinaldi, Giacomo Capitani, Lorenzo Bonicelli, Donato Crisostomi, Federico Bolelli, Elisa Ficarra, Emanuele Rodolà, Simone Calderara, Angelo Porrello
Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is updated or retrained (e.g., on larger and more curated datasets), the fine-tuned model becomes obsolete, losing its utility and requiring retraining. This raises the question: is it possible to transfer fine-tuning to a new release of the model? In this work, we investigate how to transfer fine-tuning to a new checkpoint without having to re-train, in a data-free manner. To do so, we draw principles from model re-basin and provide a recipe based on weight permutations to re-base the modifications made to the original base model, often called task vector. In particular, our approach tailors model re-basin for Transformer models, taking into account the challenges of residual connections and multi-head attention layers. Specifically, we propose a two-level method rooted in spectral theory, initially permuting the attention heads and subsequently adjusting parameters within select pairs of heads. Through extensive experiments on visual and textual tasks, we achieve the seamless transfer of fine-tuned knowledge to new pre-trained backbones without relying on a single training step or datapoint. Code is available at https://github.com/aimagelab/TransFusion.
nan
Article 541
Title@2025-05-28 (3): An Empirical Evaluation of Rewiring Approaches in Graph Neural Networks
Title: An Empirical Evaluation of Rewiring Approaches in Graph Neural Networks | Eine empirische Bewertung der Verdrahtungsansätze in Graphen-Neuralen Netzwerken | 对图形神经网络重新布线方法的经验评价 2305.19717v2 |
Authors: Alessio Micheli, Domenico Tortorella
Graph neural networks compute node representations by performing multiple message-passing steps that consist in local aggregations of node features. Having deep models that can leverage longer-range interactions between nodes is hindered by the issues of over-smoothing and over-squashing. In particular, the latter is attributed to the graph topology which guides the message-passing, causing a node representation to become insensitive to information contained at distant nodes. Many graph rewiring methods have been proposed to remedy or mitigate this problem. However, properly evaluating the benefits of these methods is made difficult by the coupling of over-squashing with other issues strictly related to model training, such as vanishing gradients. Therefore, we propose an evaluation setting based on message-passing models that do not require training to compute node and graph representations. We perform a systematic experimental comparison on real-world node and graph classification tasks, showing that rewiring the underlying graph rarely does confer a practical benefit for message-passing.
nan
Article 542
Title@2025-05-28 (3): Topological Eigenvalue Theorems for Tensor Analysis in Multi-Modal Data Fusion
Title: Topological Eigenvalue Theorems for Tensor Analysis in Multi-Modal Data Fusion | Topologische Eigenwert-Theoreme für die Tensoranalyse in multi-Modal Data Fusion | 多模式数据融合中用于天线分析的多模式数据融合中的表光分析的表性地球价值地形学理论论 2409.09392v3 |
Authors: Ronald Katende
This paper presents a novel framework for tensor eigenvalue analysis in the context of multi-modal data fusion, leveraging topological invariants such as Betti numbers. Traditional approaches to tensor eigenvalue analysis often extend matrix theory, whereas this work introduces a topological perspective to enhance the understanding of tensor structures. By establishing new theorems that link eigenvalues to topological features, the proposed framework provides deeper insights into the latent structure of data, improving both interpretability and robustness. Applications in data fusion demonstrate the theoretical and practical significance of this approach, with potential for broad impact in machine learning and data science.
nan
Article 543
Title@2025-05-28 (3): Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows
Title: Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows | Computing Optimal Transport Maps und Wasserstein Barycenter mit bedingten Normalisierungsflüssen | 使用条件性正常流动的最佳运输地图和瓦塞尔斯坦百分点 2505.22364v1 |
Authors: Gabriele Visentin, Patrick Cheridito
We present a novel method for efficiently computing optimal transport maps and Wasserstein barycenters in high-dimensional spaces. Our approach uses conditional normalizing flows to approximate the input distributions as invertible pushforward transformations from a common latent space. This makes it possible to directly solve the primal problem using gradient-based minimization of the transport cost, unlike previous methods that rely on dual formulations and complex adversarial optimization. We show how this approach can be extended to compute Wasserstein barycenters by solving a conditional variance minimization problem. A key advantage of our conditional architecture is that it enables the computation of barycenters for hundreds of input distributions, which was computationally infeasible with previous methods. Our numerical experiments illustrate that our approach yields accurate results across various high-dimensional tasks and compares favorably with previous state-of-the-art methods.
nan
Article 544
Title@2025-05-28 (3): Directed Homophily-Aware Graph Neural Network
Title: Directed Homophily-Aware Graph Neural Network | Regie führte homophily-aware Graph Neural Network | 直导光电图神经网络 2505.22362v1 |
Authors: Aihu Zhang, Jiaxing Xu, Mengcheng Lan, Shili Xiang, Yiping Ke
Graph Neural Networks (GNNs) have achieved significant success in various learning tasks on graph-structured data. Nevertheless, most GNNs struggle to generalize to heterophilic neighborhoods. Additionally, many GNNs ignore the directional nature of real-world graphs, resulting in suboptimal performance on directed graphs with asymmetric structures. In this work, we propose Directed Homophily-aware Graph Neural Network (DHGNN), a novel framework that addresses these limitations by incorporating homophily-aware and direction-sensitive components. DHGNN employs a resettable gating mechanism to adaptively modulate message contributions based on homophily levels and informativeness, and a structure-aware noise-tolerant fusion module to effectively integrate node representations from the original and reverse directions. Extensive experiments on both homophilic and heterophilic directed graph datasets demonstrate that DHGNN outperforms state-of-the-art methods in node classification and link prediction. In particular, DHGNN improves over the best baseline by up to 15.07% in link prediction. Our analysis further shows that the gating mechanism captures directional homophily gaps and fluctuating homophily across layers, providing deeper insights into message-passing behavior on complex graph structures.
nan
Article 545
Title@2025-05-28 (3): Continuum-armed Bandit Optimization with Batch Pairwise Comparison Oracles
Title: Continuum-armed Bandit Optimization with Batch Pairwise Comparison Oracles | Kontinuierliche Bandit-Optimierung mit Batch Pairwise Vergleich Oracles | 以批次对称比较甲骨文优化利用批次对称比较 2505.22361v1 |
Authors: Xiangyu Chang, Xi Chen, Yining Wang, Zhiyi Zeng
This paper studies a bandit optimization problem where the goal is to maximize a function $f(x)$ over $T$ periods for some unknown strongly concave function $f$. We consider a new pairwise comparison oracle, where the decision-maker chooses a pair of actions $(x, x’)$ for a consecutive number of periods and then obtains an estimate of $f(x)-f(x’)$. We show that such a pairwise comparison oracle finds important applications to joint pricing and inventory replenishment problems and network revenue management. The challenge in this bandit optimization is twofold. First, the decision-maker not only needs to determine a pair of actions $(x, x’)$ but also a stopping time $n$ (i.e., the number of queries based on $(x, x’)$). Second, motivated by our inventory application, the estimate of the difference $f(x)-f(x’)$ is biased, which is different from existing oracles in stochastic optimization literature. To address these challenges, we first introduce a discretization technique and local polynomial approximation to relate this problem to linear bandits. Then we developed a tournament successive elimination technique to localize the discretized cell and run an interactive batched version of LinUCB algorithm on cells. We establish regret bounds that are optimal up to poly-logarithmic factors. Furthermore, we apply our proposed algorithm and analytical framework to the two operations management problems and obtain results that improve state-of-the-art results in the existing literature.
nan
Article 546
Title@2025-05-28 (3): Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification
Title: Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification | Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification | 多级损失 多级损失 多级 损 失 多级 分 分 分 分 化 中 梯 源 普遍化的多级 几何事项 2505.22359v1 |
Authors: Matan Schliserman, Tomer Koren
We study the generalization performance of unregularized gradient methods for separable linear classification. While previous work mostly deal with the binary case, we focus on the multiclass setting with $k$ classes and establish novel population risk bounds for Gradient Descent for loss functions that decay to zero. In this setting, we show risk bounds that reveal that convergence rates are crucially influenced by the geometry of the loss template, as formalized by Wang and Scott (2024), rather than of the loss function itself. Particularly, we establish risk upper bounds that holds for any decay rate of the loss whose template is smooth with respect to the $p$-norm. In the case of exponentially decaying losses, our results indicates a contrast between the $p=\infty$ case, where the risk exhibits a logarithmic dependence on $k$, and $p=2$ where the risk scales linearly with $k$. To establish this separation formally, we also prove a lower bound in the latter scenario, demonstrating that the polynomial dependence on $k$ is unavoidable. Central to our analysis is a novel bound on the Rademacher complexity of low-noise vector-valued linear predictors with a loss template smooth w.r.t.~general $p$-norms.
nan
Article 547
Title@2025-05-28 (3): Budget-Adaptive Adapter Tuning in Orthogonal Subspaces for Continual Learning in LLMs
Title: Budget-Adaptive Adapter Tuning in Orthogonal Subspaces for Continual Learning in LLMs | Budget-Adaptive Adapter Tuning in Orthogonal Subspaces für kontinuierliches Lernen in LLMs | 用于LLMM中持续学习的正方形子空间的预算-ADA 预算-ADA 调适器图案 2505.22358v1 |
Authors: Zhiyi Wan, Wanrou Du, Liang Li, Miao Pan, Xiaoqi Qin
Large language models (LLMs) often suffer from catastrophic forgetting in continual learning (CL) scenarios, where performance on previously learned tasks degrades severely while training on sequentially arriving tasks. Although pioneering CL approaches using orthogonal subspaces can mitigate task interference, they typically employ fixed budget allocation, neglecting the varying complexity across tasks and layers. Besides, recent budget-adaptive tuning methods for LLMs often adopt multi-stage paradigms that decouple optimization and budget allocation. Such decoupling results in potential misalignment, which hinders those approaches’ practical application in CL scenarios. To address these limitations, we propose OA-Adapter, a novel parameter-efficient approach for continual learning in LLMs that unifies dynamic budget adaptation with orthogonal subspace learning in a single end-to-end training stage. Specifically, OA-Adapter introduces a dynamic bottleneck dimension adaptation mechanism that simultaneously allocates an efficient parameter budget and optimizes task objectives without misalignment. To effectively preserve previously acquired knowledge while coordinating with the dynamic budget allocation, orthogonal constraints are applied specifically between the parameter subspace of the current task and the dynamically allocated parameter subspaces of historical tasks. Experimental results on continual learning benchmarks demonstrate that OA-Adapter outperforms state-of-the-art methods in both accuracy and parameter efficiency, achieving higher average accuracy while using 58.5% fewer parameters on the standard CL benchmark.
nan
Article 548
Title@2025-05-28 (3): Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings
Title: Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings | Eignungsfilter: Ein statistisches Rahmenwerk für die Klassifikator-Evaluierung in Real-World-Einsatzeinstellungen | 适用性过滤器:在现实世界部署设置中进行分类评价的统计框架 2505.22356v1 |
Authors: Angéline Pouget, Mohammad Yaghini, Stephan Rabanser, Nicolas Papernot
Deploying machine learning models in safety-critical domains poses a key challenge: ensuring reliable model performance on downstream user data without access to ground truth labels for direct validation. We propose the suitability filter, a novel framework designed to detect performance deterioration by utilizing suitability signals – model output features that are sensitive to covariate shifts and indicative of potential prediction errors. The suitability filter evaluates whether classifier accuracy on unlabeled user data shows significant degradation compared to the accuracy measured on the labeled test dataset. Specifically, it ensures that this degradation does not exceed a pre-specified margin, which represents the maximum acceptable drop in accuracy. To achieve reliable performance evaluation, we aggregate suitability signals for both test and user data and compare these empirical distributions using statistical hypothesis testing, thus providing insights into decision uncertainty. Our modular method adapts to various models and domains. Empirical evaluations across different classification tasks demonstrate that the suitability filter reliably detects performance deviations due to covariate shift. This enables proactive mitigation of potential failures in high-stakes applications.
nan
Article 549
Title@2025-05-28 (3): Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning
Title: Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning | Schauen Sie nach innen oder schauen Sie darüber hinaus? Ein theoretischer Vergleich zwischen Parameter-Effizient und Full Fine-Tuning | 内观还是外观? 参数有效与完全精准之间的理论比较。 2505.22355v1 |
Authors: Yongkang Liu, Xingle Xu, Ercong Nie, Zijing Wang, Shi Feng, Daling Wang, Qian Li, Hinrich Schütze
Parameter-Efficient Fine-Tuning (PEFT) methods achieve performance comparable to Full Fine-Tuning (FFT) while requiring significantly fewer computing resources, making it the go-to choice for researchers. We find that although PEFT can achieve competitive results on some benchmarks, its performance falls short of FFT in complex tasks, such as reasoning and instruction-based fine-tuning. In this paper, we compare the characteristics of PEFT and FFT in terms of representational capacity and robustness based on optimization theory. We theoretically demonstrate that PEFT is a strict subset of FFT. By providing theoretical upper bounds for PEFT, we show that the limited parameter space constrains the model’s representational ability, making it more susceptible to perturbations. Experiments on 15 datasets encompassing classification, generation, reasoning, instruction fine-tuning tasks and 11 adversarial test sets validate our theories. We hope that these results spark further research beyond the realms of well established PEFT. The source code is in the anonymous Github repository\footnote{https://github.com/misonsky/PEFTEval}.
nan
Article 550
Title@2025-05-28 (3): Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing
Title: Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing | Kontext-sensible neocortical Neuronen verwandeln die Wirksamkeit und Effizienz der neuronalen Informationsverarbeitung | 环境敏感的新园艺神经元改变神经信息处理的效益和效率 2207.07338v7 |
Authors: Khubaib Ahmed, Ahsan Adeel, Mario Franco, Mohsin Raza
Deep learning (DL) has big-data processing capabilities that are as good, or even better, than those of humans in many real-world domains, but at the cost of high energy requirements that may be unsustainable in some applications and of errors, that, though infrequent, can be large. We hypothesise that a fundamental weakness of DL lies in its intrinsic dependence on integrate-and-fire point neurons that maximise information transmission irrespective of whether it is relevant in the current context or not. This leads to unnecessary neural firing and to the feedforward transmission of conflicting messages, which makes learning difficult and processing energy inefficient. Here we show how to circumvent these limitations by mimicking the capabilities of context-sensitive neocortical neurons that receive input from diverse sources as a context to amplify and attenuate the transmission of relevant and irrelevant information, respectively. We demonstrate that a deep network composed of such local processors seeks to maximise agreement between the active neurons, thus restricting the transmission of conflicting information to higher levels and reducing the neural activity required to process large amounts of heterogeneous real-world data. As shown to be far more effective and efficient than current forms of DL, this two-point neuron study offers a possible step-change in transforming the cellular foundations of deep network architectures.
nan
Article 551
Title@2025-05-28 (3): AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
Title: AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings | AKRMap: Adaptive Kernel-Regression für vertrauenswürdige Visualisierung von Cross-Modal-Embeddings | AKRMap:跨模式嵌入的可信赖可视化的适应性内核倒退 2505.14664v2 |
Authors: Yilin Ye, Junchao Huang, Xingchen Zeng, Jiazhi Xia, Wei Zeng
Cross-modal embeddings form the foundation for multi-modal models. However, visualization methods for interpreting cross-modal embeddings have been primarily confined to traditional dimensionality reduction (DR) techniques like PCA and t-SNE. These DR methods primarily focus on feature distributions within a single modality, whilst failing to incorporate metrics (e.g., CLIPScore) across multiple modalities. This paper introduces AKRMap, a new DR technique designed to visualize cross-modal embeddings metric with enhanced accuracy by learning kernel regression of the metric landscape in the projection space. Specifically, AKRMap constructs a supervised projection network guided by a post-projection kernel regression loss, and employs adaptive generalized kernels that can be jointly optimized with the projection. This approach enables AKRMap to efficiently generate visualizations that capture complex metric distributions, while also supporting interactive features such as zoom and overlay for deeper exploration. Quantitative experiments demonstrate that AKRMap outperforms existing DR methods in generating more accurate and trustworthy visualizations. We further showcase the effectiveness of AKRMap in visualizing and comparing cross-modal embeddings for text-to-image models. Code and demo are available at https://github.com/yilinye/AKRMap.
nan
Article 552
Title@2025-05-28 (3): Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training
Title: Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training | Progressive Data Dropout: Ein verblüffend einfacher Ansatz zum schnelleren Training | 渐进数据辍学:快速培训的一个令人尴尬的简单方法 2505.22342v1 |
Authors: Shriram M S, Xinyue Hao, Shihao Hou, Yang Lu, Laura Sevilla-Lara, Anurag Arnab, Shreyank N Gowda
The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwined factors: the size of models and the size of datasets. While promising research efforts focus on reducing the size of models, the other half of the equation remains fairly mysterious. Indeed, it is surprising that the standard approach to training remains to iterate over and over, uniformly sampling the training dataset. In this paper we explore a series of alternative training paradigms that leverage insights from hard-data-mining and dropout, simple enough to implement and use that can become the new training standard. The proposed Progressive Data Dropout reduces the number of effective epochs to as little as 12.4% of the baseline. This savings actually do not come at any cost for accuracy. Surprisingly, the proposed method improves accuracy by up to 4.82%. Our approach requires no changes to model architecture or optimizer, and can be applied across standard training pipelines, thus posing an excellent opportunity for wide adoption. Code can be found here: https://github.com/bazyagami/LearningWithRevision
nan
Article 553
Title@2025-05-28 (3): Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
Title: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start | Multimodale Reasoning durch verstärktes Lernen mit kaltem Start fördern | 通过 “ 冷起 “ 的强化学习推进多模式理由 2505.22334v1 |
Authors: Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang
Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress. While “aha moment” patterns–where models exhibit self-correction through reflection–are often attributed to emergent properties from RL, we first demonstrate that these patterns exist in multimodal LLMs (MLLMs) prior to RL training but may not necessarily correlate with improved reasoning performance. Building on these insights, we present a comprehensive study on enhancing multimodal reasoning through a two-stage approach: (1) supervised fine-tuning (SFT) as a cold start with structured chain-of-thought reasoning patterns, followed by (2) reinforcement learning via GRPO to further refine these capabilities. Our extensive experiments show that this combined approach consistently outperforms both SFT-only and RL-only methods across challenging multimodal reasoning benchmarks. The resulting models achieve state-of-the-art performance among open-source MLLMs at both 3B and 7B scales, with our 7B model showing substantial improvements over base models (e.g., 66.3 %$\rightarrow$73.4 % on MathVista, 62.9 %$\rightarrow$70.4 % on We-Math) and our 3B model achieving performance competitive with several 7B models. Overall, this work provides practical guidance for building advanced multimodal reasoning models. Our code is available at https://github.com/waltonfuture/RL-with-Cold-Start.
nan
Article 554
Title@2025-05-28 (3): Credal Prediction based on Relative Likelihood
Title: Credal Prediction based on Relative Likelihood | Credal Prediction basierend auf relativer Likelihood | 基于相对可能性的裂变预测 2505.22332v1 |
Authors: Timo Löhr, Paul Hofman, Felix Mohr, Eyke Hüllermeier
Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner’s epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction based on the statistical notion of relative likelihood: The target of prediction is the set of all (conditional) probability distributions produced by the collection of plausible models, namely those models whose relative likelihood exceeds a specified threshold. This threshold has an intuitive interpretation and allows for controlling the trade-off between correctness and precision of credal predictions. We tackle the problem of approximating credal sets defined in this way by means of suitably modified ensemble learning techniques. To validate our approach, we illustrate its effectiveness by experiments on benchmark datasets demonstrating superior uncertainty representation without compromising predictive performance. We also compare our method against several state-of-the-art baselines in credal prediction.
nan
Article 555
Title@2025-05-28 (3): Learning in Stackelberg Games with Non-myopic Agents
Title: Learning in Stackelberg Games with Non-myopic Agents | Lernen in Stackelberg Spiele mit nicht-myopischen Agenten | 学习与非中色剂在斯塔克尔贝格运动会中的学习 2208.09407v3 |
Authors: Nika Haghtalab, Thodoris Lykouris, Sloan Nietert, Alexander Wei
We study Stackelberg games where a principal repeatedly interacts with a non-myopic long-lived agent, without knowing the agent’s payoff function. Although learning in Stackelberg games is well-understood when the agent is myopic, dealing with non-myopic agents poses additional complications. In particular, non-myopic agents may strategize and select actions that are inferior in the present in order to mislead the principal’s learning algorithm and obtain better outcomes in the future. We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents. Through the design and analysis of minimally reactive bandit algorithms, our reduction trades off the statistical efficiency of the principal’s learning algorithm against its effectiveness in inducing near-best-responses. We apply this framework to Stackelberg security games (SSGs), pricing with unknown demand curve, general finite Stackelberg games, and strategic classification. In each setting, we characterize the type and impact of misspecifications present in near-best responses and develop a learning algorithm robust to such misspecifications. On the way, we improve the state-of-the-art query complexity of learning in SSGs with $n$ targets from $O(n^3)$ to a near-optimal $\widetilde{O}(n)$ by uncovering a fundamental structural property of these games. The latter result is of independent interest beyond learning with non-myopic agents.
nan
Article 556
Title@2025-05-28 (3): When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks?
Title: When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks? | Wann führt Neuroevolution das Verstärkte Lernen in Transfer-Lernaufgaben durch? | 在转让学习任务方面,神经革命何时会超越竞争加强学习? 2505.22696v1 |
Authors: Eleni Nisioti, Joachim Winther Pedersen, Erwan Plantec, Milton L. Montero, Sebastian Risi
The ability to continuously and efficiently transfer skills across tasks is a hallmark of biological intelligence and a long-standing goal in artificial systems. Reinforcement learning (RL), a dominant paradigm for learning in high-dimensional control tasks, is known to suffer from brittleness to task variations and catastrophic forgetting. Neuroevolution (NE) has recently gained attention for its robustness, scalability, and capacity to escape local optima. In this paper, we investigate an understudied dimension of NE: its transfer learning capabilities. To this end, we introduce two benchmarks: a) in stepping gates, neural networks are tasked with emulating logic circuits, with designs that emphasize modular repetition and variation b) ecorobot extends the Brax physics engine with objects such as walls and obstacles and the ability to easily switch between different robotic morphologies. Crucial in both benchmarks is the presence of a curriculum that enables evaluating skill transfer across tasks of increasing complexity. Our empirical analysis shows that NE methods vary in their transfer abilities and frequently outperform RL baselines. Our findings support the potential of NE as a foundation for building more adaptable agents and highlight future challenges for scaling NE to complex, real-world problems.
nan
Article 557
Title@2025-05-28 (3): LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning
Title: LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning | LLM-ODDR: Ein großes Sprachmodell für Joint Order Dispatching und Driver Repositioning | LLM-ODDD:联合调度和司机重新定位大语言示范框架 2505.22695v1 |
Authors: Tengfei Lyu, Siyuan Feng, Hao Liu, Hai Yang
Ride-hailing platforms face significant challenges in optimizing order dispatching and driver repositioning operations in dynamic urban environments. Traditional approaches based on combinatorial optimization, rule-based heuristics, and reinforcement learning often overlook driver income fairness, interpretability, and adaptability to real-world dynamics. To address these gaps, we propose LLM-ODDR, a novel framework leveraging Large Language Models (LLMs) for joint Order Dispatching and Driver Repositioning (ODDR) in ride-hailing services. LLM-ODDR framework comprises three key components: (1) Multi-objective-guided Order Value Refinement, which evaluates orders by considering multiple objectives to determine their overall value; (2) Fairness-aware Order Dispatching, which balances platform revenue with driver income fairness; and (3) Spatiotemporal Demand-Aware Driver Repositioning, which optimizes idle vehicle placement based on historical patterns and projected supply. We also develop JointDR-GPT, a fine-tuned model optimized for ODDR tasks with domain knowledge. Extensive experiments on real-world datasets from Manhattan taxi operations demonstrate that our framework significantly outperforms traditional methods in terms of effectiveness, adaptability to anomalous conditions, and decision interpretability. To our knowledge, this is the first exploration of LLMs as decision-making agents in ride-hailing ODDR tasks, establishing foundational insights for integrating advanced language models within intelligent transportation systems.
nan
Article 558
Title@2025-05-28 (3): Individualised Counterfactual Examples Using Conformal Prediction Intervals
Title: Individualised Counterfactual Examples Using Conformal Prediction Intervals | Individualisierte gegenfaktische Beispiele mit konformen Vorhersageintervallen | 使用非正式预测间隔的个别反事实实例 2505.22326v1 |
Authors: James M. Adams, Gesine Reinert, Lukasz Szpruch, Carsten Maple, Andrew Elliott
Counterfactual explanations for black-box models aim to pr ovide insight into an algorithmic decision to its recipient. For a binary classification problem an individual counterfactual details which features might be changed for the model to infer the opposite class. High-dimensional feature spaces that are typical of machine learning classification models admit many possible counterfactual examples to a decision, and so it is important to identify additional criteria to select the most useful counterfactuals. In this paper, we explore the idea that the counterfactuals should be maximally informative when considering the knowledge of a specific individual about the underlying classifier. To quantify this information gain we explicitly model the knowledge of the individual, and assess the uncertainty of predictions which the individual makes by the width of a conformal prediction interval. Regions of feature space where the prediction interval is wide correspond to areas where the confidence in decision making is low, and an additional counterfactual example might be more informative to an individual. To explore and evaluate our individualised conformal prediction interval counterfactuals (CPICFs), first we present a synthetic data set on a hypercube which allows us to fully visualise the decision boundary, conformal intervals via three different methods, and resultant CPICFs. Second, in this synthetic data set we explore the impact of a single CPICF on the knowledge of an individual locally around the original query. Finally, in both our synthetic data set and a complex real world dataset with a combination of continuous and discrete variables, we measure the utility of these counterfactuals via data augmentation, testing the performance on a held out set.
nan
Article 559
Title@2025-05-28 (3): A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective
Title: A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective | Ein genauerer Blick auf die Erinnerung an Tabular Diffusion Modell: Eine datenzentrische Perspektive | 更仔细地看一看表格传播模型中的记忆化:数据核心视角 2505.22322v1 |
Authors: Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen, Xiaoge Zhang, Kaiyu Tang, Xiao Li, Jing Li
Diffusion models have shown strong performance in generating high-quality tabular data, but they carry privacy risks by reproducing exact training samples. While prior work focuses on dataset-level augmentation to reduce memorization, little is known about which individual samples contribute most. We present the first data-centric study of memorization dynamics in tabular diffusion models. We quantify memorization for each real sample based on how many generated samples are flagged as replicas, using a relative distance ratio. Our empirical analysis reveals a heavy-tailed distribution of memorization counts: a small subset of samples contributes disproportionately to leakage, confirmed via sample-removal experiments. To understand this, we divide real samples into top- and non-top-memorized groups and analyze their training-time behaviors. We track when each sample is first memorized and monitor per-epoch memorization intensity (AUC). Memorized samples are memorized slightly earlier and show stronger signals in early training. Based on these insights, we propose DynamicCut, a two-stage, model-agnostic mitigation method: (a) rank samples by epoch-wise intensity, (b) prune a tunable top fraction, and (c) retrain on the filtered dataset. Across multiple tabular datasets and models, DynamicCut reduces memorization with minimal impact on data diversity and downstream performance. It also complements augmentation-based defenses. Furthermore, DynamicCut enables cross-model transferability: high-ranked samples identified from one model (e.g., a diffusion model) are also effective for reducing memorization when removed from others, such as GANs and VAEs.
nan
Article 560
Title@2025-05-28 (3): Core Context Aware Transformers for Long Context Language Modeling
Title: Core Context Aware Transformers for Long Context Language Modeling | Core Context Aware Transformers für lange Kontext-Sprachenmodellierung | 长语语言建模核心认知变型器 2412.12465v2 |
Authors: Yaofo Chen, Zeng You, Shuhai Zhang, Haokun Li, Yirui Li, Yaowei Wang, Mingkui Tan
Transformer-based Large Language Models (LLMs) have exhibited remarkable success in extensive tasks primarily attributed to self-attention mechanism, which requires a token to consider all preceding tokens as its context to compute attention. However, when the context length L becomes very large (e.g., 128K), the amount of potentially redundant information in the context tends to increase. The redundant context not only hampers the modeling representation performance but also incurs unnecessary computational and storage overhead. In this paper, we propose a plug-and-play Core Context Aware (CCA) Attention for efficient long-context modeling, comprising two complementary modules: 1) Globality-aware pooling module groups input tokens and dynamically compresses each group into one core token based on their significance. In this way, our method automatically focuses and strengthens core context while diminishing redundancy during the learning process, leading to effective long-term dependency modeling. 2) Locality-preserving module incorporates neighboring tokens to preserve local context for detailed representation. Notably, our CCA-Attention is able to replace the self-attention module in existing LLMs with minimal fine-tuning cost. Extensive experimental results show the superiority of our method in both long-context modeling and computational efficiency over state-of-the-art methods.
nan
Article 561
Title@2025-05-28 (3): Copresheaf Topological Neural Networks: A Generalized Deep Learning Framework
Title: Copresheaf Topological Neural Networks: A Generalized Deep Learning Framework | Copresheaf Topologische neurale Netzwerke: Ein generalisiertes Deep Learning Framework | Copresheaf 地形神经网络:普遍深层学习框架 2505.21251v2 |
Authors: Mustafa Hajij, Lennart Bastian, Sarah Osentoski, Hardik Kabaria, John L. Davenport, Sheik Dawood, Balaji Cherukuri, Joseph G. Kocheemoolayil, Nastaran Shahmansouri, Adrian Lew, Theodore Papamarkou, Tolga Birdal
We introduce copresheaf topological neural networks (CTNNs), a powerful and unifying framework that encapsulates a wide spectrum of deep learning architectures, designed to operate on structured data: including images, point clouds, graphs, meshes, and topological manifolds. While deep learning has profoundly impacted domains ranging from digital assistants to autonomous systems, the principled design of neural architectures tailored to specific tasks and data types remains one of the field’s most persistent open challenges. CTNNs address this gap by grounding model design in the language of copresheaves, a concept from algebraic topology that generalizes and subsumes most practical deep learning models in use today. This abstract yet constructive formulation yields a rich design space from which theoretically sound and practically effective solutions can be derived to tackle core challenges in representation learning: long-range dependencies, oversmoothing, heterophily, and non-Euclidean domains. Our empirical results on structured data benchmarks demonstrate that CTNNs consistently outperform conventional baselines, particularly in tasks requiring hierarchical or localized sensitivity. These results underscore CTNNs as a principled, multi-scale foundation for the next generation of deep learning architectures.
nan
Article 562
Title@2025-05-28 (3): If Pigs Could Fly… Can LLMs Logically Reason Through Counterfactuals?
Title: If Pigs Could Fly… Can LLMs Logically Reason Through Counterfactuals? | Wenn Schweine fliegen könnten… können LLMs logischerweise durch Gegenfakten denken? | 如果猪能飞… 2505.22318v1 |
Authors: Ishwar B Balappanawar, Vamshi Krishna Bonagiri, Anish R Joishy, Manas Gaur, Krishnaprasad Thirunarayan, Ponnurangam Kumaraguru
Large Language Models (LLMs) demonstrate impressive reasoning capabilities in familiar contexts, but struggle when the context conflicts with their parametric knowledge. To investigate this phenomenon, we introduce CounterLogic, a dataset containing 1,800 examples across 9 logical schemas, explicitly designed to evaluate logical reasoning through counterfactual (hypothetical knowledge-conflicting) scenarios. Our systematic evaluation of 11 LLMs across 6 different datasets reveals a consistent performance degradation, with accuracies dropping by 27% on average when reasoning through counterfactual information. We propose Self-Segregate, a prompting method enabling metacognitive awareness (explicitly identifying knowledge conflicts) before reasoning. Our method dramatically narrows the average performance gaps from 27% to just 11%, while significantly increasing the overall accuracy (+7.5%). We discuss the implications of these findings and draw parallels to human cognitive processes, particularly on how humans disambiguate conflicting information during reasoning tasks. Our findings offer practical insights for understanding and enhancing LLMs reasoning capabilities in real-world applications, especially where models must logically reason independently of their factual knowledge.
nan
Article 563
Title@2025-05-28 (3): Rethinking BPS: A Utility-Based Evaluation Framework
Title: Rethinking BPS: A Utility-Based Evaluation Framework | Rethinking BPS: Ein Nutzen-basierter Bewertungsrahmen | 重新思考BPS:基于公用事业的评价框架 2505.22316v1 |
Authors: Konrad Özdemir, Lukas Kirchdorfer, Keyvan Amiri Elyasi, Han van der Aa, Heiner Stuckenschmidt
Business process simulation (BPS) is a key tool for analyzing and optimizing organizational workflows, supporting decision-making by estimating the impact of process changes. The reliability of such estimates depends on the ability of a BPS model to accurately mimic the process under analysis, making rigorous accuracy evaluation essential. However, the state-of-the-art approach to evaluating BPS models has two key limitations. First, it treats simulation as a forecasting problem, testing whether models can predict unseen future events. This fails to assess how well a model captures the as-is process, particularly when process behavior changes from train to test period. Thus, it becomes difficult to determine whether poor results stem from an inaccurate model or the inherent complexity of the data, such as unpredictable drift. Second, the evaluation approach strongly relies on Earth Mover’s Distance-based metrics, which can obscure temporal patterns and thus yield misleading conclusions about simulation quality. To address these issues, we propose a novel framework that evaluates simulation quality based on its ability to generate representative process behavior. Instead of comparing simulated logs to future real-world executions, we evaluate whether predictive process monitoring models trained on simulated data perform comparably to those trained on real data for downstream analysis tasks. Empirical results show that our framework not only helps identify sources of discrepancies but also distinguishes between model accuracy and data complexity, offering a more meaningful way to assess BPS quality.
nan
Article 564
Title@2025-05-28 (3): MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Title: MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections | MUDDFormer: Breaking Residual Engpässe in Transformatoren über Multiway Dynamic Dense Connections | MUDDFormer:通过多路动态感应连接在变形器中打破残余瓶颈 2502.12170v2 |
Authors: Da Xiao, Qingye Meng, Shengping Li, Xingyuan Yuan
We propose MUltiway Dynamic Dense (MUDD) connections, a simple yet effective method to address the limitations of residual connections and enhance cross-layer information flow in Transformers. Unlike existing dense connection approaches with static and shared connection weights, MUDD generates connection weights dynamically depending on hidden states at each sequence position and for each decoupled input stream (the query, key, value or residual) of a Transformer block. MUDD connections can be seamlessly integrated into any Transformer architecture to create MUDDFormer. Extensive experiments show that MUDDFormer significantly outperforms Transformers across various model architectures and scales in language modeling, achieving the performance of Transformers trained with 1.8X-2.4X compute. Notably, MUDDPythia-2.8B matches Pythia-6.9B in pretraining ppl and downstream tasks and even rivals Pythia-12B in five-shot settings, while adding only 0.23% parameters and 0.4% computation. Code in JAX and PyTorch and pre-trained models are available at https://github.com/Caiyun-AI/MUDDFormer .
nan
Article 565
Title@2025-05-28 (3): From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization
Title: From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization | Von Dormant zu Gelöscht: Tamper-Resistent Unlearning durch Gewicht-Raum-Regularisierung | 从杜尔曼特移到删除:通过宽空正规化,让塔帕-较远摆脱学习 2505.22310v1 |
Authors: Shoaib Ahmed Siddiqui, Adrian Weller, David Krueger, Gintare Karolina Dziugaite, Michael Curtis Mozer, Eleni Triantafillou
Recent unlearning methods for LLMs are vulnerable to relearning attacks: knowledge believed-to-be-unlearned re-emerges by fine-tuning on a small set of (even seemingly-unrelated) examples. We study this phenomenon in a controlled setting for example-level unlearning in vision classifiers. We make the surprising discovery that forget-set accuracy can recover from around 50% post-unlearning to nearly 100% with fine-tuning on just the retain set – i.e., zero examples of the forget set. We observe this effect across a wide variety of unlearning methods, whereas for a model retrained from scratch excluding the forget set (gold standard), the accuracy remains at 50%. We observe that resistance to relearning attacks can be predicted by weight-space properties, specifically, $L_2$-distance and linear mode connectivity between the original and the unlearned model. Leveraging this insight, we propose a new class of methods that achieve state-of-the-art resistance to relearning attacks.
nan
Article 566
Title@2025-05-28 (3): FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
Title: FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration | FireQ: Schnelle INT4-FP8-Kernel- und RoPE-gestützte Quantisierung für LLM-Inferenzbeschleunigung | 消防:快速INT4-FFP8 内核和ROPE-感知的LLM 推推加速量 2505.20839v2 |
Authors: Daehyeon Baek, Jieun Choi, Jimyoung Son, Kyungmin Bin, Seungbeom Choi, Kihyo Moon, Minsung Jang, Hyojung Lee
As large language models become increasingly prevalent, memory bandwidth constraints significantly limit inference throughput, motivating post-training quantization (PTQ). In this paper, we propose FireQ, a co-designed PTQ framework and an INT4-FP8 matrix multiplication kernel that accelerates LLM inference across all linear layers. Specifically, FireQ quantizes linear layer weights and key-values to INT4, and activations and queries to FP8, significantly enhancing throughput. Additionally, we introduce a three-stage pipelining for the prefill phase, which modifies the FlashAttention-3 kernel, effectively reducing time-to-first-token in the prefill phase. To minimize accuracy loss from quantization, we develop novel outlier smoothing techniques tailored separately for linear and attention layers. In linear layers, we explicitly use per-tensor scaling to prevent underflow caused by the FP8 quantization scaling factor of INT4 quantization, and channel-wise scaling to compensate for coarse granularity of INT4. In attention layers, we address quantization challenges posed by rotary positional embeddings (RoPE) by combining pre-RoPE and post-RoPE scaling strategies. FireQ significantly outperforms state-of-the-art methods, achieving 1.68x faster inference in feed-forward network layers on Llama2-7B and 1.26x faster prefill phase performance on Llama3-8B compared to QServe, with negligible accuracy loss.
nan
Article 567
Title@2025-05-28 (3): Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning
Title: Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning | Transformer vorgebildet auf verfahrenstechnische Daten enthalten modulare Strukturen für algorithmische Vernunft | 在包含用于算法理由的模块结构的程序性数据方面受过预先培训的变异器 2505.22308v1 |
Authors: Zachary Shinnick, Liangze Jiang, Hemanth Saratchandran, Anton van den Hengel, Damien Teney
Pretraining on large, semantically rich datasets is key for developing language models. Surprisingly, recent studies have shown that even synthetic data, generated procedurally through simple semantic-free algorithms, can yield some of the same benefits as natural language pretraining. It is unclear what specific capabilities such simple synthetic data instils in a model, where these capabilities reside in the architecture, and how they manifest within its weights. In this short paper, we identify several beneficial forms of procedural data, together with specific algorithmic reasoning skills that improve in small transformers. Our core finding is that different procedural rules instil distinct but complementary inductive structures in the model. With extensive ablations and partial-transfer experiments, we discover that these structures reside in different parts of the model. Attention layers often carry the most transferable information, but some pretraining rules impart useful structure to MLP blocks instead. Most interestingly, the structures induced by multiple rules can be composed to jointly reinforce multiple capabilities. These results suggest an exciting possibility of disentangling the acquisition of knowledge from reasoning in language models, with the goal of improving their robustness and data efficiency.
nan
Article 568
Title@2025-05-28 (3): Risk-Informed Diffusion Transformer for Long-Tail Trajectory Prediction in the Crash Scenario
Title: Risk-Informed Diffusion Transformer for Long-Tail Trajectory Prediction in the Crash Scenario | Risiko-informierter Diffusionstransformator für langspurige Trajektorien-Vorhersage im Crash-Szenario | 崩溃设想情景中长帆轨迹预测风险化传导变异器 2501.16349v2 |
Authors: Junlan Chen, Pei Liu, Zihao Zhang, Hongyi Zhao, Yufei Ji, Ziyuan Pu
Trajectory prediction methods have been widely applied in autonomous driving technologies. Although the overall performance accuracy of trajectory prediction is relatively high, the lack of trajectory data in critical scenarios in the training data leads to the long-tail phenomenon. Normally, the trajectories of the tail data are more critical and more difficult to predict and may include rare scenarios such as crashes. To solve this problem, we extracted the trajectory data from real-world crash scenarios, which contain more long-tail data. Meanwhile, based on the trajectory data in this scenario, we integrated graph-based risk information and diffusion with transformer and proposed the Risk-Informed Diffusion Transformer (RI-DiT) trajectory prediction method. Extensive experiments were conducted on trajectory data in the real-world crash scenario, and the results show that the algorithm we proposed has good performance. When predicting the data of the tail 10\% (Top 10\%), the minADE and minFDE indicators are 0.016/2.667 m. At the same time, we showed the trajectory conditions of different long-tail distributions. The distribution of trajectory data is closer to the tail, the less smooth the trajectory is. Through the trajectory data in real-world crash scenarios, Our work expands the methods to overcome the long-tail challenges in trajectory prediction. Our method, RI-DiT, integrates inverse time to collision (ITTC) and the feature of traffic flow, which can predict long-tail trajectories more accurately and improve the safety of autonomous driving systems.
nan
Article 569
Title@2025-05-28 (3): Robustness and Cybersecurity in the EU Artificial Intelligence Act
Title: Robustness and Cybersecurity in the EU Artificial Intelligence Act | Robustheit und Cybersicherheit im EU-Gesetz über künstliche Intelligenz | 《欧盟人工情报法》中的强力和网络安全 2502.16184v2 |
Authors: Henrik Nolte, Miriam Rateike, Michèle Finck
The EU Artificial Intelligence Act (AIA) establishes different legal principles for different types of AI systems. While prior work has sought to clarify some of these principles, little attention has been paid to robustness and cybersecurity. This paper aims to fill this gap. We identify legal challenges and shortcomings in provisions related to robustness and cybersecurity for high-risk AI systems(Art. 15 AIA) and general-purpose AI models (Art. 55 AIA). We show that robustness and cybersecurity demand resilience against performance disruptions. Furthermore, we assess potential challenges in implementing these provisions in light of recent advancements in the machine learning (ML) literature. Our analysis informs efforts to develop harmonized standards, guidelines by the European Commission, as well as benchmarks and measurement methodologies under Art. 15(2) AIA. With this, we seek to bridge the gap between legal terminology and ML research, fostering a better alignment between research and implementation efforts.
nan
Article 570
Title@2025-05-28 (3): Versatile Cardiovascular Signal Generation with a Unified Diffusion Transformer
Title: Versatile Cardiovascular Signal Generation with a Unified Diffusion Transformer | Vielseitige kardiovaskuläre Signalgenerierung mit einem Unified Diffusion Transformer | 具有统一扩散变异器的心血管心血管信号生成 2505.22306v1 |
Authors: Zehua Chen, Yuyang Miao, Liyuan Wang, Luyun Fan, Danilo P. Mandic, Jun Zhu
Cardiovascular signals such as photoplethysmography (PPG), electrocardiography (ECG), and blood pressure (BP) are inherently correlated and complementary, together reflecting the health of cardiovascular system. However, their joint utilization in real-time monitoring is severely limited by diverse acquisition challenges from noisy wearable recordings to burdened invasive procedures. Here we propose UniCardio, a multi-modal diffusion transformer that reconstructs low-quality signals and synthesizes unrecorded signals in a unified generative framework. Its key innovations include a specialized model architecture to manage the signal modalities involved in generation tasks and a continual learning paradigm to incorporate varying modality combinations. By exploiting the complementary nature of cardiovascular signals, UniCardio clearly outperforms recent task-specific baselines in signal denoising, imputation, and translation. The generated signals match the performance of ground-truth signals in detecting abnormal health conditions and estimating vital signs, even in unseen domains, while ensuring interpretability for human experts. These advantages position UniCardio as a promising avenue for advancing AI-assisted healthcare.
nan
Article 571
Title@2025-05-28 (3): LLäMmlein: Compact and Competitive German-Only Language Models from Scratch
Title: LLäMmlein: Compact and Competitive German-Only Language Models from Scratch | LLäMmlein: Kompakte und wettbewerbsfähige deutschsprachige Sprachmodelle von Scratch | LläMmlein:来自斯克拉奇的契约和竞争性独德语言模式 2411.11171v4 |
Authors: Jan Pfister, Julia Wunderle, Andreas Hotho
We create two German-only decoder models, LL"aMmlein 120M and 1B, transparently from scratch and publish them, along with the training data, for the German NLP research community to use. The model training involved several key steps, including extensive data preprocessing, the creation of a custom German tokenizer, the training itself, as well as the evaluation of the final models on various benchmarks. Throughout the training process, multiple checkpoints were saved and analyzed using the SuperGLEBer benchmark to monitor the models’ learning dynamics. Compared to state-of-the-art models on the SuperGLEBer benchmark, both LL"aMmlein models performed competitively, consistently matching or surpassing models with similar parameter sizes. The results show that the models’ quality scales with size as expected, but performance improvements on some tasks plateaued early, offering valuable insights into resource allocation for future model development.
nan
Article 572
Title@2025-05-28 (3): Diss-l-ECT: Dissecting Graph Data with Local Euler Characteristic Transforms
Title: Diss-l-ECT: Dissecting Graph Data with Local Euler Characteristic Transforms | Diss-l-ECT: Entschlüsselung von Graphendaten mit lokalen Euler-Charakteristik-Transformationen | Diss- l- ECT: 用本地电磁特征变换解析图表数据 2410.02622v2 |
Authors: Julius von Rohrscheidt, Bastian Rieck
The Euler Characteristic Transform (ECT) is an efficiently-computable geometrical-topological invariant that characterizes the global shape of data. In this paper, we introduce the Local Euler Characteristic Transform ($\ell$-ECT), a novel extension of the ECT particularly designed to enhance expressivity and interpretability in graph representation learning. Unlike traditional Graph Neural Networks (GNNs), which may lose critical local details through aggregation, the $\ell$-ECT provides a lossless representation of local neighborhoods. This approach addresses key limitations in GNNs by preserving nuanced local structures while maintaining global interpretability. Moreover, we construct a rotation-invariant metric based on $\ell$-ECTs for spatial alignment of data spaces. Our method exhibits superior performance compared to standard GNNs on a variety of node-classification tasks, while also offering theoretical guarantees that demonstrate its effectiveness.
nan
Article 573
Title@2025-05-28 (3): 360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training
Title: 360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training | 360-LlaMA-Fabrik: Plug & Play-Sequenz-Parallelität für langes Nachtraining | 360-LLamaMA-Factory: 长期培训之后的插件和播放序列平行主义 2505.22296v1 |
Authors: Haosheng Zou, Xiaowei Lv, Shousheng Jia, Xiangzheng Zhang
Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies’ training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.
nan
Article 574
Title@2025-05-28 (3): Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
Title: Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Light-R1: Curriculum SFT, DPO und RL für Long COT aus Scratch und darüber hinaus | Light-R1:SFT、DPO和RL课程,用于Scratch及以后的长期COT 2503.10460v4 |
Authors: Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang
This paper introduces Light-R1, an open-source suite for training long reasoning models using reproducible and cost-effective methodology. Given the proprietary nature of data used in the DeepSeek-R1 series, we develop an alternative approach leveraging exclusively public data and models. Our curriculum training progressively increases data difficulty, combined with multi-staged post-training. Our Light-R1-32B model, trained from Qwen2.5-32B-Instruct, outperforms DeepSeek-R1-Distill-Qwen-32B in math reasoning. Experimental results show that this curriculum approach becomes more effective when distinct, diverse datasets are available for different training stages: fine-tuning DeepSeek-R1-Distilled models (pre-tuned by DeepSeek team on proprietary data) with 3,000 challenging examples from our curriculum dataset yielded state-of-the-art 7B and 14B models, while the 32B model, Light-R1-32B-DS performed comparably to QwQ-32B and DeepSeek-R1. Furthermore, we extend our work by applying GRPO on long reasoning models. Our final Light-R1-14B-DS achieves SOTA performance among 14B models in math, with AIME24 & 25 scores of 74.0 and 60.2 respectively, surpassing many 32B models and DeepSeek-R1-Distill-Llama-70B. Despite math-focused training, Light-R1-14B-DS demonstrates strong cross-domain generalization. Light-R1 represents a significant advancement in making sophisticated reasoning models more accessible and implementable in real-world applications. Our models, training data and code have been made available at https://github.com/Qihoo360/Light-R1.
nan
Article 575
Title@2025-05-28 (3): MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning
Title: MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning | MoRE: Eine Mischung aus Low-Rank Experten für adaptives Multi-Task Learning | MoRE: 适应性多任务学习低级专家混合组合 2505.22694v1 |
Authors: Dacao Zhang, Kun Zhang, Shimao Chu, Le Wu, Xin Li, Si Wei
With the rapid development of Large Language Models (LLMs), Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant attention, which aims to achieve efficient fine-tuning of LLMs with fewer parameters. As a representative PEFT method, Low-Rank Adaptation (LoRA) introduces low-rank matrices to approximate the incremental tuning parameters and achieves impressive performance over multiple scenarios. After that, plenty of improvements have been proposed for further improvement. However, these methods either focus on single-task scenarios or separately train multiple LoRA modules for multi-task scenarios, limiting the efficiency and effectiveness of LoRA in multi-task scenarios. To better adapt to multi-task fine-tuning, in this paper, we propose a novel Mixture of Low-Rank Experts (MoRE) for multi-task PEFT. Specifically, instead of using an individual LoRA for each task, we align different ranks of LoRA module with different tasks, which we named low-rank experts. Moreover, we design a novel adaptive rank selector to select the appropriate expert for each task. By jointly training low-rank experts, MoRE can enhance the adaptability and efficiency of LoRA in multi-task scenarios. Finally, we conduct extensive experiments over multiple multi-task benchmarks along with different LLMs to verify model performance. Experimental results demonstrate that compared to traditional LoRA and its variants, MoRE significantly improves the performance of LLMs in multi-task scenarios and incurs no additional inference cost. We also release the model and code to facilitate the community.
nan
Article 576
Title@2025-05-28 (3): Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling
Title: Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling | Das Unlösbare neu denken: Wenn In-Context Search Test-Time Scaling trifft | 重新思考无法解答的问题: 当 In-Ctext 搜索遇到测试时间缩放时 2505.22290v1 |
Authors: Fanzeng Xia, Yidong Luo, Tinko Sebastian Bartels, Yaqi Xu, Tongxin Li
Recent research has highlighted that Large Language Models (LLMs), even when trained to generate extended long reasoning steps, still face significant challenges on hard reasoning problems. However, much of the existing literature relies on direct prompting with simple in-context learning examples for evaluation, which largely overlooks advanced techniques to elicit LLMs’ deliberate reasoning before drawing conclusions that LLMs hit a performance ceiling. In this paper, we systematically explore the combined potential of in-context search and test-time scaling on super hard reasoning tasks. We find that by employing advanced in-context search prompting to LLMs augmented with internal scaling, one can achieve transformative performance breakthroughs on tasks previously deemed “unsolvable” (e.g., reported success rates below 5%). We provide both empirical results and theoretical analysis of how this combination can unleash LLM reasoning capabilities: i) Empirically, on controlled NP-hard tasks and complex real-world planning benchmarks, our approach achieves up to a 30x improvement in success rates compared to previously reported results without any external mechanisms; ii) Theoretically, we show that in-context search prompting, when combined with internal scaling, significantly extends the complexity class of solvable reasoning problems. These findings challenge prevailing assumptions about the limitations of LLMs on complex tasks, indicating that current evaluation paradigms systematically underestimate their true potential. Our work calls for a critical reassessment of how LLM reasoning is benchmarked and a more robust evaluation strategy that fully captures the true capabilities of contemporary LLMs, which can lead to a better understanding of their operational reasoning boundaries in real-world deployments.
nan
Article 577
Title@2025-05-28 (3): A Variational Perspective on Generative Protein Fitness Optimization
Title: A Variational Perspective on Generative Protein Fitness Optimization | Eine abwechslungsreiche Perspektive auf generative Protein-Fitness-Optimierung | 关于最优化的生质蛋白质健身的变异视角 2501.19200v2 |
Authors: Lea Bogensperger, Dominik Narnhofer, Ahmed Allam, Konrad Schindler, Michael Krauthammer
The goal of protein fitness optimization is to discover new protein variants with enhanced fitness for a given use. The vast search space and the sparsely populated fitness landscape, along with the discrete nature of protein sequences, pose significant challenges when trying to determine the gradient towards configurations with higher fitness. We introduce Variational Latent Generative Protein Optimization (VLGPO), a variational perspective on fitness optimization. Our method embeds protein sequences in a continuous latent space to enable efficient sampling from the fitness distribution and combines a (learned) flow matching prior over sequence mutations with a fitness predictor to guide optimization towards sequences with high fitness. VLGPO achieves state-of-the-art results on two different protein benchmarks of varying complexity. Moreover, the variational design with explicit prior and likelihood functions offers a flexible plug-and-play framework that can be easily customized to suit various protein design tasks.
nan
Article 578
Title@2025-05-28 (3): Random Feature Representation Boosting
Title: Random Feature Representation Boosting | Zufällige Merkmalsdarstellung steigert sich | 随机特性显示促进 2501.18283v3 |
Authors: Nikita Zozoulenko, Thomas Cass, Lukas Gonon
We introduce Random Feature Representation Boosting (RFRBoost), a novel method for constructing deep residual random feature neural networks (RFNNs) using boosting theory. RFRBoost uses random features at each layer to learn the functional gradient of the network representation, enhancing performance while preserving the convex optimization benefits of RFNNs. In the case of MSE loss, we obtain closed-form solutions to greedy layer-wise boosting with random features. For general loss functions, we show that fitting random feature residual blocks reduces to solving a quadratically constrained least squares problem. Through extensive numerical experiments on tabular datasets for both regression and classification, we show that RFRBoost significantly outperforms RFNNs and end-to-end trained MLP ResNets in the small- to medium-scale regime where RFNNs are typically applied. Moreover, RFRBoost offers substantial computational benefits, and theoretical guarantees stemming from boosting theory.
nan
Article 579
Title@2025-05-28 (3): Sample Efficient Robot Learning in Supervised Effect Prediction Tasks
Title: Sample Efficient Robot Learning in Supervised Effect Prediction Tasks | Beispiel Effizientes Roboter-Lernen in überwachten Effekt-Vorhersage-Aufgaben | 在监督效应预测任务中提高机器人学习效率 2412.02331v2 |
Authors: Mehmet Arda Eren, Erhan Oztop
In self-supervised robotic learning, agents acquire data through active interaction with their environment, incurring costs such as energy use, human oversight, and experimental time. To mitigate these, sample-efficient exploration is essential. While intrinsic motivation (IM) methods like learning progress (LP) are widely used in robotics, and active learning (AL) is well established for classification in machine learning, few frameworks address continuous, high-dimensional regression tasks typical of world model learning. We propose MUSEL (Model Uncertainty for Sample-Efficient Learning), a novel AL framework tailored for regression tasks in robotics, such as action-effect prediction. MUSEL introduces a model uncertainty metric that combines total predictive uncertainty, learning progress, and input diversity to guide data acquisition. We validate our approach using a Stochastic Variational Deep Kernel Learning (SVDKL) model in two robotic tabletop tasks. Experimental results demonstrate that MUSEL improves both learning accuracy and sample efficiency, validating its effectiveness in learning action effects and selecting informative samples.
nan
Article 580
Title@2025-05-28 (3): From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning
Title: From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning | Von den Kerneln zu den Features: Eine Multi-Scale Adaptive Theorie des Feature Learning | 从核心到地貌特征:多尺度适应性地貌学习理论 2502.03210v2 |
Authors: Noa Rubin, Kirsten Fischer, Javed Lindner, David Dahmen, Inbar Seroussi, Zohar Ringel, Michael Krämer, Moritz Helias
Feature learning in neural networks is crucial for their expressive power and inductive biases, motivating various theoretical approaches. Some approaches describe network behavior after training through a change in kernel scale from initialization, resulting in a generalization power comparable to a Gaussian process. Conversely, in other approaches training results in the adaptation of the kernel to the data, involving directional changes to the kernel. The relationship and respective strengths of these two views have so far remained unresolved. This work presents a theoretical framework of multi-scale adaptive feature learning bridging these two views. Using methods from statistical mechanics, we derive analytical expressions for network output statistics which are valid across scaling regimes and in the continuum between them. A systematic expansion of the network’s probability distribution reveals that mean-field scaling requires only a saddle-point approximation, while standard scaling necessitates additional correction terms. Remarkably, we find across regimes that kernel adaptation can be reduced to an effective kernel rescaling when predicting the mean network output in the special case of a linear network. However, for linear and non-linear networks, the multi-scale adaptive approach captures directional feature learning effects, providing richer insights than what could be recovered from a rescaling of the kernel alone.
nan
Article 581
Title@2025-05-28 (3): Zero-Shot Mono-to-Binaural Speech Synthesis
Title: Zero-Shot Mono-to-Binaural Speech Synthesis | Null-Schuss-Mono-bis-Binaural-Sprachsynthese | 零热单声词合成 2412.08356v2 |
Authors: Alon Levkovitch, Julian Salazar, Soroosh Mariooryad, RJ Skerry-Ryan, Nadav Bar, Bastiaan Kleijn, Eliya Nachmani
We present ZeroBAS, a neural method to synthesize binaural audio from monaural audio recordings and positional information without training on any binaural data. To our knowledge, this is the first published zero-shot neural approach to mono-to-binaural audio synthesis. Specifically, we show that a parameter-free geometric time warping and amplitude scaling based on source location suffices to get an initial binaural synthesis that can be refined by iteratively applying a pretrained denoising vocoder. Furthermore, we find this leads to generalization across room conditions, which we measure by introducing a new dataset, TUT Mono-to-Binaural, to evaluate state-of-the-art monaural-to-binaural synthesis methods on unseen conditions. Our zero-shot method is perceptually on-par with the performance of supervised methods on the standard mono-to-binaural dataset, and even surpasses them on our out-of-distribution TUT Mono-to-Binaural dataset. Our results highlight the potential of pretrained generative audio models and zero-shot learning to unlock robust binaural audio synthesis.
nan
Article 582
Title@2025-05-28 (3): Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration
Title: Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | 利用语言代理框架中的双重进程理论促进实时同时人类-AI合作 2502.11882v5 |
Authors: Shao Zhang, Xihuai Wang, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, Weinan Zhang, Xinbing Wang, Ying Wen
Agents built on large language models (LLMs) have excelled in turn-by-turn human-AI collaboration but struggle with simultaneous tasks requiring real-time interaction. Latency issues and the challenge of inferring variable human strategies hinder their ability to make autonomous decisions without explicit instructions. Through experiments with current independent System 1 and System 2 methods, we validate the necessity of using Dual Process Theory (DPT) in real-time tasks. We propose DPT-Agent, a novel language agent framework that integrates System 1 and System 2 for efficient real-time simultaneous human-AI collaboration. DPT-Agent’s System 1 uses a Finite-state Machine (FSM) and code-as-policy for fast, intuitive, and controllable decision-making. DPT-Agent’s System 2 integrates Theory of Mind (ToM) and asynchronous reflection to infer human intentions and perform reasoning-based autonomous decisions. We demonstrate the effectiveness of DPT-Agent through further experiments with rule-based agents and human collaborators, showing significant improvements over mainstream LLM-based frameworks. DPT-Agent can effectively help LLMs convert correct slow thinking and reasoning into executable actions, thereby improving performance. To the best of our knowledge, DPT-Agent is the first language agent framework that achieves successful real-time simultaneous human-AI collaboration autonomously. Code of DPT-Agent can be found in https://github.com/sjtu-marl/DPT-Agent.
nan
Article 583
Title@2025-05-28 (3): TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Title: TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup | TransMLA: Migration von GQA-Modellen zu MLA mit voller DeepSeek-Kompatibilität und Speedup | TransMLA:将GQA模型迁移到具有全深搜索兼容性和加速性的司法协助模式 2502.07864v4 |
Authors: Fanxu Meng, Pingzhi Tang, Zengwei Yao, Xing Sun, Muhan Zhang
In this paper, we present TransMLA, a framework that seamlessly converts any GQA-based pre-trained model into an MLA-based model. Our approach enables direct compatibility with DeepSeek’s codebase, allowing these models to fully leverage DeepSeek-specific optimizations such as vLLM and SGlang. By compressing 93% of the KV cache in LLaMA-2-7B, TransMLA achieves a 10.6x inference speedup at an 8K context length while preserving meaningful output quality. Additionally, the model requires only 6 billion tokens for fine-tuning to regain performance on par with the original across multiple benchmarks. TransMLA offers a practical solution for migrating GQA-based models to the MLA structure. When combined with DeepSeek’s advanced features, such as FP8 quantization and Multi-Token Prediction, even greater inference acceleration can be realized.
nan
Article 584
Title@2025-05-28 (3): Full Domain Analysis in Fluid Dynamics
Title: Full Domain Analysis in Fluid Dynamics | Vollständige Domänenanalyse in Fluiddynamik | 流体动态全域分析 2505.22275v1 |
Authors: Alexander Hagg, Adam Gaier, Dominik Wilde, Alexander Asteroth, Holger Foysi, Dirk Reith
Novel techniques in evolutionary optimization, simulation and machine learning allow for a broad analysis of domains like fluid dynamics, in which computation is expensive and flow behavior is complex. Under the term of full domain analysis we understand the ability to efficiently determine the full space of solutions in a problem domain, and analyze the behavior of those solutions in an accessible and interactive manner. The goal of full domain analysis is to deepen our understanding of domains by generating many examples of flow, their diversification, optimization and analysis. We define a formal model for full domain analysis, its current state of the art, and requirements of subcomponents. Finally, an example is given to show what we can learn by using full domain analysis. Full domain analysis, rooted in optimization and machine learning, can be a helpful tool in understanding complex systems in computational physics and beyond.
nan
Article 585
Title@2025-05-28 (3): EventFlow: Forecasting Temporal Point Processes with Flow Matching
Title: EventFlow: Forecasting Temporal Point Processes with Flow Matching | EventFlow: Vorhersage von zeitlichen Punktprozessen mit Flow Matching | 事件:预测与流动匹配的时点进程 2410.07430v2 |
Authors: Gavin Kerrigan, Kai Nelson, Padhraic Smyth
Continuous-time event sequences, in which events occur at irregular intervals, are ubiquitous across a wide range of industrial and scientific domains. The contemporary modeling paradigm is to treat such data as realizations of a temporal point process, and in machine learning it is common to model temporal point processes in an autoregressive fashion using a neural network. While autoregressive models are successful in predicting the time of a single subsequent event, their performance can degrade when forecasting longer horizons due to cascading errors and myopic predictions. We propose EventFlow, a non-autoregressive generative model for temporal point processes. The model builds on the flow matching framework in order to directly learn joint distributions over event times, side-stepping the autoregressive process. EventFlow is simple to implement and achieves a 20%-53% lower error than the nearest baseline on standard TPP benchmarks while simultaneously using fewer model calls at sampling time.
nan
Article 586
Title@2025-05-28 (3): Reward Generalization in RLHF: A Topological Perspective
Title: Reward Generalization in RLHF: A Topological Perspective | Lohnverallgemeinerung in RLHF: Eine topologische Perspektive | RLHF的奖励普遍化:地形学观点 2402.10184v7 |
Authors: Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang
Existing alignment methods share a common topology of information flow, where reward information is collected from humans, modeled with preference learning, and used to tune language models. However, this shared topology has not been systematically characterized, nor have its alternatives been thoroughly explored, leaving the problems of low data efficiency and unreliable generalization unaddressed. As a solution, we introduce a theory of reward generalization in reinforcement learning from human feedback (RLHF), focusing on the topology of information flow at both macro and micro levels. At the macro level, we portray the RLHF information flow as an autoencoding process over behavior distributions, formalizing the RLHF objective of distributional consistency between human preference and model behavior. At the micro level, we present induced Bayesian networks to model the impact of dataset topologies on reward generalization. Combining analysis on both levels, we propose reward modeling from tree-structured preference information. It is shown to reduce reward uncertainty by up to $\Theta(\log n/\log\log n)$ times compared to baselines, where $n$ is the dataset size. Validation on three NLP tasks shows that it achieves an average win rate of 65% against baselines, thus improving reward generalization for free via topology design, while reducing the amount of data requiring annotation.
nan
Article 587
Title@2025-05-28 (3): A Novel Characterization of the Population Area Under the Risk Coverage Curve (AURC) and Rates of Finite Sample Estimators
Title: A Novel Characterization of the Population Area Under the Risk Coverage Curve (AURC) and Rates of Finite Sample Estimators | Eine neuartige Charakterisierung des Populationsgebiets unter der Risikodeckungskurve (AURC) und Raten von Finite Sample-Schätzern | 风险覆盖曲线下人口区的新特点和有限抽样估计率 2410.15361v3 |
Authors: Han Zhou, Jordy Van Landeghem, Teodora Popordanoska, Matthew B. Blaschko
The selective classifier (SC) has been proposed for rank based uncertainty thresholding, which could have applications in safety critical areas such as medical diagnostics, autonomous driving, and the justice system. The Area Under the Risk-Coverage Curve (AURC) has emerged as the foremost evaluation metric for assessing the performance of SC systems. In this work, we present a formal statistical formulation of population AURC, presenting an equivalent expression that can be interpreted as a reweighted risk function. Through Monte Carlo methods, we derive empirical AURC plug-in estimators for finite sample scenarios. The weight estimators associated with these plug-in estimators are shown to be consistent, with low bias and tightly bounded mean squared error (MSE). The plug-in estimators are proven to converge at a rate of $\mathcal{O}(\sqrt{\ln(n)/n})$ demonstrating statistical consistency. We empirically validate the effectiveness of our estimators through experiments across multiple datasets, model architectures, and confidence score functions (CSFs), demonstrating consistency and effectiveness in fine-tuning AURC performance.
nan
Article 588
Title@2025-05-28 (3): Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations
Title: Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations | Verbesserung der regelbasierten Reasoning in LLMs mit neurosymbolischen Darstellungen | 改进使用新阳性表示法的LLM中基于规则的理据 2502.01657v3 |
Authors: Varun Dhanraj, Chris Eliasmith
Large language models (LLMs) continue to face challenges in reliably solving reasoning tasks, particularly those that require precise rule following, as often found in mathematical reasoning. This paper introduces a novel neurosymbolic method that improves LLM reasoning by encoding hidden states into neurosymbolic vectors, enabling problem-solving within a neurosymbolic vector space. The results are decoded and merged with the original hidden state, significantly boosting the model’s performance on numerical reasoning tasks. By offloading computation through neurosymbolic representations, this method enhances efficiency, reliability, and interpretability. Experimental results demonstrate an average of 88.6% lower cross-entropy loss and 15.4 times more problems correctly solved on a suite of mathematical reasoning tasks compared to chain-of-thought prompting and supervised fine-tuning (LoRA), without degrading performance on other tasks. We make our code available at: https://github.com/vdhanraj/Neurosymbolic-LLM.
nan
Article 589
Title@2025-05-28 (3): Training on Plausible Counterfactuals Removes Spurious Correlations
Title: Training on Plausible Counterfactuals Removes Spurious Correlations | Training auf Plausible Counterfactals entfernt spurlose Korrelationen | 关于可视反事实消除污损的培训 2505.16583v3 |
Authors: Shpresim Sadiku, Kartikeya Chitranshi, Hiroshi Kera, Sebastian Pokutta
Plausible counterfactual explanations (p-CFEs) are perturbations that minimally modify inputs to change classifier decisions while remaining plausible under the data distribution. In this study, we demonstrate that classifiers can be trained on p-CFEs labeled with induced \emph{incorrect} target classes to classify unperturbed inputs with the original labels. While previous studies have shown that such learning is possible with adversarial perturbations, we extend this paradigm to p-CFEs. Interestingly, our experiments reveal that learning from p-CFEs is even more effective: the resulting classifiers achieve not only high in-distribution accuracy but also exhibit significantly reduced bias with respect to spurious correlations.
nan
Article 590
Title@2025-05-28 (3): LiDAR Based Semantic Perception for Forklifts in Outdoor Environments
Title: LiDAR Based Semantic Perception for Forklifts in Outdoor Environments | LiDAR basierte semantische Wahrnehmung für Gabelstapler im Freien | 室外环境中叉车使用基于 LiDAR 的语义感 2505.22258v1 |
Authors: Benjamin Serfling, Hannes Reichert, Lorenzo Bayerlein, Konrad Doll, Kati Radkhah-Lens
In this study, we present a novel LiDAR-based semantic segmentation framework tailored for autonomous forklifts operating in complex outdoor environments. Central to our approach is the integration of a dual LiDAR system, which combines forward-facing and downward-angled LiDAR sensors to enable comprehensive scene understanding, specifically tailored for industrial material handling tasks. The dual configuration improves the detection and segmentation of dynamic and static obstacles with high spatial precision. Using high-resolution 3D point clouds captured from two sensors, our method employs a lightweight yet robust approach that segments the point clouds into safety-critical instance classes such as pedestrians, vehicles, and forklifts, as well as environmental classes such as driveable ground, lanes, and buildings. Experimental validation demonstrates that our approach achieves high segmentation accuracy while satisfying strict runtime requirements, establishing its viability for safety-aware, fully autonomous forklift navigation in dynamic warehouse and yard environments.
nan
Article 591
Title@2025-05-28 (3): Something’s Fishy In The Data Lake: A Critical Re-evaluation of Table Union Search Benchmarks
Title: Something’s Fishy In The Data Lake: A Critical Re-evaluation of Table Union Search Benchmarks | Irgendetwas ist Fishy In The Data Lake: Eine kritische Neubewertung der Tabelle Union Suche Benchmarks | “数据湖中的鱼:对表格联合搜索基准的重要重新评估” 2505.21329v2 |
Authors: Allaa Boutaleb, Bernd Amann, Hubert Naacke, Rafael Angarita
Recent table representation learning and data discovery methods tackle table union search (TUS) within data lakes, which involves identifying tables that can be unioned with a given query table to enrich its content. These methods are commonly evaluated using benchmarks that aim to assess semantic understanding in real-world TUS tasks. However, our analysis of prominent TUS benchmarks reveals several limitations that allow simple baselines to perform surprisingly well, often outperforming more sophisticated approaches. This suggests that current benchmark scores are heavily influenced by dataset-specific characteristics and fail to effectively isolate the gains from semantic understanding. To address this, we propose essential criteria for future benchmarks to enable a more realistic and reliable evaluation of progress in semantic table union search.
nan
Article 592
Title@2025-05-28 (3): Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
Title: Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training | Revisiting Group Relative Policy Optimization: Einblicke in die On-Policy- und Off-Policy-Schulung | 重新审视小组相对政策优化:对政策和非政策培训的深入了解 2505.22257v1 |
Authors: Youssef Mroueh, Nicolas Dupuis, Brian Belgodere, Apoorva Nitsure, Mattia Rigotti, Kristjan Greenewald, Jiri Navratil, Jerret Ross, Jesus Rios
We revisit Group Relative Policy Optimization (GRPO) in both on-policy and off-policy optimization regimes. Our motivation comes from recent work on off-policy Proximal Policy Optimization (PPO), which improves training stability, sampling efficiency, and memory usage. In addition, a recent analysis of GRPO suggests that estimating the advantage function with off-policy samples could be beneficial. Building on these observations, we adapt GRPO to the off-policy setting. We show that both on-policy and off-policy GRPO objectives yield an improvement in the reward. This result motivates the use of clipped surrogate objectives in the off-policy version of GRPO. We then compare the empirical performance of reinforcement learning with verifiable rewards in post-training using both GRPO variants. Our results show that off-policy GRPO either significantly outperforms or performs on par with its on-policy counterpart.
nan
Article 593
Title@2025-05-28 (3): Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Title: Train Sparse Autoencoders Efficiently by Utilizing Features Correlation | Bahnsparse Autoencoder effizient durch die Nutzung von Funktionen Korrelation | 通过使用地物关联, 高效地列列“ 分散的自动编译器” 。 2505.22255v1 |
Authors: Vadim Kurochkin, Yaroslav Aksenov, Daniil Laptev, Daniil Gavrilov, Nikita Balagansky
Sparse Autoencoders (SAEs) have demonstrated significant promise in interpreting the hidden states of language models by decomposing them into interpretable latent directions. However, training SAEs at scale remains challenging, especially when large dictionary sizes are used. While decoders can leverage sparse-aware kernels for efficiency, encoders still require computationally intensive linear operations with large output dimensions. To address this, we propose KronSAE, a novel architecture that factorizes the latent representation via Kronecker product decomposition, drastically reducing memory and computational overhead. Furthermore, we introduce mAND, a differentiable activation function approximating the binary AND operation, which improves interpretability and performance in our factorized framework.
nan
Article 594
Title@2025-05-28 (3): A Unified Online-Offline Framework for Co-Branding Campaign Recommendations
Title: A Unified Online-Offline Framework for Co-Branding Campaign Recommendations | Ein einheitliches Online-Offline-Rahmenwerk für Co-Branding-Kampagnenempfehlungen | 联合捆绑运动建议统一在线离线框架 2505.22254v1 |
Authors: Xiangxiang Dai, Xiaowei Sun, Jinhang Zuo, Xutong Liu, John C. S. Lui
Co-branding has become a vital strategy for businesses aiming to expand market reach within recommendation systems. However, identifying effective cross-industry partnerships remains challenging due to resource imbalances, uncertain brand willingness, and ever-changing market conditions. In this paper, we provide the first systematic study of this problem and propose a unified online-offline framework to enable co-branding recommendations. Our approach begins by constructing a bipartite graph linking initiating'' and
target’’ brands to quantify co-branding probabilities and assess market benefits. During the online learning phase, we dynamically update the graph in response to market feedback, while striking a balance between exploring new collaborations for long-term gains and exploiting established partnerships for immediate benefits. To address the high initial co-branding costs, our framework mitigates redundant exploration, thereby enhancing short-term performance while ensuring sustainable strategic growth. In the offline optimization phase, our framework consolidates the interests of multiple sub-brands under the same parent brand to maximize overall returns, avoid excessive investment in single sub-brands, and reduce unnecessary costs associated with over-prioritizing a single sub-brand. We present a theoretical analysis of our approach, establishing a highly nontrivial sublinear regret bound for online learning in the complex co-branding problem, and enhancing the approximation guarantee for the NP-hard offline budget allocation optimization. Experiments on both synthetic and real-world co-branding datasets demonstrate the practical effectiveness of our framework, with at least 12\% improvement.
nan
Article 595
Title@2025-05-28 (3): B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical Data
Title: B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical Data | B-XAIC Datensatz: Benchmarking Erklärbare KI für Graph Neuronale Netzwerke unter Verwendung chemischer Daten | B-XAIC数据集:使用化学数据的图形神经网络基准可解释的AI 2505.22252v1 |
Authors: Magdalena Proszewska, Tomasz Danel, Dawid Rymarczyk
Understanding the reasoning behind deep learning model predictions is crucial in cheminformatics and drug discovery, where molecular design determines their properties. However, current evaluation frameworks for Explainable AI (XAI) in this domain often rely on artificial datasets or simplified tasks, employing data-derived metrics that fail to capture the complexity of real-world scenarios and lack a direct link to explanation faithfulness. To address this, we introduce B-XAIC, a novel benchmark constructed from real-world molecular data and diverse tasks with known ground-truth rationales for assigned labels. Through a comprehensive evaluation using B-XAIC, we reveal limitations of existing XAI methods for Graph Neural Networks (GNNs) in the molecular domain. This benchmark provides a valuable resource for gaining deeper insights into the faithfulness of XAI, facilitating the development of more reliable and interpretable models.
nan
Article 596
Title@2025-05-28 (3): Evaluating Compact LLMs for Zero-Shot Iberian Language Tasks on End-User Devices
Title: Evaluating Compact LLMs for Zero-Shot Iberian Language Tasks on End-User Devices | Bewertung kompakter LLMs für blitzfreie iberische Sprachaufgaben auf Endbenutzer-Geräten | 评价关于最终用户装置的零 - 低 - 低 - 高 - 伊比利亚语语言任务 2504.03312v2 |
Authors: Luís Couto Seller, Íñigo Sanz Torres, Adrián Vogel-Fernández, Carlos González Carballo, Pedro Miguel Sánchez Sánchez, Adrián Carruana Martín, Enrique de Miguel Ambite
Large Language Models have significantly advanced natural language processing, achieving remarkable performance in tasks such as language generation, translation, and reasoning. However, their substantial computational requirements restrict deployment to high-end systems, limiting accessibility on consumer-grade devices. This challenge is especially pronounced for under-resourced languages like those spoken in the Iberian Peninsula, where relatively limited linguistic resources and benchmarks hinder effective evaluation. This work presents a comprehensive evaluation of compact state-of-the-art LLMs across several essential NLP tasks tailored for Iberian languages. The results reveal that while some models consistently excel in certain tasks, significant performance gaps remain, particularly for languages such as Basque. These findings highlight the need for further research on balancing model compactness with robust multilingual performance
nan
Article 597
Title@2025-05-28 (3): UDuo: Universal Dual Optimization Framework for Online Matching
Title: UDuo: Universal Dual Optimization Framework for Online Matching | UDuo: Universal Dual Optimization Framework für Online-Matching | UDuo: 通用双优化在线匹配框架 2505.22243v1 |
Authors: Bin Li, Diwei Liu, Zehong Hu, Jia Jia
Online resource allocation under budget constraints critically depends on proper modeling of user arrival dynamics. Classical approaches employ stochastic user arrival models to derive near-optimal solutions through fractional matching formulations of exposed users for downstream allocation tasks. However, this is no longer a reasonable assumption when the environment changes dynamically. In this work, We propose the Universal Dual optimization framework UDuo, a novel paradigm that fundamentally rethinks online allocation through three key innovations: (i) a temporal user arrival representation vector that explicitly captures distribution shifts in user arrival patterns and resource consumption dynamics, (ii) a resource pacing learner with adaptive allocation policies that generalize to heterogeneous constraint scenarios, and (iii) an online time-series forecasting approach for future user arrival distributions that achieves asymptotically optimal solutions with constraint feasibility guarantees in dynamic environments. Experimental results show that UDuo achieves higher efficiency and faster convergence than the traditional stochastic arrival model in real-world pricing while maintaining rigorous theoretical validity for general online allocation problems.
nan
Article 598
Title@2025-05-28 (3): Reinforcement Learning with Verifiable Rewards: GRPO’s Effective Loss, Dynamics, and Success Amplification
Title: Reinforcement Learning with Verifiable Rewards: GRPO’s Effective Loss, Dynamics, and Success Amplification | Verstärktes Lernen mit überprüfbaren Belohnungen: Effektiver Verlust, Dynamik und Erfolgsverstärkung von GRPO | 利用可核实的奖励加强学习:GROP的有效损失、动态和成功扩展 2503.06639v3 |
Authors: Youssef Mroueh
Group Relative Policy Optimization (GRPO) was introduced recently and used successfully to train DeepSeek-R1 models for promoting reasoning capabilities of LLMs using verifiable or binary rewards. We show in this paper that GRPO with verifiable rewards can be written as a Kullback–Leibler (KL) regularized contrastive loss, where the contrastive samples are synthetic data sampled from the old policy. The optimal GRPO policy $\pi_{n}$ can be expressed explicitly in terms of the binary reward, as well as the first- and second-order statistics of the old policy ($\pi_{n-1}$) and the reference policy $\pi_{\text{ref}}$. Iterating this scheme, we obtain a sequence of policies $\pi_{n}$ for which we can quantify the probability of success $p_n$. We show that the probability of success of the policy satisfies a recurrence that converges to a fixed point of a function that depends on the initial probability of success $p_{\text{ref}}$ and the regularization parameter $\beta$ of the $KL$ regularizer. We show that the fixed point $p^*$ is guaranteed to be larger than $p_{\text{ref}}$, thereby demonstrating that GRPO effectively amplifies the probability of success of the policy.
nan
Article 599
Title@2025-05-28 (3): Rethinking GNN Expressive Power from a Distributed Computational Model Perspective
Title: Rethinking GNN Expressive Power from a Distributed Computational Model Perspective | Überdenken von GNN Expressive Power aus einer distributed Computational Model Perspective | 从分配的计算模型模型角度重新思考GNNN 的表达力 2410.01308v3 |
Authors: Guanyu Cui, Yuhe Guo, Zhewei Wei, Hsin-Hao Su
The success of graph neural networks (GNNs) has motivated theoretical studies on their expressive power, often through alignments with the Weisfeiler-Lehman (WL) tests. However, such analyses typically focus on the ability of GNNs to distinguish between graph structures, rather than to compute or approximate specific function classes. The latter is more commonly studied in machine learning theory, including results such as the Turing completeness of recurrent networks and the universal approximation property of feedforward networks. We argue that using well-defined computational models, such as a modified CONGEST model with clearly specified preprocessing and postprocessing, offers a more sound framework for analyzing GNN expressiveness. Within this framework, we show that allowing unrestricted preprocessing or incorporating externally computed features, while claiming that these precomputations enhance the expressiveness, can sometimes lead to problems. We also show that the lower bound on a GNN’s capacity (depth multiplied by width) to simulate one iteration of the WL test actually grows nearly linearly with graph size, indicating that the WL test is not locally computable and is misaligned with message-passing GNNs. Despite these negative results, we also present positive results that characterize the effects of virtual nodes and edges from a computational model perspective. Finally, we highlight several open problems regarding GNN expressiveness for further exploration.
nan
Article 600
Title@2025-05-28 (3): NRFormer: Nationwide Nuclear Radiation Forecasting with Spatio-Temporal Transformer
Title: NRFormer: Nationwide Nuclear Radiation Forecasting with Spatio-Temporal Transformer | NRFormer: landesweite Vorhersage der nuklearen Strahlung mit Spatio-Temporal Transformer | NR 前:利用时空变压器进行全国核辐射预报 2410.11924v3 |
Authors: Tengfei Lyu, Jindong Han, Hao Liu
Nuclear radiation, which refers to the energy emitted from atomic nuclei during decay, poses significant risks to human health and environmental safety. Recently, advancements in monitoring technology have facilitated the effective recording of nuclear radiation levels and related factors, such as weather conditions. The abundance of monitoring data enables the development of accurate and reliable nuclear radiation forecasting models, which play a crucial role in informing decision-making for individuals and governments. However, this task is challenging due to the imbalanced distribution of monitoring stations over a wide spatial range and the non-stationary radiation variation patterns. In this study, we introduce NRFormer, a novel framework tailored for the nationwide prediction of nuclear radiation variations. By integrating a non-stationary temporal attention module, an imbalance-aware spatial attention module, and a radiation propagation prompting module, NRFormer collectively captures complex spatio-temporal dynamics of nuclear radiation. Extensive experiments on two real-world datasets demonstrate the superiority of our proposed framework against 11 baselines.
nan
Article 601
Title@2025-05-28 (3): On Provable Length and Compositional Generalization
Title: On Provable Length and Compositional Generalization | Auf evable Länge und kompositorische Verallgemeinerung | 关于可预见长度和组 成 式 通 泛 化 2402.04875v6 |
Authors: Kartik Ahuja, Amin Mansouri
Out-of-distribution generalization capabilities of sequence-to-sequence models can be studied from the lens of two crucial forms of generalization: length generalization – the ability to generalize to longer sequences than ones seen during training, and compositional generalization: the ability to generalize to token combinations not seen during training. In this work, we provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models – deep sets, transformers, state space models, and recurrent neural nets – trained to minimize the prediction error. We show that \emph{limited capacity} versions of these different architectures achieve both length and compositional generalization provided the training distribution is sufficiently diverse. In the first part, we study structured limited capacity variants of different architectures and arrive at the generalization guarantees with limited diversity requirements on the training distribution. In the second part, we study limited capacity variants with less structural assumptions and arrive at generalization guarantees but with more diversity requirements on the training distribution. Further, we also show that chain-of-thought supervision enables length generalization in higher capacity counterparts of the different architectures we study.
nan
Article 602
Title@2025-05-28 (3): Yambda-5B – A Large-Scale Multi-modal Dataset for Ranking And Retrieval
Title: Yambda-5B – A Large-Scale Multi-modal Dataset for Ranking And Retrieval | Yambda-5B – Ein multimodaler Datensatz für das Ranking und das Retrieval | Yambda-5B – – 用于排名和检索的大型多模式数据集 2505.22238v1 |
Authors: A. Ploshkin, V. Tytskiy, A. Pismenny, V. Baikalov, E. Taychinov, A. Permiakov, D. Burlakov, E. Krofto, N. Savushkin
We present Yambda-5B, a large-scale open dataset sourced from the Yandex.Music streaming platform. Yambda-5B contains 4.79 billion user-item interactions from 1 million users across 9.39 million tracks. The dataset includes two primary types of interactions: implicit feedback (listening events) and explicit feedback (likes, dislikes, unlikes and undislikes). In addition, we provide audio embeddings for most tracks, generated by a convolutional neural network trained on audio spectrograms. A key distinguishing feature of Yambda-5B is the inclusion of the is_organic flag, which separates organic user actions from recommendation-driven events. This distinction is critical for developing and evaluating machine learning algorithms, as Yandex.Music relies on recommender systems to personalize track selection for users. To support rigorous benchmarking, we introduce an evaluation protocol based on a Global Temporal Split, allowing recommendation algorithms to be assessed in conditions that closely mirror real-world use. We report benchmark results for standard baselines (ItemKNN, iALS) and advanced models (SANSA, SASRec) using a variety of evaluation metrics. By releasing Yambda-5B to the community, we aim to provide a readily accessible, industrial-scale resource to advance research, foster innovation, and promote reproducible results in recommender systems.
nan
Article 603
Title@2025-05-28 (3): Decision-Focused Forecasting: A Differentiable Multistage Optimisation Architecture
Title: Decision-Focused Forecasting: A Differentiable Multistage Optimisation Architecture | Entscheidungsorientierte Prognose: Eine differenzierbare mehrstufige Optimierungsarchitektur | 决定重点预测:可区别的多阶段优化结构 2405.14719v2 |
Authors: Egon Peršak, Miguel F. Anjos
Most decision-focused learning work has focused on single stage problems whereas many real-world decision problems are more appropriately modelled using multistage optimisation. In multistage problems contextual information is revealed over time, decisions have to be taken sequentially, and decisions now have an intertemporal effect on future decisions. Decision-focused forecasting is a recurrent differentiable optimisation architecture that expresses a fully differentiable multistage optimisation approach. This architecture enables us to account for the intertemporal decision effects of forecasts. We show what gradient adjustments are made to account for the state-path caused by forecasting. We apply the model to multistage problems in energy storage arbitrage and portfolio optimisation and report that our model outperforms existing approaches.
nan
Article 604
Title@2025-05-28 (3): Optimal kernel regression bounds under energy-bounded noise
Title: Optimal kernel regression bounds under energy-bounded noise | Optimale Kernel-Regressionsgrenzen unter energiegebundenem Rauschen | 在受能源限制的噪音下的最佳内核回归界限 2505.22235v1 |
Authors: Amon Lahr, Johannes Köhler, Anna Scampicchio, Melanie N. Zeilinger
Non-conservative uncertainty bounds are key for both assessing an estimation algorithm’s accuracy and in view of downstream tasks, such as its deployment in safety-critical contexts. In this paper, we derive a tight, non-asymptotic uncertainty bound for kernel-based estimation, which can also handle correlated noise sequences. Its computation relies on a mild norm-boundedness assumption on the unknown function and the noise, returning the worst-case function realization within the hypothesis class at an arbitrary query input location. The value of this function is shown to be given in terms of the posterior mean and covariance of a Gaussian process for an optimal choice of the measurement noise covariance. By rigorously analyzing the proposed approach and comparing it with other results in the literature, we show its effectiveness in returning tight and easy-to-compute bounds for kernel-based estimates.
nan
Article 605
Title@2025-05-28 (3): Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
Title: Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models | Qualität Across-Sprachen beurteilen: Ein mehrsprachiger Ansatz zur Vorschulung von Datenfiltern mit Sprachmodellen | 判断各语文的质量:采用多种语文办法,利用语言模式进行培训前数据过滤 2505.22232v1 |
Authors: Mehdi Ali, Manuel Brack, Max Lübbering, Elias Wendt, Abbas Goher Khan, Richard Rutmann, Alex Jude, Maurice Kraus, Alexander Arno Weber, Felix Stollenwerk, David Kaczér, Florian Mai, Lucie Flek, Rafet Sifa, Nicolas Flores-Herr, Joachim Köhler, Patrick Schramowski, Michael Fromm, Kristian Kersting
High-quality multilingual training data is essential for effectively pretraining large language models (LLMs). Yet, the availability of suitable open-source multilingual datasets remains limited. Existing state-of-the-art datasets mostly rely on heuristic filtering methods, restricting both their cross-lingual transferability and scalability. Here, we introduce JQL, a systematic approach that efficiently curates diverse and high-quality multilingual data at scale while significantly reducing computational demands. JQL distills LLMs’ annotation capabilities into lightweight annotators based on pretrained multilingual embeddings. These models exhibit robust multilingual and cross-lingual performance, even for languages and scripts unseen during training. Evaluated empirically across 35 languages, the resulting annotation pipeline substantially outperforms current heuristic filtering methods like Fineweb2. JQL notably enhances downstream model training quality and increases data retention rates. Our research provides practical insights and valuable resources for multilingual data curation, raising the standards of multilingual dataset development.
nan
Article 606
Title@2025-05-28 (3): You Do Not Fully Utilize Transformer’s Representation Capacity
Title: You Do Not Fully Utilize Transformer’s Representation Capacity | Sie nicht voll nutzen Transformer-Repräsentanz Kapazität | 您没有充分利用变换器的代表能力 2502.09245v2 |
Authors: Gleb Gerasimov, Yaroslav Aksenov, Nikita Balagansky, Viacheslav Sinii, Daniil Gavrilov
In contrast to RNNs, which compress their history into a single hidden state, Transformers can attend to all past tokens directly. However, standard Transformers rely solely on the hidden state from the previous layer to represent the entire context. We show that this design choice induces representation collapse and degrades performance. To address this issue, we introduce Layer-Integrated Memory (LIMe), a lightweight extension that leverages existing key-value buffers and learns per-head, per-layer routing weights to integrate representations from all previous layers with negligible overhead. Through extensive experiments-including language modeling, synthetic reasoning benchmarks, and very deep architectures-LIMe consistently achieves faster convergence, lower perplexity per FLOP, and substantial accuracy improvements on synthetic tasks while preserving higher value-vector entropy and improved token separability. Finally, our analysis of the learned routing weights reveals systematic reuse of both local and long-distance features, demonstrating how LIMe mitigates collapse, unlocks richer representations without increasing hidden-state size, and points to promising directions for future research.
nan
Article 607
Title@2025-05-28 (3): Solver-Free Decision-Focused Learning for Linear Optimization Problems
Title: Solver-Free Decision-Focused Learning for Linear Optimization Problems | Solver-Free decision-focused Learning für lineare Optimierungsprobleme | 处理线性优化问题的无解决者决定-集中学习 2505.22224v1 |
Authors: Senne Berden, Ali İrfan Mahmutoğulları, Dimos Tsouros, Tias Guns
Mathematical optimization is a fundamental tool for decision-making in a wide range of applications. However, in many real-world scenarios, the parameters of the optimization problem are not known a priori and must be predicted from contextual features. This gives rise to predict-then-optimize problems, where a machine learning model predicts problem parameters that are then used to make decisions via optimization. A growing body of work on decision-focused learning (DFL) addresses this setting by training models specifically to produce predictions that maximize downstream decision quality, rather than accuracy. While effective, DFL is computationally expensive, because it requires solving the optimization problem with the predicted parameters at each loss evaluation. In this work, we address this computational bottleneck for linear optimization problems, a common class of problems in both DFL literature and real-world applications. We propose a solver-free training method that exploits the geometric structure of linear optimization to enable efficient training with minimal degradation in solution quality. Our method is based on the insight that a solution is optimal if and only if it achieves an objective value that is at least as good as that of its adjacent vertices on the feasible polytope. Building on this, our method compares the estimated quality of the ground-truth optimal solution with that of its precomputed adjacent vertices, and uses this as loss function. Experiments demonstrate that our method significantly reduces computational cost while maintaining high decision quality.
nan
Article 608
Title@2025-05-28 (3): Taming Recommendation Bias with Causal Intervention on Evolving Personal Popularity
Title: Taming Recommendation Bias with Causal Intervention on Evolving Personal Popularity | Zähmungsempfehlung Bias mit ursächlicher Intervention zur Entwicklung persönlicher Beliebtheit | ” 与个人大众演变的因果关系干预 “ 的 “ 比亚斯 “ 和 “ 个人大众演变 “ 的 “ 比亚斯 “ 建议 2505.14310v2 |
Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Zhen Wang, Xingtong Yu, Manabu Okumura
Popularity bias occurs when popular items are recommended far more frequently than they should be, negatively impacting both user experience and recommendation accuracy. Existing debiasing methods mitigate popularity bias often uniformly across all users and only partially consider the time evolution of users or items. However, users have different levels of preference for item popularity, and this preference is evolving over time. To address these issues, we propose a novel method called CausalEPP (Causal Intervention on Evolving Personal Popularity) for taming recommendation bias, which accounts for the evolving personal popularity of users. Specifically, we first introduce a metric called {Evolving Personal Popularity} to quantify each user’s preference for popular items. Then, we design a causal graph that integrates evolving personal popularity into the conformity effect, and apply deconfounded training to mitigate the popularity bias of the causal graph. During inference, we consider the evolution consistency between users and items to achieve a better recommendation. Empirical studies demonstrate that CausalEPP outperforms baseline methods in reducing popularity bias while improving recommendation accuracy.
nan
Article 609
Title@2025-05-28 (3): Quantum framework for Reinforcement Learning: Integrating Markov decision process, quantum arithmetic, and trajectory search
Title: Quantum framework for Reinforcement Learning: Integrating Markov decision process, quantum arithmetic, and trajectory search | Quanten-Framework for Reinforcement Learning: Markov-Entscheidungsprozess, Quantenarithmetik und Flugbahnsuche integrieren | 强化学习的量子框架:纳入Markov决策程序、量数算术和轨迹搜索 2412.18208v3 |
Authors: Thet Htar Su, Shaswot Shresthamali, Masaaki Kondo
This paper introduces a quantum framework for addressing reinforcement learning (RL) tasks, grounded in the quantum principles and leveraging a fully quantum model of the classical Markov decision process (MDP). By employing quantum concepts and a quantum search algorithm, this work presents the implementation and optimization of the agent-environment interactions entirely within the quantum domain, eliminating reliance on classical computations. Key contributions include the quantum-based state transitions, return calculation, and trajectory search mechanism that utilize quantum principles to demonstrate the realization of RL processes through quantum phenomena. The implementation emphasizes the fundamental role of quantum superposition in enhancing computational efficiency for RL tasks. Results demonstrate the capacity of a quantum model to achieve quantum enhancement in RL, highlighting the potential of fully quantum implementations in decision-making tasks. This work not only underscores the applicability of quantum computing in machine learning but also contributes to the field of quantum reinforcement learning (QRL) by offering a robust framework for understanding and exploiting quantum computing in RL systems.
nan
Article 610
Title@2025-05-28 (3): Advancing Sequential Numerical Prediction in Autoregressive Models
Title: Advancing Sequential Numerical Prediction in Autoregressive Models | Advancing Sequential Numerical Prediction in Autoregressive Modelle | 自动递减模型中推进序列序号预测 2505.13077v2 |
Authors: Xiang Fei, Jinghui Lu, Qi Sun, Hao Feng, Yanjie Wang, Wei Shi, An-Lan Wang, Jingqun Tang, Can Huang
Autoregressive models have become the de facto choice for sequence generation tasks, but standard approaches treat digits as independent tokens and apply cross-entropy loss, overlooking the coherent structure of numerical sequences. This paper introduces Numerical Token Integrity Loss (NTIL) to address this gap. NTIL operates at two levels: (1) token-level, where it extends the Earth Mover’s Distance (EMD) to preserve ordinal relationships between numerical values, and (2) sequence-level, where it penalizes the overall discrepancy between the predicted and actual sequences. This dual approach improves numerical prediction and integrates effectively with LLMs/MLLMs. Extensive experiments show significant performance improvements with NTIL.
nan
Article 611
Title@2025-05-28 (3): On the Within-class Variation Issue in Alzheimer’s Disease Detection
Title: On the Within-class Variation Issue in Alzheimer’s Disease Detection | Zur klasseninternen Variationsfrage bei der Alzheimer-Erkennung | 阿尔茨海默氏氏病检测的 类内变化变化问题 2409.16322v2 |
Authors: Jiawen Kang, Dongrui Han, Lingwei Meng, Jingyan Zhou, Jinchao Li, Xixin Wu, Helen Meng
Alzheimer’s Disease (AD) detection employs machine learning classification models to distinguish between individuals with AD and those without. Different from conventional classification tasks, we identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments. Therefore, simplistic binary AD classification may overlook two crucial aspects: within-class heterogeneity and instance-level imbalance. In this work, we found using a sample score estimator can generate sample-specific soft scores aligning with cognitive scores. We subsequently propose two simple yet effective methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively. Based on the ADReSS and CU-MARVEL corpora, we demonstrated and analyzed the advantages of the proposed approaches in detection performance. These findings provide insights for developing robust and reliable AD detection models.
nan
Article 612
Title@2025-05-28 (3): Interpreting CLIP with Hierarchical Sparse Autoencoders
Title: Interpreting CLIP with Hierarchical Sparse Autoencoders | CLIP mit Hierarchical Sparse Autoencodern interpretieren | 使用等级式的粗度自动解析器解释 CLIP 2502.20578v2 |
Authors: Vladimir Zaigrajew, Hubert Baniecki, Przemyslaw Biecek
Sparse autoencoders (SAEs) are useful for detecting and steering interpretable features in neural networks, with particular potential for understanding complex multimodal representations. Given their ability to uncover interpretable features, SAEs are particularly valuable for analyzing large-scale vision-language models (e.g., CLIP and SigLIP), which are fundamental building blocks in modern systems yet remain challenging to interpret and control. However, current SAE methods are limited by optimizing both reconstruction quality and sparsity simultaneously, as they rely on either activation suppression or rigid sparsity constraints. To this end, we introduce Matryoshka SAE (MSAE), a new architecture that learns hierarchical representations at multiple granularities simultaneously, enabling a direct optimization of both metrics without compromise. MSAE establishes a new state-of-the-art Pareto frontier between reconstruction quality and sparsity for CLIP, achieving 0.99 cosine similarity and less than 0.1 fraction of variance unexplained while maintaining ~80% sparsity. Finally, we demonstrate the utility of MSAE as a tool for interpreting and controlling CLIP by extracting over 120 semantic concepts from its representation to perform concept-based similarity search and bias analysis in downstream tasks like CelebA. We make the codebase available at https://github.com/WolodjaZ/MSAE.
nan
Article 613
Title@2025-05-28 (3): LaMM: Semi-Supervised Pre-Training of Large-Scale Materials Models
Title: LaMM: Semi-Supervised Pre-Training of Large-Scale Materials Models | LaMM: Halbüberwachte Vorausbildung von großformatigen Werkstoffmodellen | LAMM: 大型材料模型的半监督前培训 2505.22208v1 |
Authors: Yosuke Oyama, Yusuke Majima, Eiji Ohta, Yasufumi Sakai
Neural network potentials (NNPs) are crucial for accelerating computational materials science by surrogating density functional theory (DFT) calculations. Improving their accuracy is possible through pre-training and fine-tuning, where an NNP model is first pre-trained on a large-scale dataset and then fine-tuned on a smaller target dataset. However, this approach is computationally expensive, mainly due to the cost of DFT-based dataset labeling and load imbalances during large-scale pre-training. To address this, we propose LaMM, a semi-supervised pre-training method incorporating improved denoising self-supervised learning and a load-balancing algorithm for efficient multi-node training. We demonstrate that our approach effectively leverages a large-scale dataset of $\sim$300 million semi-labeled samples to train a single NNP model, resulting in improved fine-tuning performance in terms of both speed and accuracy.
nan
Article 614
Title@2025-05-28 (3): Pitfalls of Rule- and Model-based Verifiers – A Case Study on Mathematical Reasoning
Title: Pitfalls of Rule- and Model-based Verifiers – A Case Study on Mathematical Reasoning | Pitfalls of Rule- and Model-based Verifiers – Eine Fallstudie zur mathematischen Begründung | 规则和基于示范的验证符咒 – – 关于数学理由的个案研究 2505.22203v1 |
Authors: Yuzhen Huang, Weihao Zeng, Xingshan Zeng, Qi Zhu, Junxian He
Trustworthy verifiers are essential for the success of reinforcement learning with verifiable reward (RLVR), which is the core methodology behind various large reasoning models such as DeepSeek-R1. In complex domains like mathematical reasoning, rule-based verifiers have been widely adopted in previous works to train strong reasoning models. However, the reliability of these verifiers and their impact on the RL training process remain poorly understood. In this work, we take mathematical reasoning as a case study and conduct a comprehensive analysis of various verifiers in both static evaluation and RL training scenarios. First, we find that current open-source rule-based verifiers often fail to recognize equivalent answers presented in different formats across multiple commonly used mathematical datasets, resulting in non-negligible false negative rates. This limitation adversely affects RL training performance and becomes more pronounced as the policy model gets stronger. Subsequently, we investigate model-based verifiers as a potential solution to address these limitations. While the static evaluation shows that model-based verifiers achieve significantly higher verification accuracy, further analysis and RL training results imply that they are highly susceptible to hacking, where they misclassify certain patterns in responses as correct (i.e., false positives). This vulnerability is exploited during policy model optimization, leading to artificially inflated rewards. Our findings underscore the unique risks inherent to both rule-based and model-based verifiers, aiming to offer valuable insights to develop more robust reward systems in reinforcement learning.
nan
Article 615
Title@2025-05-28 (3): Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer
Title: Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer | Verbesserung der Unsicherheitsabschätzung und -interpretierbarkeit über Bayesian Non-negative Decision Layer | 通过Bayesian非负决定层加强不确定性的估算和解释 2505.22199v1 |
Authors: Xinyue Hu, Zhibin Duan, Bo Chen, Mingyuan Zhou
Although deep neural networks have demonstrated significant success due to their powerful expressiveness, most models struggle to meet practical requirements for uncertainty estimation. Concurrently, the entangled nature of deep neural networks leads to a multifaceted problem, where various localized explanation techniques reveal that multiple unrelated features influence the decisions, thereby undermining interpretability. To address these challenges, we develop a Bayesian Non-negative Decision Layer (BNDL), which reformulates deep neural networks as a conditional Bayesian non-negative factor analysis. By leveraging stochastic latent variables, the BNDL can model complex dependencies and provide robust uncertainty estimation. Moreover, the sparsity and non-negativity of the latent variables encourage the model to learn disentangled representations and decision layers, thereby improving interpretability. We also offer theoretical guarantees that BNDL can achieve effective disentangled learning. In addition, we developed a corresponding variational inference method utilizing a Weibull variational inference network to approximate the posterior distribution of the latent variables. Our experimental results demonstrate that with enhanced disentanglement capabilities, BNDL not only improves the model’s accuracy but also provides reliable uncertainty estimation and improved interpretability.
nan
Article 616
Title@2025-05-28 (3): An Augmentation-Aware Theory for Self-Supervised Contrastive Learning
Title: An Augmentation-Aware Theory for Self-Supervised Contrastive Learning | Eine Augmentations-Bewusst-Theorie für selbstüberwachtes kontrastives Lernen | 自我监督违规学习的增强- 软件软件理论 2505.22196v1 |
Authors: Jingyi Cui, Hongwei Wen, Yisen Wang
Self-supervised contrastive learning has emerged as a powerful tool in machine learning and computer vision to learn meaningful representations from unlabeled data. Meanwhile, its empirical success has encouraged many theoretical studies to reveal the learning mechanisms. However, in the existing theoretical research, the role of data augmentation is still under-exploited, especially the effects of specific augmentation types. To fill in the blank, we for the first time propose an augmentation-aware error bound for self-supervised contrastive learning, showing that the supervised risk is bounded not only by the unsupervised risk, but also explicitly by a trade-off induced by data augmentation. Then, under a novel semantic label assumption, we discuss how certain augmentation methods affect the error bound. Lastly, we conduct both pixel- and representation-level experiments to verify our proposed theoretical results.
nan
Article 617
Title@2025-05-28 (3): Physics-inspired Generative AI models via real hardware-based noisy quantum diffusion
Title: Physics-inspired Generative AI models via real hardware-based noisy quantum diffusion | Physik-inspirierte Generative KI-Modelle über reale Hardware-basierte laute Quantendiffusion | 通过实实在在的硬件噪音量子扩散 产生人工智能模型 2505.22193v1 |
Authors: Marco Parigi, Stefano Martina, Francesco Aldo Venturelli, Filippo Caruso
Quantum Diffusion Models (QDMs) are an emerging paradigm in Generative AI that aims to use quantum properties to improve the performances of their classical counterparts. However, existing algorithms are not easily scalable due to the limitations of near-term quantum devices. Following our previous work on QDMs, here we propose and implement two physics-inspired protocols. In the first, we use the formalism of quantum stochastic walks, showing that a specific interplay of quantum and classical dynamics in the forward process produces statistically more robust models generating sets of MNIST images with lower Fr'echet Inception Distance (FID) than using totally classical dynamics. In the second approach, we realize an algorithm to generate images by exploiting the intrinsic noise of real IBM quantum hardware with only four qubits. Our work could be a starting point to pave the way for new scenarios for large-scale algorithms in quantum Generative AI, where quantum noise is neither mitigated nor corrected, but instead exploited as a useful resource.
nan
Article 618
Title@2025-05-28 (3): Beyond RMSE and MAE: Introducing EAUC to unmask hidden bias and unfairness in dyadic regression models
Title: Beyond RMSE and MAE: Introducing EAUC to unmask hidden bias and unfairness in dyadic regression models | Jenseits von RMSE und MAE: Einführung des EUC zur Enttarnung versteckter Bias und Ungerechtigkeit in dyadischen Regressionsmodellen | RUSE 和MAE 之后的RUSE 和MAE:将EAUC引入dyadic回归模型中隐蔽的偏见和不公平现象 2401.10690v5 |
Authors: Jorge Paz-Ruza, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, Brais Cancela, Carlos Eiras-Franco
Dyadic regression models, which output real-valued predictions for pairs of entities, are fundamental in many domains (e.g. obtaining user-product ratings in Recommender Systems) and promising and under exploration in others (e.g. tuning patient-drug dosages in precision pharmacology). In this work, we prove that non-uniform observed value distributions of individual entities lead to severe biases in state-of-the-art models, skewing predictions towards the average of observed past values for the entity and providing worse-than-random predictive power in eccentric yet crucial cases; we name this phenomenon eccentricity bias. We show that global error metrics like Root Mean Squared Error (RMSE) are insufficient to capture this bias, and we introduce Eccentricity-Area Under the Curve (EAUC) as a novel metric that can quantify it in all studied domains and models. We prove the intuitive interpretation of EAUC by experimenting with naive post-training bias corrections, and theorize other options to use EAUC to guide the construction of fair models. This work contributes a bias-aware evaluation of dyadic regression to prevent unfairness in critical real-world applications of such systems.
nan
Article 619
Title@2025-05-28 (3): LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently
Title: LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently | LoRA-One: Ein-Schritt-Full Gradient könnte genug für feines Tuning von großen Sprachmodellen sein, wahrscheinlich und effizient | LORA-OI: 精巧、高效、可预见和高效的微调大语言模型的单步全步可满足需要 2502.01235v2 |
Authors: Yuanhe Zhang, Fanghui Liu, Yudong Chen
This paper explores how theory can guide and enhance practical algorithms, using Low-Rank Adaptation (LoRA, Hu et al. 2022) in large language models as a case study. We rigorously prove that, under gradient descent, LoRA adapters align with specific singular subspaces of the one-step full fine-tuning gradient. This result suggests that, by properly initializing the adapters using the one-step full gradient, subspace alignment can be achieved immediately and applicable to both linear and nonlinear models. Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. Besides, our theory reveals connections between LoRA-One and other gradient-alignment-based methods, helping to clarify misconceptions in the design of such algorithms. LoRA-One achieves significant empirical improvements over LoRA and its variants across benchmarks in natural language understanding, mathematical reasoning, and code generation. Code is available at: https://github.com/YuanheZ/LoRA-One.
nan
Article 620
Title@2025-05-28 (3): LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Title: LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits | LC-Tsallis-INF: Generalisierte Best-of-Both-Worlds Lineare Kontextbanditen | LC-Tsallis-INF: 普遍化的两世界最佳线性线性直线性范围内的强盗 2403.03219v3 |
Authors: Masahiro Kato, Shinji Ito
We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the $\alpha$-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most $O(\log(T))$ in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most $O(\sqrt{T})$ in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter $\beta \in (1, \infty]$, which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves $O\left(\log(T)^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)$ regret under the margin condition.
nan
Article 621
Title@2025-05-28 (3): Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes
Title: Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes | Kontinuierliche und diskrete Diffusion mit nicht gleichzeitigen Diffusionsprozessen | 与非平行扩散进程一起进行连续和分解的不连续和分解文本传播 2505.22165v1 |
Authors: Bocheng Li, Zhujin Gao, Linli Xu
Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using categorical distributions, allowing for different diffusion progress across tokens but lacking fine-grained control. Continuous diffusion models map tokens to continuous spaces and apply fine-grained noise, but the diffusion progress is uniform across tokens, limiting their ability to capture semantic nuances. To address these limitations, we propose \textbf{\underline{N}}on-simultan\textbf{\underline{e}}ous C\textbf{\underline{o}}ntinuous \textbf{\underline{Diff}}usion Models (NeoDiff), a novel diffusion model that integrates the strengths of both discrete and continuous approaches. NeoDiff introduces a Poisson diffusion process for the forward process, enabling a flexible and fine-grained noising paradigm, and employs a time predictor for the reverse process to adaptively modulate the denoising progress based on token semantics. Furthermore, NeoDiff utilizes an optimized schedule for inference to ensure more precise noise control and improved performance. Our approach unifies the theories of discrete and continuous diffusion models, offering a more principled and effective framework for text generation. Experimental results on several text generation tasks demonstrate NeoDiff’s superior performance compared to baselines of non-autoregressive continuous and discrete diffusion models, iterative-based methods and autoregressive diffusion-based methods. These results highlight NeoDiff’s potential as a powerful tool for generating high-quality text and advancing the field of diffusion-based text generation.
nan
Article 622
Title@2025-05-28 (3): AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping
Title: AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping | AgriFM: Multi-Source-Modell für die zeitliche Fernerkundung | AgriFM:多种来源的时空遥感基金会作物绘图模型 2505.21357v2 |
Authors: Wenyuan Li, Shunlin Liang, Keyan Chen, Yongzhe Chen, Han Ma, Jianglei Xu, Yichuan Ma, Shikang Guan, Husheng Fang, Zhenwei Shi
Accurate crop mapping fundamentally relies on modeling multi-scale spatiotemporal patterns, where spatial scales range from individual field textures to landscape-level context, and temporal scales capture both short-term phenological transitions and full growing-season dynamics. Transformer-based remote sensing foundation models (RSFMs) offer promising potential for crop mapping due to their innate ability for unified spatiotemporal processing. However, current RSFMs remain suboptimal for crop mapping: they either employ fixed spatiotemporal windows that ignore the multi-scale nature of crop systems or completely disregard temporal information by focusing solely on spatial patterns. To bridge these gaps, we present AgriFM, a multi-source remote sensing foundation model specifically designed for agricultural crop mapping. Our approach begins by establishing the necessity of simultaneous hierarchical spatiotemporal feature extraction, leading to the development of a modified Video Swin Transformer architecture where temporal down-sampling is synchronized with spatial scaling operations. This modified backbone enables efficient unified processing of long time-series satellite inputs. AgriFM leverages temporally rich data streams from three satellite sources including MODIS, Landsat-8/9 and Sentinel-2, and is pre-trained on a global representative dataset comprising over 25 million image samples supervised by land cover products. The resulting framework incorporates a versatile decoder architecture that dynamically fuses these learned spatiotemporal representations, supporting diverse downstream tasks. Comprehensive evaluations demonstrate AgriFM’s superior performance over conventional deep learning approaches and state-of-the-art general-purpose RSFMs across all downstream tasks. Codes will be available at https://github.com/flyakon/AgriFM.
nan
Article 623
Title@2025-05-28 (3): The informativeness of the gradient revisited
Title: The informativeness of the gradient revisited | Die Aufschlusskraft des Gradienten wurde überarbeitet | 重新讨论的梯度信息性 2505.22158v1 |
Authors: Rustem Takhanov
In the past decade gradient-based deep learning has revolutionized several applications. However, this rapid advancement has highlighted the need for a deeper theoretical understanding of its limitations. Research has shown that, in many practical learning tasks, the information contained in the gradient is so minimal that gradient-based methods require an exceedingly large number of iterations to achieve success. The informativeness of the gradient is typically measured by its variance with respect to the random selection of a target function from a hypothesis class. We use this framework and give a general bound on the variance in terms of a parameter related to the pairwise independence of the target function class and the collision entropy of the input distribution. Our bound scales as $ \tilde{\mathcal{O}}(\varepsilon+e^{-\frac{1}{2}\mathcal{E}_c}) $, where $ \tilde{\mathcal{O}} $ hides factors related to the regularity of the learning model and the loss function, $ \varepsilon $ measures the pairwise independence of the target function class and $\mathcal{E}_c$ is the collision entropy of the input distribution. To demonstrate the practical utility of our bound, we apply it to the class of Learning with Errors (LWE) mappings and high-frequency functions. In addition to the theoretical analysis, we present experiments to understand better the nature of recent deep learning-based attacks on LWE.
nan
Article 624
Title@2025-05-28 (3): Towards Practical Defect-Focused Automated Code Review
Title: Towards Practical Defect-Focused Automated Code Review | Auf dem Weg zu einer praktischen fehlerorientierten automatisierten Code-Überprüfung | 走向实际失效-受污染的自动编码审查 2505.17928v2 |
Authors: Junyi Lu, Lili Jiang, Xiaojia Li, Jianbing Fang, Fengjun Zhang, Li Yang, Chun Zuo
The complexity of code reviews has driven efforts to automate review comments, but prior approaches oversimplify this task by treating it as snippet-level code-to-text generation and relying on text similarity metrics like BLEU for evaluation. These methods overlook repository context, real-world merge request evaluation, and defect detection, limiting their practicality. To address these issues, we explore the full automation pipeline within the online recommendation service of a company with nearly 400 million daily active users, analyzing industry-grade C++ codebases comprising hundreds of thousands of lines of code. We identify four key challenges: 1) capturing relevant context, 2) improving key bug inclusion (KBI), 3) reducing false alarm rates (FAR), and 4) integrating human workflows. To tackle these, we propose 1) code slicing algorithms for context extraction, 2) a multi-role LLM framework for KBI, 3) a filtering mechanism for FAR reduction, and 4) a novel prompt design for better human interaction. Our approach, validated on real-world merge requests from historical fault reports, achieves a 2x improvement over standard LLMs and a 10x gain over previous baselines. While the presented results focus on C++, the underlying framework design leverages language-agnostic principles (e.g., AST-based analysis), suggesting potential for broader applicability.
nan
Article 625
Title@2025-05-28 (3): Uncertainty Estimation for Heterophilic Graphs Through the Lens of Information Theory
Title: Uncertainty Estimation for Heterophilic Graphs Through the Lens of Information Theory | Ungewissheitsschätzung für heterophile Graphen durch die Linse der Informationstheorie | 信息镜头信息理论流流中异血哲学图谱的不确定性估计 2505.22152v1 |
Authors: Dominik Fuchsgruber, Tom Wollschläger, Johannes Bordne, Stephan Günnemann
While uncertainty estimation for graphs recently gained traction, most methods rely on homophily and deteriorate in heterophilic settings. We address this by analyzing message passing neural networks from an information-theoretic perspective and developing a suitable analog to data processing inequality to quantify information throughout the model’s layers. In contrast to non-graph domains, information about the node-level prediction target can increase with model depth if a node’s features are semantically different from its neighbors. Therefore, on heterophilic graphs, the latent embeddings of an MPNN each provide different information about the data distribution - different from homophilic settings. This reveals that considering all node representations simultaneously is a key design principle for epistemic uncertainty estimation on graphs beyond homophily. We empirically confirm this with a simple post-hoc density estimator on the joint node embedding space that provides state-of-the-art uncertainty on heterophilic graphs. At the same time, it matches prior work on homophilic graphs without explicitly exploiting homophily through post-processing.
nan
Article 626
Title@2025-05-28 (3): Oryx: a Performant and Scalable Algorithm for Many-Agent Coordination in Offline MARL
Title: Oryx: a Performant and Scalable Algorithm for Many-Agent Coordination in Offline MARL | Oryx: ein performanter und skalierbarer Algorithmus für viele-Agenten-Koordination in Offline MARL | Oryx: MARL 离线下许多机构协调的性能和可缩放的数值 2505.22151v1 |
Authors: Claude Formanek, Omayma Mahjoub, Louay Ben Nessir, Sasha Abramowitz, Ruan de Kock, Wiem Khlifi, Simon Du Toit, Felix Chalumeau, Daniel Rajaonarivonivelomanantsoa, Arnol Fokam, Siddarth Singh, Ulrich Mbou Sob, Arnu Pretorius
A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel algorithm for offline cooperative MARL to directly address this challenge. Oryx adapts the recently proposed retention-based architecture Sable and combines it with a sequential form of implicit constraint Q-learning (ICQ), to develop a novel offline auto-regressive policy update scheme. This allows Oryx to solve complex coordination challenges while maintaining temporal coherence over lengthy trajectories. We evaluate Oryx across a diverse set of benchmarks from prior works (SMAC, RWARE, and Multi-Agent MuJoCo) covering tasks of both discrete and continuous control, varying in scale and difficulty. Oryx achieves state-of-the-art performance on more than 80% of the 65 tested datasets, outperforming prior offline MARL methods and demonstrating robust generalisation across domains with many agents and long horizons. Finally, we introduce new datasets to push the limits of many-agent coordination in offline MARL, and demonstrate Oryx’s superior ability to scale effectively in such settings. We will make all of our datasets, experimental data, and code available upon publication.
nan
Article 627
Title@2025-05-28 (3): Gradient Boosting Reinforcement Learning
Title: Gradient Boosting Reinforcement Learning | Gradientenfördernde Stärkung des Lernens | 逐步推进强化学习 2407.08250v2 |
Authors: Benjamin Fuhrer, Chen Tessler, Gal Dalal
We present Gradient Boosting Reinforcement Learning (GBRL), a framework that adapts the strengths of gradient boosting trees (GBT) to reinforcement learning (RL) tasks. While neural networks (NNs) have become the de facto choice for RL, they face significant challenges with structured and categorical features and tend to generalize poorly to out-of-distribution samples. These are challenges for which GBTs have traditionally excelled in supervised learning. However, GBT’s application in RL has been limited. The design of traditional GBT libraries is optimized for static datasets with fixed labels, making them incompatible with RL’s dynamic nature, where both state distributions and reward signals evolve during training. GBRL overcomes this limitation by continuously interleaving tree construction with environment interaction. Through extensive experiments, we demonstrate that GBRL outperforms NNs in domains with structured observations and categorical features while maintaining competitive performance on standard continuous control benchmarks. Like its supervised learning counterpart, GBRL demonstrates superior robustness to out-of-distribution samples and better handles irregular state-action relationships.
nan
Article 628
Title@2025-05-28 (3): Bridging Arbitrary and Tree Metrics via Differentiable Gromov Hyperbolicity
Title: Bridging Arbitrary and Tree Metrics via Differentiable Gromov Hyperbolicity | Überbrückung von Willkür- und Baummetrics durch differenzierbare Gromov-Hyperbolizität | 通过差别化格罗莫夫双向主义 2505.21073v2 |
Authors: Pierre Houedry, Nicolas Courty, Florestan Martin-Baillon, Laetitia Chapel, Titouan Vayer
Trees and the associated shortest-path tree metrics provide a powerful framework for representing hierarchical and combinatorial structures in data. Given an arbitrary metric space, its deviation from a tree metric can be quantified by Gromov’s $\delta$-hyperbolicity. Nonetheless, designing algorithms that bridge an arbitrary metric to its closest tree metric is still a vivid subject of interest, as most common approaches are either heuristical and lack guarantees, or perform moderately well. In this work, we introduce a novel differentiable optimization framework, coined DeltaZero, that solves this problem. Our method leverages a smooth surrogate for Gromov’s $\delta$-hyperbolicity which enables a gradient-based optimization, with a tractable complexity. The corresponding optimization procedure is derived from a problem with better worst case guarantees than existing bounds, and is justified statistically. Experiments on synthetic and real-world datasets demonstrate that our method consistently achieves state-of-the-art distortion.
nan
Article 629
Title@2025-05-28 (3): Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments
Title: Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments | Begrenzte Verallgemeinerbarkeit im Argumentbergbau: State-of-The-Art-Modelle lernen Datensätze, keine Argumente | 《争议采矿业的限制性通用性:国家与艺术中的模式学习数据集,非论据》 2505.22137v1 |
Authors: Marc Feger, Katarina Boland, Stefan Dietze
Identifying arguments is a necessary prerequisite for various tasks in automated discourse analysis, particularly within contexts such as political debates, online discussions, and scientific reasoning. In addition to theoretical advances in understanding the constitution of arguments, a significant body of research has emerged around practical argument mining, supported by a growing number of publicly available datasets. On these benchmarks, BERT-like transformers have consistently performed best, reinforcing the belief that such models are broadly applicable across diverse contexts of debate. This study offers the first large-scale re-evaluation of such state-of-the-art models, with a specific focus on their ability to generalize in identifying arguments. We evaluate four transformers, three standard and one enhanced with contrastive pre-training for better generalization, on 17 English sentence-level datasets as most relevant to the task. Our findings show that, to varying degrees, these models tend to rely on lexical shortcuts tied to content words, suggesting that apparent progress may often be driven by dataset-specific cues rather than true task alignment. While the models achieve strong results on familiar benchmarks, their performance drops markedly when applied to unseen datasets. Nonetheless, incorporating both task-specific pre-training and joint benchmark training proves effective in enhancing both robustness and generalization.
nan
Article 630
Title@2025-05-28 (3): RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding
Title: RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding | RAD: Redundanz-Bewusst-Destillation für Hybridmodelle über selbstspekulative Decodierung | RAD: 通过自投机代号为混合模型进行再利用-软件蒸馏 2505.22135v1 |
Authors: Yuichiro Hoshino, Hideyuki Tachibana, Muneyoshi Inahara, Hiroto Takegawa
Hybrid models combining Transformers and State Space Models (SSMs) are promising for balancing performance and efficiency. However, optimizing these hybrid models, particularly by addressing the potential redundancy inherent within the Transformer components, remains a significant challenge. In this paper, we propose RAD (Redundancy-Aware Distillation), a novel framework that uses self-speculative decoding as a diagnostic tool to identify redundant attention layers within the model. These identified layers are then selectively replaced with SSM components, followed by targeted (self-)distillation. Specifically, RAD focuses knowledge transfer on the components identified as redundant, considering architectural changes and specific weight initialization strategies. We experimentally demonstrate that self-distillation using RAD significantly surpasses the performance of the original base model on mathematical and coding tasks. Furthermore, RAD is also effective in standard knowledge distillation settings, achieving up to approximately 2x faster convergence compared to baseline methods. Notably, while a baseline model distilled from a Llama-3.1 70B teacher achieves scores of 46.17 on GSM8K and 22.75 on CRUX, RAD achieves significantly higher scores of 71.27 on GSM8K and 28.25 on CRUX, even when using a much smaller Llama-3.1 8B teacher. RAD offers a new pathway for efficient optimization and performance enhancement in the distillation of hybrid models.
nan
Article 631
Title@2025-05-28 (3): JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning
Title: JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning | JEDI: Latent End-to-End-Diffusion mildert die Asymmetrie von Agent-Human Performance im modellbasierten Verstärkungslernen | JEDI: 以模型为基础的加强学习中前端至终端扩散 消化剂-人类性能对称性 2505.19698v2 |
Authors: Jing Yu Lim, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, Dianbo Liu
Recent advances in model-based reinforcement learning (MBRL) have achieved super-human level performance on the Atari100k benchmark, driven by reinforcement learning agents trained on powerful diffusion world models. However, we identify that the current aggregates mask a major performance asymmetry: MBRL agents dramatically outperform humans in some tasks despite drastically underperforming in others, with the former inflating the aggregate metrics. This is especially pronounced in pixel-based agents trained with diffusion world models. In this work, we address the pronounced asymmetry observed in pixel-based agents as an initial attempt to reverse the worrying upward trend observed in them. We address the problematic aggregates by delineating all tasks as Agent-Optimal or Human-Optimal and advocate for equal importance on metrics from both sets. Next, we hypothesize this pronounced asymmetry is due to the lack of temporally-structured latent space trained with the World Model objective in pixel-based methods. Lastly, to address this issue, we propose Joint Embedding DIffusion (JEDI), a novel latent diffusion world model trained end-to-end with the self-consistency objective. JEDI outperforms SOTA models in human-optimal tasks while staying competitive across the Atari100k benchmark, and runs 3 times faster with 43% lower memory than the latest pixel-based diffusion baseline. Overall, our work rethinks what it truly means to cross human-level performance in Atari100k.
nan
Article 632
Title@2025-05-28 (3): Optimize Cardinality Estimation Model Pretraining by Simplifying the Training Datasets
Title: Optimize Cardinality Estimation Model Pretraining by Simplifying the Training Datasets | Kardinalitätsabschätzungsmodell optimieren Vorschulung durch Vereinfachung der Trainingsdatensätze | 通过简化培训数据集,优化红红心估计模型预培训模式 2502.14350v2 |
Authors: Boyang Fang
The cardinality estimation is a key aspect of query optimization research, and its performance has significantly improved with the integration of machine learning. To overcome the “cold start” problem or the lack of model transferability in learned cardinality estimators, some pre-training cardinality estimation models have been proposed that use learning across multiple datasets and corresponding workloads. These models typically train on a dataset created by uniformly sampling from many datasets, but this approach may not be optimal. By applying the Group Distributionally Robust Optimization (Group DRO) algorithm to training datasets, we find that some specific training datasets contribute more significantly to model performance than others. Based on this observation, we conduct extensive experiments to delve deeper into pre-training cardinality estimators. Our results show how the performance of these models can be influenced by the datasets and corresponding workloads. Finally, we introduce a simplified training dataset, which has been reduced to a fraction of the size of existing pretraining datasets. Sufficient experimental results demonstrate that the pre-trained cardinality estimator based on this simplified dataset can still achieve comparable performance to existing models in zero-shot setups.
nan
Article 633
Title@2025-05-28 (3): Revisiting Weak-to-Strong Generalization in Theory and Practice: Reverse KL vs. Forward KL
Title: Revisiting Weak-to-Strong Generalization in Theory and Practice: Reverse KL vs. Forward KL | Neuvisualisierung von Schwach-zu-Strong-Verallgemeinerung in Theorie und Praxis: Reverse KL vs. Forward KL | 重新审视理论和实践中弱到强的简单化:反向 KL vs. fward KL 2502.11107v3 |
Authors: Wei Yao, Wenkai Yang, Ziqiao Wang, Yankai Lin, Yong Liu
As large language models advance toward superhuman performance, ensuring their alignment with human values and abilities grows increasingly complex. Weak-to-strong generalization offers a promising approach by leveraging predictions from weaker models to guide stronger systems, but its effectiveness could be constrained by the inherent noise and inaccuracies in these weak predictions. To address this, we propose a theoretically grounded approach that replaces forward KL divergence-whose mass-covering behavior risks overfitting to imperfect weak signals-with reverse KL divergence. Reverse KL divergence’s zero-forcing effect prioritizes high-confidence predictions, effectively mitigating the influence of unreliable weak supervision. Theoretically, we extend existing bounds and derive tighter lower bounds for both forward and reverse KL divergence, establishing that reverse KL achieves at least comparable guarantees to forward KL. Notably, when a sufficiently pre-trained strong model is fine-tuned on the last linear layer, reverse KL guarantees that it outperforms its weak supervisor by the magnitude of their disagreement. Empirically, we demonstrate that reverse KL and reverse cross-entropy enable strong models to successfully outperform those trained with forward KL and standard cross-entropy across most settings, highlighting the practical advantages of these reverse losses.
nan
Article 634
Title@2025-05-28 (3): BiMi Sheets: Infosheets for bias mitigation methods
Title: BiMi Sheets: Infosheets for bias mitigation methods | BiMi Sheets: Infosheets für Methoden zur Biasminderung | BiMi 工作表:用于减少偏差方法的信息表 2505.22114v1 |
Authors: MaryBeth Defrance, Guillaume Bied, Maarten Buyl, Jefrey Lijffijt, Tijl De Bie
Over the past 15 years, hundreds of bias mitigation methods have been proposed in the pursuit of fairness in machine learning (ML). However, algorithmic biases are domain-, task-, and model-specific, leading to a `portability trap’: bias mitigation solutions in one context may not be appropriate in another. Thus, a myriad of design choices have to be made when creating a bias mitigation method, such as the formalization of fairness it pursues, and where and how it intervenes in the ML pipeline. This creates challenges in benchmarking and comparing the relative merits of different bias mitigation methods, and limits their uptake by practitioners. We propose BiMi Sheets as a portable, uniform guide to document the design choices of any bias mitigation method. This enables researchers and practitioners to quickly learn its main characteristics and to compare with their desiderata. Furthermore, the sheets’ structure allow for the creation of a structured database of bias mitigation methods. In order to foster the sheets’ adoption, we provide a platform for finding and creating BiMi Sheets at bimisheet.com.
nan
Article 635
Title@2025-05-28 (3): Understanding Model Ensemble in Transferable Adversarial Attack
Title: Understanding Model Ensemble in Transferable Adversarial Attack | Model-Ensemble in übertragbarem Widersacher-Angriff verstehen | 理解可转让反向攻击中可相互转让攻击的示范组合 2410.06851v3 |
Authors: Wei Yao, Zeliang Zhang, Huayi Tang, Yong Liu
Model ensemble adversarial attack has become a powerful method for generating transferable adversarial examples that can target even unknown models, but its theoretical foundation remains underexplored. To address this gap, we provide early theoretical insights that serve as a roadmap for advancing model ensemble adversarial attack. We first define transferability error to measure the error in adversarial transferability, alongside concepts of diversity and empirical model ensemble Rademacher complexity. We then decompose the transferability error into vulnerability, diversity, and a constant, which rigidly explains the origin of transferability error in model ensemble attack: the vulnerability of an adversarial example to ensemble components, and the diversity of ensemble components. Furthermore, we apply the latest mathematical tools in information theory to bound the transferability error using complexity and generalization terms, contributing to three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Finally, extensive experiments with 54 models validate our theoretical framework, representing a significant step forward in understanding transferable model ensemble adversarial attacks.
nan
Article 636
Title@2025-05-28 (3): The quest for the GRAph Level autoEncoder (GRALE)
Title: The quest for the GRAph Level autoEncoder (GRALE) | Die Suche nach dem GRAph Level AutoEncoder (GRALE) | 寻求GRALE(GRALE)的GRAP 高级自动编码器(GRALE) 2505.22109v1 |
Authors: Paul Krzakala, Gabriel Melo, Charlotte Laclau, Florence d’Alché-Buc, Rémi Flamary
Although graph-based learning has attracted a lot of attention, graph representation learning is still a challenging task whose resolution may impact key application fields such as chemistry or biology. To this end, we introduce GRALE, a novel graph autoencoder that encodes and decodes graphs of varying sizes into a shared embedding space. GRALE is trained using an Optimal Transport-inspired loss that compares the original and reconstructed graphs and leverages a differentiable node matching module, which is trained jointly with the encoder and decoder. The proposed attention-based architecture relies on Evoformer, the core component of AlphaFold, which we extend to support both graph encoding and decoding. We show, in numerical experiments on simulated and molecular data, that GRALE enables a highly general form of pre-training, applicable to a wide range of downstream tasks, from classification and regression to more complex tasks such as graph interpolation, editing, matching, and prediction.
nan
Article 637
Title@2025-05-28 (3): Inclusive, Differentially Private Federated Learning for Clinical Data
Title: Inclusive, Differentially Private Federated Learning for Clinical Data | Inklusives, differenziert privates Federated Learning für klinische Daten | 包容性、差异化私联校临床数据学习 2505.22108v1 |
Authors: Santhosh Parampottupadam, Melih Coşğun, Sarthak Pati, Maximilian Zenk, Saikat Roy, Dimitrios Bounias, Benjamin Hamm, Sinem Sav, Ralf Floca, Klaus Maier-Hein
Federated Learning (FL) offers a promising approach for training clinical AI models without centralizing sensitive patient data. However, its real-world adoption is hindered by challenges related to privacy, resource constraints, and compliance. Existing Differential Privacy (DP) approaches often apply uniform noise, which disproportionately degrades model performance, even among well-compliant institutions. In this work, we propose a novel compliance-aware FL framework that enhances DP by adaptively adjusting noise based on quantifiable client compliance scores. Additionally, we introduce a compliance scoring tool based on key healthcare and security standards to promote secure, inclusive, and equitable participation across diverse clinical settings. Extensive experiments on public datasets demonstrate that integrating under-resourced, less compliant clinics with highly regulated institutions yields accuracy improvements of up to 15% over traditional FL. This work advances FL by balancing privacy, compliance, and performance, making it a viable solution for real-world clinical workflows in global healthcare.
nan
Article 638
Title@2025-05-28 (3): Curse of High Dimensionality Issue in Transformer for Long-context Modeling
Title: Curse of High Dimensionality Issue in Transformer for Long-context Modeling | Fluch der Hochdimensionalitätsfrage im Transformer für die Langkontextmodellierung | 变异器中高多维度问题的诅咒,用于长期建模 2505.22107v1 |
Authors: Shuhai Zhang, Zeng You, Yaofo Chen, Zhiquan Wen, Qianyue Wang, Zhijie Qiu, Yuanqing Li, Mingkui Tan
Transformer-based large language models (LLMs) excel in natural language processing tasks by capturing long-range dependencies through self-attention mechanisms. However, long-context modeling faces significant computational inefficiencies due to \textit{redundant} attention computations: while attention weights are often \textit{sparse}, all tokens consume \textit{equal} computational resources. In this paper, we reformulate traditional probabilistic sequence modeling as a \textit{supervised learning task}, enabling the separation of relevant and irrelevant tokens and providing a clearer understanding of redundancy. Based on this reformulation, we theoretically analyze attention sparsity, revealing that only a few tokens significantly contribute to predictions. Building on this, we formulate attention optimization as a linear coding problem and propose a \textit{group coding strategy}, theoretically showing its ability to improve robustness against random noise and enhance learning efficiency. Motivated by this, we propose \textit{Dynamic Group Attention} (DGA), which leverages the group coding to explicitly reduce redundancy by aggregating less important tokens during attention computation. Empirical results show that our DGA significantly reduces computational costs while maintaining competitive performance.Code is available at https://github.com/bolixinyu/DynamicGroupAttention.
nan
Article 639
Title@2025-05-28 (3): Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models
Title: Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models | Devil ist in den Details: Dichte-Anleitung für Detail-Aware-Generation mit Flow-Modellen | 魔鬼在细节中: 使用流动模型生成详细软件的密度指导 2502.05807v2 |
Authors: Rafał Karczewski, Markus Heinonen, Vikas Garg
Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality images by mapping noise to a data distribution. However, recent findings suggest that image likelihood does not align with perceptual quality: high-likelihood samples tend to be smooth, while lower-likelihood ones are more detailed. Controlling sample density is thus crucial for balancing realism and detail. In this paper, we analyze an existing technique, Prior Guidance, which scales the latent code to influence image detail. We introduce score alignment, a condition that explains why this method works and show that it can be tractably checked for any continuous normalizing flow model. We then propose Density Guidance, a principled modification of the generative ODE that enables exact log-density control during sampling. Finally, we extend Density Guidance to stochastic sampling, ensuring precise log-density control while allowing controlled variation in structure or fine details. Our experiments demonstrate that these techniques provide fine-grained control over image detail without compromising sample quality. Code is available at https://github.com/Aalto-QuML/density-guidance.
nan
Article 640
Title@2025-05-28 (3): Visuospatial Cognitive Assistant
Title: Visuospatial Cognitive Assistant | Visuospatial Cognitive Assistant | 活性呼吸空间感知助理 2505.12312v3 |
Authors: Qi Feng
Video-based spatial cognition is vital for robotics and embodied AI but challenges current Vision-Language Models (VLMs). This paper makes two key contributions. First, we introduce ViCA (Visuospatial Cognitive Assistant)-322K, a diverse dataset of 322,003 QA pairs from real-world indoor videos (ARKitScenes, ScanNet, ScanNet++), offering supervision for 3D metadata-grounded queries and video-based complex reasoning. Second, we develop ViCA-7B, fine-tuned on ViCA-322K, which achieves new state-of-the-art on all eight VSI-Bench tasks, outperforming existing models, including larger ones (e.g., +26.1 on Absolute Distance). For interpretability, we present ViCA-Thinking-2.68K, a dataset with explicit reasoning chains, and fine-tune ViCA-7B to create ViCA-7B-Thinking, a model that articulates its spatial reasoning. Our work highlights the importance of targeted data and suggests paths for improved temporal-spatial modeling. We release all resources to foster research in robust visuospatial intelligence.
nan
Article 641
Title@2025-05-28 (3): Efficient Dynamic Shielding for Parametric Safety Specifications
Title: Efficient Dynamic Shielding for Parametric Safety Specifications | Effiziente dynamische Abschirmung für parametrische Sicherheitsspezifikationen | 用于参数安全规格的有效动态防护 2505.22104v1 |
Authors: Davide Corsi, Kaushik Mallik, Andoni Rodriguez, Cesar Sanchez
Shielding has emerged as a promising approach for ensuring safety of AI-controlled autonomous systems. The algorithmic goal is to compute a shield, which is a runtime safety enforcement tool that needs to monitor and intervene the AI controller’s actions if safety could be compromised otherwise. Traditional shields are designed statically for a specific safety requirement. Therefore, if the safety requirement changes at runtime due to changing operating conditions, the shield needs to be recomputed from scratch, causing delays that could be fatal. We introduce dynamic shields for parametric safety specifications, which are succinctly represented sets of all possible safety specifications that may be encountered at runtime. Our dynamic shields are statically designed for a given safety parameter set, and are able to dynamically adapt as the true safety specification (permissible by the parameters) is revealed at runtime. The main algorithmic novelty lies in the dynamic adaptation procedure, which is a simple and fast algorithm that utilizes known features of standard safety shields, like maximal permissiveness. We report experimental results for a robot navigation problem in unknown territories, where the safety specification evolves as new obstacles are discovered at runtime. In our experiments, the dynamic shields took a few minutes for their offline design, and took between a fraction of a second and a few seconds for online adaptation at each step, whereas the brute-force online recomputation approach was up to 5 times slower.
nan
Article 642
Title@2025-05-28 (3): Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts
Title: Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts | Auf dem Weg zur Visuospatialen Kognition durch hierarchische Fusion von visuellen Experten | 争取通过视觉专家的等级化融合实现纵向空间聚合 2505.12363v3 |
Authors: Qi Feng
While Multimodal Large Language Models (MLLMs) excel at general vision-language tasks, visuospatial cognition - reasoning about spatial layouts, relations, and dynamics - remains a significant challenge. Existing models often lack the necessary architectural components and specialized training data for fine-grained spatial understanding. We introduce ViCA2 (Visuospatial Cognitive Assistant 2), a novel MLLM designed to enhance spatial reasoning. ViCA2 features a dual vision encoder architecture integrating SigLIP for semantics and Hiera for spatial structure, coupled with a token ratio control mechanism for efficiency. We also developed ViCA-322K, a new large-scale dataset with over 322,000 spatially grounded question-answer pairs for targeted instruction tuning. On the challenging VSI-Bench benchmark, our ViCA2-7B model achieves a state-of-the-art average score of 56.8, significantly surpassing larger open-source models (e.g., LLaVA-NeXT-Video-72B, 40.9) and leading proprietary models (Gemini-1.5 Pro, 45.4). This demonstrates the effectiveness of our approach in achieving strong visuospatial intelligence with a compact model. We release ViCA2, its codebase, and the ViCA-322K dataset to facilitate further research.
nan
Article 643
Title@2025-05-28 (3): Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting
Title: Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting | Bedingtes Stören trifft auf Polynommodellierung: Ein flexibles entkoppeltes Framework für die Zeitreihenprognose | 满足多面性建模:时间序列预测灵活拆分框架 2410.13253v6 |
Authors: Jintao Zhang, Mingyue Cheng, Xiaoyu Tao, Zhiding Liu, Daoyu Wang
Time series forecasting models are becoming increasingly prevalent due to their critical role in decision-making across various domains. However, most existing approaches represent the coupled temporal patterns, often neglecting the distinction between their specific components. In particular, fluctuating patterns and smooth trends within time series exhibit distinct characteristics. In this work, to model complicated temporal patterns, we propose a Conditional Denoising Polynomial Modeling (CDPM) framework, where probabilistic diffusion models and deterministic linear models are trained end-to-end. Instead of modeling the coupled time series, CDPM decomposes it into trend and seasonal components for modeling them separately. To capture the fluctuating seasonal component, we employ a probabilistic diffusion model based on statistical properties from the historical window. For the smooth trend component, a module is proposed to enhance linear models by incorporating historical dependencies, thereby preserving underlying trends and mitigating noise distortion. Extensive experiments conducted on six benchmarks demonstrate the effectiveness of our framework, highlighting the potential of combining probabilistic and deterministic models.Our code is available at https://github.com/zjt-gpu/CDPM.
nan
Article 644
Title@2025-05-28 (3): On the Transferability and Discriminability of Repersentation Learning in Unsupervised Domain Adaptation
Title: On the Transferability and Discriminability of Repersentation Learning in Unsupervised Domain Adaptation | Über die Übertragbarkeit und Diskriminierbarkeit von Representation Learning in unüberwachter Domain-Anpassung | 关于无监督域适应中可转让性和可转让性 2505.22099v1 |
Authors: Wenwen Qiang, Ziyin Gu, Lingyu Si, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong
In this paper, we addressed the limitation of relying solely on distribution alignment and source-domain empirical risk minimization in Unsupervised Domain Adaptation (UDA). Our information-theoretic analysis showed that this standard adversarial-based framework neglects the discriminability of target-domain features, leading to suboptimal performance. To bridge this theoretical-practical gap, we defined “good representation learning” as guaranteeing both transferability and discriminability, and proved that an additional loss term targeting target-domain discriminability is necessary. Building on these insights, we proposed a novel adversarial-based UDA framework that explicitly integrates a domain alignment objective with a discriminability-enhancing constraint. Instantiated as Domain-Invariant Representation Learning with Global and Local Consistency (RLGLC), our method leverages Asymmetrically-Relaxed Wasserstein of Wasserstein Distance (AR-WWD) to address class imbalance and semantic dimension weighting, and employs a local consistency mechanism to preserve fine-grained target-domain discriminative information. Extensive experiments across multiple benchmark datasets demonstrate that RLGLC consistently surpasses state-of-the-art methods, confirming the value of our theoretical perspective and underscoring the necessity of enforcing both transferability and discriminability in adversarial-based UDA.
nan
Article 645
Title@2025-05-28 (3): Knowledge Base Construction for Knowledge-Augmented Text-to-SQL
Title: Knowledge Base Construction for Knowledge-Augmented Text-to-SQL | Knowledge Base Construction für wissensbasierte Text-zu-SQL | 知识强化文字到SQL知识基础建设 2505.22096v1 |
Authors: Jinheon Baek, Horst Samulowitz, Oktie Hassanzadeh, Dharmashankar Subramanian, Sola Shirai, Alfio Gliozzo, Debarun Bhattacharjya
Text-to-SQL aims to translate natural language queries into SQL statements, which is practical as it enables anyone to easily retrieve the desired information from databases. Recently, many existing approaches tackle this problem with Large Language Models (LLMs), leveraging their strong capability in understanding user queries and generating corresponding SQL code. Yet, the parametric knowledge in LLMs might be limited to covering all the diverse and domain-specific queries that require grounding in various database schemas, which makes generated SQLs less accurate oftentimes. To tackle this, we propose constructing the knowledge base for text-to-SQL, a foundational source of knowledge, from which we retrieve and generate the necessary knowledge for given queries. In particular, unlike existing approaches that either manually annotate knowledge or generate only a few pieces of knowledge for each query, our knowledge base is comprehensive, which is constructed based on a combination of all the available questions and their associated database schemas along with their relevant knowledge, and can be reused for unseen databases from different datasets and domains. We validate our approach on multiple text-to-SQL datasets, considering both the overlapping and non-overlapping database scenarios, where it outperforms relevant baselines substantially.
nan
Article 646
Title@2025-05-28 (3): Diffusion Models as Cartoonists: The Curious Case of High Density Regions
Title: Diffusion Models as Cartoonists: The Curious Case of High Density Regions | Diffusionsmodelle als Karikaturisten: Der seltsame Fall von Regionen mit hoher Dichte | 作为漫画家的传播模型:高密度地区令人好奇的案例 2411.01293v4 |
Authors: Rafał Karczewski, Markus Heinonen, Vikas Garg
We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost. Code is available at https://github.com/Aalto-QuML/high-density-diffusion.
nan
Article 647
Title@2025-05-28 (3): High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models
Title: High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models | Hohe Lautstärke 3D-Ultraschall-Rekonstruktion mit Diffusions-Modellen | 3D超声波重建,采用传播模型 2505.22090v1 |
Authors: Tristan S. W. Stevens, Oisín Nolan, Oudom Somphone, Jean-Luc Robert, Ruud J. G. van Sloun
Three-dimensional ultrasound enables real-time volumetric visualization of anatomical structures. Unlike traditional 2D ultrasound, 3D imaging reduces the reliance on precise probe orientation, potentially making ultrasound more accessible to clinicians with varying levels of experience and improving automated measurements and post-exam analysis. However, achieving both high volume rates and high image quality remains a significant challenge. While 3D diverging waves can provide high volume rates, they suffer from limited tissue harmonic generation and increased multipath effects, which degrade image quality. One compromise is to retain the focusing in elevation while leveraging unfocused diverging waves in the lateral direction to reduce the number of transmissions per elevation plane. Reaching the volume rates achieved by full 3D diverging waves, however, requires dramatically undersampling the number of elevation planes. Subsequently, to render the full volume, simple interpolation techniques are applied. This paper introduces a novel approach to 3D ultrasound reconstruction from a reduced set of elevation planes by employing diffusion models (DMs) to achieve increased spatial and temporal resolution. We compare both traditional and supervised deep learning-based interpolation methods on a 3D cardiac ultrasound dataset. Our results show that DM-based reconstruction consistently outperforms the baselines in image quality and downstream task performance. Additionally, we accelerate inference by leveraging the temporal consistency inherent to ultrasound sequences. Finally, we explore the robustness of the proposed method by exploiting the probabilistic nature of diffusion posterior sampling to quantify reconstruction uncertainty and demonstrate improved recall on out-of-distribution data with synthetic anomalies under strong subsampling.
nan
Article 648
Title@2025-05-28 (3): Base and Exponent Prediction in Mathematical Expressions using Multi-Output CNN
Title: Base and Exponent Prediction in Mathematical Expressions using Multi-Output CNN | Basis- und Exponentvorhersage in mathematischen Ausdrücken mit Multi-Output CNN | 利用有线电视新闻网的多种产出对数学表达式进行基础和指数预测 2407.14967v2 |
Authors: Md Laraib Salam, Akash S Balsaraf, Gaurav Gupta, Ashish Rajeshwar Kulkarni
The use of neural networks and deep learning techniques in image processing has significantly advanced the field, enabling highly accurate recognition results. However, achieving high recognition rates often necessitates complex network models, which can be challenging to train and require substantial computational resources. This research presents a simplified yet effective approach to predicting both the base and exponent from images of mathematical expressions using a multi-output Convolutional Neural Network (CNN). The model is trained on 10,900 synthetically generated images containing exponent expressions, incorporating random noise, font size variations, and blur intensity to simulate real-world conditions. The proposed CNN model demonstrates robust performance with efficient training time. The experimental results indicate that the model achieves high accuracy in predicting the base and exponent values, proving the efficacy of this approach in handling noisy and varied input images.
nan
Article 649
Title@2025-05-28 (3): Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations
Title: Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations | Domain-spezifisches Pruning von großen Mixture-of-Experts-Modellen mit nur wenigen Demonstrationen | 大型混合型专家模型的域特定情景,少发示范 2504.06792v2 |
Authors: Zican Dong, Han Peng, Peiyu Liu, Wayne Xin Zhao, Dong Wu, Feng Xiao, Zhifeng Wang
Mixture-of-Experts (MoE) models achieve a favorable trade-off between performance and inference efficiency by activating only a subset of experts. However, the memory overhead of storing all experts remains a major limitation, especially in large-scale MoE models such as DeepSeek-R1(671B). In this study, we investigate domain specialization and expert redundancy in large-scale MoE models and uncover a consistent behavior we term few-shot expert localization, with only a few in-domain demonstrations, the model consistently activates a sparse and stable subset of experts on tasks within the same domain. Building on this observation, we propose a simple yet effective pruning framework, EASY-EP, that leverages a few domain-specific demonstrations to identify and retain only the most relevant experts. EASY-EP comprises two key components: output-aware expert importance assessment and expert-level token contribution estimation. The former evaluates the importance of each expert for the current token by considering the gating scores and L2 norm of the outputs of activated experts, while the latter assesses the contribution of tokens based on representation similarities before and after routed experts. Experiments on DeepSeek-R1 and DeepSeek-V3-0324 show that our method can achieve comparable performances and $2.99\times$ throughput under the same memory budget with full model with only half the experts.
nan
Article 650
Title@2025-05-28 (3): PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning
Title: PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning | PADAM: Parallel gemittelter Adam reduziert Fehler bei stochastischer Optimierung im wissenschaftlichen maschinellen Lernen | PADAM: 平行平均 Adam 减少科学机器学习中随机优化的错误 2505.22085v1 |
Authors: Arnulf Jentzen, Julian Kranz, Adrian Riekert
Averaging techniques such as Ruppert–Polyak averaging and exponential movering averaging (EMA) are powerful approaches to accelerate optimization procedures of stochastic gradient descent (SGD) optimization methods such as the popular ADAM optimizer. However, depending on the specific optimization problem under consideration, the type and the parameters for the averaging need to be adjusted to achieve the smallest optimization error. In this work we propose an averaging approach, which we refer to as parallel averaged ADAM (PADAM), in which we compute parallely different averaged variants of ADAM and during the training process dynamically select the variant with the smallest optimization error. A central feature of this approach is that this procedure requires no more gradient evaluations than the usual ADAM optimizer as each of the averaged trajectories relies on the same underlying ADAM trajectory and thus on the same underlying gradients. We test the proposed PADAM optimizer in 13 stochastic optimization and deep neural network (DNN) learning problems and compare its performance with known optimizers from the literature such as standard SGD, momentum SGD, Adam with and without EMA, and ADAMW. In particular, we apply the compared optimizers to physics-informed neural network, deep Galerkin, deep backward stochastic differential equation and deep Kolmogorov approximations for boundary value partial differential equation problems from scientific machine learning, as well as to DNN approximations for optimal control and optimal stopping problems. In nearly all of the considered examples PADAM achieves, sometimes among others and sometimes exclusively, essentially the smallest optimization error. This work thus strongly suggest to consider PADAM for scientific machine learning problems and also motivates further research for adaptive averaging procedures within the training of DNNs.
nan
Article 651
Title@2025-05-28 (3): Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz
Title: Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz | Hyperbolisches rezidivierendes neuronales Netzwerk als erste Art von nicht-euklidischen neuronalen Quantenzustandsansatz | 超双曲经常性神经网络,作为第一种非欧洲的神经量子状态 ansatz 2505.22083v1 |
Authors: H. L. Dao
In this work, we introduce the first type of non-Euclidean neural quantum state (NQS) ansatz, in the form of the hyperbolic GRU (a variant of recurrent neural networks (RNNs)), to be used in the Variational Monte Carlo method of approximating the ground state wavefunction for quantum many-body systems. In particular, we examine the performances of NQS ansatzes constructed from both conventional or Euclidean RNN/GRU and from hyperbolic GRU in the prototypical settings of the one- and two-dimensional transverse field Ising models (TFIM) of up to 100 spins and the one-dimensional Heisenberg $J_1J_2$ and $J_1J_2J_3$ systems of up 50 spins. By virtue of the fact that, for all of the experiments performed in this work, hyperbolic GRU can yield performances comparable to or better than Euclidean RNNs, which have been extensively studied in these settings in the literature, our work is a proof-of-concept for the viability of hyperbolic GRU as the first type of non-Euclidean NQS ansatz for quantum many-body systems. Furthermore, in settings where the Hamiltonian displays a clear hierarchical interaction structure, such as the 1D Heisenberg $J_1J_2$ & $J_1J_2J_3$ systems with the 1st, 2nd and even 3rd nearest neighbor interactions, our results show that hyperbolic GRU definitively outperforms its Euclidean version in all instances. The fact that these results are reminiscent of the established ones from natural language processing where hyperbolic GRU almost always outperforms Euclidean RNNs when the training data exhibit a tree-like or hierarchical structure leads us to hypothesize that hyperbolic GRU NQS ansatz would likely outperform Euclidean RNN/GRU NQS ansatz in quantum spin systems that involve different degrees of nearest neighbor interactions. Finally, with this work, we hope to initiate future studies of other types of non-Euclidean NQS beyond hyperbolic GRU.
nan
Article 652
Title@2025-05-28 (3): Improved Bounds for Swap Multicalibration and Swap Omniprediction
Title: Improved Bounds for Swap Multicalibration and Swap Omniprediction | Verbesserte Bounds für Swap Multikalibrierung und Swap Omniprediction | 用于交换多校准和交换面宽度的改进宽度 2505.20885v2 |
Authors: Haipeng Luo, Spandan Senapati, Vatsal Sharan
In this paper, we consider the related problems of multicalibration – a multigroup fairness notion and omniprediction – a simultaneous loss minimization paradigm, both in the distributional and online settings. The recent work of Garg et al. (2024) raised the open problem of whether it is possible to efficiently achieve $O(\sqrt{T})$ $\ell_{2}$-multicalibration error against bounded linear functions. In this paper, we answer this question in a strongly affirmative sense. We propose an efficient algorithm that achieves $O(T^{\frac{1}{3}})$ $\ell_{2}$-swap multicalibration error (both in high probability and expectation). On propagating this bound onward, we obtain significantly improved rates for $\ell_{1}$-swap multicalibration and swap omniprediction for a loss class of convex Lipschitz functions. In particular, we show that our algorithm achieves $O(T^{\frac{2}{3}})$ $\ell_{1}$-swap multicalibration and swap omniprediction errors, thereby improving upon the previous best-known bound of $O(T^{\frac{7}{8}})$. As a consequence of our improved online results, we further obtain several improved sample complexity rates in the distributional setting. In particular, we establish a $O(\varepsilon ^ {-3})$ sample complexity of efficiently learning an $\varepsilon$-swap omnipredictor for the class of convex and Lipschitz functions, $O(\varepsilon ^{-2.5})$ sample complexity of efficiently learning an $\varepsilon$-swap agnostic learner for the squared loss, and $O(\varepsilon ^ {-5}), O(\varepsilon ^ {-2.5})$ sample complexities of learning $\ell_{1}, \ell_{2}$-swap multicalibrated predictors against linear functions, all of which significantly improve on the previous best-known bounds.
nan
Article 653
Title@2025-05-28 (3): LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Title: LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation | LongReD: Degradierung von Langtext-Großen Sprachmodellen durch Restaurationsdestillation | LongReD:通过恢复蒸馏减少长长长大语言模型的短期退化 2502.07365v3 |
Authors: Zican Dong, Junyi Li, Jinhao Jiang, Mingyu Xu, Wayne Xin Zhao, Bingning Wang, Weipeng Chen
Large language models (LLMs) have gained extended context windows through scaling positional encodings and lightweight continual pre-training. However, this often leads to degraded performance on short-text tasks, while the reasons for this degradation remain insufficiently explored. In this work, we identify two primary factors contributing to this issue: distribution drift in hidden states and attention scores, and catastrophic forgetting during continual pre-training. To address these challenges, we propose Long Context Pre-training with Restoration Distillation (LongReD), a novel approach designed to mitigate short-text performance degradation through minimizing the distribution discrepancy between the extended and original models. Besides training on long texts, LongReD distills the hidden state of selected layers from the original model on short texts. Additionally, LongReD also introduces a short-to-long distillation, aligning the output distribution on short texts with that on long texts by leveraging skipped positional indices. Experiments on common text benchmarks demonstrate that LongReD effectively preserves the model’s short-text performance while maintaining comparable or even better capacity to handle long texts than baselines. Our code is available at https://github.com/RUCAIBox/LongReD.
nan
Article 654
Title@2025-05-28 (3): A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension
Title: A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension | Hybrides Multi-Factor-Netzwerk mit dynamischer Sequenzmodellierung zur Frühwarnung von intraoperativer Hypotonie | 混合多要素网络,具有动态序列模型模型,以及早警告不合作水分的不合作状态; 2409.11064v3 |
Authors: Mingyue Cheng, Jintao Zhang, Zhiding Liu, Chunli Liu
Intraoperative hypotension (IOH) prediction using past physiological signals is crucial, as IOH may lead to inadequate organ perfusion and significantly elevate the risk of severe complications and mortality. However, current methods often rely on static modeling, overlooking the complex temporal dependencies and the inherently non-stationary nature of physiological signals. We propose a Hybrid Multi-Factor (HMF) network that formulates IOH prediction as a dynamic sequence forecasting task, explicitly capturing both temporal dependencies and physiological non-stationarity. We represent signal dynamics as multivariate time series and decompose them into trend and seasonal components, enabling separate modeling of long-term and periodic variations. Each component is encoded with a patch-based Transformer to balance computational efficiency and feature representation. To address distributional drift from evolving signals, we introduce a symmetric normalization mechanism. Experiments on both public and real-world clinical datasets show that HMF significantly outperforms competitive baselines. We hope HMF offers new insights into IOH prediction and ultimately promotes safer surgical care. Our code is available at https://github.com/Mingyue-Cheng/HMF.
nan
Article 655
Title@2025-05-28 (3): Can Test-time Computation Mitigate Memorization Bias in Neural Symbolic Regression?
Title: Can Test-time Computation Mitigate Memorization Bias in Neural Symbolic Regression? | Kann Testzeit-Computation Mitigate Memorization Bias in Neural Symbolische Regression? | 测试时计算在神经符号回落中是否可模拟记忆回弹? 2505.22081v1 |
Authors: Shun Sato, Issei Sato
Symbolic regression aims to discover mathematical equations that fit given numerical data. It has been applied in various fields of scientific research, such as producing human-readable expressions that explain physical phenomena. Recently, Neural symbolic regression (NSR) methods that involve Transformers pre-trained on large-scale synthetic datasets have gained attention. While these methods offer advantages such as short inference time, they suffer from low performance, particularly when the number of input variables is large. In this study, we hypothesized that this limitation stems from the memorization bias of Transformers in symbolic regression. We conducted a quantitative evaluation of this bias in Transformers using a synthetic dataset and found that Transformers rarely generate expressions not present in the training data. Additional theoretical analysis reveals that this bias arises from the Transformer’s inability to construct expressions compositionally while verifying their numerical validity. We finally examined if tailoring test-time strategies can lead to reduced memorization bias and better performance. We empirically demonstrate that providing additional information to the model at test time can significantly mitigate memorization bias. On the other hand, we also find that reducing memorization bias does not necessarily correlate with improved performance. These findings contribute to a deeper understanding of the limitations of NSR approaches and offer a foundation for designing more robust, generalizable symbolic regression methods. Code is available at https://github.com/Shun-0922/Mem-Bias-NSR .
nan
Article 656
Title@2025-05-28 (3): The Resurrection of the ReLU
Title: The Resurrection of the ReLU | Die Auferstehung der ReLU | 鲁鲁的复活, 2505.22074v1 |
Authors: Coşku Can Horuz, Geoffrey Kasenbacher, Saya Higuchi, Sebastian Kairat, Jendrik Stoltz, Moritz Pesl, Bernhard A. Moser, Christoph Linse, Thomas Martinetz, Sebastian Otte
Modeling sophisticated activation functions within deep learning architectures has evolved into a distinct research direction. Functions such as GELU, SELU, and SiLU offer smooth gradients and improved convergence properties, making them popular choices in state-of-the-art models. Despite this trend, the classical ReLU remains appealing due to its simplicity, inherent sparsity, and other advantageous topological characteristics. However, ReLU units are prone to becoming irreversibly inactive - a phenomenon known as the dying ReLU problem - which limits their overall effectiveness. In this work, we introduce surrogate gradient learning for ReLU (SUGAR) as a novel, plug-and-play regularizer for deep architectures. SUGAR preserves the standard ReLU function during the forward pass but replaces its derivative in the backward pass with a smooth surrogate that avoids zeroing out gradients. We demonstrate that SUGAR, when paired with a well-chosen surrogate function, substantially enhances generalization performance over convolutional network architectures such as VGG-16 and ResNet-18, providing sparser activations while effectively resurrecting dead ReLUs. Moreover, we show that even in modern architectures like Conv2NeXt and Swin Transformer - which typically employ GELU - substituting these with SUGAR yields competitive and even slightly superior performance. These findings challenge the prevailing notion that advanced activation functions are necessary for optimal performance. Instead, they suggest that the conventional ReLU, particularly with appropriate gradient handling, can serve as a strong, versatile revived classic across a broad range of deep learning vision models.
nan
Article 657
Title@2025-05-28 (3): PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Title: PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models | PRMBench: Ein feinkörniger und anspruchsvoller Benchmark für Prozess-Level-Reward-Modelle | PRMBBench:进程一级奖励模式的精细和质疑基准 2501.03124v4 |
Authors: Mingyang Song, Zhaochen Su, Xiaoye Qu, Jiawei Zhou, Yu Cheng
Process-level Reward Models (PRMs) are crucial for complex reasoning and decision-making tasks, where each intermediate step plays an important role in the reasoning process. Since language models are prone to various types of errors during the reasoning process, PRMs are required to possess nuanced capabilities for detecting various implicit error types in real-world scenarios. However, current benchmarks primarily focus on step correctness, failing to evaluate PRMs’ performance systematically. To address this gap, we introduce PRMBench, a process-level benchmark specifically designed to assess the fine-grained error detection capabilities of PRMs. PRMBench comprises 6,216 carefully designed problems and 83,456 step-level labels, evaluating models across multiple dimensions, including simplicity, soundness, and sensitivity. In our experiments on 15 models, spanning both open-source PRMs and closed-source large language models prompted as critic models, we uncover significant weaknesses in current PRMs. These findings underscore the challenges inherent in process-level evaluation and highlight key directions for future research. We hope PRMBench can be a robust bench for advancing research on PRM evaluation and development.
nan
Article 658
Title@2025-05-28 (3): Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations
Title: Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations | Message-Passing-GNNs fehlschlagen an ungefähren Sparse Dreiecks-Fabrizierungen | 投送信件 GNN 失败于近似偏差的三角三角因子化 2502.01397v2 |
Authors: Vladislav Trifonov, Ekaterina Muravleva, Ivan Oseledets
Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerating linear solvers. This position paper argues that message-passing GNNs are fundamentally incapable of approximating sparse triangular factorizations. We demonstrate that message-passing GNNs fundamentally fail to approximate sparse triangular factorizations for classes of matrices for which high-quality preconditioners exist but require non-local dependencies. To illustrate this, we construct a set of baselines using both synthetic matrices and real-world examples from the SuiteSparse collection. Across a range of GNN architectures, including Graph Attention Networks and Graph Transformers, we observe severe performance degradation compared to exact or K-optimal factorizations, with cosine similarity dropping below $0.6$ in key cases. Our theoretical and empirical results suggest that architectural innovations beyond message-passing are necessary for applying GNNs to scientific computing tasks such as matrix factorization. Experiments demonstrate that overcoming non-locality alone is insufficient. Tailored architectures are necessary to capture the required dependencies since even a completely non-local Graph Transformer fails to match the proposed baselines.
nan
Article 659
Title@2025-05-28 (3): Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head
Title: Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head | Dual-Head-Wissensdestillation: Optimierung der Logits-Nutzung mit Hilfe eines Hilfskopfes | 双头知识蒸馏:用辅助头加强登录的使用 2411.08937v2 |
Authors: Penghui Yang, Chen-Chen Zong, Sheng-Jun Huang, Lei Feng, Bo An
Traditional knowledge distillation focuses on aligning the student’s predicted probabilities with both ground-truth labels and the teacher’s predicted probabilities. However, the transition to predicted probabilities from logits would obscure certain indispensable information. To address this issue, it is intuitive to additionally introduce a logit-level loss function as a supplement to the widely used probability-level loss function, for exploiting the latent information of logits. Unfortunately, we empirically find that the amalgamation of the newly introduced logit-level loss and the previous probability-level loss will lead to performance degeneration, even trailing behind the performance of employing either loss in isolation. We attribute this phenomenon to the collapse of the classification head, which is verified by our theoretical analysis based on the neural collapse theory. Specifically, the gradients of the two loss functions exhibit contradictions in the linear classifier yet display no such conflict within the backbone. Drawing from the theoretical analysis, we propose a novel method called dual-head knowledge distillation, which partitions the linear classifier into two classification heads responsible for different losses, thereby preserving the beneficial effects of both losses on the backbone while eliminating adverse influences on the classification head. Extensive experiments validate that our method can effectively exploit the information inside the logits and achieve superior performance against state-of-the-art counterparts. Our code is available at: https://github.com/penghui-yang/DHKD.
nan
Article 660
Title@2025-05-28 (3): Learning Latent Graph Structures and their Uncertainty
Title: Learning Latent Graph Structures and their Uncertainty | Lernen Latent Graph Structures und ihre Unsicherheit | 学习后边图结构及其不确定性 2405.19933v2 |
Authors: Alessandro Manenti, Daniele Zambon, Cesare Alippi
Graph neural networks use relational information as an inductive bias to enhance prediction performance. Not rarely, task-relevant relations are unknown and graph structure learning approaches have been proposed to learn them from data. Given their latent nature, no graph observations are available to provide a direct training signal to the learnable relations. Therefore, graph topologies are typically learned on the prediction task alongside the other graph neural network parameters. In this paper, we demonstrate that minimizing point-prediction losses does not guarantee proper learning of the latent relational information and its associated uncertainty. Conversely, we prove that suitable loss functions on the stochastic model outputs simultaneously grant solving two tasks: (i) learning the unknown distribution of the latent graph and (ii) achieving optimal predictions of the target variable. Finally, we propose a sampling-based method that solves this joint learning task. Empirical results validate our theoretical claims and demonstrate the effectiveness of the proposed approach.
nan
Article 661
Title@2025-05-28 (3): Towards Resilient and Sustainable Global Industrial Systems: An Evolutionary-Based Approach
Title: Towards Resilient and Sustainable Global Industrial Systems: An Evolutionary-Based Approach | Auf dem Weg zu stabilen und nachhaltigen globalen Industriesystemen: ein evolutionärer Ansatz | 走向具有复原力和可持续的全球工业系统:基于演变的方法 2503.11688v2 |
Authors: Václav Jirkovský, Jiří Kubalík, Petr Kadera, Arnd Schirrmann, Andreas Mitschke, Andreas Zindel
This paper presents a new complex optimization problem in the field of automatic design of advanced industrial systems and proposes a hybrid optimization approach to solve the problem. The problem is multi-objective as it aims at finding solutions that minimize CO2 emissions, transportation time, and costs. The optimization approach combines an evolutionary algorithm and classical mathematical programming to design resilient and sustainable global manufacturing networks. Further, it makes use of the OWL ontology for data consistency and constraint management. The experimental validation demonstrates the effectiveness of the approach in both single and double sourcing scenarios. The proposed methodology, in general, can be applied to any industry case with complex manufacturing and supply chain challenges.
nan
Article 662
Title@2025-05-28 (3): Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact
Title: Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact | Quanten-Kernel-Lernen für kleine Datensätze Modellierung in Halbleiterfertigung: Anwendung auf Ohm-Kontakt | 半导体制造中小型数据集建模的量子核心学习: Ohmic 接触的应用 2409.10803v3 |
Authors: Zeheng Wang, Fangzhou Wang, Liang Li, Zirui Wang, Timothy van der Laan, Ross C. C. Leon, Jing-Kai Huang, Muhammad Usman
Modeling complex semiconductor fabrication processes such as Ohmic contact formation remains challenging due to high-dimensional parameter spaces and limited experimental data. While classical machine learning (CML) approaches have been successful in many domains, their performance degrades in small-sample, nonlinear scenarios. In this work, we investigate quantum machine learning (QML) as an alternative, exploiting quantum kernels to capture intricate correlations from compact datasets. Using only 159 experimental GaN HEMT samples, we develop a quantum kernel-aligned regressor (QKAR) combining a shallow Pauli-Z feature map with a trainable quantum kernel alignment (QKA) layer. All models, including seven baseline CML regressors, are evaluated under a unified PCA-based preprocessing pipeline to ensure a fair comparison. QKAR consistently outperforms classical baselines across multiple metrics (MAE, MSE, RMSE), achieving a mean absolute error of 0.338 Omega mm when validated on experimental data. We further assess noise robustness and generalization through cross-validation and new device fabrication. These findings suggest that carefully constructed QML models could provide predictive advantages in data-constrained semiconductor modeling, offering a foundation for practical deployment on near-term quantum hardware. While challenges remain for both QML and CML, this study demonstrates QML’s potential as a complementary approach in complex process modeling tasks.
nan
Article 663
Title@2025-05-28 (3): A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
Title: A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment | Eine umfassende Umfrage in LLM(-Agent) Full Stack Sicherheit: Daten, Schulung und Bereitstellung | 用LLLM(-代理)全堆安全:数据、培训和部署进行的全面调查 2504.15585v3 |
Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, Xiaofeng Wang, Dacheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu
The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concern, not only for researchers and corporations but also for every nation. Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e.g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire “lifechain” of LLMs. To address this gap, this paper introduces, for the first time, the concept of “full-stack” safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and eventual commercialization. Compared to the off-the-shelf LLM safety surveys, our work demonstrates several distinctive advantages: (I) Comprehensive Perspective. We define the complete LLM lifecycle as encompassing data preparation, pre-training, post-training, deployment and final commercialization. To our knowledge, this represents the first safety survey to encompass the entire lifecycle of LLMs. (II) Extensive Literature Support. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues within a more holistic understanding. (III) Unique Insights. Through systematic literature analysis, we have developed reliable roadmaps and perspectives for each chapter. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems. These insights provide valuable guidance for researchers pursuing future work in this field.
nan
Article 664
Title@2025-05-28 (3): ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Title: ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation | ORIGEN: Zero-Shot 3D-Orientierungsgrundierung in Text-zu-Bild-Generierung | 将零热3D定向定位作为产生文字到图像的基础 2503.22194v2 |
Authors: Yunhong Min, Daehyeon Choi, Kyeongmin Yeo, Jihyun Lee, Minhyuk Sung
We introduce ORIGEN, the first zero-shot method for 3D orientation grounding in text-to-image generation across multiple objects and diverse categories. While previous work on spatial grounding in image generation has mainly focused on 2D positioning, it lacks control over 3D orientation. To address this, we propose a reward-guided sampling approach using a pretrained discriminative model for 3D orientation estimation and a one-step text-to-image generative flow model. While gradient-ascent-based optimization is a natural choice for reward-based guidance, it struggles to maintain image realism. Instead, we adopt a sampling-based approach using Langevin dynamics, which extends gradient ascent by simply injecting random noise–requiring just a single additional line of code. Additionally, we introduce adaptive time rescaling based on the reward function to accelerate convergence. Our experiments show that ORIGEN outperforms both training-based and test-time guidance methods across quantitative metrics and user studies.
nan
Article 665
Title@2025-05-28 (3): Reinforced Reasoning for Embodied Planning
Title: Reinforced Reasoning for Embodied Planning | Verstärkte Begründung für die körperbetonte Planung | 强化规划强化理由 2505.22050v1 |
Authors: Di Wu, Jiaxin Fan, Junzhe Zang, Guanbo Wang, Wei Yin, Wenhao Li, Bo Jin
Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals. While recent vision-language models (VLMs) excel at static perception tasks, they struggle with the temporal reasoning, spatial understanding, and commonsense grounding needed for planning in interactive environments. In this work, we introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning. We first distill a high-quality dataset from a powerful closed-source model and perform supervised fine-tuning (SFT) to equip the model with structured decision-making priors. We then design a rule-based reward function tailored to multi-step action quality and optimize the policy via Generalized Reinforced Preference Optimization (GRPO). Our approach is evaluated on Embench, a recent benchmark for interactive embodied tasks, covering both in-domain and out-of-domain scenarios. Experimental results show that our method significantly outperforms models of similar or larger scale, including GPT-4o-mini and 70B+ open-source baselines, and exhibits strong generalization to unseen environments. This work highlights the potential of reinforcement-driven reasoning to advance long-horizon planning in embodied AI.
nan
Article 666
Title@2025-05-28 (3): Differentiable Generalized Sliced Wasserstein Plans
Title: Differentiable Generalized Sliced Wasserstein Plans | Unterschiedliche generalisierte Wasserstein-Pläne | 刀切瓦西斯坦计划 2505.22049v1 |
Authors: Laetitia Chapel, Romain Tavenard, Samuel Vaiter
Optimal Transport (OT) has attracted significant interest in the machine learning community, not only for its ability to define meaningful distances between probability distributions – such as the Wasserstein distance – but also for its formulation of OT plans. Its computational complexity remains a bottleneck, though, and slicing techniques have been developed to scale OT to large datasets. Recently, a novel slicing scheme, dubbed min-SWGG, lifts a single one-dimensional plan back to the original multidimensional space, finally selecting the slice that yields the lowest Wasserstein distance as an approximation of the full OT plan. Despite its computational and theoretical advantages, min-SWGG inherits typical limitations of slicing methods: (i) the number of required slices grows exponentially with the data dimension, and (ii) it is constrained to linear projections. Here, we reformulate min-SWGG as a bilevel optimization problem and propose a differentiable approximation scheme to efficiently identify the optimal slice, even in high-dimensional settings. We furthermore define its generalized extension for accommodating to data living on manifolds. Finally, we demonstrate the practical value of our approach in various applications, including gradient flows on manifolds and high-dimensional spaces, as well as a novel sliced OT-based conditional flow matching for image generation – where fast computation of transport plans is essential.
nan
Article 667
Title@2025-05-28 (3): Learning Curves of Stochastic Gradient Descent in Kernel Regression
Title: Learning Curves of Stochastic Gradient Descent in Kernel Regression | Lernkurven des stochastischen Gradienten Abstiegs in Kernel-Regression | 内核倒退中尾部渐变源的学习曲线 2505.22048v1 |
Authors: Haihan Zhang, Weicheng Lin, Yuanshi Liu, Cong Fang
This paper considers a canonical problem in kernel regression: how good are the model performances when it is trained by the popular online first-order algorithms, compared to the offline ones, such as ridge and ridgeless regression? In this paper, we analyze the foundational single-pass Stochastic Gradient Descent (SGD) in kernel regression under source condition where the optimal predictor can even not belong to the RKHS, i.e. the model is misspecified. Specifically, we focus on the inner product kernel over the sphere and characterize the exact orders of the excess risk curves under different scales of sample sizes $n$ concerning the input dimension $d$. Surprisingly, we show that SGD achieves min-max optimal rates up to constants among all the scales, without suffering the saturation, a prevalent phenomenon observed in (ridge) regression, except when the model is highly misspecified and the learning is in a final stage where $n\gg d^{\gamma}$ with any constant $\gamma >0$. The main reason for SGD to overcome the curse of saturation is the exponentially decaying step size schedule, a common practice in deep neural network training. As a byproduct, we provide the \emph{first} provable advantage of the scheme over the iterative averaging method in the common setting.
nan
Article 668
Title@2025-05-28 (3): Learning to Steer Learners in Games
Title: Learning to Steer Learners in Games | Lernen zu Steer Learners in Spielen | 在运动会中学习向运动会中的稳坐学生学习 2502.20770v2 |
Authors: Yizhou Zhang, Yi-An Ma, Eric Mazumdar
We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner’s objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner’s payoff structure. We demonstrate the effectiveness of this approach if the learner’s algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and another where the learner uses stochastic mirror ascent with known regularizer and step sizes.
nan
Article 669
Title@2025-05-28 (3): PUATE: Efficient Average Treatment Effect Estimation from Treated (Positive) and Unlabeled Units
Title: PUATE: Efficient Average Treatment Effect Estimation from Treated (Positive) and Unlabeled Units | PUATE: Effiziente Schätzung des durchschnittlichen Behandlungseffekts aus behandelten (Positiven) und nicht gekennzeichneten Einheiten | PUATE: 高效平均处理效果估算处理(积极)单位和无标签单位的高效平均处理效果 2501.19345v2 |
Authors: Masahiro Kato, Fumiaki Kozai, Ryo Inokuchi
The estimation of average treatment effects (ATEs), defined as the difference in expected outcomes between treatment and control groups, is a central topic in causal inference. This study develops semiparametric efficient estimators for ATE in a setting where only a treatment group and an unlabeled group, consisting of units whose treatment status is unknown, are observed. This scenario constitutes a variant of learning from positive and unlabeled data (PU learning) and can be viewed as a special case of ATE estimation with missing data. For this setting, we derive the semiparametric efficiency bounds, which characterize the lowest achievable asymptotic variance for regular estimators. We then construct semiparametric efficient ATE estimators that attain these bounds. Our results contribute to the literature on causal inference with missing data and weakly supervised learning.
nan
Article 670
Title@2025-05-28 (3): MultiScale Contextual Bandits for Long Term Objectives
Title: MultiScale Contextual Bandits for Long Term Objectives | MultiScale Contextual Bandits für langfristige Ziele | 长期目标多层次背景影响 2503.17674v2 |
Authors: Richa Rastogi, Yuta Saito, Thorsten Joachims
The feedback that AI systems (e.g., recommender systems, chatbots) collect from user interactions is a crucial source of training data. While short-term feedback (e.g., clicks, engagement) is widely used for training, there is ample evidence that optimizing short-term feedback does not necessarily achieve the desired long-term objectives. Unfortunately, directly optimizing for long-term objectives is challenging, and we identify the disconnect in the timescales of short-term interventions (e.g., rankings) and the long-term feedback (e.g., user retention) as one of the key obstacles. To overcome this disconnect, we introduce the framework of MultiScale Policy Learning to contextually reconcile that AI systems need to act and optimize feedback at multiple interdependent timescales. Following a PAC-Bayes motivation, we show how the lower timescales with more plentiful data can provide a data-dependent hierarchical prior for faster learning at higher scales, where data is more scarce. As a result, the policies at all levels effectively optimize for the long-term. We instantiate the framework with MultiScale Off-Policy Bandit Learning (MSBL) and demonstrate its effectiveness on three tasks relating to recommender and conversational systems.
nan
Article 671
Title@2025-05-28 (3): Latent Mamba Operator for Partial Differential Equations
Title: Latent Mamba Operator for Partial Differential Equations | Latent Mamba Operator für partielle Differentialgleichungen | 部分差异方程的 中端 Mamba 运算符 2505.19105v2 |
Authors: Karn Tiwari, Niladri Dutta, N M Anoop Krishnan, Prathosh A P
Neural operators have emerged as powerful data-driven frameworks for solving Partial Differential Equations (PDEs), offering significant speedups over numerical methods. However, existing neural operators struggle with scalability in high-dimensional spaces, incur high computational costs, and face challenges in capturing continuous and long-range dependencies in PDE dynamics. To address these limitations, we introduce the Latent Mamba Operator (LaMO), which integrates the efficiency of state-space models (SSMs) in latent space with the expressive power of kernel integral formulations in neural operators. We also establish a theoretical connection between state-space models (SSMs) and the kernel integral of neural operators. Extensive experiments across diverse PDE benchmarks on regular grids, structured meshes, and point clouds covering solid and fluid physics datasets, LaMOs achieve consistent state-of-the-art (SOTA) performance, with a 32.3% improvement over existing baselines in solution operator approximation, highlighting its efficacy in modeling complex PDE solutions.
nan
Article 672
Title@2025-05-28 (3): Estimating the Effects of Sample Training Orders for Large Language Models without Retraining
Title: Estimating the Effects of Sample Training Orders for Large Language Models without Retraining | Bewertung der Auswirkungen von Mustertrainingsaufträgen für große Sprachmodelle ohne Umschulung | 估计无再培训的大语言模式抽样培训令的影响 2505.22042v1 |
Authors: Hao Yang, Haoxuan Li, Mengyue Yang, Xu Chen, Mingming Gong
The order of training samples plays a crucial role in large language models (LLMs), significantly impacting both their external performance and internal learning dynamics. Traditional methods for investigating this effect generally require retraining the model with various sample orders, which is computationally infeasible for LLMs. In this work, we improve traditional methods by designing a retraining-free framework. By approximating Adam optimizer updates with first- and second-order Taylor expansions and utilizing random projection methods to store intermediate checkpoints, our framework can efficiently estimate model parameters for arbitrary training sample orders. Next, we apply our framework to two downstream research problems: (1) Training curriculum design for LLMs – we base our retraining-free framework to propose a novel curriculum learning strategy that augments curriculum proposals with estimated model performances, enabling more informed sample scheduling. (2) LLMs’ memorization and generalization effect analysis – we use our retraining-free framework to estimate how the positions of training samples influence LLMs’ capacity for memorization and generalization. We conduct extensive experiments to validate the effectiveness of our retraining-free framework in reproducing the true model performances, and further demonstrate its potential in optimizing LLM training curricula and analyzing the memorization and generalization effects of LLMs.
nan
Article 673
Title@2025-05-28 (3): Detecting Undesired Process Behavior by Means of Retrieval Augmented Generation
Title: Detecting Undesired Process Behavior by Means of Retrieval Augmented Generation | Erkennung von unerwünschtem Prozessverhalten mittels retrievaler Augmented Generation | 通过回收增加一代的手段检测不想要的流程行为 2505.22041v1 |
Authors: Michael Grohs, Adrian Rebmann, Jana-Rebecca Rehse
Conformance checking techniques detect undesired process behavior by comparing process executions that are recorded in event logs to desired behavior that is captured in a dedicated process model. If such models are not available, conformance checking techniques are not applicable, but organizations might still be interested in detecting undesired behavior in their processes. To enable this, existing approaches use Large Language Models (LLMs), assuming that they can learn to distinguish desired from undesired behavior through fine-tuning. However, fine-tuning is highly resource-intensive and the fine-tuned LLMs often do not generalize well. To address these limitations, we propose an approach that requires neither a dedicated process model nor resource-intensive fine-tuning to detect undesired process behavior. Instead, we use Retrieval Augmented Generation (RAG) to provide an LLM with direct access to a knowledge base that contains both desired and undesired process behavior from other processes, assuming that the LLM can transfer this knowledge to the process at hand. Our evaluation shows that our approach outperforms fine-tuned LLMs in detecting undesired behavior, demonstrating that RAG is a viable alternative to resource-intensive fine-tuning, particularly when enriched with relevant context from the event log, such as frequent traces and activities.
nan
Article 674
Title@2025-05-28 (3): Revisiting In-Context Learning with Long Context Language Models
Title: Revisiting In-Context Learning with Long Context Language Models | Das In-Context-Lernen mit langen Kontext-Sprachmodellen | 以长方语言模式重新研究内文学习 2412.16926v3 |
Authors: Jinheon Baek, Sun Jae Lee, Prakhar Gupta, Geunseob Oh, Siddharth Dalmia, Prateek Kolhar
In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs) has significantly increased the number of examples that can be included in context, raising an important question of whether ICL performance in a many-shot regime is still sensitive to the method of sample selection. To answer this, we revisit these approaches in the context of LCLMs through extensive experiments on 18 datasets spanning 4 tasks. Surprisingly, we observe that sophisticated example selection techniques do not yield significant improvements over a simple random sample selection method. Instead, we discover that the advent of LCLMs has fundamentally shifted the challenge of ICL from that of selecting the most effective examples to that of collecting sufficient examples to fill the context window. Specifically, in certain datasets, including all available examples does not fully utilize the context window; however, by augmenting the examples in context with a simple data augmentation approach, we substantially improve ICL performance by 5%.
nan
Article 675
Title@2025-05-28 (3): Weakly-Supervised Contrastive Learning for Imprecise Class Labels
Title: Weakly-Supervised Contrastive Learning for Imprecise Class Labels | Schwachüberwachtes Kontrastives Lernen für ungenaue Klassen-Etiketten | 简便类标签的微弱监督反竞争学习 2505.22028v1 |
Authors: Zi-Hao Zhou, Jun-Jie Wang, Tong Wei, Min-Ling Zhang
Contrastive learning has achieved remarkable success in learning effective representations, with supervised contrastive learning often outperforming self-supervised approaches. However, in real-world scenarios, data annotations are often ambiguous or inaccurate, meaning that class labels may not reliably indicate whether two examples belong to the same class. This limitation restricts the applicability of supervised contrastive learning. To address this challenge, we introduce the concept of ``continuous semantic similarity’’ to define positive and negative pairs. Instead of directly relying on imprecise class labels, we measure the semantic similarity between example pairs, which quantifies how closely they belong to the same category by iteratively refining weak supervisory signals. Based on this concept, we propose a graph-theoretic framework for weakly-supervised contrastive learning, where semantic similarity serves as the graph weights. Our framework is highly versatile and can be applied to many weakly-supervised learning scenarios. We demonstrate its effectiveness through experiments in two common settings, i.e., noisy label and partial label learning, where existing methods can be easily integrated to significantly improve performance. Theoretically, we establish an error bound for our approach, showing that it can approximate supervised contrastive learning under mild conditions. The implementation code is available at https://github.com/Speechless-10308/WSC.
nan
Article 676
Title@2025-05-28 (3): Evaluation of the impact of expert knowledge: How decision support scores impact the effectiveness of automatic knowledge-driven feature engineering (aKDFE)
Title: Evaluation of the impact of expert knowledge: How decision support scores impact the effectiveness of automatic knowledge-driven feature engineering (aKDFE) | Bewertung der Auswirkungen von Expertenwissen: Wie die Entscheidungsunterstützung die Wirksamkeit des automatischen wissensbasierten Feature Engineerings beeinflusst (aKDFE) | 评价专家知识的影响:决策支持的评分如何影响知识驱动的自动知识特性工程(KDFE)的有效性 2504.05928v2 |
Authors: Olof Björneld, Tora Hammar, Daniel Nilsson, Alisa Lincke, Welf Löwe
Adverse Drug Events (ADEs), harmful medication effects, pose significant healthcare challenges, impacting patient safety and costs. This study evaluates automatic Knowledge-Driven Feature Engineering (aKDFE) for improved ADE prediction from Electronic Health Record (EHR) data, comparing it with automated event-based Knowledge Discovery in Databases (KDD). We investigated how incorporating domain-specific ADE risk scores for prolonged heart QT interval, extracted from the Janusmed Riskprofile (Janusmed) Clinical Decision Support System (CDSS), affects prediction performance using EHR data and medication handling events. Results indicate that, while aKDFE step 1 (event-based feature generation) alone did not significantly improve ADE prediction performance, aKDFE step 2 (patient-centric transformation) enhances the prediction performance. High Area Under the Receiver Operating Characteristic curve (AUROC) values suggest strong feature correlations to the outcome, aligning with the predictive power of patients’ prior healthcare history for ADEs. Statistical analysis did not confirm that incorporating the Janusmed information (i) risk scores and (ii) medication route of administration into the model’s feature set enhanced predictive performance. However, the patient-centric transformation applied by aKDFE proved to be a highly effective feature engineering approach. Limitations include a single-project focus, potential bias from machine learning pipeline methods, and reliance on AUROC. In conclusion, aKDFE, particularly with patient-centric transformation, improves ADE prediction from EHR data. Future work will explore attention-based models, event feature sequences, and automatic methods for incorporating domain knowledge into the aKDFE framework.
nan
Article 677
Title@2025-05-28 (3): Efficient Online Reinforcement Learning for Diffusion Policy
Title: Efficient Online Reinforcement Learning for Diffusion Policy | Effizientes Online-Verstärkungslernen für die Diffusionspolitik | 高效在线强化学习促进传播政策 2502.00361v3 |
Authors: Haitong Ma, Tianyi Chen, Kai Wang, Na Li, Bo Dai
Diffusion policies have achieved superior performance in imitation learning and offline reinforcement learning (RL) due to their rich expressiveness. However, the conventional diffusion training procedure requires samples from target distribution, which is impossible in online RL since we cannot sample from the optimal policy. Backpropagating policy gradient through the diffusion process incurs huge computational costs and instability, thus being expensive and not scalable. To enable efficient training of diffusion policies in online RL, we generalize the conventional denoising score matching by reweighting the loss function. The resulting Reweighted Score Matching (RSM) preserves the optimal solution and low computational cost of denoising score matching, while eliminating the need to sample from the target distribution and allowing learning to optimize value functions. We introduce two tractable reweighted loss functions to solve two commonly used policy optimization problems, policy mirror descent and max-entropy policy, resulting in two practical algorithms named Diffusion Policy Mirror Descent (DPMD) and Soft Diffusion Actor-Critic (SDAC). We conducted comprehensive comparisons on MuJoCo benchmarks. The empirical results show that the proposed algorithms outperform recent diffusion-policy online RLs on most tasks, and the DPMD improves more than 120% over soft actor-critic on Humanoid and Ant.
nan
Article 678
Title@2025-05-28 (3): Model Diffusion for Certifiable Few-shot Transfer Learning
Title: Model Diffusion for Certifiable Few-shot Transfer Learning | Modell-Diffusion für zertifizierbares Transfer-Lernen mit wenigen Fotos | 可核证的 “ 几光 “ 转让学习模型传播 2502.06970v2 |
Authors: Fady Rezk, Royson Lee, Henry Gouk, Timothy Hospedales, Minyoung Kim
In contemporary deep learning, a prevalent and effective workflow for solving low-data problems is adapting powerful pre-trained foundation models (FMs) to new tasks via parameter-efficient fine-tuning (PEFT). However, while empirically effective, the resulting solutions lack generalisation guarantees to certify their accuracy - which may be required for ethical or legal reasons prior to deployment in high-importance applications. In this paper we develop a novel transfer learning approach that is designed to facilitate non-vacuous learning theoretic generalisation guarantees for downstream tasks, even in the low-shot regime. Specifically, we first use upstream tasks to train a distribution over PEFT parameters. We then learn the downstream task by a sample-and-evaluate procedure – sampling plausible PEFTs from the trained diffusion model and selecting the one with the highest likelihood on the downstream data. Crucially, this confines our model hypothesis to a finite set of PEFT samples. In contrast to the typical continuous hypothesis spaces of neural network weights, this facilitates tighter risk certificates. We instantiate our bound and show non-trivial generalization guarantees compared to existing learning approaches which lead to vacuous bounds in the low-shot regime.
nan
Article 679
Title@2025-05-28 (3): Learning in Compact Spaces with Approximately Normalized Transformers
Title: Learning in Compact Spaces with Approximately Normalized Transformers | Lernen in kompakten Räumen mit etwa normalisierten Transformatoren | 学习与大约正常化变异器的紧凑空间的学习 2505.22014v1 |
Authors: Jörg K. H. Franke, Urs Spiegelhalter, Marianna Nezhurina, Jenia Jitsev, Frank Hutter, Michael Hefenbrock
In deep learning, regularization and normalization are common solutions for challenges such as overfitting, numerical instabilities, and the increasing variance in the residual stream. An alternative approach is to force all parameters and representations to lie on a hypersphere. This removes the need for regularization and increases convergence speed, but comes with additional costs. In this work, we propose a more holistic but approximate normalization (anTransformer). Our approach constrains the norm of parameters and normalizes all representations via scalar multiplications motivated by the tight concentration of the norms of high-dimensional random vectors. When applied to GPT training, we observe a 40% faster convergence compared to models with QK normalization, with less than 3% additional runtime. Deriving scaling laws for anGPT, we found our method enables training with larger batch sizes and fewer hyperparameters, while matching the favorable scaling characteristics of classic GPT architectures.
nan
Article 680
Title@2025-05-28 (3): SageAttention2++: A More Efficient Implementation of SageAttention2
Title: SageAttention2++: A More Efficient Implementation of SageAttention2 | SageAttention2++: Effizientere Umsetzung von SageAttention2 | SageAttention2++:更有效地实施SageAttention2 2505.21136v2 |
Authors: Jintao Zhang, Xiaoming Xu, Jia Wei, Haofeng Huang, Pengle Zhang, Chendong Xiang, Jun Zhu, Jianfei Chen
The efficiency of attention is critical because its time complexity grows quadratically with sequence length. SageAttention2 addresses this by utilizing quantization to accelerate matrix multiplications (Matmul) in attention. To further accelerate SageAttention2, we propose to utilize the faster instruction of FP8 Matmul accumulated in FP16. The instruction is 2x faster than the FP8 Matmul used in SageAttention2. Our experiments show that SageAttention2++ achieves a 3.9x speedup over FlashAttention while maintaining the same attention accuracy as SageAttention2. This means SageAttention2++ effectively accelerates various models, including those for language, image, and video generation, with negligible end-to-end metrics loss. The code will be available at https://github.com/thu-ml/SageAttention.
nan
Article 681
Title@2025-05-28 (3): A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?
Title: A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs? | Eine umfassende Real-World Bewertung von Audio Watermarking Algorithmen: Werden sie überleben Neural Codecs? | 对音频水标定法的全面现实世界评估:它们能否生存神经规范? 2505.19663v2 |
Authors: Yigitcan Özer, Woosung Choi, Joan Serrà, Mayank Kumar Singh, Wei-Hsiang Liao, Yuki Mitsufuji
We introduce the Robust Audio Watermarking Benchmark (RAW-Bench), a benchmark for evaluating deep learning-based audio watermarking methods with standardized and systematic comparisons. To simulate real-world usage, we introduce a comprehensive audio attack pipeline with various distortions such as compression, background noise, and reverberation, along with a diverse test dataset including speech, environmental sounds, and music recordings. Evaluating four existing watermarking methods on RAW-bench reveals two main insights: (i) neural compression techniques pose the most significant challenge, even when algorithms are trained with such compressions; and (ii) training with audio attacks generally improves robustness, although it is insufficient in some cases. Furthermore, we find that specific distortions, such as polarity inversion, time stretching, or reverb, seriously affect certain methods. The evaluation framework is accessible at github.com/SonyResearch/raw_bench.
nan
Article 682
Title@2025-05-28 (3): Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains
Title: Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains | Domaino1s: Leitende LLM-Gründung für erklärbare Antworten in High-Stakes-Domains | 域1:在高占用域中解释可解答案的 指导性LLM 2501.14431v2 |
Authors: Xu Chu, Zhijie Tan, Hanlin Xue, Guanyu Wang, Tong Mo, Weiping Li
Large Language Models (LLMs) are widely applied to downstream domains. However, current LLMs for high-stakes domain tasks, such as financial investment and legal QA, typically generate brief answers without reasoning processes and explanations. This limits users’ confidence in making decisions based on their responses. While original CoT shows promise, it lacks self-correction mechanisms during reasoning. This work introduces Domain$o1$s, which enhances LLMs’ reasoning capabilities on domain tasks through supervised fine-tuning and tree search. We construct CoT-stock-2k and CoT-legal-2k datasets for fine-tuning models that activate domain-specific reasoning steps based on their judgment. Additionally, we propose Selective Tree Exploration to spontaneously explore solution spaces and sample optimal reasoning paths to improve performance. We also introduce PROOF-Score, a new metric for evaluating domain models’ explainability, complementing traditional accuracy metrics with richer assessment dimensions. Extensive experiments on stock investment recommendation and legal reasoning QA tasks demonstrate Domaino1s’s leading performance and explainability. Our code is available at https://github.com/Hyalinesky/Domaino1s.
nan
Article 683
Title@2025-05-28 (3): Align-DA: Align Score-based Atmospheric Data Assimilation with Multiple Preferences
Title: Align-DA: Align Score-based Atmospheric Data Assimilation with Multiple Preferences | Align-DA: Align Score-basierte atmosphärische Daten Assimilation mit mehreren Präferenzen | Aleign-DA: 与多重优惠相仿的一致计分大气数据 2505.22008v1 |
Authors: Jing-An Sun, Hang Fan, Junchao Gong, Ben Fei, Kun Chen, Fenghua Ling, Wenlong Zhang, Wanghan Xu, Li Yan, Pierre Gentine, Lei Bai
Data assimilation (DA) aims to estimate the full state of a dynamical system by combining partial and noisy observations with a prior model forecast, commonly referred to as the background. In atmospheric applications, this problem is fundamentally ill-posed due to the sparsity of observations relative to the high-dimensional state space. Traditional methods address this challenge by simplifying background priors to regularize the solution, which are empirical and require continual tuning for application. Inspired by alignment techniques in text-to-image diffusion models, we propose Align-DA, which formulates DA as a generative process and uses reward signals to guide background priors, replacing manual tuning with data-driven alignment. Specifically, we train a score-based model in the latent space to approximate the background-conditioned prior, and align it using three complementary reward signals for DA: (1) assimilation accuracy, (2) forecast skill initialized from the assimilated state, and (3) physical adherence of the analysis fields. Experiments with multiple reward signals demonstrate consistent improvements in analysis quality across different evaluation metrics and observation-guidance strategies. These results show that preference alignment, implemented as a soft constraint, can automatically adapt complex background priors tailored to DA, offering a promising new direction for advancing the field.
nan
Article 684
Title@2025-05-28 (3): Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings
Title: Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings | Generalisierungsanalyse für überwachtes Kontrastives Repräsentationslernen unter Nicht-IID-Einstellungen | 在非IID设置下受监督的违反代表制学习的通用分析 2505.04937v3 |
Authors: Nong Minh Hieu, Antoine Ledent
Contrastive Representation Learning (CRL) has achieved impressive success in various domains in recent years. Nevertheless, the theoretical understanding of the generalization behavior of CRL has remained limited. Moreover, to the best of our knowledge, the current literature only analyzes generalization bounds under the assumption that the data tuples used for contrastive learning are independently and identically distributed. However, in practice, we are often limited to a fixed pool of reusable labeled data points, making it inevitable to recycle data across tuples to create sufficiently large datasets. Therefore, the tuple-wise independence condition imposed by previous works is invalidated. In this paper, we provide a generalization analysis for the CRL framework under non-$i.i.d.$ settings that adheres to practice more realistically. Drawing inspiration from the literature on U-statistics, we derive generalization bounds which indicate that the required number of samples in each class scales as the logarithm of the covering number of the class of learnable feature representations associated to that class. Next, we apply our main results to derive excess risk bounds for common function classes such as linear maps and neural networks.
nan
Article 685
Title@2025-05-28 (3): Locking-Free Training of Physics-Informed Neural Network for Solving Nearly Incompressible Elasticity Equations
Title: Locking-Free Training of Physics-Informed Neural Network for Solving Nearly Incompressible Elasticity Equations | Locking-Free Training of Physics-informed Neural Network for Solving Fast Incompressible Elasticity Equations | 用于解决近不压缩弹性等量的物理内成神经网络的无锁化培训 2505.21994v1 |
Authors: Josef Dick, Seungchan Ko, Kassem Mustapha, Sanghyeon Park
Due to divergence instability, the accuracy of low-order conforming finite element methods for nearly incompressible homogeneous elasticity equations deteriorates as the Lam'e coefficient $\lambda\to\infty$, or equivalently as the Poisson ratio $\nu\to1/2$. This phenomenon, known as locking or non-robustness, remains not fully understood despite extensive investigation. In this paper, we propose a robust method based on a fundamentally different, machine-learning-driven approach. Leveraging recently developed Physics-Informed Neural Networks (PINNs), we address the numerical solution of linear elasticity equations governing nearly incompressible materials. The core idea of our method is to appropriately decompose the given equations to alleviate the extreme imbalance in the coefficients, while simultaneously solving both the forward and inverse problems to recover the solutions of the decomposed systems as well as the associated external conditions. Through various numerical experiments, including constant, variable and parametric Lam'e coefficients, we illustrate the efficiency of the proposed methodology.
nan
Article 686
Title@2025-05-28 (3): Identifying Causal Direction via Variational Bayesian Compression
Title: Identifying Causal Direction via Variational Bayesian Compression | Identifizierung der Kausalrichtung durch variationale Bayesische Kompression | 通过变异贝耶斯压缩确定因果方向 2505.07503v3 |
Authors: Quang-Duy Tran, Bao Duong, Phuoc Nguyen, Thin Nguyen
Telling apart the cause and effect between two random variables with purely observational data is a challenging problem that finds applications in various scientific disciplines. A key principle utilized in this task is the algorithmic Markov condition, which postulates that the joint distribution, when factorized according to the causal direction, yields a more succinct codelength compared to the anti-causal direction. Previous approaches approximate these codelengths by relying on simple functions or Gaussian processes (GPs) with easily evaluable complexity, compromising between model fitness and computational complexity. To overcome these limitations, we propose leveraging the variational Bayesian learning of neural networks as an interpretation of the codelengths. Consequently, we can enhance the model fitness while promoting the succinctness of the codelengths, while avoiding the significant computational complexity of the GP-based approaches. Extensive experiments on both synthetic and real-world benchmarks in cause-effect identification demonstrate the effectiveness of our proposed method, surpassing the overall performance of related complexity-based and structural causal model regression-based approaches.
nan
Article 687
Title@2025-05-28 (3): ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Title: ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning | ACE: Exploring Activation Cosine Ähnlichkeit und Varianz für genaues und kalibrationseffizientes LLM Pruning | ACE: 探索在准确度和校准-有效LLM Pruning 方面活跃共生相近性和差异 2505.21987v1 |
Authors: Zhendong Mi, Zhenglun Kong, Geng Yuan, Shaoyi Huang
With the rapid expansion of large language models (LLMs), the demand for memory and computational resources has grown significantly. Recent advances in LLM pruning aim to reduce the size and computational cost of these models. However, existing methods often suffer from either suboptimal pruning performance or low time efficiency during the pruning process. In this work, we propose an efficient and effective pruning method that simultaneously achieves high pruning performance and fast pruning speed with improved calibration efficiency. Our approach introduces two key innovations: (1) An activation cosine similarity loss-guided pruning metric, which considers the angular deviation of the output activation between the dense and pruned models. (2) An activation variance-guided pruning metric, which helps preserve semantic distinctions in output activations after pruning, enabling effective pruning with shorter input sequences. These two components can be readily combined to enhance LLM pruning in both accuracy and efficiency. Experimental results show that our method achieves up to an 18% reduction in perplexity and up to 63% decrease in pruning time on prevalent LLMs such as LLaMA, LLaMA-2, and OPT.
nan
Article 688
Title@2025-05-28 (3): Reward-Independent Messaging for Decentralized Multi-Agent Reinforcement Learning
Title: Reward-Independent Messaging for Decentralized Multi-Agent Reinforcement Learning | Reward-independent Messaging für dezentralisiertes Mehr-Agenten-Verstärkungs-Lernen | 权力下放多机构加强学习分权式多机构加强学习的回报独立通信 2505.21985v1 |
Authors: Naoto Yoshida, Tadahiro Taniguchi
In multi-agent reinforcement learning (MARL), effective communication improves agent performance, particularly under partial observability. We propose MARL-CPC, a framework that enables communication among fully decentralized, independent agents without parameter sharing. MARL-CPC incorporates a message learning model based on collective predictive coding (CPC) from emergent communication research. Unlike conventional methods that treat messages as part of the action space and assume cooperation, MARL-CPC links messages to state inference, supporting communication in non-cooperative, reward-independent settings. We introduce two algorithms -Bandit-CPC and IPPO-CPC- and evaluate them in non-cooperative MARL tasks. Benchmarks show that both outperform standard message-as-action approaches, establishing effective communication even when messages offer no direct benefit to the sender. These results highlight MARL-CPC’s potential for enabling coordination in complex, decentralized environments.
nan
Article 689
Title@2025-05-28 (3): How to Synthesize Text Data without Model Collapse?
Title: How to Synthesize Text Data without Model Collapse? | Wie können Sie Textdaten ohne Modellkollaps synthesieren? | 如何在没有模式折叠的情况下合成文本数据 ? 2412.14689v3 |
Authors: Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, Bowen Zhou
Model collapse in synthetic data indicates that iterative training on self-generated data leads to a gradual decline in performance. With the proliferation of AI models, synthetic data will fundamentally reshape the web data ecosystem. Future GPT-${n}$ models will inevitably be trained on a blend of synthetic and human-produced data. In this paper, we focus on two questions: what is the impact of synthetic data on language model training, and how to synthesize data without model collapse? We first pre-train language models across different proportions of synthetic data, revealing a negative correlation between the proportion of synthetic data and model performance. We further conduct statistical analysis on synthetic data to uncover distributional shift phenomenon and over-concentration of n-gram features. Inspired by the above findings, we propose token editing on human-produced data to obtain semi-synthetic data. As a proof of concept, we theoretically demonstrate that token-level editing can prevent model collapse, as the test error is constrained by a finite upper bound. We conduct extensive experiments on pre-training from scratch, continual pre-training, and supervised fine-tuning. The results validate our theoretical proof that token-level editing improves model performance.
nan
Article 690
Title@2025-05-28 (3): Latent Weight Diffusion: Generating reactive policies instead of trajectories
Title: Latent Weight Diffusion: Generating reactive policies instead of trajectories | Latent Weight Diffusion: Erzeugen von reaktiven Strategien anstelle von Trajektorien | 负负重扩散: 产生反应性政策, 而不是轨迹 2410.14040v2 |
Authors: Shashank Hegde, Satyajeet Das, Gautam Salhotra, Gaurav S. Sukhatme
With the increasing availability of open-source robotic data, imitation learning has emerged as a viable approach for both robot manipulation and locomotion. Currently, large generalized policies are trained to predict controls or trajectories using diffusion models, which have the desirable property of learning multimodal action distributions. However, generalizability comes with a cost, namely, larger model size and slower inference. This is especially an issue for robotic tasks that require high control frequency. Further, there is a known trade-off between performance and action horizon for Diffusion Policy (DP), a popular model for generating trajectories: fewer diffusion queries accumulate greater trajectory tracking errors. For these reasons, it is common practice to run these models at high inference frequency, subject to robot computational constraints. To address these limitations, we propose Latent Weight Diffusion (LWD), a method that uses diffusion to generate closed-loop policies (weights for neural policies) for robotic tasks, rather than generating trajectories. Learning the behavior distribution through parameter space over trajectory space offers two key advantages: longer action horizons (fewer diffusion queries) & robustness to perturbations while retaining high performance; and a lower inference compute cost. To this end, we show that LWD has higher success rates than DP when the action horizon is longer and when stochastic perturbations exist in the environment. Furthermore, LWD achieves multitask performance comparable to DP while requiring just ~1/45th of the inference-time FLOPS
nan
Article 691
Title@2025-05-28 (3): Two-Stage Feature Generation with Transformer and Reinforcement Learning
Title: Two-Stage Feature Generation with Transformer and Reinforcement Learning | Zweistufige Feature-Generierung mit Transformer und Verstärkungslernen | 具有变换器和强化学习的两阶段特色生成 2505.21978v1 |
Authors: Wanfu Gao, Zengyao Man, Zebin He, Yuhao Tang, Jun Gao, Kunpeng Liu
Feature generation is a critical step in machine learning, aiming to enhance model performance by capturing complex relationships within the data and generating meaningful new features. Traditional feature generation methods heavily rely on domain expertise and manual intervention, making the process labor-intensive and challenging to adapt to different scenarios. Although automated feature generation techniques address these issues to some extent, they often face challenges such as feature redundancy, inefficiency in feature space exploration, and limited adaptability to diverse datasets and tasks. To address these problems, we propose a Two-Stage Feature Generation (TSFG) framework, which integrates a Transformer-based encoder-decoder architecture with Proximal Policy Optimization (PPO). The encoder-decoder model in TSFG leverages the Transformer’s self-attention mechanism to efficiently represent and transform features, capturing complex dependencies within the data. PPO further enhances TSFG by dynamically adjusting the feature generation strategy based on task-specific feedback, optimizing the process for improved performance and adaptability. TSFG dynamically generates high-quality feature sets, significantly improving the predictive performance of machine learning models. Experimental results demonstrate that TSFG outperforms existing state-of-the-art methods in terms of feature quality and adaptability.
nan
Article 692
Title@2025-05-28 (3): Judging LLMs on a Simplex
Title: Judging LLMs on a Simplex | LLMs auf einem Simplex zu urteilen | 以简单方式判断LLMs 2505.21972v1 |
Authors: Patrick Vossler, Fan Xia, Yifan Mai, Jean Feng
Automated evaluation of free-form outputs from large language models (LLMs) is challenging because many distinct answers can be equally valid. A common practice is to use LLMs themselves as judges, but the theoretical properties of this approach are not yet well understood. We show that a geometric framework that represents both judges and candidates as points on a probability simplex can provide helpful insight on what is or is not identifiable using LLM judges. Our theoretical analysis uncovers a “phase transition” in ranking identifiability: for binary scoring systems, true rankings are identifiable even with weak judges under mild assumptions, while rankings become non-identifiable for three or more scoring levels even with infinite data, absent additional prior knowledge. This non-identifiability highlights how uncertainty in rankings stems from not only aleatoric uncertainty (i.e., inherent stochasticity in the data) but also epistemic uncertainty regarding which assumptions hold, an aspect that has received limited attention until now. To integrate both types of uncertainty, we use Bayesian inference to encode assumptions as priors and conduct sensitivity analysis of ranking estimates and credible intervals. Empirical evaluations across multiple benchmarks demonstrate that Bayesian inference yields more accurate rankings and substantially improves coverage rates. These results underscore the importance of taking a more holistic approach to uncertainty quantification when using LLMs as judges.
nan
Article 693
Title@2025-05-28 (3): Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing
Title: Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing | Heterogene Token-Übertragung in LLM-Wissensbearbeitung abmildern | 减轻LLLM知识编辑中变异式 Tok 超称 2502.00602v2 |
Authors: Tianci Liu, Ruirui Li, Zihan Dong, Hui Liu, Xianfeng Tang, Qingyu Yin, Linjun Zhang, Haoyu Wang, Jing Gao
Large language models (LLMs) have achieved remarkable performance on various natural language tasks. However, they are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world. This motivates the development of knowledge editing (KE) to update specific knowledge in LLMs without changing unrelated others or compromising their pre-trained capabilities. Previous efforts sought to update a small amount of parameters of a LLM and proved effective for making selective updates. Nonetheless, the edited LLM often exhibits degraded ability to reason about the new knowledge. In this work, we identify a key issue: heterogeneous token overfitting (HTO), where the LLM overfits different tokens in the provided knowledge at varying rates. To tackle this, we propose OVERTONE, a token-level smoothing method that mitigates HTO by adaptively refining the target distribution. Theoretically, OVERTONE offers better parameter updates with negligible computation overhead. It also induces an implicit DPO but does not require preference data pairs. Extensive experiments across four editing methods, two LLMs, and diverse scenarios demonstrate the effectiveness and versatility of our method.
nan
Article 694
Title@2025-05-28 (3): Robust Reward Alignment via Hypothesis Space Batch Cutting
Title: Robust Reward Alignment via Hypothesis Space Batch Cutting | Robuste Belohnung Ausrichtung durch Hypothesis Raum Batch Schneiden | 通过假设空间批量切割进行强力奖励调整 2502.02921v3 |
Authors: Zhixian Xie, Haode Zhang, Yizhe Feng, Wanxin Jin
Reward design in reinforcement learning and optimal control is challenging. Preference-based alignment addresses this by enabling agents to learn rewards from ranked trajectory pairs provided by humans. However, existing methods often struggle from poor robustness to unknown false human preferences. In this work, we propose a robust and efficient reward alignment method based on a novel and geometrically interpretable perspective: hypothesis space batched cutting. Our method iteratively refines the reward hypothesis space through “cuts” based on batches of human preferences. Within each batch, human preferences, queried based on disagreement, are grouped using a voting function to determine the appropriate cut, ensuring a bounded human query complexity. To handle unknown erroneous preferences, we introduce a conservative cutting method within each batch, preventing erroneous human preferences from making overly aggressive cuts to the hypothesis space. This guarantees provable robustness against false preferences, while eliminating the need to explicitly identify them. We evaluate our method in a model predictive control setting across diverse tasks. The results demonstrate that our framework achieves comparable or superior performance to state-of-the-art methods in error-free settings while significantly outperforming existing methods when handling a high percentage of erroneous human preferences.
nan
Article 695
Title@2025-05-28 (3): Cooperation of Experts: Fusing Heterogeneous Information with Large Margin
Title: Cooperation of Experts: Fusing Heterogeneous Information with Large Margin | Kooperation von Experten: Verschmelzende Heterogene Informationen mit großer Spanne | 专家合作:利用具有较大边际效应的异种信息 2505.20853v2 |
Authors: Shuo Wang, Shunyang Huang, Jinghui Yuan, Zhixiang Shen, Zhao Kang
Fusing heterogeneous information remains a persistent challenge in modern data analysis. While significant progress has been made, existing approaches often fail to account for the inherent heterogeneity of object patterns across different semantic spaces. To address this limitation, we propose the Cooperation of Experts (CoE) framework, which encodes multi-typed information into unified heterogeneous multiplex networks. By overcoming modality and connection differences, CoE provides a powerful and flexible model for capturing the intricate structures of real-world complex data. In our framework, dedicated encoders act as domain-specific experts, each specializing in learning distinct relational patterns in specific semantic spaces. To enhance robustness and extract complementary knowledge, these experts collaborate through a novel large margin mechanism supported by a tailored optimization strategy. Rigorous theoretical analyses guarantee the framework’s feasibility and stability, while extensive experiments across diverse benchmarks demonstrate its superior performance and broad applicability. Our code is available at https://github.com/strangeAlan/CoE.
nan
Article 696
Title@2025-05-28 (3): EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles
Title: EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles | EnsemW2S: Verbesserung der Schwach-zu-Strong-Verallgemeinerung mit großsprachigen Modellensembles | EnsemW2S:用大语言模型组合加强弱至强的通用化 2505.21959v1 |
Authors: Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, Bang An, Bayan Bruss, John Langford, Furong Huang
With Large Language Models (LLMs) rapidly approaching and potentially surpassing human-level performance, it has become imperative to develop approaches capable of effectively supervising and enhancing these powerful models using smaller, human-level models exposed to only human-level data. We address this critical weak-to-strong (W2S) generalization challenge by proposing a novel method aimed at improving weak experts, by training on the same limited human-level data, enabling them to generalize to complex, super-human-level tasks. Our approach, called \textbf{EnsemW2S}, employs a token-level ensemble strategy that iteratively combines multiple weak experts, systematically addressing the shortcomings identified in preceding iterations. By continuously refining these weak models, we significantly enhance their collective ability to supervise stronger student models. We extensively evaluate the generalization performance of both the ensemble of weak experts and the subsequent strong student model across in-distribution (ID) and out-of-distribution (OOD) datasets. For OOD, we specifically introduce question difficulty as an additional dimension for defining distributional shifts. Our empirical results demonstrate notable improvements, achieving 4\%, and 3.2\% improvements on ID datasets and, upto 6\% and 2.28\% on OOD datasets for experts and student models respectively, underscoring the effectiveness of our proposed method in advancing W2S generalization.
nan
Article 697
Title@2025-05-28 (3): A Stochastic Approximation Approach for Efficient Decentralized Optimization on Random Networks
Title: A Stochastic Approximation Approach for Efficient Decentralized Optimization on Random Networks | Ein stochastischer Annäherungsansatz für eine effiziente dezentralisierte Optimierung von Random Networks | 随机网络高效分散优化优化的斯托卡接近方法 2410.18774v2 |
Authors: Chung-Yiu Yau, Haoming Liu, Hoi-To Wai
A challenging problem in decentralized optimization is to develop algorithms with fast convergence on random and time varying topologies under unreliable and bandwidth-constrained communication network. This paper studies a stochastic approximation approach with a Fully Stochastic Primal Dual Algorithm (FSPDA) framework. Our framework relies on a novel observation that randomness in time varying topology can be incorporated in a stochastic augmented Lagrangian formulation, whose expected value admits saddle points that coincide with stationary solutions of the decentralized optimization problem. With the FSPDA framework, we develop two new algorithms supporting efficient sparsified communication on random time varying topologies – FSPDA-SA allows agents to execute multiple local gradient steps depending on the time varying topology to accelerate convergence, and FSPDA-STORM further incorporates a variance reduction step to improve sample complexity. For problems with smooth (possibly non-convex) objective function, within $T$ iterations, we show that FSPDA-SA (resp. FSPDA-STORM) finds an $\mathcal{O}( 1/\sqrt{T} )$-stationary (resp. $\mathcal{O}( 1/T^{2/3} )$) solution. Numerical experiments show the benefits of the FSPDA algorithms.
nan
Article 698
Title@2025-05-28 (3): Kimi k1.5: Scaling Reinforcement Learning with LLMs
Title: Kimi k1.5: Scaling Reinforcement Learning with LLMs | Kimi k1.5: Skalierungs-Verstärkungs-Lernen mit LLMs | Kimi k1.5:利用LLMs加强加强学习 2501.12599v3 |
Authors: Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haotian Yao, Haotian Zhao, Haoyu Lu, Haoze Li, Haozhen Yu, Hongcheng Gao, Huabin Zheng, Huan Yuan, Jia Chen, Jianhang Guo, Jianlin Su, Jianzhou Wang, Jie Zhao, Jin Zhang, Jingyuan Liu, Junjie Yan, Junyan Wu, Lidong Shi, Ling Ye, Longhui Yu, Mengnan Dong, Neo Zhang, Ningchen Ma, Qiwei Pan, Qucheng Gong, Shaowei Liu, Shengling Ma, Shupeng Wei, Sihan Cao, Siying Huang, Tao Jiang, Weihao Gao, Weimin Xiong, Weiran He, Weixiao Huang, Wenhao Wu, Wenyang He, Xianghui Wei, Xianqing Jia, Xingzhe Wu, Xinran Xu, Xinxing Zu, Xinyu Zhou, Xuehai Pan, Y. Charles, Yang Li, Yangyang Hu, Yangyang Liu, Yanru Chen, Yejie Wang, Yibo Liu, Yidao Qin, Yifeng Liu, Ying Yang, Yiping Bao, Yulun Du, Yuxin Wu, Yuzhi Wang, Zaida Zhou, Zhaoji Wang, Zhaowei Li, Zhen Zhu, Zheng Zhang, Zhexu Wang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Ziyao Xu, Zonghan Yang, Zongyu Lin
Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior published work has not produced competitive results. In light of this, we report on the training practice of Kimi k1.5, our latest multi-modal LLM trained with RL, including its RL training techniques, multi-modal data recipes, and infrastructure optimization. Long context scaling and improved policy optimization methods are key ingredients of our approach, which establishes a simplistic, effective RL framework without relying on more complex techniques such as Monte Carlo tree search, value functions, and process reward models. Notably, our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities – e.g., 77.5 on AIME, 96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista – matching OpenAI’s o1. Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results – e.g., 60.8 on AIME, 94.6 on MATH500, 47.3 on LiveCodeBench – outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3.5 by a large margin (up to +550%).
nan
Article 699
Title@2025-05-28 (3): Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization
Title: Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization | Stochastische primäre Doppelblockkoordinate für Zwei-Wege-Partielle AUC-Maximierung | 双向部分AUC 最大化 2505.21944v1 |
Authors: Linli Zhou, Bokun Wang, My T. Thai, Tianbao Yang
Two-way partial AUC (TPAUC) is a critical performance metric for binary classification with imbalanced data, as it focuses on specific ranges of the true positive rate (TPR) and false positive rate (FPR). However, stochastic algorithms for TPAUC optimization remain under-explored, with existing methods either limited to approximated TPAUC loss functions or burdened by sub-optimal complexities. To overcome these limitations, we introduce two innovative stochastic primal-dual double block-coordinate algorithms for TPAUC maximization. These algorithms utilize stochastic block-coordinate updates for both the primal and dual variables, catering to both convex and non-convex settings. We provide theoretical convergence rate analyses, demonstrating significant improvements over prior approaches. Our experimental results, based on multiple benchmark datasets, validate the superior performance of our algorithms, showcasing faster convergence and better generalization. This work advances the state of the art in TPAUC optimization and offers practical tools for real-world machine learning applications.
nan
Article 700
Title@2025-05-28 (3): Continual Learning Beyond Experience Rehearsal and Full Model Surrogates
Title: Continual Learning Beyond Experience Rehearsal and Full Model Surrogates | Kontinuierliches Lernen über die Erfahrung hinaus Proben und vollständige Modellüberlagerungen | 排练和全模模范代理公司 2505.21942v1 |
Authors: Prashant Bhat, Laurens Niesten, Elahe Arani, Bahram Zonooz
Continual learning (CL) has remained a significant challenge for deep neural networks as learning new tasks erases previously acquired knowledge, either partially or completely. Existing solutions often rely on experience rehearsal or full model surrogates to mitigate CF. While effective, these approaches introduce substantial memory and computational overhead, limiting their scalability and applicability in real-world scenarios. To address this, we propose SPARC, a scalable CL approach that eliminates the need for experience rehearsal and full-model surrogates. By effectively combining task-specific working memories and task-agnostic semantic memory for cross-task knowledge consolidation, SPARC results in a remarkable parameter efficiency, using only 6% of the parameters required by full-model surrogates. Despite its lightweight design, SPARC achieves superior performance on Seq-TinyImageNet and matches rehearsal-based methods on various CL benchmarks. Additionally, weight re-normalization in the classification layer mitigates task-specific biases, establishing SPARC as a practical and scalable solution for CL under stringent efficiency constraints.
nan
Article 701
Title@2025-05-28 (3): Go With the Flow: Fast Diffusion for Gaussian Mixture Models
Title: Go With the Flow: Fast Diffusion for Gaussian Mixture Models | Mit dem Fluss gehen: Schnelle Diffusion für Gaussian Mixture Models | 随流而去:高山混合模型的快速扩散 2412.09059v4 |
Authors: George Rapakoulias, Ali Reza Pedram, Fengjiao Liu, Lingjiong Zhu, Panagiotis Tsiotras
Schrodinger Bridges (SBs) are diffusion processes that steer, in finite time, a given initial distribution to another final one while minimizing a suitable cost functional. Although various methods for computing SBs have recently been proposed in the literature, most of these approaches require computationally expensive training schemes, even for solving low-dimensional problems. In this work, we propose an analytic parametrization of a set of feasible policies for steering the distribution of a dynamical system from one Gaussian Mixture Model (GMM) to another. Instead of relying on standard non-convex optimization techniques, the optimal policy within the set can be approximated as the solution of a low-dimensional linear program whose dimension scales linearly with the number of components in each mixture. The proposed method generalizes naturally to more general classes of dynamical systems, such as controllable linear time-varying systems, enabling efficient solutions to multi-marginal momentum SB between GMMs, a challenging distribution interpolation problem. We showcase the potential of this approach in low-to-moderate dimensional problems such as image-to-image translation in the latent space of an autoencoder, learning of cellular dynamics using multi-marginal momentum SB problems, and various other examples. We also test our approach on an Entropic Optimal Transport (EOT) benchmark problem and show that it outperforms state-of-the-art methods in cases where the boundary distributions are mixture models while requiring virtually no training.
nan
Article 702
Title@2025-05-28 (3): Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection
Title: Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection | Praktische Adversarialangriffe auf stochastische Banditen durch gefälschte Dateninjektion | 通过假数据注射,实际对抗性攻击斯托卡强盗 2505.21938v1 |
Authors: Qirun Zeng, Eric He, Richard Hoffmann, Xuchuang Wang, Jinhang Zuo
Adversarial attacks on stochastic bandits have traditionally relied on some unrealistic assumptions, such as per-round reward manipulation and unbounded perturbations, limiting their relevance to real-world systems. We propose a more practical threat model, Fake Data Injection, which reflects realistic adversarial constraints: the attacker can inject only a limited number of bounded fake feedback samples into the learner’s history, simulating legitimate interactions. We design efficient attack strategies under this model, explicitly addressing both magnitude constraints (on reward values) and temporal constraints (on when and how often data can be injected). Our theoretical analysis shows that these attacks can mislead both Upper Confidence Bound (UCB) and Thompson Sampling algorithms into selecting a target arm in nearly all rounds while incurring only sublinear attack cost. Experiments on synthetic and real-world datasets validate the effectiveness of our strategies, revealing significant vulnerabilities in widely used stochastic bandit algorithms under practical adversarial scenarios.
nan
Article 703
Title@2025-05-28 (3): ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
Title: ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation | ReQFlow: Rektifizierter Quaternionsfluss für effiziente und hochwertige Protein-Backbone-Generation | ReQFlow:为高效和高品质蛋白后骨生成而调整的四量流动 2502.14637v3 |
Authors: Angxiao Yue, Zichong Wang, Hongteng Xu
Protein backbone generation plays a central role in de novo protein design and is significant for many biological and medical applications. Although diffusion and flow-based generative models provide potential solutions to this challenging task, they often generate proteins with undesired designability and suffer computational inefficiency. In this study, we propose a novel rectified quaternion flow (ReQFlow) matching method for fast and high-quality protein backbone generation. In particular, our method generates a local translation and a 3D rotation from random noise for each residue in a protein chain, which represents each 3D rotation as a unit quaternion and constructs its flow by spherical linear interpolation (SLERP) in an exponential format. We train the model by quaternion flow (QFlow) matching with guaranteed numerical stability and rectify the QFlow model to accelerate its inference and improve the designability of generated protein backbones, leading to the proposed ReQFlow model. Experiments show that ReQFlow achieves on-par performance in protein backbone generation while requiring much fewer sampling steps and significantly less inference time (e.g., being 37x faster than RFDiffusion and 63x faster than Genie2 when generating a backbone of length 300), demonstrating its effectiveness and efficiency. The code is available at https://github.com/AngxiaoYue/ReQFlow.
nan
Article 704
Title@2025-05-28 (3): Higher-Order Group Synchronization
Title: Higher-Order Group Synchronization | Gruppensynchronisierung mit höherer Ordnung | 高级分级组同步化 2505.21932v1 |
Authors: Adriana L. Duncan, Joe Kileel
Group synchronization is the problem of determining reliable global estimates from noisy local measurements on networks. The typical task for group synchronization is to assign elements of a group to the nodes of a graph in a way that respects group elements given on the edges which encode information about local pairwise relationships between the nodes. In this paper, we introduce a novel higher-order group synchronization problem which operates on a hypergraph and seeks to synchronize higher-order local measurements on the hyperedges to obtain global estimates on the nodes. Higher-order group synchronization is motivated by applications to computer vision and image processing, among other computational problems. First, we define the problem of higher-order group synchronization and discuss its mathematical foundations. Specifically, we give necessary and sufficient synchronizability conditions which establish the importance of cycle consistency in higher-order group synchronization. Then, we propose the first computational framework for general higher-order group synchronization; it acts globally and directly on higher-order measurements using a message passing algorithm. We discuss theoretical guarantees for our framework, including convergence analyses under outliers and noise. Finally, we show potential advantages of our method through numerical experiments. In particular, we show that in certain cases our higher-order method applied to rotational and angular synchronization outperforms standard pairwise synchronization methods and is more robust to outliers. We also show that our method has comparable performance on simulated cryo-electron microscopy (cryo-EM) data compared to a standard cryo-EM reconstruction package.
nan
Article 705
Title@2025-05-28 (3): Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Title: Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning | Ermittlung von Kriterien für die Neugewichtung von Verlusten zur Verbesserung des LLM-Entlernens | 探索损失重新加权标准,加强LLM 重新学习 2505.11953v2 |
Authors: Puning Yang, Qizhou Wang, Zhuo Huang, Tongliang Liu, Chengqi Zhang, Bo Han
Loss reweighting has shown significant benefits for machine unlearning with large language models (LLMs). However, their exact functionalities are left unclear and the optimal strategy remains an open question, thus impeding the understanding and improvement of existing methodologies. In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance – the former indicates that those insufficiently optimized data should be emphasized, while the latter stresses some critical data that are most influential for loss minimization. To study their usefulness, we design specific reweighting strategies for each goal and evaluate their respective effects on unlearning. We conduct extensive empirical analyses on well-established benchmarks, and summarize some important observations as follows: (i) Saturation enhances efficacy more than importance-based reweighting, and their combination can yield additional improvements. (ii) Saturation typically allocates lower weights to data with lower likelihoods, whereas importance-based reweighting does the opposite. (iii) The efficacy of unlearning is also largely influenced by the smoothness and granularity of the weight distributions. Based on these findings, we propose SatImp, a simple reweighting method that combines the advantages of both saturation and importance. Empirical results on extensive datasets validate the efficacy of our method, potentially bridging existing research gaps and indicating directions for future research. Our code is available at https://github.com/tmlr-group/SatImp.
nan
Article 706
Title@2025-05-28 (3): Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets
Title: Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets | Effizientes Ensemble für die Feinabstimmung von Sprachmodellen auf mehreren Datensätzen | 多个数据集微调语言模型高效组合组合 2505.21930v1 |
Authors: Dongyue Li, Ziniu Zhang, Lu Wang, Hongyang R. Zhang
This paper develops an ensemble method for fine-tuning a language model to multiple datasets. Existing methods, such as quantized LoRA (QLoRA), are efficient when adapting to a single dataset. When training on multiple datasets of different tasks, a common setup in practice, it remains unclear how to design an efficient adaptation for fine-tuning language models. We propose to use an ensemble of multiple smaller adapters instead of a single adapter per task. We design an efficient algorithm that partitions $n$ datasets into $m$ groups, where $m$ is typically much smaller than $n$ in practice, and train one adapter for each group before taking a weighted combination to form the ensemble. The algorithm leverages a first-order approximation property of low-rank adaptation to quickly obtain the fine-tuning performances of dataset combinations since methods like LoRA stay close to the base model. Hence, we use the gradients of the base model to estimate its behavior during fine-tuning. Empirically, this approximation holds with less than $1\%$ error on models with up to $34$ billion parameters, leading to an estimation of true fine-tuning performances under $5\%$ error while speeding up computation compared to base fine-tuning by $105$ times. When applied to fine-tune Llama and GPT models on ten text classification tasks, our approach provides up to $10\%$ higher average test accuracy over QLoRA, with only $9\%$ more FLOPs. On a Llama model with $34$ billion parameters, an ensemble of QLoRA increases test accuracy by $3\%$ compared to QLoRA, with only $8\%$ more FLOPs.
nan
Article 707
Title@2025-05-28 (3): Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment
Title: Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment | Effiziente Logit-basierte Wissensdestillation von Tiefen-Spiking-Neural-Netzwerken für die Bereitstellung von Vollstrecken-Zeitschritten | 用于全红时间步骤部署的深渗透神经网络的高效基于逻辑的知识蒸馏 2501.15925v2 |
Authors: Chengting Yu, Xiaochen Zhao, Lei Liu, Shu Yang, Gaoang Wang, Erping Li, Aili Wang
Spiking Neural Networks (SNNs) are emerging as a brain-inspired alternative to traditional Artificial Neural Networks (ANNs), prized for their potential energy efficiency on neuromorphic hardware. Despite this, SNNs often suffer from accuracy degradation compared to ANNs and face deployment challenges due to fixed inference timesteps, which require retraining for adjustments, limiting operational flexibility. To address these issues, our work considers the spatio-temporal property inherent in SNNs, and proposes a novel distillation framework for deep SNNs that optimizes performance across full-range timesteps without specific retraining, enhancing both efficacy and deployment adaptability. We provide both theoretical analysis and empirical validations to illustrate that training guarantees the convergence of all implicit models across full-range timesteps. Experimental results on CIFAR-10, CIFAR-100, CIFAR10-DVS, and ImageNet demonstrate state-of-the-art performance among distillation-based SNNs training methods. Our code is available at https://github.com/Intelli-Chip-Lab/snn_temporal_decoupling_distillation.
nan
Article 708
Title@2025-05-28 (3): Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology
Title: Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology | Subspezialitätsspezifisches Stiftungsmodell für intelligente Gastrointestinalpathologie | 智能气胃肠道病理学 2505.21928v1 |
Authors: Lianghui Zhu, Xitong Ling, Minxi Ouyang, Xiaoping Liu, Mingxi Fu, Tian Guan, Fanglei Fu, Xuanyu Wang, Maomao Zeng, Mingxi Zhu, Yibo Jin, Liming Liu, Song Duan, Qiming He, Yizhi Wang, Luxi Xie, Houqiang Li, Yonghong He, Sufang Tian
Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis, heavily reliant on the subjective interpretation of pathologists, suffers from limited reproducibility and diagnostic variability. To overcome these limitations and address the lack of pathology-specific foundation models for GI diseases, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterative optimization strategy combining pretraining with fine-screening, specifically designed to address the detection of sparsely distributed lesion areas in whole-slide images. Digepath is pretrained on more than 353 million image patches from over 200,000 hematoxylin and eosin-stained slides of GI diseases. It attains state-of-the-art performance on 33 out of 34 tasks related to GI pathology, including pathological diagnosis, molecular prediction, gene mutation prediction, and prognosis evaluation, particularly in diagnostically ambiguous cases and resolution-agnostic tissue classification.We further translate the intelligent screening module for early GI cancer and achieve near-perfect 99.6% sensitivity across 9 independent medical institutions nationwide. The outstanding performance of Digepath highlights its potential to bridge critical gaps in histopathological practice. This work not only advances AI-driven precision pathology for GI diseases but also establishes a transferable paradigm for other pathology subspecialties.
nan
Article 709
Title@2025-05-28 (3): RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
Title: RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination | RenderFormer: Transformer-basiertes Neural-Rendering von Dreiecksnetzen mit globaler Beleuchtung | 成形前:以变形器为基础的以全球光化为工具的三角三角光板的神经成形 2505.21925v1 |
Authors: Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, Xin Tong
We present RenderFormer, a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning. Instead of taking a physics-centric approach to rendering, we formulate rendering as a sequence-to-sequence transformation where a sequence of tokens representing triangles with reflectance properties is converted to a sequence of output tokens representing small patches of pixels. RenderFormer follows a two stage pipeline: a view-independent stage that models triangle-to-triangle light transport, and a view-dependent stage that transforms a token representing a bundle of rays to the corresponding pixel values guided by the triangle-sequence from the view-independent stage. Both stages are based on the transformer architecture and are learned with minimal prior constraints. We demonstrate and evaluate RenderFormer on scenes with varying complexity in shape and light transport.
nan
Article 710
Title@2025-05-28 (3): FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design
Title: FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design | FALCON: Ein ML-Framework für vollautomatisierte Layout-Kontrainierte analoge Schaltungen | FALCON: 完全自动布局约束模拟电路设计 ML 框架 2505.21923v1 |
Authors: Asal Mehradfar, Xuzhe Zhao, Yilun Huang, Emir Ceyani, Yankai Yang, Shihao Han, Hamidreza Aghasi, Salman Avestimehr
Designing analog circuits from performance specifications is a complex, multi-stage process encompassing topology selection, parameter inference, and layout feasibility. We introduce FALCON, a unified machine learning framework that enables fully automated, specification-driven analog circuit synthesis through topology selection and layout-constrained optimization. Given a target performance, FALCON first selects an appropriate circuit topology using a performance-driven classifier guided by human design heuristics. Next, it employs a custom, edge-centric graph neural network trained to map circuit topology and parameters to performance, enabling gradient-based parameter inference through the learned forward model. This inference is guided by a differentiable layout cost, derived from analytical equations capturing parasitic and frequency-dependent effects, and constrained by design rules. We train and evaluate FALCON on a large-scale custom dataset of 1M analog mm-wave circuits, generated and simulated using Cadence Spectre across 20 expert-designed topologies. Through this evaluation, FALCON demonstrates >99\% accuracy in topology inference, <10\% relative error in performance prediction, and efficient layout-aware design that completes in under 1 second per instance. Together, these results position FALCON as a practical and extensible foundation model for end-to-end analog circuit design automation.
nan
Article 711
Title@2025-05-28 (3): Self-supervised Learning Method Using Transformer for Multi-dimensional Sensor Data Processing
Title: Self-supervised Learning Method Using Transformer for Multi-dimensional Sensor Data Processing | Selbstüberwachte Lernmethode mit Transformer für mehrdimensionale Sensordatenverarbeitung | 利用变压器进行多维传感器数据处理的自监督学习方法 2505.21918v1 |
Authors: Haruki Kai, Tsuyoshi Okita
We developed a deep learning algorithm for human activity recognition using sensor signals as input. In this study, we built a pretrained language model based on the Transformer architecture, which is widely used in natural language processing. By leveraging this pretrained model, we aimed to improve performance on the downstream task of human activity recognition. While this task can be addressed using a vanilla Transformer, we propose an enhanced n-dimensional numerical processing Transformer that incorporates three key features: embedding n-dimensional numerical data through a linear layer, binning-based pre-processing, and a linear transformation in the output layer. We evaluated the effectiveness of our proposed model across five different datasets. Compared to the vanilla Transformer, our model demonstrated 10%-15% improvements in accuracy.
nan
Article 712
Title@2025-05-28 (3): SlimLLM: Accurate Structured Pruning for Large Language Models
Title: SlimLLM: Accurate Structured Pruning for Large Language Models | SlimLLM: Genau strukturiertes Pruning für große Sprachmodelle | SlimLLM:大型语言模型的准确结构审慎 2505.22689v1 |
Authors: Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang
Large language models(LLMs) have garnered significant attention and demonstrated impressive capabilities in a wide range of applications. However, due to their enormous computational costs, the deployment and application of LLMs are often severely limited. To address this issue, structured pruning is an effective solution to compress the parameters of LLMs. Determining the importance of each sub-module in LLMs and minimizing performance loss are critical issues that need to be carefully addressed in structured pruning. In this paper, we propose an effective and fast structured pruning method named SlimLLM for large language models. For channel and attention head pruning, we evaluate the importance based on the entire channel or head, rather than merely aggregating the importance of individual elements within a sub-module. This approach enables a more holistic consideration of the interdependence among elements within the sub-module. In addition, we design a simple linear regression strategy for the output matrix to quickly recover performance. We also propose layer-based importance ratio to determine the pruning ratio for each layer. Based on the LLaMA benchmark results, our SlimLLM outperforms other methods and achieves state-of-the-art performance.
nan
Article 713
Title@2025-05-28 (3): Understanding the behavior of representation forgetting in continual learning
Title: Understanding the behavior of representation forgetting in continual learning | Das Verhalten der Repräsentation verstehen vergessen im kontinuierlichen Lernen | 理解在不断学习中遗忘的代言人行为 2505.20970v2 |
Authors: Joonkyu Kim, Yejin Kim, Jy-yong Sohn
In continual learning scenarios, catastrophic forgetting of previously learned tasks is a critical issue, making it essential to effectively measure such forgetting. Recently, there has been growing interest in focusing on representation forgetting, the forgetting measured at the hidden layer. In this paper, we provide the first theoretical analysis of representation forgetting and use this analysis to better understand the behavior of continual learning. First, we introduce a new metric called representation discrepancy, which measures the difference between representation spaces constructed by two snapshots of a model trained through continual learning. We demonstrate that our proposed metric serves as an effective surrogate for the representation forgetting while remaining analytically tractable. Second, through mathematical analysis of our metric, we derive several key findings about the dynamics of representation forgetting: the forgetting occurs more rapidly to a higher degree as the layer index increases, while increasing the width of the network slows down the forgetting process. Third, we support our theoretical findings through experiments on real image datasets, including Split-CIFAR100 and ImageNet1K.
nan
Article 714
Title@2025-05-28 (3): ExpProof : Operationalizing Explanations for Confidential Models with ZKPs
Title: ExpProof : Operationalizing Explanations for Confidential Models with ZKPs | ExpProof : Operationalisierung von Erklärungen für vertrauliche Modelle mit ZKPs | 利用:对ZKPs的机密模型的解释投入运作 2502.03773v3 |
Authors: Chhavi Yadav, Evan Monroe Laufer, Dan Boneh, Kamalika Chaudhuri
In principle, explanations are intended as a way to increase trust in machine learning models and are often obligated by regulations. However, many circumstances where these are demanded are adversarial in nature, meaning the involved parties have misaligned interests and are incentivized to manipulate explanations for their purpose. As a result, explainability methods fail to be operational in such settings despite the demand \cite{bordt2022post}. In this paper, we take a step towards operationalizing explanations in adversarial scenarios with Zero-Knowledge Proofs (ZKPs), a cryptographic primitive. Specifically we explore ZKP-amenable versions of the popular explainability algorithm LIME and evaluate their performance on Neural Networks and Random Forests. Our code is publicly available at https://github.com/emlaufer/ExpProof.
nan
Article 715
Title@2025-05-28 (3): Taming Transformer Without Using Learning Rate Warmup
Title: Taming Transformer Without Using Learning Rate Warmup | Zähmung Transformer ohne Verwendung von Lernrate Warmup | 塔姆变形器不使用学习速率暖化 2505.21910v1 |
Authors: Xianbiao Qi, Yelin He, Jiaquan Ye, Chun-Guang Li, Bojia Zi, Xili Dai, Qin Zou, Rong Xiao
Scaling Transformer to a large scale without using some technical tricks such as learning rate warump and using an obviously lower learning rate is an extremely challenging task, and is increasingly gaining more attention. In this paper, we provide a theoretical analysis for the process of training Transformer and reveal the rationale behind the model crash phenomenon in the training process, termed \textit{spectral energy concentration} of ${\bW_q}^{\top} \bW_k$, which is the reason for a malignant entropy collapse, where ${\bW_q}$ and $\bW_k$ are the projection matrices for the query and the key in Transformer, respectively. To remedy this problem, motivated by \textit{Weyl’s Inequality}, we present a novel optimization strategy, \ie, making the weight updating in successive steps smooth – if the ratio $\frac{\sigma_{1}(\nabla \bW_t)}{\sigma_{1}(\bW_{t-1})}$ is larger than a threshold, we will automatically bound the learning rate to a weighted multiple of $\frac{\sigma_{1}(\bW_{t-1})}{\sigma_{1}(\nabla \bW_t)}$, where $\nabla \bW_t$ is the updating quantity in step $t$. Such an optimization strategy can prevent spectral energy concentration to only a few directions, and thus can avoid malignant entropy collapse which will trigger the model crash. We conduct extensive experiments using ViT, Swin-Transformer and GPT, showing that our optimization strategy can effectively and stably train these Transformers without using learning rate warmup.
nan
Article 716
Title@2025-05-28 (3): Criticality and Safety Margins for Reinforcement Learning
Title: Criticality and Safety Margins for Reinforcement Learning | Kritizität und Sicherheitsmargen für verstärktes Lernen | 强化学习的临界和安全边缘 2409.18289v2 |
Authors: Alexander Grushin, Walt Woods, Alvaro Velasquez, Simon Khan
State of the art reinforcement learning methods sometimes encounter unsafe situations. Identifying when these situations occur is of interest both for post-hoc analysis and during deployment, where it might be advantageous to call out to a human overseer for help. Efforts to gauge the criticality of different points in time have been developed, but their accuracy is not well established due to a lack of ground truth, and they are not designed to be easily interpretable by end users. Therefore, we seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users. We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions. We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality. Safety margins make these interpretable, when defined as the number of random actions for which performance loss will not exceed some tolerance with high confidence. We demonstrate this approach in several environment-agent combinations; for an A3C agent in an Atari Beamrider environment, the lowest 5% of safety margins contain 47% of agent losses; i.e., supervising only 5% of decisions could potentially prevent roughly half of an agent’s errors. This criticality framework measures the potential impacts of bad decisions, even before those decisions are made, allowing for more effective debugging and oversight of autonomous agents.
nan
Article 717
Title@2025-05-28 (3): Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Title: Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding | Verstärktes Lernen für Out-of-Distribution-Reasoning in LLMs: Eine empirische Studie zur diagnostischen Gruppencodierung | 在LLMM中加强分配外原因的强化学习:诊断相关群体编码经验研究 2505.21908v1 |
Authors: Hanyin Wang, Zhenbang Wu, Gururaj Kolar, Hariprasad Korsapati, Brian Bartlett, Bryan Hull, Jimeng Sun
Diagnosis-Related Group (DRG) codes are essential for hospital reimbursement and operations but require labor-intensive assignment. Large Language Models (LLMs) struggle with DRG coding due to the out-of-distribution (OOD) nature of the task: pretraining corpora rarely contain private clinical or billing data. We introduce DRG-Sapphire, which uses large-scale reinforcement learning (RL) for automated DRG coding from clinical notes. Built on Qwen2.5-7B and trained with Group Relative Policy Optimization (GRPO) using rule-based rewards, DRG-Sapphire introduces a series of RL enhancements to address domain-specific challenges not seen in previous mathematical tasks. Our model achieves state-of-the-art accuracy on the MIMIC-IV benchmark and generates physician-validated reasoning for DRG assignments, significantly enhancing explainability. Our study further sheds light on broader challenges of applying RL to knowledge-intensive, OOD tasks. We observe that RL performance scales approximately linearly with the logarithm of the number of supervised fine-tuning (SFT) examples, suggesting that RL effectiveness is fundamentally constrained by the domain knowledge encoded in the base model. For OOD tasks like DRG coding, strong RL performance requires sufficient knowledge infusion prior to RL. Consequently, scaling SFT may be more effective and computationally efficient than scaling RL alone for such tasks.
nan
Article 718
Title@2025-05-28 (3): OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models
Title: OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models | OVERT: Ein Benchmark für eine überwiderrechtliche Bewertung von Text-zu-Bild-Modellen | GUT: 对文本到图像模型的反否决评价基准 2505.21347v2 |
Authors: Ziheng Cheng, Yixiao Huang, Hui Xu, Somayeh Sojoudi, Xuandong Zhao, Dawn Song, Song Mei
Text-to-Image (T2I) models have achieved remarkable success in generating visual content from text inputs. Although multiple safety alignment strategies have been proposed to prevent harmful outputs, they often lead to overly cautious behavior – rejecting even benign prompts – a phenomenon known as $\textit{over-refusal}$ that reduces the practical utility of T2I models. Despite over-refusal having been observed in practice, there is no large-scale benchmark that systematically evaluates this phenomenon for T2I models. In this paper, we present an automatic workflow to construct synthetic evaluation data, resulting in OVERT ($\textbf{OVE}$r-$\textbf{R}$efusal evaluation on $\textbf{T}$ext-to-image models), the first large-scale benchmark for assessing over-refusal behaviors in T2I models. OVERT includes 4,600 seemingly harmful but benign prompts across nine safety-related categories, along with 1,785 genuinely harmful prompts (OVERT-unsafe) to evaluate the safety-utility trade-off. Using OVERT, we evaluate several leading T2I models and find that over-refusal is a widespread issue across various categories (Figure 1), underscoring the need for further research to enhance the safety alignment of T2I models without compromising their functionality. As a preliminary attempt to reduce over-refusal, we explore prompt rewriting; however, we find it often compromises faithfulness to the meaning of the original prompts. Finally, we demonstrate the flexibility of our generation framework in accommodating diverse safety requirements by generating customized evaluation data adapting to user-defined policies.
nan
Article 719
Title@2025-05-28 (3): Geometry-Informed Neural Operator Transformer
Title: Geometry-Informed Neural Operator Transformer | Geometrie-informierter Neuraloperator Transformer | 智能神经操作器变换器 2504.19452v3 |
Authors: Qibang Liu, Vincient Zhong, Hadi Meidani, Diab Abueidda, Seid Koric, Philippe Geubelle
Machine-learning-based surrogate models offer significant computational efficiency and faster simulations compared to traditional numerical methods, especially for problems requiring repeated evaluations of partial differential equations. This work introduces the Geometry-Informed Neural Operator Transformer (GINOT), which integrates the transformer architecture with the neural operator framework to enable forward predictions for arbitrary geometries. GINOT encodes the surface points cloud of a geometry using a sampling and grouping mechanism combined with an attention mechanism, ensuring invariance to point order and padding while maintaining robustness to variations in point density. The geometry information is seamlessly integrated with query points in the solution decoder through the attention mechanism. The performance of GINOT is validated on multiple challenging datasets, showcasing its high accuracy and strong generalization capabilities for complex and arbitrary 2D and 3D geometries.
nan
Article 720
Title@2025-05-28 (3): Integrating Intermediate Layer Optimization and Projected Gradient Descent for Solving Inverse Problems with Diffusion Models
Title: Integrating Intermediate Layer Optimization and Projected Gradient Descent for Solving Inverse Problems with Diffusion Models | Integration von Intermediate Layer Optimization und projizierter Gradient Descent zur Lösung inverser Probleme mit Diffusionsmodellen | 整合中间层优化和预测梯度,以解决传播模型的反向问题 2505.20789v2 |
Authors: Yang Zheng, Wen Li, Zhaoqiang Liu
Inverse problems (IPs) involve reconstructing signals from noisy observations. Recently, diffusion models (DMs) have emerged as a powerful framework for solving IPs, achieving remarkable reconstruction performance. However, existing DM-based methods frequently encounter issues such as heavy computational demands and suboptimal convergence. In this work, building upon the idea of the recent work DMPlug, we propose two novel methods, DMILO and DMILO-PGD, to address these challenges. Our first method, DMILO, employs intermediate layer optimization (ILO) to alleviate the memory burden inherent in DMPlug. Additionally, by introducing sparse deviations, we expand the range of DMs, enabling the exploration of underlying signals that may lie outside the range of the diffusion model. We further propose DMILO-PGD, which integrates ILO with projected gradient descent (PGD), thereby reducing the risk of suboptimal convergence. We provide an intuitive theoretical analysis of our approaches under appropriate conditions and validate their superiority through extensive experiments on diverse image datasets, encompassing both linear and nonlinear IPs. Our results demonstrate significant performance gains over state-of-the-art methods, highlighting the effectiveness of DMILO and DMILO-PGD in addressing common challenges in DM-based IP solvers.
nan
Article 721
Title@2025-05-28 (3): Combinatorial Reinforcement Learning with Preference Feedback
Title: Combinatorial Reinforcement Learning with Preference Feedback | Kombinatorisches Stärkungslernen mit Präferenz-Feedback | 结合强化学习与优先反馈 2502.10158v2 |
Authors: Joongkyu Lee, Min-hwan Oh
In this paper, we consider combinatorial reinforcement learning with preference feedback, where a learning agent sequentially offers an action–an assortment of multiple items to–a user, whose preference feedback follows a multinomial logistic (MNL) model. This framework allows us to model real-world scenarios, particularly those involving long-term user engagement, such as in recommender systems and online advertising. However, this framework faces two main challenges: (1) the unknown value of each item, unlike traditional MNL bandits that only address single-step preference feedback, and (2) the difficulty of ensuring optimism while maintaining tractable assortment selection in the combinatorial action space with unknown values. In this paper, we assume a contextual MNL preference model, where the mean utilities are linear, and the value of each item is approximated by a general function. We propose an algorithm, MNL-VQL, that addresses these challenges, making it both computationally and statistically efficient. As a special case, for linear MDPs (with the MNL preference feedback), we establish the first regret lower bound in this framework and show that MNL-VQL achieves nearly minimax-optimal regret. To the best of our knowledge, this is the first work to provide statistical guarantees in combinatorial RL with preference feedback.
nan
Article 722
Title@2025-05-28 (3): ReGNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction
Title: ReGNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction | ReGNet: Reziproke Raum-Bewusst-Langstrecken-Modellierung für kristalline Eigenschaftsvorhersage | ReGNet:水晶财产预测的对等空间-软件长距离模型模型 2502.02748v2 |
Authors: Jianan Nie, Peiyao Xiao, Kaiyi Ji, Peng Gao
Predicting properties of crystals from their structures is a fundamental yet challenging task in materials science. Unlike molecules, crystal structures exhibit infinite periodic arrangements of atoms, requiring methods capable of capturing both local and global information effectively. However, most current works fall short of capturing long-range interactions within periodic structures. To address this limitation, we leverage \emph{reciprocal space} to efficiently encode long-range interactions with learnable filters within Fourier transforms. We introduce Reciprocal Geometry Network (ReGNet), a novel architecture that integrates geometric GNNs and reciprocal blocks to model short-range and long-range interactions, respectively. Experimental results on JARVIS, Materials Project, and MatBench demonstrate that ReGNet achieves state-of-the-art predictive accuracy across a range of crystal property prediction tasks. Additionally, we explore a model extension that employs the mixture-of-experts for multi-property prediction with promising results and high computational efficiency. These findings highlight the potential of our model as a scalable and accurate solution for crystal property prediction. The code will be released upon paper acceptance.
nan
Article 723
Title@2025-05-28 (3): Language-Enhanced Representation Learning for Single-Cell Transcriptomics
Title: Language-Enhanced Representation Learning for Single-Cell Transcriptomics | Sprachverstärktes Repräsentationslernen für Single-Cell-Transkriptomik | 单一计算机转基因学的提高语言代表性学习 2503.09427v3 |
Authors: Yaorui Shi, Jiaqi Yang, Changhao Nai, Sihang Li, Junfeng Fang, Xiang Wang, Zhiyuan Liu, Yang Zhang
Single-cell RNA sequencing (scRNA-seq) offers detailed insights into cellular heterogeneity. Recent advancements leverage single-cell large language models (scLLMs) for effective representation learning. These models focus exclusively on transcriptomic data, neglecting complementary biological knowledge from textual descriptions. To overcome this limitation, we propose scMMGPT, a novel multimodal framework designed for language-enhanced representation learning in single-cell transcriptomics. Unlike existing methods, scMMGPT employs robust cell representation extraction, preserving quantitative gene expression data, and introduces an innovative two-stage pre-training strategy combining discriminative precision with generative flexibility. Extensive experiments demonstrate that scMMGPT significantly outperforms unimodal and multimodal baselines across key downstream tasks, including cell annotation and clustering, and exhibits superior generalization in out-of-distribution scenarios.
nan
Article 724
Title@2025-05-28 (3): Federated Continual Graph Learning
Title: Federated Continual Graph Learning | Föderiertes kontinuierliches Graphenlernen | 联邦连续图学习 2411.18919v3 |
Authors: Yinlin Zhu, Miao Hu, Di Wu
Managing evolving graph data presents substantial challenges in storage and privacy, and training graph neural networks (GNNs) on such data often leads to catastrophic forgetting, impairing performance on earlier tasks. Despite existing continual graph learning (CGL) methods mitigating this to some extent, they rely on centralized architectures and ignore the potential of distributed graph databases to leverage collective intelligence. To this end, we propose Federated Continual Graph Learning (FCGL) to adapt GNNs across multiple evolving graphs under storage and privacy constraints. Our empirical study highlights two core challenges: local graph forgetting (LGF), where clients lose prior knowledge when adapting to new tasks, and global expertise conflict (GEC), where the global GNN exhibits sub-optimal performance in both adapting to new tasks and retaining old ones, arising from inconsistent client expertise during server-side parameter aggregation. To address these, we introduce POWER, a framework that preserves experience nodes with maximum local-global coverage locally to mitigate LGF, and leverages pseudo-prototype reconstruction with trajectory-aware knowledge transfer to resolve GEC. Experiments on various graph datasets demonstrate POWER’s superiority over federated adaptations of CGL baselines and vision-centric federated continual learning approaches.
nan
Article 725
Title@2025-05-28 (3): Towards Large Reasoning Models for Agriculture
Title: Towards Large Reasoning Models for Agriculture | Auf dem Weg zu groß angelegten Konzepten für die Landwirtschaft | 争取实现农业大理由解释模式 2505.19259v2 |
Authors: Hossein Zaremehrjerdi, Shreyan Ganguly, Ashlyn Rairdin, Elizabeth Tranel, Benjamin Feuer, Juan Ignacio Di Salvo, Srikanth Panthulugiri, Hernan Torres Pacin, Victoria Moser, Sarah Jones, Joscif G Raigne, Yanben Shen, Heidi M. Dornath, Aditya Balu, Adarsh Krishnamurthy, Asheesh K Singh, Arti Singh, Baskar Ganapathysubramanian, Chinmay Hegde, Soumik Sarkar
Agricultural decision-making involves complex, context-specific reasoning, where choices about crops, practices, and interventions depend heavily on geographic, climatic, and economic conditions. Traditional large language models (LLMs) often fall short in navigating this nuanced problem due to limited reasoning capacity. We hypothesize that recent advances in large reasoning models (LRMs) can better handle such structured, domain-specific inference. To investigate this, we introduce AgReason, the first expert-curated open-ended science benchmark with 100 questions for agricultural reasoning. Evaluations across thirteen open-source and proprietary models reveal that LRMs outperform conventional ones, though notable challenges persist, with the strongest Gemini-based baseline achieving 36% accuracy. We also present AgThoughts, a large-scale dataset of 44.6K question-answer pairs generated with human oversight and equipped with synthetically generated reasoning traces. Using AgThoughts, we develop AgThinker, a suite of small reasoning models that can be run on consumer-grade GPUs, and show that our dataset can be effective in unlocking agricultural reasoning abilities in LLMs. Our project page is here: https://baskargroup.github.io/Ag_reasoning/
nan
Article 726
Title@2025-05-28 (3): Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization
Title: Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization | Komprimierende Sine-Activated Low-Rank-Adapter durch Quantisierung nach dem Training | 通过培训后定量化压缩松状活动低Rank适应器 2505.21895v1 |
Authors: Cameron Gordon, Yiping Ji, Hemanth Saratchandran, Paul Albert, Simon Lucey
Low-Rank Adaptation (LoRA) has become a standard approach for parameter-efficient fine-tuning, offering substantial reductions in trainable parameters by modeling updates as the product of two low-rank matrices. While effective, the low-rank constraint inherently limits representational capacity, often resulting in reduced performance compared to full-rank fine-tuning. Recent work by Ji et al. (2025) has addressed this limitation by applying a fixed-frequency sinusoidal transformation to low-rank adapters, increasing their stable rank without introducing additional parameters. This raises a crucial question: can the same sine-activated technique be successfully applied within the context of Post-Training Quantization to retain benefits even after model compression? In this paper, we investigate this question by extending the sinusoidal transformation framework to quantized LoRA adapters. We develop a theoretical analysis showing that the stable rank of a quantized adapter is tightly linked to that of its full-precision counterpart, motivating the use of such rank-enhancing functions even under quantization. Our results demonstrate that the expressivity gains from a sinusoidal non-linearity persist after quantization, yielding highly compressed adapters with negligible loss in performance. We validate our approach across a range of fine-tuning tasks for language, vision and text-to-image generation achieving significant memory savings while maintaining competitive accuracy.
nan
Article 727
Title@2025-05-28 (3): SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Title: SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training | SDPO: Importance-Sampled Direct Preference Optimierung für stabile Diffusionsschulungen | SDPO: 稳定传播培训的重要性抽样直接优惠优化 2505.21893v1 |
Authors: Xiaomeng Yang, Zhiyu Tan, Junyan Wang, Zhijian Zhou, Hao Li
Preference learning has become a central technique for aligning generative models with human expectations. Recently, it has been extended to diffusion models through methods like Direct Preference Optimization (DPO). However, existing approaches such as Diffusion-DPO suffer from two key challenges: timestep-dependent instability, caused by a mismatch between the reverse and forward diffusion processes and by high gradient variance in early noisy timesteps, and off-policy bias arising from the mismatch between optimization and data collection policies. We begin by analyzing the reverse diffusion trajectory and observe that instability primarily occurs at early timesteps with low importance weights. To address these issues, we first propose DPO-C\&M, a practical strategy that improves stability by clipping and masking uninformative timesteps while partially mitigating off-policy bias. Building on this, we introduce SDPO (Importance-Sampled Direct Preference Optimization), a principled framework that incorporates importance sampling into the objective to fully correct for off-policy bias and emphasize informative updates during the diffusion process. Experiments on CogVideoX-2B, CogVideoX-5B, and Wan2.1-1.3B demonstrate that both methods outperform standard Diffusion-DPO, with SDPO achieving superior VBench scores, human preference alignment, and training robustness. These results highlight the importance of timestep-aware, distribution-corrected optimization in diffusion-based preference learning.
nan
Article 728
Title@2025-05-28 (3): ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
Title: ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image | ControlTac: Kraft- und positionsgesteuerte taktile Datenvergrößerung mit einem einzigen Referenzbild | 控制塔克: 带有单一参考图像的 力控和位置控轨迹数据增强 2505.20498v2 |
Authors: Dongyu Luo, Kelin Yu, Amir-Hossein Shahidzadeh, Cornelia Fermüller, Yiannis Aloimonos, Ruohan Gao
Vision-based tactile sensing has been widely used in perception, reconstruction, and robotic manipulation. However, collecting large-scale tactile data remains costly due to the localized nature of sensor-object interactions and inconsistencies across sensor instances. Existing approaches to scaling tactile data, such as simulation and free-form tactile generation, often suffer from unrealistic output and poor transferability to downstream tasks. To address this, we propose ControlTac, a two-stage controllable framework that generates realistic tactile images conditioned on a single reference tactile image, contact force, and contact position. With those physical priors as control input, ControlTac generates physically plausible and varied tactile images that can be used for effective data augmentation. Through experiments on three downstream tasks, we demonstrate that ControlTac can effectively augment tactile datasets and lead to consistent gains. Our three real-world experiments further validate the practical utility of our approach. Project page: https://dongyuluo.github.io/controltac.
nan
Article 729
Title@2025-05-28 (3): Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion
Title: Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion | Fast lineare Konvergenz unter Minimal-Score Annahmen: Quantisierte Transition Diffusion | 在最低分数假设下几乎线性聚合:量化过渡扩散 2505.21892v1 |
Authors: Xunpeng Huang, Yingyu Lin, Nikki Lijing Kuang, Hanze Dong, Difan Zou, Yian Ma, Tong Zhang
Continuous diffusion models have demonstrated remarkable performance in data generation across various domains, yet their efficiency remains constrained by two critical limitations: (1) the local adjacency structure of the forward Markov process, which restricts long-range transitions in the data space, and (2) inherent biases introduced during the simulation of time-inhomogeneous reverse denoising processes. To address these challenges, we propose Quantized Transition Diffusion (QTD), a novel approach that integrates data quantization with discrete diffusion dynamics. Our method first transforms the continuous data distribution $p_$ into a discrete one $q_$ via histogram approximation and binary encoding, enabling efficient representation in a structured discrete latent space. We then design a continuous-time Markov chain (CTMC) with Hamming distance-based transitions as the forward process, which inherently supports long-range movements in the original data space. For reverse-time sampling, we introduce a \textit{truncated uniformization} technique to simulate the reverse CTMC, which can provably provide unbiased generation from $q_$ under minimal score assumptions. Through a novel KL dynamic analysis of the reverse CTMC, we prove that QTD can generate samples with $O(d\ln^2(d/\epsilon))$ score evaluations in expectation to approximate the $d$–dimensional target distribution $p_$ within an $\epsilon$ error tolerance. Our method not only establishes state-of-the-art inference efficiency but also advances the theoretical foundations of diffusion-based generative modeling by unifying discrete and continuous diffusion paradigms.
nan
Article 730
Title@2025-05-28 (3): Towards Robust Automated Perceptual Voice Quality Assessment with Speech Foundation Models
Title: Towards Robust Automated Perceptual Voice Quality Assessment with Speech Foundation Models | Auf dem Weg zu robuster automatisierter Wahrnehmungsqualitätsbewertung mit Sprachstiftungsmodellen | 以语音基金会模式进行强有力的自主声音质量评估 2505.21356v2 |
Authors: Whenty Ariyanti, Kuan-Yu Chen, Sabato Marco Siniscalchi, Hsin-Min Wang, Yu Tsao
Perceptual voice quality assessment is essential for diagnosing and monitoring voice disorders. Traditionally, expert raters use scales such as the CAPE-V and GRBAS. However, these are subjective and prone to inter-rater variability, motivating the need for automated, objective assessment methods. This study proposes VOQANet, a deep learning framework with an attention mechanism that leverages a Speech Foundation Model (SFM) to extract high-level acoustic and prosodic information from raw speech. To improve robustness and interpretability, we introduce VOQANet+, which integrates handcrafted acoustic features such as jitter, shimmer, and harmonics-to-noise ratio (HNR) with SFM embeddings into a hybrid representation. Unlike prior work focusing only on vowel-based phonation (PVQD-A subset) from the Perceptual Voice Quality Dataset (PVQD), we evaluate our models on both vowel-based and sentence-level speech (PVQD-S subset) for better generalizability. Results show that sentence-based input outperforms vowel-based input, particularly at the patient level, highlighting the benefit of longer utterances for capturing voice attributes. VOQANet consistently surpasses baseline methods in root mean squared error and Pearson correlation across CAPE-V and GRBAS dimensions, with VOQANet+ achieving further improvements. Additional tests under noisy conditions show that VOQANet+ maintains high prediction accuracy, supporting its use in real-world and telehealth settings. These findings demonstrate the value of combining SFM embeddings with domain-informed acoustic features for interpretable and robust voice quality assessment.
nan
Article 731
Title@2025-05-28 (3): Symbolic Foundation Regressor on Complex Networks
Title: Symbolic Foundation Regressor on Complex Networks | Symbolischer Foundation-Regressor auf komplexen Netzwerken | 复杂网络上的反射器 2505.21879v1 |
Authors: Weiting Liu, Jiaxu Cui, Jiao Hu, En Wang, Bo Yang
In science, we are interested not only in forecasting but also in understanding how predictions are made, specifically what the interpretable underlying model looks like. Data-driven machine learning technology can significantly streamline the complex and time-consuming traditional manual process of discovering scientific laws, helping us gain insights into fundamental issues in modern science. In this work, we introduce a pre-trained symbolic foundation regressor that can effectively compress complex data with numerous interacting variables while producing interpretable physical representations. Our model has been rigorously tested on non-network symbolic regression, symbolic regression on complex networks, and the inference of network dynamics across various domains, including physics, biochemistry, ecology, and epidemiology. The results indicate a remarkable improvement in equation inference efficiency, being three times more effective than baseline approaches while maintaining accurate predictions. Furthermore, we apply our model to uncover more intuitive laws of interaction transmission from global epidemic outbreak data, achieving optimal data fitting. This model extends the application boundary of pre-trained symbolic regression models to complex networks, and we believe it provides a foundational solution for revealing the hidden mechanisms behind changes in complex phenomena, enhancing interpretability, and inspiring further scientific discoveries.
nan
Article 732
Title@2025-05-28 (3): Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning
Title: Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning | Hybride Batch-Normalisierung: Lösung des Dilemmas der Batch-Normalisierung im Federated Learning | 混合批次正常化:解决联邦学习中批次正常化的难题 2505.21877v1 |
Authors: Hongyao Chen, Tianyang Xu, Xiaojun Wu, Josef Kittler
Batch Normalisation (BN) is widely used in conventional deep neural network training to harmonise the input-output distributions for each batch of data. However, federated learning, a distributed learning paradigm, faces the challenge of dealing with non-independent and identically distributed data among the client nodes. Due to the lack of a coherent methodology for updating BN statistical parameters, standard BN degrades the federated learning performance. To this end, it is urgent to explore an alternative normalisation solution for federated learning. In this work, we resolve the dilemma of the BN layer in federated learning by developing a customised normalisation approach, Hybrid Batch Normalisation (HBN). HBN separates the update of statistical parameters (i.e. , means and variances used for evaluation) from that of learnable parameters (i.e. , parameters that require gradient updates), obtaining unbiased estimates of global statistical parameters in distributed scenarios. In contrast with the existing solutions, we emphasise the supportive power of global statistics for federated learning. The HBN layer introduces a learnable hybrid distribution factor, allowing each computing node to adaptively mix the statistical parameters of the current batch with the global statistics. Our HBN can serve as a powerful plugin to advance federated learning performance. It reflects promising merits across a wide range of federated learning settings, especially for small batch sizes and heterogeneous data.
nan
Article 733
Title@2025-05-28 (3): Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images
Title: Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images | Gezieltes Lernen mit gestörten Zeichen Gradient Methoden mit Anwendungen auf medizinischen Bildern | 采用固定信号渐进方法,在医学图像上应用医学图象,有针对性地取消学习 2505.21872v1 |
Authors: George R. Nahass, Zhu Wang, Homa Rashidisabet, Won Hwa Kim, Sasha Hubschman, Jeffrey C. Peterson, Ghasem Yazdanpanah, Chad A. Purnell, Pete Setabutr, Ann Q. Tran, Darvin Yi, Sathya N. Ravi
Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes are common. To this end, we propose a bilevel optimization formulation of boundary-based unlearning that can be solved using iterative algorithms. We provide convergence guarantees when first-order algorithms are used to unlearn. Our method introduces tunable loss design for controlling the forgetting-retention tradeoff and supports novel model composition strategies that merge the strengths of distinct unlearning runs. Across benchmark and real-world clinical imaging datasets, our approach outperforms baselines on both forgetting and retention metrics, including scenarios involving imaging devices and anatomical outliers. This work establishes machine unlearning as a modular, practical alternative to retraining for real-world model maintenance in clinical applications.
nan
Article 734
Title@2025-05-28 (3): Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning
Title: Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning | Coarse-to-fine Q-Network mit Aktionssequenz für dateneffizientes Roboterlernen | Coarse 至 fine Q 网络与数据效率机器人学习行动序列 2411.12155v4 |
Authors: Younggyo Seo, Pieter Abbeel
Predicting a sequence of actions has been crucial in the success of recent behavior cloning algorithms in robotics. Can similar ideas improve reinforcement learning (RL)? We answer affirmatively by observing that incorporating action sequences when predicting ground-truth return-to-go leads to lower validation loss. Motivated by this, we introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions, i.e., explicitly training the value function to learn the consequence of executing action sequences. Our experiments show that CQN-AS outperforms several baselines on a variety of sparse-reward humanoid control and tabletop manipulation tasks from BiGym and RLBench.
nan
Article 735
Title@2025-05-28 (3): Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
Title: Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures | Mini-Batch Coresets für speichereffiziente Sprachmodellschulungen auf Datenmischungen | 记忆效率语言数据混合模型培训微型批量核心数据集 2407.19580v4 |
Authors: Dang Nguyen, Wenhan Yang, Rathul Anand, Yu Yang, Baharan Mirzasoleiman
Training with larger mini-batches improves the convergence rate and can yield superior performance. However, training with large mini-batches becomes prohibitive for Large Language Models (LLMs), due to the large GPU memory requirement. To address this problem, an effective approach is finding small mini-batch coresets that closely match the gradient of larger mini-batches. However, this approach becomes infeasible and ineffective for LLMs, due to the highly imbalanced mixture of sources in language data, use of the Adam optimizer, and the very large gradient dimensionality of LLMs. In this work, we address the above challenges by proposing Coresets for Training LLMs (CoLM). First, we show that mini-batch coresets found by gradient matching do not contain representative examples of the small sources w.h.p., and thus including all examples of the small sources in the mini-batch coresets is crucial for optimal performance. Second, we normalize the gradients by their historical exponential to find mini-batch coresets for training with Adam. Finally, we leverage zeroth-order methods to find smooth gradient of the last V-projection matrix and sparsify it to keep the dimensions with the largest normalized gradient magnitude. We apply CoLM to fine-tuning Phi-2, Phi-3, Zephyr, and Llama-3 models with LoRA on MathInstruct and SuperGLUE benchmark. Remarkably, CoLM reduces the memory requirement of fine-tuning by 2x and even outperforms training with 4x larger mini-batches. Moreover, CoLM seamlessly integrates with existing memory-efficient training methods like LoRA, further reducing the memory requirements of training LLMs. Our code is available at https://github.com/BigML-CS-UCLA/CoLM.
nan
Article 736
Title@2025-05-28 (3): Revisiting Bayesian Model Averaging in the Era of Foundation Models
Title: Revisiting Bayesian Model Averaging in the Era of Foundation Models | Bayesianisches Modell im Zeitalter der Gründungsmodelle neu besuchen | 重新审查基金会模式时代的贝耶斯模式 2505.21857v1 |
Authors: Mijung Park
We revisit the classical, full-fledged Bayesian model averaging (BMA) paradigm to ensemble pre-trained and/or lightly-finetuned foundation models to enhance the classification performance on image and text data. To make BMA tractable under foundation models, we introduce trainable linear classifiers that take frozen features from the pre-trained foundation models as inputs. The model posteriors over the linear classifiers tell us which linear heads and frozen features are better suited for a given dataset, resulting in a principled model ensembling method. Furthermore, we propose a computationally cheaper, optimizable model averaging scheme (OMA). In OMA, we directly optimize the model ensemble weights, just like those weights based on model posterior distributions in BMA, by reducing the amount of surprise (expected entropy of the predictions) we get from predictions of ensembled models. With the rapid development of foundation models, these approaches will enable the incorporation of future, possibly significantly better foundation models to enhance the performance of challenging classification tasks.
nan
Article 737
Title@2025-05-28 (3): Meta Co-Training: Two Views are Better than One
Title: Meta Co-Training: Two Views are Better than One | Meta Co-Training: Zwei Ansichten sind besser als eine | Meta联合培训:两种观点比一种观点更好 2311.18083v5 |
Authors: Jay C. Rothenberger, Dimitrios I. Diochnos
In many critical computer vision scenarios unlabeled data is plentiful, but labels are scarce and difficult to obtain. As a result, semi-supervised learning which leverages unlabeled data to boost the performance of supervised classifiers have received significant attention in recent literature. One representative class of semi-supervised algorithms are co-training algorithms. Co-training algorithms leverage two different models which have access to different independent and sufficient representations or “views” of the data to jointly make better predictions. Each of these models creates pseudo-labels on unlabeled points which are used to improve the other model. We show that in the common case where independent views are not available, we can construct such views inexpensively using pre-trained models. Co-training on the constructed views yields a performance improvement over any of the individual views we construct and performance comparable with recent approaches in semi-supervised learning. We present Meta Co-Training, a novel semi-supervised learning algorithm, which has two advantages over co-training: (i) learning is more robust when there is large discrepancy between the information content of the different views, and (ii) does not require retraining from scratch on each iteration. Our method achieves new state-of-the-art performance on ImageNet-10% achieving a ~4.7% reduction in error rate over prior work. Our method also outperforms prior semi-supervised work on several other fine-grained image classification datasets.
nan
Article 738
Title@2025-05-28 (3): Investigating the effectiveness of multimodal data in forecasting SARS-COV-2 case surges
Title: Investigating the effectiveness of multimodal data in forecasting SARS-COV-2 case surges | Untersuchung der Wirksamkeit multimodaler Daten bei der Prognose von SARS-COV-2-Fallfluten | 调查多式联运数据在预测SARS-COV-2案件激增方面的有效性 2505.22688v1 |
Authors: Palur Venkata Raghuvamsi, Siyuan Brandon Loh, Prasanta Bhattacharya, Joses Ho, Raphael Lee Tze Chuen, Alvin X. Han, Sebastian Maurer-Stroh
The COVID-19 pandemic response relied heavily on statistical and machine learning models to predict key outcomes such as case prevalence and fatality rates. These predictions were instrumental in enabling timely public health interventions that helped break transmission cycles. While most existing models are grounded in traditional epidemiological data, the potential of alternative datasets, such as those derived from genomic information and human behavior, remains underexplored. In the current study, we investigated the usefulness of diverse modalities of feature sets in predicting case surges. Our results highlight the relative effectiveness of biological (e.g., mutations), public health (e.g., case counts, policy interventions) and human behavioral features (e.g., mobility and social media conversations) in predicting country-level case surges. Importantly, we uncover considerable heterogeneity in predictive performance across countries and feature modalities, suggesting that surge prediction models may need to be tailored to specific national contexts and pandemic phases. Overall, our work highlights the value of integrating alternative data sources into existing disease surveillance frameworks to enhance the prediction of pandemic dynamics.
nan
Article 739
Title@2025-05-28 (3): Multi-Label Bayesian Active Learning with Inter-Label Relationships
Title: Multi-Label Bayesian Active Learning with Inter-Label Relationships | Multi-Label Bayesian Aktives Lernen mit inter-Label Beziehungen | 多标签贝耶斯人积极学习与跨标签关系 2411.17941v2 |
Authors: Yuanyuan Qi, Jueqing Lu, Xiaohao Yang, Joanne Enticott, Lan Du
The primary challenge of multi-label active learning, differing it from multi-class active learning, lies in assessing the informativeness of an indefinite number of labels while also accounting for the inherited label correlation. Existing studies either require substantial computational resources to leverage correlations or fail to fully explore label dependencies. Additionally, real-world scenarios often require addressing intrinsic biases stemming from imbalanced data distributions. In this paper, we propose a new multi-label active learning strategy to address both challenges. Our method incorporates progressively updated positive and negative correlation matrices to capture co-occurrence and disjoint relationships within the label space of annotated samples, enabling a holistic assessment of uncertainty rather than treating labels as isolated elements. Furthermore, alongside diversity, our model employs ensemble pseudo labeling and beta scoring rules to address data imbalances. Extensive experiments on four realistic datasets demonstrate that our strategy consistently achieves more reliable and superior performance, compared to several established methods.
nan
Article 740
Title@2025-05-28 (3): Improving the Variance of Differentially Private Randomized Experiments through Clustering
Title: Improving the Variance of Differentially Private Randomized Experiments through Clustering | Verbesserung der Varianz von differenziert privaten Randomisierten Experimenten durch Clustering | 通过集群化改进差异私人随机化实验的差异 2308.00957v3 |
Authors: Adel Javanmard, Vahab Mirrokni, Jean Pouget-Abadie
Estimating causal effects from randomized experiments is only possible if participants are willing to disclose their potentially sensitive responses. Differential privacy, a widely used framework for ensuring an algorithms privacy guarantees, can encourage participants to share their responses without the risk of de-anonymization. However, many mechanisms achieve differential privacy by adding noise to the original dataset, which reduces the precision of causal effect estimation. This introduces a fundamental trade-off between privacy and variance when performing causal analyses on differentially private data. In this work, we propose a new differentially private mechanism, “Cluster-DP”, which leverages a given cluster structure in the data to improve the privacy-variance trade-off. While our results apply to any clustering, we demonstrate that selecting higher-quality clusters, according to a quality metric we introduce, can decrease the variance penalty without compromising privacy guarantees. Finally, we evaluate the theoretical and empirical performance of our Cluster-DP algorithm on both real and simulated data, comparing it to common baselines, including two special cases of our algorithm: its unclustered version and a uniform-prior version.
nan
Article 741
Title@2025-05-28 (3): ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model
Title: ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model | ItDPDM: Informationstheoretisches Diskretes Poisson-Diffusionsmodell | ITDDDM:信息-理论分辨偏异Poisson传播模型 2505.05082v3 |
Authors: Sagnik Bhattacharya, Abhiram Gorle, Ahsan Bilal, Connor Ding, Amit Kumar Singh Yadav, Tsachy Weissman
Generative modeling of non-negative, discrete data, such as symbolic music, remains challenging due to two persistent limitations in existing methods. Firstly, many approaches rely on modeling continuous embeddings, which is suboptimal for inherently discrete data distributions. Secondly, most models optimize variational bounds rather than exact data likelihood, resulting in inaccurate likelihood estimates and degraded sampling quality. While recent diffusion-based models have addressed these issues separately, we tackle them jointly. In this work, we introduce the Information-Theoretic Discrete Poisson Diffusion Model (ItDPDM), inspired by photon arrival process, which combines exact likelihood estimation with fully discrete-state modeling. Central to our approach is an information-theoretic Poisson Reconstruction Loss (PRL) that has a provable exact relationship with the true data likelihood. ItDPDM achieves improved likelihood and sampling performance over prior discrete and continuous diffusion models on a variety of synthetic discrete datasets. Furthermore, on real-world datasets such as symbolic music and images, ItDPDM attains superior likelihood estimates and competitive generation quality-demonstrating a proof of concept for distribution-robust discrete generative modeling.
nan
Article 742
Title@2025-05-28 (3): Solving Empirical Bayes via Transformers
Title: Solving Empirical Bayes via Transformers | Lösen von Empirischen Buchten über Transformer | 通过变换器解决实证贝贝 2502.09844v2 |
Authors: Anzo Teh, Mark Jabbour, Yury Polyanskiy
This work applies modern AI tools (transformers) to solving one of the oldest statistical problems: Poisson means under empirical Bayes (Poisson-EB) setting. In Poisson-EB a high-dimensional mean vector $\theta$ (with iid coordinates sampled from an unknown prior $\pi$) is estimated on the basis of $X=\mathrm{Poisson}(\theta)$. A transformer model is pre-trained on a set of synthetically generated pairs $(X,\theta)$ and learns to do in-context learning (ICL) by adapting to unknown $\pi$. Theoretically, we show that a sufficiently wide transformer can achieve vanishing regret with respect to an oracle estimator who knows $\pi$ as dimension grows to infinity. Practically, we discover that already very small models (100k parameters) are able to outperform the best classical algorithm (non-parametric maximum likelihood, or NPMLE) both in runtime and validation loss, which we compute on out-of-distribution synthetic data as well as real-world datasets (NHL hockey, MLB baseball, BookCorpusOpen). Finally, by using linear probes, we confirm that the transformer’s EB estimator appears to internally work differently from either NPMLE or Robbins’ estimators.
nan
Article 743
Title@2025-05-28 (3): Continuous Thought Machines
Title: Continuous Thought Machines | Kontinuierliche Gedankenmaschinen | 连续思考机 2505.05522v3 |
Authors: Luke Darlow, Ciaran Regan, Sebastian Risi, Jeffrey Seely, Llion Jones
Biological brains demonstrate complex neural activity, where the timing and interplay between neurons is critical to how brains process information. Most deep learning architectures simplify neural activity by abstracting away temporal dynamics. In this paper we challenge that paradigm. By incorporating neuron-level processing and synchronization, we can effectively reintroduce neural timing as a foundational element. We present the Continuous Thought Machine (CTM), a model designed to leverage neural dynamics as its core representation. The CTM has two core innovations: (1) neuron-level temporal processing, where each neuron uses unique weight parameters to process a history of incoming signals; and (2) neural synchronization employed as a latent representation. The CTM aims to strike a balance between oversimplified neuron abstractions that improve computational efficiency, and biological realism. It operates at a level of abstraction that effectively captures essential temporal dynamics while remaining computationally tractable for deep learning. We demonstrate the CTM’s strong performance and versatility across a range of challenging tasks, including ImageNet-1K classification, solving 2D mazes, sorting, parity computation, question-answering, and RL tasks. Beyond displaying rich internal representations and offering a natural avenue for interpretation owing to its internal process, the CTM is able to perform tasks that require complex sequential reasoning. The CTM can also leverage adaptive compute, where it can stop earlier for simpler tasks, or keep computing when faced with more challenging instances. The goal of this work is to share the CTM and its associated innovations, rather than pushing for new state-of-the-art results. To that end, we believe the CTM represents a significant step toward developing more biologically plausible and powerful artificial intelligence systems.
nan
Article 744
Title@2025-05-28 (3): Statistical Inference for Temporal Difference Learning with Linear Function Approximation
Title: Statistical Inference for Temporal Difference Learning with Linear Function Approximation | Statistische Schlussfolgerung für zeitliches Differenzlernen mit linearer Funktionsannäherung | 与线性函数接近一致的时空差异学习统计推推 2410.16106v3 |
Authors: Weichen Wu, Gen Li, Yuting Wei, Alessandro Rinaldo
We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the optimal linear approximation to the value function. We make three significant contributions that improve the current state-of-the-art results: (i) we derive sharper high probability convergence guarantee that depend explicitly on the asymptotic variance and hold under weaker conditions than those normally assumed; (ii) we establish refined high-dimensional Berry-Esseen bounds over the class of convex sets, achieving faster rates than those previously established in the literature, and (iii) we propose and analyze a novel, computationally efficient online plug-in estimator of the asymptotic covariance matrix.These results enable the construction of confidence regions and simultaneous confidence intervals for the linear parameters of the value function approximation, with guaranteed finite-sample coverage. We demonstrate the applicability of our theoretical findings through numerical experiments.
nan
Article 745
Title@2025-05-28 (3): A Provable Approach for End-to-End Safe Reinforcement Learning
Title: A Provable Approach for End-to-End Safe Reinforcement Learning | Ein realistischer Ansatz für das Ende-zu-Ende sichere Stärkungslernen | 最终至最终安全强化学习的可行办法 2505.21852v1 |
Authors: Akifumi Wachi, Kohei Miyaguchi, Takumi Tanabe, Rei Sato, Youhei Akimoto
A longstanding goal in safe reinforcement learning (RL) is a method to ensure the safety of a policy throughout the entire process, from learning to operation. However, existing safe RL paradigms inherently struggle to achieve this objective. We propose a method, called Provably Lifetime Safe RL (PLS), that integrates offline safe RL with safe policy deployment to address this challenge. Our proposed method learns a policy offline using return-conditioned supervised learning and then deploys the resulting policy while cautiously optimizing a limited set of parameters, known as target returns, using Gaussian processes (GPs). Theoretically, we justify the use of GPs by analyzing the mathematical relationship between target and actual returns. We then prove that PLS finds near-optimal target returns while guaranteeing safety with high probability. Empirically, we demonstrate that PLS outperforms baselines both in safety and reward performance, thereby achieving the longstanding goal to obtain high rewards while ensuring the safety of a policy throughout the lifetime from learning to operation.
nan
Article 746
Title@2025-05-28 (3): Streaming Flow Policy: Simplifying diffusion$/$flow-matching policies by treating action trajectories as flow trajectories
Title: Streaming Flow Policy: Simplifying diffusion$/$flow-matching policies by treating action trajectories as flow trajectories | Streaming Flow Policy: Vereinfachende Diffusion$/$ Flow-Matching-Richtlinien durch Behandlung von Aktionsbahnen als Flow-Trajektorien | 流流流流流流政策:通过将行动轨迹作为流动轨迹处理,简化以美元/美元/美元的流量匹配政策 2505.21851v1 |
Authors: Sunshine Jiang, Xiaolin Fang, Nicholas Roy, Tomás Lozano-Pérez, Leslie Pack Kaelbling, Siddharth Ancha
Recent advances in diffusion$/$flow-matching policies have enabled imitation learning of complex, multi-modal action trajectories. However, they are computationally expensive because they sample a trajectory of trajectories: a diffusion$/$flow trajectory of action trajectories. They discard intermediate action trajectories, and must wait for the sampling process to complete before any actions can be executed on the robot. We simplify diffusion$/$flow policies by treating action trajectories as flow trajectories. Instead of starting from pure noise, our algorithm samples from a narrow Gaussian around the last action. Then, it incrementally integrates a velocity field learned via flow matching to produce a sequence of actions that constitute a single trajectory. This enables actions to be streamed to the robot on-the-fly during the flow sampling process, and is well-suited for receding horizon policy execution. Despite streaming, our method retains the ability to model multi-modal behavior. We train flows that stabilize around demonstration trajectories to reduce distribution shift and improve imitation learning performance. Streaming flow policy outperforms prior methods while enabling faster policy execution and tighter sensorimotor loops for learning-based robot control. Project website: https://streaming-flow-policy.github.io/
nan
Article 747
Title@2025-05-28 (3): Spectral clustering for dependent community Hawkes process models of temporal networks
Title: Spectral clustering for dependent community Hawkes process models of temporal networks | Spektrales Clustering für abhängige Community Hawkes Prozessmodelle von zeitlichen Netzwerken | 依赖依赖性社区霍克斯时间网络过程模型光谱群群群 2505.21845v1 |
Authors: Lingfei Zhao, Hadeel Soliman, Kevin S. Xu, Subhadeep Paul
Temporal networks observed continuously over time through timestamped relational events data are commonly encountered in application settings including online social media communications, financial transactions, and international relations. Temporal networks often exhibit community structure and strong dependence patterns among node pairs. This dependence can be modeled through mutual excitations, where an interaction event from a sender to a receiver node increases the possibility of future events among other node pairs. We provide statistical results for a class of models that we call dependent community Hawkes (DCH) models, which combine the stochastic block model with mutually exciting Hawkes processes for modeling both community structure and dependence among node pairs, respectively. We derive a non-asymptotic upper bound on the misclustering error of spectral clustering on the event count matrix as a function of the number of nodes and communities, time duration, and the amount of dependence in the model. Our result leverages recent results on bounding an appropriate distance between a multivariate Hawkes process count vector and a Gaussian vector, along with results from random matrix theory. We also propose a DCH model that incorporates only self and reciprocal excitation along with highly scalable parameter estimation using a Generalized Method of Moments (GMM) estimator that we demonstrate to be consistent for growing network size and time duration.
nan
Article 748
Title@2025-05-28 (3): A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem
Title: A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem | Ein physikinformiertes Lernrahmenwerk zur Lösung des Unendlichen-Horizon-Optimalen Steuerungsproblems | 解决无限 – – 霍里佐最佳控制问题的物理综合学习框架 2505.21842v1 |
Authors: Filippos Fotiadis, Kyriakos G. Vamvoudakis
We propose a physics-informed neural networks (PINNs) framework to solve the infinite-horizon optimal control problem of nonlinear systems. In particular, since PINNs are generally able to solve a class of partial differential equations (PDEs), they can be employed to learn the value function of the infinite-horizon optimal control problem via solving the associated steady-state Hamilton-Jacobi-Bellman (HJB) equation. However, an issue here is that the steady-state HJB equation generally yields multiple solutions; hence if PINNs are directly employed to it, they may end up approximating a solution that is different from the optimal value function of the problem. We tackle this by instead applying PINNs to a finite-horizon variant of the steady-state HJB that has a unique solution, and which uniformly approximates the optimal value function as the horizon increases. An algorithm to verify if the chosen horizon is large enough is also given, as well as a method to extend it – with reduced computations and robustness to approximation errors – in case it is not. Unlike many existing methods, the proposed technique works well with non-polynomial basis functions, does not require prior knowledge of a stabilizing controller, and does not perform iterative policy evaluations. Simulations are performed, which verify and clarify theoretical findings.
nan
Article 749
Title@2025-05-28 (3): An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints
Title: An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints | Optimistischer Algorithmus für Online-CMDPS mit jederzeit feindlichen Einschränkungen | 带有任何时间的反逆限制的在线 CMDPS 优化算法 2505.21841v1 |
Authors: Jiahui Zhu, Kihyun Yu, Dabeen Lee, Xin Liu, Honghao Wei
Online safe reinforcement learning (RL) plays a key role in dynamic environments, with applications in autonomous driving, robotics, and cybersecurity. The objective is to learn optimal policies that maximize rewards while satisfying safety constraints modeled by constrained Markov decision processes (CMDPs). Existing methods achieve sublinear regret under stochastic constraints but often fail in adversarial settings, where constraints are unknown, time-varying, and potentially adversarially designed. In this paper, we propose the Optimistic Mirror Descent Primal-Dual (OMDPD) algorithm, the first to address online CMDPs with anytime adversarial constraints. OMDPD achieves optimal regret O(sqrt(K)) and strong constraint violation O(sqrt(K)) without relying on Slater’s condition or the existence of a strictly known safe policy. We further show that access to accurate estimates of rewards and transitions can further improve these bounds. Our results offer practical guarantees for safe decision-making in adversarial environments.
nan
Article 750
Title@2025-05-28 (3): Natural Language Reinforcement Learning
Title: Natural Language Reinforcement Learning | Natürliche Sprache Stärkung Lernen | 自然语言强化学习 2411.14251v3 |
Authors: Xidong Feng, Bo Liu, Yan Song, Haotian Fu, Ziyu Wan, Girish A. Koushik, Zhiyuan Hu, Mengyue Yang, Ying Wen, Jun Wang
Artificial intelligence progresses towards the “Era of Experience,” where agents are expected to learn from continuous, grounded interaction. We argue that traditional Reinforcement Learning (RL), which typically represents value as a scalar, can restrict agent’s deep understanding of environments and hinders the active, deliberative learning crucial for navigating this new paradigm. To address the issue, we introduce Natural Language Reinforcement Learning (NLRL), a framework that extends RL principles into natural language counterparts. Central to NLRL is the Language Value Function (LVF), which redefines value as an interpretable linguistic narrative articulating the rationale behind an evaluation. NLRL further extends this concept to core RL components, including policy, the Bellman equation, and policy iteration. Leveraging recent advancements in Large Language Models (LLMs), NLRL can be practically implemented to achieve RL-like policy and value training through unsupervised environment interactions. Experiments over 4 multi-step agentic tasks demonstrate NLRL’s effectiveness, efficiency, and its potential to foster deeper understanding and more active learning strategies.
nan
Article 751
Title@2025-05-28 (3): UniMoGen: Universal Motion Generation
Title: UniMoGen: Universal Motion Generation | UniMoGen: Universal Motion Generation | UniMoGen: 宇宙运动一代 2505.21837v1 |
Authors: Aliasghar Khani, Arianna Rampini, Evan Atherton, Bruno Roy
Motion generation is a cornerstone of computer graphics, animation, gaming, and robotics, enabling the creation of realistic and varied character movements. A significant limitation of existing methods is their reliance on specific skeletal structures, which restricts their versatility across different characters. To overcome this, we introduce UniMoGen, a novel UNet-based diffusion model designed for skeleton-agnostic motion generation. UniMoGen can be trained on motion data from diverse characters, such as humans and animals, without the need for a predefined maximum number of joints. By dynamically processing only the necessary joints for each character, our model achieves both skeleton agnosticism and computational efficiency. Key features of UniMoGen include controllability via style and trajectory inputs, and the ability to continue motions from past frames. We demonstrate UniMoGen’s effectiveness on the 100style dataset, where it outperforms state-of-the-art methods in diverse character motion generation. Furthermore, when trained on both the 100style and LAFAN1 datasets, which use different skeletons, UniMoGen achieves high performance and improved efficiency across both skeletons. These results highlight UniMoGen’s potential to advance motion generation by providing a flexible, efficient, and controllable solution for a wide range of character animations.
nan
Article 752
Title@2025-05-27 (2): Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures
Title: Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures | Ableiten von Verkehrsmodellen im Terminal-Luftraum von Flugspuren und -verfahren | 从飞行轨道和程序中推断终端航空空间的交通模式 2303.09981v3 |
Authors: Soyeon Jung, Amelia Hardy, Mykel J. Kochenderfer
Realistic aircraft trajectory models are useful in the design and validation of air traffic management (ATM) systems. Models of aircraft operated under instrument flight rules (IFR) require capturing the variability inherent in how aircraft follow standard flight procedures. The variability in aircraft behavior differs among flight stages. In this paper, we propose a simple probabilistic model that can learn this variability from procedural data and flight tracks collected from radar surveillance data. For each segment, we use a Gaussian mixture model to learn the deviations of aircraft trajectories from their procedures. Given new procedures, we generate synthetic trajectories by sampling a series of deviations from the Gaussian mixture model and reconstructing the aircraft trajectory using the deviations and the procedures. We extend this method to capture pairwise correlations between aircraft and show how a pairwise model can be used to generate traffic involving an arbitrary number of aircraft. We demonstrate the proposed models on the arrival tracks and procedures of the John F. Kennedy International Airport. Distributional similarity between the original and the synthetic trajectory dataset was evaluated using the Jensen-Shannon divergence between the empirical distributions of different variables and we provide qualitative analyses of the synthetic trajectories generated.
nan
Article 753
Title@2025-05-27 (2): TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Title: TuneComp: Joint Fine-tuning and Compression for Large Foundation Models | TuneComp: Gemeinsame Feinabstimmung und Kompression für große Fundamentmodelle | TununComp:大型基金会模型的联合微调和压缩 2505.21835v1 |
Authors: Xiangyu Chen, Jing Liu, Ye Wang, Matthew Brand, Pu, Wang, Toshiaki Koike-Akino
To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine-tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.
nan
Article 754
Title@2025-05-27 (2): Constrained Discrete Diffusion
Title: Constrained Discrete Diffusion | Beschränkte diskrete Diffusion | 限制的分解扩散 2503.09790v2 |
Authors: Michael Cardei, Jacob K Christopher, Thomas Hartvigsen, Brian R. Bartoldson, Bhavya Kailkhura, Ferdinando Fioretto
Discrete diffusion models are a class of generative models that construct sequences by progressively denoising samples from a categorical noise distribution. Beyond their rapidly growing ability to generate coherent natural language, these models present a new and important opportunity to enforce sequence-level constraints, a capability that current autoregressive models cannot natively provide. This paper capitalizes on this opportunity by introducing Constrained Discrete Diffusion (CDD), a novel integration of differentiable constraint optimization within the diffusion process to ensure adherence to constraints, logic rules, or safety requirements for generated sequences. Unlike conventional text generators that often rely on post-hoc filtering or model retraining for controllable generation, CDD directly imposes constraints into the discrete diffusion sampling process, resulting in a training-free and effective approach. Experiments in toxicity-controlled text generation, property-constrained molecule design, and instruction-constrained text completion demonstrate that CDD achieves zero constraint violations in a diverse array of tasks while preserving fluency, novelty, and coherence while outperforming autoregressive and existing discrete diffusion approaches.
nan
Article 755
Title@2025-05-27 (2): In Search of Adam’s Secret Sauce
Title: In Search of Adam’s Secret Sauce | Auf der Suche nach Adams geheimer Sauce | 寻找亚当的秘密香肠 2505.21829v1 |
Authors: Antonio Orvieto, Robert Gower
Understanding the remarkable efficacy of Adam when training transformer-based language models has become a central research topic within the optimization community. To gain deeper insights, several simplifications of Adam have been proposed, such as the signed gradient and signed momentum methods. In this work, we conduct an extensive empirical study - training over 1,300 language models across different data configurations and scales - comparing Adam to several known simplified variants. We find that signed momentum methods are faster than SGD, but consistently underperform relative to Adam, even after careful tuning of momentum, clipping setting and learning rates. However, our analysis reveals a compelling option that preserves near-optimal performance while allowing for new insightful reformulations: constraining the Adam momentum parameters to be equal. Beyond robust performance, this choice affords new theoretical insights, highlights the “secret sauce” on top of signed momentum, and grants a precise statistical interpretation: we show that Adam in this setting implements a natural online algorithm for estimating the mean and variance of gradients-one that arises from a mean-field Gaussian variational inference perspective.
nan
Article 756
Title@2025-05-27 (2): Music Source Restoration
Title: Music Source Restoration | Restaurierung der Musikquelle | 音乐来源恢复 2505.21827v1 |
Authors: Yongyi Zang, Zheqi Dai, Mark D. Plumbley, Qiuqiang Kong
We introduce Music Source Restoration (MSR), a novel task addressing the gap between idealized source separation and real-world music production. Current Music Source Separation (MSS) approaches assume mixtures are simple sums of sources, ignoring signal degradations employed during music production like equalization, compression, and reverb. MSR models mixtures as degraded sums of individually degraded sources, with the goal of recovering original, undegraded signals. Due to the lack of data for MSR, we present RawStems, a dataset annotation of 578 songs with unprocessed source signals organized into 8 primary and 17 secondary instrument groups, totaling 354.13 hours. To the best of our knowledge, RawStems is the first dataset that contains unprocessed music stems with hierarchical categories. We consider spectral filtering, dynamic range compression, harmonic distortion, reverb and lossy codec as possible degradations, and establish U-Former as a baseline method, demonstrating the feasibility of MSR on our dataset. We release the RawStems dataset annotations, degradation simulation pipeline, training code and pre-trained models to be publicly available.
nan
Article 757
Title@2025-05-27 (2): From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
Title: From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization | Von EduVisBench zu EduVisAgent: Ein Benchmark- und Multi-Agent-Framework für eine sinnvolle pädagogische Visualisierung | 从Edu Visb bench到Edu Visbench-Edu VisbearAgender:有理性的可视化教育基准和多机构框架 2505.16832v2 |
Authors: Haonian Ji, Shi Qiu, Siyang Xin, Siwei Han, Zhaorun Chen, Dake Zhang, Hongyi Wang, Huaxiu Yao
While foundation models (FMs), such as diffusion models and large vision-language models (LVLMs), have been widely applied in educational contexts, their ability to generate pedagogically effective visual explanations remains limited. Most existing approaches focus primarily on textual reasoning, overlooking the critical role of structured and interpretable visualizations in supporting conceptual understanding. To better assess the visual reasoning capabilities of FMs in educational settings, we introduce EduVisBench, a multi-domain, multi-level benchmark. EduVisBench features diverse STEM problem sets requiring visually grounded solutions, along with a fine-grained evaluation rubric informed by pedagogical theory. Our empirical analysis reveals that existing models frequently struggle with the inherent challenge of decomposing complex reasoning and translating it into visual representations aligned with human cognitive processes. To address these limitations, we propose EduVisAgent, a multi-agent collaborative framework that coordinates specialized agents for instructional planning, reasoning decomposition, metacognitive prompting, and visualization design. Experimental results show that EduVisAgent substantially outperforms all baselines, achieving a 40.2% improvement and delivering more educationally aligned visualizations. EduVisBench and EduVisAgent are available at https://github.com/aiming-lab/EduVisBench and https://github.com/aiming-lab/EduVisAgent.
nan
Article 758
Title@2025-05-27 (2): Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
Title: Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones | Lassen Sie mich nachdenken! Eine lange Kette des Denkens kann es wert sein, auf jeden Fall viele kurze Menschen | 让我想想吧!一个长期的思考链 可能值得一试 有很多短一个 2505.21825v1 |
Authors: Parsa Mirtaheri, Ezra Edelman, Samy Jelassi, Eran Malach, Enric Boix-Adsera
Inference-time computation has emerged as a promising scaling axis for improving large language model reasoning. However, despite yielding impressive performance, the optimal allocation of inference-time computation remains poorly understood. A central question is whether to prioritize sequential scaling (e.g., longer chains of thought) or parallel scaling (e.g., majority voting across multiple short chains of thought). In this work, we seek to illuminate the landscape of test-time scaling by demonstrating the existence of reasoning settings where sequential scaling offers an exponential advantage over parallel scaling. These settings are based on graph connectivity problems in challenging distributions of graphs. We validate our theoretical findings with comprehensive experiments across a range of language models, including models trained from scratch for graph connectivity with different chain of thought strategies as well as large reasoning models.
nan
Article 759
Title@2025-05-27 (2): Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations
Title: Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations | Unüberwachte Latent Pattern Analyse zur Schätzung des Typ-2-Diabetes-Risikos in nicht diagnostizierten Populationen | 未经监督的对未诊断的人群2型糖尿病风险估算的 2505.21824v1 |
Authors: Praveen Kumar, Vincent T. Metzger, Scott A. Malec
The global prevalence of diabetes, particularly type 2 diabetes mellitus (T2DM), is rapidly increasing, posing significant health and economic challenges. T2DM not only disrupts blood glucose regulation but also damages vital organs such as the heart, kidneys, eyes, nerves, and blood vessels, leading to substantial morbidity and mortality. In the US alone, the economic burden of diagnosed diabetes exceeded $400 billion in 2022. Early detection of individuals at risk is critical to mitigating these impacts. While machine learning approaches for T2DM prediction are increasingly adopted, many rely on supervised learning, which is often limited by the lack of confirmed negative cases. To address this limitation, we propose a novel unsupervised framework that integrates Non-negative Matrix Factorization (NMF) with statistical techniques to identify individuals at risk of developing T2DM. Our method identifies latent patterns of multimorbidity and polypharmacy among diagnosed T2DM patients and applies these patterns to estimate the T2DM risk in undiagnosed individuals. By leveraging data-driven insights from comorbidity and medication usage, our approach provides an interpretable and scalable solution that can assist healthcare providers in implementing timely interventions, ultimately improving patient outcomes and potentially reducing the future health and economic burden of T2DM.
nan
Article 760
Title@2025-05-27 (2): An Innovative Data-Driven and Adaptive Reinforcement Learning Approach for Context-Aware Prescriptive Process Monitoring
Title: An Innovative Data-Driven and Adaptive Reinforcement Learning Approach for Context-Aware Prescriptive Process Monitoring | Ein innovativer datengetriebener und adaptiver Weiterbildungsansatz für die kontext-aware Prescriptive Prozessüberwachung | 采用创新型数据驱动和适应性强化学习方法,用于内容软件指令程序监测 2501.10543v2 |
Authors: Mostafa Abbasi, Maziyar Khadivi, Maryam Ahang, Patricia Lasserre, Yves Lucet, Homayoun Najjaran
The application of artificial intelligence and machine learning in business process management has advanced significantly, however, the full potential of these technologies remains largely unexplored, primarily due to challenges related to data quality and availability. We present a novel framework called Fine-Tuned Offline Reinforcement Learning Augmented Process Sequence Optimization (FORLAPS), which aims to identify optimal execution paths in business processes by leveraging reinforcement learning enhanced with a state-dependent reward shaping mechanism, thereby enabling context-sensitive prescriptions. Additionally, to compare FORLAPS with the existing models (Permutation Feature Importance and multi-task Long Short Term Memory model), we experimented to evaluate its effectiveness in terms of resource savings and process time reduction. The experimental results on real-life event logs validate that FORLAPS achieves 31% savings in resource time spent and a 23% reduction in process time span. To further enhance learning, we introduce an innovative process-aware data augmentation technique that selectively increases the average estimated Q-values in sampled batches, enabling automatic fine-tuning of the reinforcement learning model. Robustness was assessed through both prefix-level and trace-level evaluations, using the Damerau-Levenshtein distance as the primary metric. Finally, the model’s adaptability across industries was further validated through diverse case studies, including healthcare treatment pathways, financial services workflows, permit applications from regulatory bodies, and operations management. In each domain, the proposed model demonstrated exceptional performance, outperforming existing state-of-the-art approaches in prescriptive decision-making, demonstrating its capability to prescribe optimal next steps and predict the best next activities within a process trace.
nan
Article 761
Title@2025-05-27 (2): DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra
Title: DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra | DiffMS: Diffusionserzeugung von Molekülen auf Massenspektren | DiffMS: 受质量光谱约束的分子的扩散生成 2502.09571v2 |
Authors: Montgomery Bohde, Mrunali Manjrekar, Runzhong Wang, Shuiwang Ji, Connor W. Coley
Mass spectrometry plays a fundamental role in elucidating the structures of unknown molecules and subsequent scientific discoveries. One formulation of the structure elucidation task is the conditional de novo generation of molecular structure given a mass spectrum. Toward a more accurate and efficient scientific discovery pipeline for small molecules, we present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task. The encoder utilizes a transformer architecture and models mass spectra domain knowledge such as peak formulae and neutral losses, and the decoder is a discrete graph diffusion model restricted by the heavy-atom composition of a known chemical formula. To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs, which are available in virtually infinite quantities, compared to structure-spectrum pairs that number in the tens of thousands. Extensive experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation. We provide several ablations to demonstrate the effectiveness of our diffusion and pretraining approaches and show consistent performance scaling with increasing pretraining dataset size. DiffMS code is publicly available at https://github.com/coleygroup/DiffMS.
nan
Article 762
Title@2025-05-27 (2): Representative Language Generation
Title: Representative Language Generation | Repräsentative Sprachgenerierung | 代 代 代 语 语 代 语 代 语 代 2505.21819v1 |
Authors: Charlotte Peale, Vinod Raman, Omer Reingold
We introduce “representative generation,” extending the theoretical framework for generation proposed by Kleinberg et al. (2024) and formalized by Li et al. (2024), to additionally address diversity and bias concerns in generative models. Our notion requires outputs of a generative model to proportionally represent groups of interest from the training data. We characterize representative uniform and non-uniform generation, introducing the “group closure dimension” as a key combinatorial quantity. For representative generation in the limit, we analyze both information-theoretic and computational aspects, demonstrating feasibility for countably infinite hypothesis classes and collections of groups under certain conditions, but proving a negative result for computability using only membership queries. This contrasts with Kleinberg et al.’s (2024) positive results for standard generation in the limit. Our findings provide a rigorous foundation for developing more diverse and representative generative models.
nan
Article 763
Title@2025-05-27 (2): Optimizing Data Augmentation through Bayesian Model Selection
Title: Optimizing Data Augmentation through Bayesian Model Selection | Optimierung der Datenvergrößerung durch Bayesian Model Selection | 通过Bayesian模式选择优化数据增加 2505.21813v1 |
Authors: Madi Matymov, Ba-Hien Tran, Michael Kampffmeyer, Markus Heinonen, Maurizio Filippone
Data Augmentation (DA) has become an essential tool to improve robustness and generalization of modern machine learning. However, when deciding on DA strategies it is critical to choose parameters carefully, and this can be a daunting task which is traditionally left to trial-and-error or expensive optimization based on validation performance. In this paper, we counter these limitations by proposing a novel framework for optimizing DA. In particular, we take a probabilistic view of DA, which leads to the interpretation of augmentation parameters as model (hyper)-parameters, and the optimization of the marginal likelihood with respect to these parameters as a Bayesian model selection problem. Due to its intractability, we derive a tractable Evidence Lower BOund (ELBO), which allows us to optimize augmentation parameters jointly with model parameters. We provide extensive theoretical results on variational approximation quality, generalization guarantees, invariance properties, and connections to empirical Bayes. Through experiments on computer vision tasks, we show that our approach improves calibration and yields robust performance over fixed or no augmentation. Our work provides a rigorous foundation for optimizing DA through Bayesian principles with significant potential for robust machine learning.
nan
Article 764
Title@2025-05-27 (2): Learning Enhanced Ensemble Filters
Title: Learning Enhanced Ensemble Filters | Enhanced Ensemble Filter lernen | 学习增强的组合过滤器 2504.17836v2 |
Authors: Eviatar Bach, Ricardo Baptista, Edoardo Calvello, Bohan Chen, Andrew Stuart
The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. This shortcoming is addressed by approximating the mean-field evolution using a novel form of neural operator taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. The derivation of methods from a mean-field formulation allows a single parameterization of the algorithm to be deployed at different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root mean-square-error performance relative to leading methods in filtering the Lorenz 96 and Kuramoto-Sivashinsky models.
nan
Article 765
Title@2025-05-27 (2): ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails
Title: ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails | ThinkGuard: Besonnenes langsames Denken führt zu voreiligen Wärtern | 思考指南:慎重考虑的慢思考引领谨慎警卫车 2502.13458v2 |
Authors: Xiaofei Wen, Wenxuan Zhou, Wenjie Jacky Mo, Muhao Chen
Ensuring the safety of large language models (LLMs) is critical as they are deployed in real-world applications. Existing guardrails rely on rule-based filtering or single-pass classification, limiting their ability to handle nuanced safety violations. To address this, we propose ThinkGuard, a critique-augmented guardrail model that distills knowledge from high-capacity LLMs by generating structured critiques alongside safety labels. Fine-tuned on critique-augmented data, the captured deliberative thinking ability drastically enhances the guardrail’s cautiousness and interpretability. Evaluated on multiple safety benchmarks, ThinkGuard achieves the highest average F1 and AUPRC, outperforming all baselines. Compared to LLaMA Guard 3, ThinkGuard improves accuracy by 16.1% and macro F1 by 27.0%. Moreover, it surpasses label-only fine-tuned models, confirming that structured critiques enhance both classification precision and nuanced safety reasoning while maintaining computational efficiency.
nan
Article 766
Title@2025-05-27 (2): Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect
Title: Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect | Sprachqualitätsdimensionen als Interpretierbare Primitive für sprechenden Stil für atypische Sprache und Affekt | 语音质量方面作为非非典型演讲和影响说话风格的可解释的原始语言 2505.21809v1 |
Authors: Jaya Narain, Vasudha Kowtha, Colin Lea, Lauren Tooley, Dianna Yee, Vikramjit Mitra, Zifang Huang, Miquel Espi Marques, Jon Huang, Carlos Avendano, Shirley Ren
Perceptual voice quality dimensions describe key characteristics of atypical speech and other speech modulations. Here we develop and evaluate voice quality models for seven voice and speech dimensions (intelligibility, imprecise consonants, harsh voice, naturalness, monoloudness, monopitch, and breathiness). Probes were trained on the public Speech Accessibility (SAP) project dataset with 11,184 samples from 434 speakers, using embeddings from frozen pre-trained models as features. We found that our probes had both strong performance and strong generalization across speech elicitation categories in the SAP dataset. We further validated zero-shot performance on additional datasets, encompassing unseen languages and tasks: Italian atypical speech, English atypical speech, and affective speech. The strong zero-shot performance and the interpretability of results across an array of evaluations suggests the utility of using voice quality dimensions in speaking style-related tasks.
nan
Article 767
Title@2025-05-27 (2): Towards Operational Automated Greenhouse Gas Plume Detection
Title: Towards Operational Automated Greenhouse Gas Plume Detection | Auf dem Weg zu einer operationell automatisierten Treibhausgas-Plume-Erkennung | 实现操作性自动温室气体管道探测 2505.21806v1 |
Authors: Brian D. Bue, Jake H. Lee, Andrew K. Thorpe, Philip G. Brodrick, Daniel Cusworth, Alana Ayasse, Vassiliki Mancoridis, Anagha Satish, Shujun Xiong, Riley Duren
Operational deployment of a fully automated greenhouse gas (GHG) plume detection system remains an elusive goal for imaging spectroscopy missions, despite recent advances in deep learning approaches. With the dramatic increase in data availability, however, automation continues to increase in importance for natural and anthropogenic emissions monitoring. This work reviews and addresses several key obstacles in the field: data and label quality control, prevention of spatiotemporal biases, and correctly aligned modeling objectives. We demonstrate through rigorous experiments using multicampaign data from airborne and spaceborne instruments that convolutional neural networks (CNNs) are able to achieve operational detection performance when these obstacles are alleviated. We demonstrate that a multitask model that learns both instance detection and pixelwise segmentation simultaneously can successfully lead towards an operational pathway. We evaluate the model’s plume detectability across emission source types and regions, identifying thresholds for operational deployment. Finally, we provide analysis-ready data, models, and source code for reproducibility, and work to define a set of best practices and validation standards to facilitate future contributions to the field.
nan
Article 768
Title@2025-05-27 (2): From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
Title: From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs | Von der Anfahrt zu den Cones: Erforschung multidimensionaler Darstellungen von Propositional Facts in LLMs | ” 从方向到锥体:探索液晶中各种潜在事实的多层面代表 “ 2505.21800v1 |
Authors: Stanley Yu, Vaidehi Bulusu, Oscar Yasunaga, Clayton Lau, Cole Blondin, Sean O’Brien, Kevin Zhu, Vasu Sharma
Large Language Models (LLMs) exhibit strong conversational abilities but often generate falsehoods. Prior work suggests that the truthfulness of simple propositions can be represented as a single linear direction in a model’s internal activations, but this may not fully capture its underlying geometry. In this work, we extend the concept cone framework, recently introduced for modeling refusal, to the domain of truth. We identify multi-dimensional cones that causally mediate truth-related behavior across multiple LLM families. Our results are supported by three lines of evidence: (i) causal interventions reliably flip model responses to factual statements, (ii) learned cones generalize across model architectures, and (iii) cone-based interventions preserve unrelated model behavior. These findings reveal the richer, multidirectional structure governing simple true/false propositions in LLMs and highlight concept cones as a promising tool for probing abstract behaviors.
nan
Article 769
Title@2025-05-27 (2): PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective
Title: PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective | PolarGrad: Eine Klasse von Matrix-Gradienten-Optimierern aus einer einheitlichen Sicht der Vorkonditionierung | 极地格:从统一前置角度出发的矩阵-高压优化器类别 2505.21799v1 |
Authors: Tim Tsz-Kit Lau, Qi Long, Weijie Su
The ever-growing scale of deep learning models and datasets underscores the critical importance of efficient optimization methods. While preconditioned gradient methods such as Adam and AdamW are the de facto optimizers for training neural networks and large language models, structure-aware preconditioned optimizers like Shampoo and Muon, which utilize the matrix structure of gradients, have demonstrated promising evidence of faster convergence. In this paper, we introduce a unifying framework for analyzing “matrix-aware” preconditioned methods, which not only sheds light on the effectiveness of Muon and related optimizers but also leads to a class of new structure-aware preconditioned methods. A key contribution of this framework is its precise distinction between preconditioning strategies that treat neural network weights as vectors (addressing curvature anisotropy) versus those that consider their matrix structure (addressing gradient anisotropy). This perspective provides new insights into several empirical phenomena in language model pre-training, including Adam’s training instabilities, Muon’s accelerated convergence, and the necessity of learning rate warmup for Adam. Building upon this framework, we introduce PolarGrad, a new class of preconditioned optimization methods based on the polar decomposition of matrix-valued gradients. As a special instance, PolarGrad includes Muon with updates scaled by the nuclear norm of the gradients. We provide numerical implementations of these methods, leveraging efficient numerical polar decomposition algorithms for enhanced convergence. Our extensive evaluations across diverse matrix optimization problems and language model pre-training tasks demonstrate that PolarGrad outperforms both Adam and Muon.
nan
Article 770
Title@2025-05-27 (2): A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging
Title: A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging | Ein General-Purpose-Theorem für hochwahrscheinliche Grenzen stochastischer Annäherung mit Polyak Average | 具有聚氨基挥动作用的斯托克相吸合高概率波断的普通用途理论 2505.21796v1 |
Authors: Sajad Khodadadian, Martin Zubeldia
Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing non-asymptotic concentration bounds for the error of averaged SA iterates. Our approach assumes access to individual concentration bounds for the unaveraged iterates and yields a sharp bound on the averaged iterates. We also construct an example, showing the tightness of our result up to constant multiplicative factors. As direct applications, we derive tight concentration bounds for contractive SA algorithms and for algorithms such as temporal difference learning and Q-learning with averaging, obtaining new bounds in settings where traditional analysis is challenging.
nan
Article 771
Title@2025-05-27 (2): End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding
Title: End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding | End-to-End-Brustkrebs-Radiotherapie Planung über LMMs mit Konsistenz-Embedding | 通过具有一致嵌入的LMMs进行端至端乳腺癌放射治疗规划 2311.15876v4 |
Authors: Kwanyoung Kim, Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Joongyo Lee, Jin Sung Kim, Yong Bae Kim, Jong Chul Ye
Recent advances in AI foundation models have significant potential for lightening the clinical workload by mimicking the comprehensive and multi-faceted approaches used by medical professionals. In the field of radiation oncology, the integration of multiple modalities holds great importance, so the opportunity of foundational model is abundant. Inspired by this, here we present RO-LMM, a multi-purpose, comprehensive large multimodal model (LMM) tailored for the field of radiation oncology. This model effectively manages a series of tasks within the clinical workflow, including clinical context summarization, radiation treatment plan suggestion, and plan-guided target volume segmentation by leveraging the capabilities of LMM. In particular, to perform consecutive clinical tasks without error accumulation, we present a novel Consistency Embedding Fine-Tuning (CEFTune) technique, which boosts LMM’s robustness to noisy inputs while preserving the consistency of handling clean inputs. We further extend this concept to LMM-driven segmentation framework, leading to a novel Consistency Embedding Segmentation (CESEG) techniques. Experimental results including multi-centre validation confirm that our RO-LMM with CEFTune and CESEG results in promising performance for multiple clinical tasks with generalization capabilities.
nan
Article 772
Title@2025-05-27 (2): Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms
Title: Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms | Multimodales Federated Learning: Eine Umfrage durch die Linse verschiedener FL-Paradigmen | 多模式联邦学习:通过不同FL范式的镜头进行调查 2505.21792v1 |
Authors: Yuanzhe Peng, Jieming Bian, Lei Wang, Yin Huang, Jie Xu
Multimodal Federated Learning (MFL) lies at the intersection of two pivotal research areas: leveraging complementary information from multiple modalities to improve downstream inference performance and enabling distributed training to enhance efficiency and preserve privacy. Despite the growing interest in MFL, there is currently no comprehensive taxonomy that organizes MFL through the lens of different Federated Learning (FL) paradigms. This perspective is important because multimodal data introduces distinct challenges across various FL settings. These challenges, including modality heterogeneity, privacy heterogeneity, and communication inefficiency, are fundamentally different from those encountered in traditional unimodal or non-FL scenarios. In this paper, we systematically examine MFL within the context of three major FL paradigms: horizontal FL (HFL), vertical FL (VFL), and hybrid FL. For each paradigm, we present the problem formulation, review representative training algorithms, and highlight the most prominent challenge introduced by multimodal data in distributed settings. We also discuss open challenges and provide insights for future research. By establishing this taxonomy, we aim to uncover the novel challenges posed by multimodal data from the perspective of different FL paradigms and to offer a new lens through which to understand and advance the development of MFL.
nan
Article 773
Title@2025-05-27 (2): LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Title: LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models | LV-XAttn: Verteilte Cross-Attention für lange visuelle Eingänge in multimodalen großen Sprachmodellen | LV-XAttn:多式大语言模型中长视输入分布式交叉注意 2502.02406v3 |
Authors: Tzu-Tao Chang, Shivaram Venkataraman
Cross-attention is commonly adopted in multimodal large language models (MLLMs) for integrating visual information into the language backbone. However, in applications with large visual inputs, such as video understanding, processing a large number of visual tokens in cross-attention layers leads to high memory demands and often necessitates distributed computation across multiple GPUs. Existing distributed attention mechanisms face significant communication overheads, making cross-attention layers a critical bottleneck for efficient training and inference of MLLMs. To address this, we propose LV-XAttn, a distributed, exact cross-attention mechanism with minimal communication overhead. We observe that in applications involving large visual inputs, the size of the query block is typically much smaller than that of the key-value blocks. Thus, in LV-XAttn we keep the large key-value blocks locally on each GPU and exchange smaller query blocks across GPUs. We also introduce an efficient activation recomputation technique to support longer visual context. We theoretically analyze the communication benefits of LV-XAttn and show that it can achieve speedups for a wide range of models. Our evaluations with Llama 3-V, mPLUG-Owl3 and OpenFlamingo models find that LV-XAttn achieves up to 10.62$\times$ end-to-end speedup compared to existing approaches.
nan
Article 774
Title@2025-05-27 (2): Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks
Title: Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks | Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks | 以美元为单位、以美元为单位、以美元为单位、以美元为单位、以目标为单位的全球最小化器 2505.21791v1 |
Authors: Julia Nakhleh, Robert D. Nowak
Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network – i.e., the network with the fewest nonzero parameters or neurons – a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest single-hidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing $\ell^p$ quasinorms of the weights for $0 < p < 1$, a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.
nan
Article 775
Title@2025-05-27 (2): Faster Rates for Private Adversarial Bandits
Title: Faster Rates for Private Adversarial Bandits | Schnellere Preise für private Adversarial Bandits | 私人反盗贼的速率 2505.21790v1 |
Authors: Hilal Asi, Vinod Raman, Kunal Talwar
We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice. For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithm to a private bandit algorithm. Instantiating our conversion with existing non-private bandit algorithms gives a regret upper bound of $O\left(\frac{\sqrt{KT}}{\sqrt{\epsilon}}\right)$, improving upon the existing upper bound $O\left(\frac{\sqrt{KT \log(KT)}}{\epsilon}\right)$ for all $\epsilon \leq 1$. In particular, our algorithms allow for sublinear expected regret even when $\epsilon \leq \frac{1}{\sqrt{T}}$, establishing the first known separation between central and local differential privacy for this problem. For bandits with expert advice, we give the first differentially private algorithms, with expected regret $O\left(\frac{\sqrt{NT}}{\sqrt{\epsilon}}\right), O\left(\frac{\sqrt{KT\log(N)}\log(KT)}{\epsilon}\right)$, and $\tilde{O}\left(\frac{N^{1/6}K^{1/2}T^{2/3}\log(NT)}{\epsilon ^{1/3}} + \frac{N^{1/2}\log(NT)}{\epsilon}\right)$, where $K$ and $N$ are the number of actions and experts respectively. These rates allow us to get sublinear regret for different combinations of small and large $K, N$ and $\epsilon.$
nan
Article 776
Title@2025-05-27 (2): Wanda++: Pruning Large Language Models via Regional Gradients
Title: Wanda++: Pruning Large Language Models via Regional Gradients | Wanda++: Beschneiden großer Sprachmodelle über regionale Gradienten | Wanda+++:通过区域渐变来保护大语言模式 2503.04992v3 |
Authors: Yifan Yang, Kai Zhen, Bhavana Ganesh, Aram Galstyan, Goeric Huybrechts, Markus Müller, Jonas M. Kübler, Rupak Vignesh Swaminathan, Athanasios Mouchtaris, Sravan Babu Bodapati, Nathan Susanj, Zheng Zhang, Jack FitzGerald, Abhishek Kumar
Large Language Models (LLMs) pruning seeks to remove unimportant weights for inference speedup with minimal accuracy impact. However, existing methods often suffer from accuracy degradation without full-model sparsity-aware fine-tuning. This paper presents Wanda++, a novel pruning framework that outperforms the state-of-the-art methods by utilizing decoder-block-level \textbf{regional} gradients. Specifically, Wanda++ improves the pruning score with regional gradients for the first time and proposes an efficient regional optimization method to minimize pruning-induced output discrepancies between the dense and sparse decoder output. Notably, Wanda++ improves perplexity by up to 32\% over Wanda in the language modeling task and generalizes effectively to downstream tasks. Moreover, despite updating weights with regional optimization, Wanda++ remains orthogonal to sparsity-aware fine-tuning, further reducing perplexity with LoRA in great extend. Our approach is lightweight, pruning a 7B LLaMA model in under 10 minutes on a single H100 GPU.
nan
Article 777
Title@2025-05-27 (2): Born a Transformer – Always a Transformer?
Title: Born a Transformer – Always a Transformer? | Geboren ein Transformer - immer ein Transformer? | 天生的变形人 - - 总是变形人? 2505.21785v1 |
Authors: Yana Veitsman, Mayank Jobanputra, Yash Sarrof, Aleksandra Bakalova, Vera Demberg, Ellie Pavlick, Michael Hahn
Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these constraints in practice due to the scale of both the models themselves and their pretraining data. We explore how these architectural constraints manifest after pretraining, by studying a family of $\textit{retrieval}$ and $\textit{copying}$ tasks inspired by Liu et al. [2024]. We use the recently proposed C-RASP framework for studying length generalization [Huang et al., 2025b] to provide guarantees for each of our settings. Empirically, we observe an $\textit{induction-versus-anti-induction}$ asymmetry, where pretrained models are better at retrieving tokens to the right (induction) rather than the left (anti-induction) of a query token. This asymmetry disappears upon targeted fine-tuning if length-generalization is guaranteed by theory. Mechanistic analysis reveals that this asymmetry is connected to the differences in the strength of induction versus anti-induction circuits within pretrained Transformers. We validate our findings through practical experiments on real-world tasks demonstrating reliability risks. Our results highlight that pretraining selectively enhances certain Transformer capabilities, but does not overcome fundamental length-generalization limits.
nan
Article 778
Title@2025-05-27 (2): Universal Approximation of Mean-Field Models via Transformers
Title: Universal Approximation of Mean-Field Models via Transformers | Universelle Annäherung von Mittelwert-Feld-Modellen über Transformer | 通过变压器实现平均实地模型普遍接近 2410.16295v2 |
Authors: Shiba Biswal, Karthik Elamvazhuthi, Rishi Sonthalia
This paper investigates the use of transformers to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, leading to permutation-equivariant dynamics. First, we empirically demonstrate that transformers are well-suited for approximating a variety of mean field models, including the Cucker-Smale model for flocking and milling, and the mean-field system for training two-layer neural networks. We validate our numerical experiments via mathematical theory. Specifically, we prove that if a finite-dimensional transformer effectively approximates the finite-dimensional vector field governing the particle system, then the $L_2$ distance between the \textit{expected transformer} and the infinite-dimensional mean-field vector field can be uniformly bounded by a function of the number of particles observed during training. Leveraging this result, we establish theoretical bounds on the distance between the true mean-field dynamics and those obtained using the transformer.
nan
Article 779
Title@2025-05-27 (2): Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Title: Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models | Wasserzeichen im Sand: Unmöglichkeit der starken Wasserzeichen für generative Modelle | 沙沙中的水印:在生成模型中使用强水标志的可能性 2311.04378v5 |
Authors: Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, Boaz Barak
Watermarking generative models consists of planting a statistical signal (watermark) in a model’s output so that it can be later verified that the output was generated by the given model. A strong watermarking scheme satisfies the property that a computationally bounded attacker cannot erase the watermark without causing significant quality degradation. In this paper, we study the (im)possibility of strong watermarking schemes. We prove that, under well-specified and natural assumptions, strong watermarking is impossible to achieve. This holds even in the private detection algorithm setting, where the watermark insertion and detection algorithms share a secret key, unknown to the attacker. To prove this result, we introduce a generic efficient watermark attack; the attacker is not required to know the private key of the scheme or even which scheme is used. Our attack is based on two assumptions: (1) The attacker has access to a “quality oracle” that can evaluate whether a candidate output is a high-quality response to a prompt, and (2) The attacker has access to a “perturbation oracle” which can modify an output with a nontrivial probability of maintaining quality, and which induces an efficiently mixing random walk on high-quality outputs. We argue that both assumptions can be satisfied in practice by an attacker with weaker computational capabilities than the watermarked model itself, to which the attacker has only black-box access. Furthermore, our assumptions will likely only be easier to satisfy over time as models grow in capabilities and modalities. We demonstrate the feasibility of our attack by instantiating it to attack three existing watermarking schemes for large language models: Kirchenbauer et al. (2023), Kuditipudi et al. (2023), and Zhao et al. (2023). The same attack successfully removes the watermarks planted by all three schemes, with only minor quality degradation.
nan
Article 780
Title@2025-05-27 (2): P-DROP: Poisson-Based Dropout for Graph Neural Networks
Title: P-DROP: Poisson-Based Dropout for Graph Neural Networks | P-DROP: Poisson-basiertes Dropout für Graphen-Neural-Netzwerke | PDROP: 石形神经网络的 Poisson-Poisson 辍学 2505.21783v1 |
Authors: Hyunsik Yun
Over-smoothing remains a major challenge in Graph Neural Networks (GNNs), where repeated message passing causes node representations to converge and lose discriminative power. To address this, we propose a novel node selection strategy based on Poisson processes, introducing stochastic but structure-aware updates. Specifically, we equip each node with an independent Poisson clock, enabling asynchronous and localized updates that preserve structural diversity. We explore two applications of this strategy: as a replacement for dropout-based regularization and as a dynamic subgraph training scheme. Experimental results on standard benchmarks (Cora, Citeseer, Pubmed) demonstrate that our Poisson-based method yields competitive or improved accuracy compared to traditional Dropout, DropEdge, and DropNode approaches, particularly in later training stages.
nan
Article 781
Title@2025-05-27 (2): Diffusion Adversarial Post-Training for One-Step Video Generation
Title: Diffusion Adversarial Post-Training for One-Step Video Generation | Diffusions-Adversarial-Post-Training für die One-Step-Videogenerierung | 单步制录像制作单步制片后培训 2501.08316v2 |
Authors: Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang
The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model, Seaweed-APT, can generate 2-second, 1280x720, 24fps videos in real time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.
nan
Article 782
Title@2025-05-27 (2): Memorization to Generalization: Emergence of Diffusion Models from Associative Memory
Title: Memorization to Generalization: Emergence of Diffusion Models from Associative Memory | Erinnerung an die Verallgemeinerung: Entstehung von Diffusionsmodellen aus dem assoziativen Gedächtnis | 记忆化为普遍化:共同内存传播模型的出现 2505.21777v1 |
Authors: Bao Pham, Gabriel Raya, Matteo Negri, Mohammed J. Zaki, Luca Ambrogioni, Dmitry Krotov
Hopfield networks are associative memory (AM) systems, designed for storing and retrieving patterns as local minima of an energy landscape. In the classical Hopfield model, an interesting phenomenon occurs when the amount of training data reaches its critical memory load $- spurious\,\,states$, or unintended stable points, emerge at the end of the retrieval dynamics, leading to incorrect recall. In this work, we examine diffusion models, commonly used in generative modeling, from the perspective of AMs. The training phase of diffusion model is conceptualized as memory encoding (training data is stored in the memory). The generation phase is viewed as an attempt of memory retrieval. In the small data regime the diffusion model exhibits a strong memorization phase, where the network creates distinct basins of attraction around each sample in the training set, akin to the Hopfield model below the critical memory load. In the large data regime, a different phase appears where an increase in the size of the training set fosters the creation of new attractor states that correspond to manifolds of the generated samples. Spurious states appear at the boundary of this transition and correspond to emergent attractor states, which are absent in the training set, but, at the same time, have distinct basins of attraction around them. Our findings provide: a novel perspective on the memorization-generalization phenomenon in diffusion models via the lens of AMs, theoretical prediction of existence of spurious states, empirical validation of this prediction in commonly-used diffusion models.
nan
Article 783
Title@2025-05-27 (2): DualSchool: How Reliable are LLMs for Optimization Education?
Title: DualSchool: How Reliable are LLMs for Optimization Education? | DualSchool: Wie zuverlässig sind LLMs für die Optimierungsbildung? | 两所学校:优化教育LLMs有多可靠? 2505.21775v1 |
Authors: Michael Klamkin, Arnaud Deza, Sikai Cheng, Haoruo Zhao, Pascal Van Hentenryck
Consider the following task taught in introductory optimization courses which addresses challenges articulated by the community at the intersection of (generative) AI and OR: generate the dual of a linear program. LLMs, being trained at web-scale, have the conversion process and many instances of Primal to Dual Conversion (P2DC) at their disposal. Students may thus reasonably expect that LLMs would perform well on the P2DC task. To assess this expectation, this paper introduces DualSchool, a comprehensive framework for generating and verifying P2DC instances. The verification procedure of DualSchool uses the Canonical Graph Edit Distance, going well beyond existing evaluation methods for optimization models, which exhibit many false positives and negatives when applied to P2DC. Experiments performed by DualSchool reveal interesting findings. Although LLMs can recite the conversion procedure accurately, state-of-the-art open LLMs fail to consistently produce correct duals. This finding holds even for the smallest two-variable instances and for derivative tasks, such as correctness, verification, and error classification. The paper also discusses the implications for educators, students, and the development of large reasoning systems.
nan
Article 784
Title@2025-05-27 (2): Backdoors in DRL: Four Environments Focusing on In-distribution Triggers
Title: Backdoors in DRL: Four Environments Focusing on In-distribution Triggers | Hintertüren in DRL: Vier Umgebungen mit Fokus auf In-Distribution Trigger | DRL的后门:四个环境,侧重于内部分配触发器 2505.17248v2 |
Authors: Chace Ashcraft, Ted Staley, Josh Carney, Cameron Hickert, Kiran Karra, Nathan Drenkow
Backdoor attacks, or trojans, pose a security risk by concealing undesirable behavior in deep neural network models. Open-source neural networks are downloaded from the internet daily, possibly containing backdoors, and third-party model developers are common. To advance research on backdoor attack mitigation, we develop several trojans for deep reinforcement learning (DRL) agents. We focus on in-distribution triggers, which occur within the agent’s natural data distribution, since they pose a more significant security threat than out-of-distribution triggers due to their ease of activation by the attacker during model deployment. We implement backdoor attacks in four reinforcement learning (RL) environments: LavaWorld, Randomized LavaWorld, Colorful Memory, and Modified Safety Gymnasium. We train various models, both clean and backdoored, to characterize these attacks. We find that in-distribution triggers can require additional effort to implement and be more challenging for models to learn, but are nevertheless viable threats in DRL even using basic data poisoning attacks.
nan
Article 785
Title@2025-05-27 (2): Beyond 1D: Vision Transformers and Multichannel Signal Images for PPG-to-ECG Reconstruction
Title: Beyond 1D: Vision Transformers and Multichannel Signal Images for PPG-to-ECG Reconstruction | Beyond 1D: Vision Transformers und Multichannel Signal Images für PPG-zu-ECG-Rekonstruktion | 1D之后:为重建PPPG至ECG提供愿景变形器和多通道信号图像 2505.21767v1 |
Authors: Xiaoyan Li, Shixin Xu, Faisal Habib, Arvind Gupta, Huaxiong Huang
Reconstructing ECG from PPG is a promising yet challenging task. While recent advancements in generative models have significantly improved ECG reconstruction, accurately capturing fine-grained waveform features remains a key challenge. To address this, we propose a novel PPG-to-ECG reconstruction method that leverages a Vision Transformer (ViT) as the core network. Unlike conventional approaches that rely on single-channel PPG, our method employs a four-channel signal image representation, incorporating the original PPG, its first-order difference, second-order difference, and area under the curve. This multi-channel design enriches feature extraction by preserving both temporal and physiological variations within the PPG. By leveraging the self-attention mechanism in ViT, our approach effectively captures both inter-beat and intra-beat dependencies, leading to more robust and accurate ECG reconstruction. Experimental results demonstrate that our method consistently outperforms existing 1D convolution-based approaches, achieving up to 29% reduction in PRD and 15% reduction in RMSE. The proposed approach also produces improvements in other evaluation metrics, highlighting its robustness and effectiveness in reconstructing ECG signals. Furthermore, to ensure a clinically relevant evaluation, we introduce new performance metrics, including QRS area error, PR interval error, RT interval error, and RT amplitude difference error. Our findings suggest that integrating a four-channel signal image representation with the self-attention mechanism of ViT enables more effective extraction of informative PPG features and improved modeling of beat-to-beat variations for PPG-to-ECG mapping. Beyond demonstrating the potential of PPG as a viable alternative for heart activity monitoring, our approach opens new avenues for cyclic signal analysis and prediction.
nan
Article 786
Title@2025-05-27 (2): Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop
Title: Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop | Erklärbare multimodale Zeitreihenvorhersage mit LLM-in-the-Loop | 与LLM in-Loop的可解释的多时时间序列预测 2503.01013v2 |
Authors: Yushan Jiang, Wenchao Yu, Geon Lee, Dongjin Song, Kijung Shin, Wei Cheng, Yanchi Liu, Haifeng Chen
Time series analysis provides essential insights for real-world system dynamics and informs downstream decision-making, yet most existing methods often overlook the rich contextual signals present in auxiliary modalities. To bridge this gap, we introduce TimeXL, a multi-modal prediction framework that integrates a prototype-based time series encoder with three collaborating Large Language Models (LLMs) to deliver more accurate predictions and interpretable explanations. First, a multi-modal prototype-based encoder processes both time series and textual inputs to generate preliminary forecasts alongside case-based rationales. These outputs then feed into a prediction LLM, which refines the forecasts by reasoning over the encoder’s predictions and explanations. Next, a reflection LLM compares the predicted values against the ground truth, identifying textual inconsistencies or noise. Guided by this feedback, a refinement LLM iteratively enhances text quality and triggers encoder retraining. This closed-loop workflow – prediction, critique (reflect), and refinement – continuously boosts the framework’s performance and interpretability. Empirical evaluations on four real-world datasets demonstrate that TimeXL achieves up to 8.9\% improvement in AUC and produces human-centric, multi-modal explanations, highlighting the power of LLM-driven reasoning for time series prediction.
nan
Article 787
Title@2025-05-27 (2): TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
Title: TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster | TS-RAG: Retrieval-Augmented Generation basierte Time Series Foundation Modelle sind stärker Zero-Shot Forecaster | TS-RAG:基于时间序列的回收-养殖一代基于时间序列的基础模型是更强的零热预测仪 2503.07649v3 |
Authors: Kanghui Ning, Zijie Pan, Yu Liu, Yushan Jiang, James Y. Zhang, Kashif Rasul, Anderson Schneider, Lintao Ma, Yuriy Nevmyvaka, Dongjin Song
Large Language Models (LLMs) and Foundation Models (FMs) have recently become prevalent for time series forecasting tasks. While fine-tuning LLMs enables domain adaptation, they often struggle to generalize across diverse and unseen datasets. Moreover, existing Time Series Foundation Models (TSFMs) still face challenges in handling non-stationary dynamics and distribution shifts, largely due to the lack of effective mechanisms for adaptation. To this end, we present TS-RAG, a retrieval-augmented generation framework for time series forecasting that enhances the generalization and interpretability of TSFMs. Specifically, TS-RAG leverages pre-trained time series encoders to retrieve semantically relevant segments from a dedicated knowledge base, enriching the contextual representation of the input query. Furthermore, we propose an Adaptive Retrieval Mixer (ARM) module that dynamically fuses the retrieved patterns with the TSFM’s internal representation, improving forecasting accuracy without requiring task-specific fine-tuning. Thorough empirical studies on seven public benchmark datasets demonstrate that TS-RAG achieves state-of-the-art zero-shot forecasting performance, outperforming the existing TSFMs by up to 6.84% across diverse domains while also providing desirable interpretability.
nan
Article 788
Title@2025-05-27 (2): Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization
Title: Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization | Puristische Korrelationen in der hochdimensionalen Regression: Die Rollen der Regularisierung, der Einfachheit Bias und der Überparameterisierung | 高度倒退中的纯净误值:常规化、简易生物和过度计量化的作用 2502.01347v2 |
Authors: Simone Bombari, Marco Mondelli
Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data, with negative implications on robustness, bias and fairness. In this work, we provide a statistical characterization of this phenomenon for high-dimensional regression, when the data contains a predictive core feature $x$ and a spurious feature $y$. Specifically, we quantify the amount of spurious correlations $C$ learned via linear regression, in terms of the data covariance and the strength $\lambda$ of the ridge regularization. As a consequence, we first capture the simplicity of $y$ through the spectrum of its covariance, and its correlation with $x$ through the Schur complement of the full data covariance. Next, we prove a trade-off between $C$ and the in-distribution test loss $L$, by showing that the value of $\lambda$ that minimizes $L$ lies in an interval where $C$ is increasing. Finally, we investigate the effects of over-parameterization via the random features model, by showing its equivalence to regularized linear regression. Our theoretical results are supported by numerical experiments on Gaussian, Color-MNIST, and CIFAR-10 datasets.
nan
Article 789
Title@2025-05-27 (2): FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
Title: FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering | FRAMES-VQA: Benchmarking Fine-Tuning Robustheit über Multi-Modal Shifts in der visuellen Fragestellung | FRAMES-VQA:确定视觉问题解答中多模式变化的精确调整强度基准 2505.21755v1 |
Authors: Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra, Zsolt Kira
Visual question answering (VQA) systems face significant challenges when adapting to real-world data shifts, especially in multi-modal contexts. While robust fine-tuning strategies are essential for maintaining performance across in-distribution (ID) and out-of-distribution (OOD) scenarios, current evaluation settings are primarily unimodal or particular to some types of OOD, offering limited insight into the complexities of multi-modal contexts. In this work, we propose a new benchmark FRAMES-VQA (Fine-Tuning Robustness across Multi-Modal Shifts in VQA) for evaluating robust fine-tuning for VQA tasks. We utilize ten existing VQA benchmarks, including VQAv2, IV-VQA, VQA-CP, OK-VQA and others, and categorize them into ID, near and far OOD datasets covering uni-modal, multi-modal and adversarial distribution shifts. We first conduct a comprehensive comparison of existing robust fine-tuning methods. We then quantify the distribution shifts by calculating the Mahalanobis distance using uni-modal and multi-modal embeddings extracted from various models. Further, we perform an extensive analysis to explore the interactions between uni- and multi-modal shifts as well as modality importance for ID and OOD samples. These analyses offer valuable guidance on developing more robust fine-tuning methods to handle multi-modal distribution shifts. The code is available at https://github.com/chengyuehuang511/FRAMES-VQA .
nan
Article 790
Title@2025-05-27 (2): Path Planning for Masked Diffusion Model Sampling
Title: Path Planning for Masked Diffusion Model Sampling | Pfadplanung für maskierte Diffusions-Modell-Probenahme | 蒙面扩散模型取样规划路径 2502.03540v4 |
Authors: Fred Zhangzhi Peng, Zachary Bezemek, Sawan Patel, Jarrid Rector-Brooks, Sherwood Yao, Avishek Joey Bose, Alexander Tong, Pranam Chatterjee
Any order generation of discrete data using masked diffusion models (MDMs) offers a compelling alternative to traditional autoregressive models, especially in domains that lack a natural causal ordering of data. However, current popular MDMs depart from their successful continuous diffusion model counterparts with simplified masked inference wherein unmasked tokens cannot be iteratively refined – even if there is a mistake. In this paper, we extract the full power of MDMs by introducing a novel inference sampling strategy termed Path Planning (P2) that decomposes each generation step into two sub-stages: planning and denoising. Under P2, the planner at every step selects appropriate tokens that are marked to be updated, which can then be sampled using the denoiser. We demonstrate that P2 generalizes all existing sampling strategies for MDMs and critically enhances generative quality through the new capability of refining and updating existing unmasked tokens. We theoretically prove that P2 establishes a (new) expanded evidence lower bound (ELBO) on the log marginal likelihood of data. We instantiate P2 with a family of planners including: 1.) Self-Planning, 2.) BERT-Planning, and 3.) Trained-Planning with a learned planner leading to SOTA generative performance for MDMs on a suite of domains. Specifically, solely using P2 inference, we observe relative improvements of 22% in protein sequence foldability, 8% in RNA sequence pLDDT, 4% in math reasoning, 68% in story generation (ROUGE score), and 33% in code generation for the challenging pass@1 metric.
nan
Article 791
Title@2025-05-27 (2): Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals
Title: Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals | Hierarchisches Stärkungslernen mit unsicheren, diffusionalen Unterzielen | 具有不确定性的梯级强化学习,有不确定的辅助分传播目标 2505.21750v1 |
Authors: Vivienne Huiling Wang, Tinghuai Wang, Joni Pajarinen
Hierarchical reinforcement learning (HRL) learns to make decisions on multiple levels of temporal abstraction. A key challenge in HRL is that the low-level policy changes over time, making it difficult for the high-level policy to generate effective subgoals. To address this issue, the high-level policy must capture a complex subgoal distribution while also accounting for uncertainty in its estimates. We propose an approach that trains a conditional diffusion model regularized by a Gaussian Process (GP) prior to generate a complex variety of subgoals while leveraging principled GP uncertainty quantification. Building on this framework, we develop a strategy that selects subgoals from both the diffusion policy and GP’s predictive mean. Our approach outperforms prior HRL methods in both sample efficiency and performance on challenging continuous control benchmarks.
nan
Article 792
Title@2025-05-27 (2): Revisiting Bi-Linear State Transitions in Recurrent Neural Networks
Title: Revisiting Bi-Linear State Transitions in Recurrent Neural Networks | Bi-Lineare State Transitions in recurrenten neuralen Netzwerken erneut besuchen | 在经常性神经网络中重新审查双利那尔州过渡 2505.21749v1 |
Authors: M. Reza Ebrahimi, Roland Memisevic
The role of hidden units in recurrent neural networks is typically seen as modeling memory, with research focusing on enhancing information retention through gating mechanisms. A less explored perspective views hidden units as active participants in the computation performed by the network, rather than passive memory stores. In this work, we revisit bi-linear operations, which involve multiplicative interactions between hidden units and input embeddings. We demonstrate theoretically and empirically that they constitute a natural inductive bias for representing the evolution of hidden states in state tracking tasks. These are the simplest type of task that require hidden units to actively contribute to the behavior of the network. We also show that bi-linear state updates form a natural hierarchy corresponding to state tracking tasks of increasing complexity, with popular linear recurrent networks such as Mamba residing at the lowest-complexity center of that hierarchy.
nan
Article 793
Title@2025-05-27 (2): Privacy for Free in the Overparameterized Regime
Title: Privacy for Free in the Overparameterized Regime | Privatsphäre kostenlos im überparameterisierten Regime | 过度计量制度中的免费隐私 2410.14787v2 |
Authors: Simone Bombari, Marco Mondelli
Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data. In the last decade, the problem of understanding its performance cost with respect to standard GD has received remarkable attention from the research community, which formally derived upper bounds on the excess population risk $R_{P}$ in different learning settings. However, existing bounds typically degrade with over-parameterization, i.e., as the number of parameters $p$ gets larger than the number of training samples $n$ – a regime which is ubiquitous in current deep-learning practice. As a result, the lack of theoretical insights leaves practitioners without clear guidance, leading some to reduce the effective number of trainable parameters to improve performance, while others use larger models to achieve better results through scale. In this work, we show that in the popular random features model with quadratic loss, for any sufficiently large $p$, privacy can be obtained for free, i.e., $\left | R_{P} \right | = o(1)$, not only when the privacy parameter $\varepsilon$ has constant order, but also in the strongly private setting $\varepsilon = o(1)$. This challenges the common wisdom that over-parameterization inherently hinders performance in private learning. |
nan
Article 794
Title@2025-05-27 (2): Learning to See More: UAS-Guided Super-Resolution of Satellite Imagery for Precision Agriculture
Title: Learning to See More: UAS-Guided Super-Resolution of Satellite Imagery for Precision Agriculture | Mehr erfahren: UAS-geführte Super-Resolution von Satellitenbildern für Präzisionslandwirtschaft | 学习更多见:UAS-UAS指导的精密农业卫星图像超级分辨率 2505.21746v1 |
Authors: Arif Masrur, Peder A. Olsen, Paul R. Adler, Carlan Jackson, Matthew W. Myers, Nathan Sedghi, Ray R. Weil
Unmanned Aircraft Systems (UAS) and satellites are key data sources for precision agriculture, yet each presents trade-offs. Satellite data offer broad spatial, temporal, and spectral coverage but lack the resolution needed for many precision farming applications, while UAS provide high spatial detail but are limited by coverage and cost, especially for hyperspectral data. This study presents a novel framework that fuses satellite and UAS imagery using super-resolution methods. By integrating data across spatial, spectral, and temporal domains, we leverage the strengths of both platforms cost-effectively. We use estimation of cover crop biomass and nitrogen (N) as a case study to evaluate our approach. By spectrally extending UAS RGB data to the vegetation red edge and near-infrared regions, we generate high-resolution Sentinel-2 imagery and improve biomass and N estimation accuracy by 18% and 31%, respectively. Our results show that UAS data need only be collected from a subset of fields and time points. Farmers can then 1) enhance the spectral detail of UAS RGB imagery; 2) increase the spatial resolution by using satellite data; and 3) extend these enhancements spatially and across the growing season at the frequency of the satellite flights. Our SRCNN-based spectral extension model shows considerable promise for model transferability over other cropping systems in the Upper and Lower Chesapeake Bay regions. Additionally, it remains effective even when cloud-free satellite data are unavailable, relying solely on the UAS RGB input. The spatial extension model produces better biomass and N predictions than models built on raw UAS RGB images. Once trained with targeted UAS RGB data, the spatial extension model allows farmers to stop repeated UAS flights. While we introduce super-resolution advances, the core contribution is a lightweight and scalable system for affordable on-farm use.
nan
Article 795
Title@2025-05-27 (2): Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen
Title: Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen | Das Unsichtbare simulieren: Crash Prediction muss lernen, was nicht passiert ist | 模拟看不见:崩溃预测必须从没有发生的事情中吸取教训 2505.21743v1 |
Authors: Zihao Li, Xinyuan Cao, Xiangbo Gao, Kexin Tian, Keshu Wu, Mohammad Anis, Hao Zhang, Keke Long, Jiwan Jiang, Xiaopeng Li, Yunlong Zhang, Tianbao Yang, Dominique Lord, Zhengzhong Tu, Yang Zhou
Traffic safety science has long been hindered by a fundamental data paradox: the crashes we most wish to prevent are precisely those events we rarely observe. Existing crash-frequency models and surrogate safety metrics rely heavily on sparse, noisy, and under-reported records, while even sophisticated, high-fidelity simulations undersample the long-tailed situations that trigger catastrophic outcomes such as fatalities. We argue that the path to achieving Vision Zero, i.e., the complete elimination of traffic fatalities and severe injuries, requires a paradigm shift from traditional crash-only learning to a new form of counterfactual safety learning: reasoning not only about what happened, but also about the vast set of plausible yet perilous scenarios that could have happened under slightly different circumstances. To operationalize this shift, our proposed agenda bridges macro to micro. Guided by crash-rate priors, generative scene engines, diverse driver models, and causal learning, near-miss events are synthesized and explained. A crash-focused digital twin testbed links micro scenes to macro patterns, while a multi-objective validator ensures that simulations maintain statistical realism. This pipeline transforms sparse crash data into rich signals for crash prediction, enabling the stress-testing of vehicles, roads, and policies before deployment. By learning from crashes that almost happened, we can shift traffic safety from reactive forensics to proactive prevention, advancing Vision Zero.
nan
Article 796
Title@2025-05-27 (2): Outlier-Robust Linear System Identification Under Heavy-tailed Noise
Title: Outlier-Robust Linear System Identification Under Heavy-tailed Noise | Ausreißer-Robust Lineare System-Identifikation unter stark verdichtetem Lärm | 在重尾噪音下识别线性系统 2501.00421v2 |
Authors: Vinay Kanakeri, Aritra Mitra
We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system, given access to multiple independent trajectories sampled from the system. Several recent papers have conducted a non-asymptotic analysis of this problem, relying crucially on the assumption that the process noise is either Gaussian or sub-Gaussian, i.e., “light-tailed”. In sharp contrast, we work under a significantly weaker noise model, assuming nothing more than the existence of the fourth moment of the noise distribution. For this setting, we provide the first set of results demonstrating that one can obtain sample-complexity bounds for linear system identification that are nearly of the same order as under sub-Gaussian noise. To achieve such results, we develop a novel robust system identification algorithm that relies on constructing multiple weakly-concentrated estimators, and then boosting their performance using suitable tools from high-dimensional robust statistics. Interestingly, our analysis reveals how the kurtosis of the noise distribution, a measure of heavy-tailedness, affects the number of trajectories needed to achieve desired estimation error bounds. Finally, we show that our algorithm and analysis technique can be easily extended to account for scenarios where an adversary can arbitrarily corrupt a small fraction of the collected trajectory data. Our work takes the first steps towards building a robust statistical learning theory for control under non-ideal assumptions on the data-generating process.
nan
Article 797
Title@2025-05-27 (2): What is Adversarial Training for Diffusion Models?
Title: What is Adversarial Training for Diffusion Models? | Was ist ein Adversarial Training für Diffusionsmodelle? | 传播模型的反向培训是什么? 2505.21742v1 |
Authors: Briglia Maria Rosaria, Mujtaba Hussain Mirza, Giuseppe Lisanti, Iacopo Masi
We answer the question in the title, showing that adversarial training (AT) for diffusion models (DMs) fundamentally differs from classifiers: while AT in classifiers enforces output invariance, AT in DMs requires equivariance to keep the diffusion process aligned with the data distribution. AT is a way to enforce smoothness in the diffusion flow, improving robustness to outliers and corrupted data. Unlike prior art, our method makes no assumptions about the noise model and integrates seamlessly into diffusion training by adding random noise, similar to randomized smoothing, or adversarial noise, akin to AT. This enables intrinsic capabilities such as handling noisy data, dealing with extreme variability such as outliers, preventing memorization, and improving robustness. We rigorously evaluate our approach with proof-of-concept datasets with known distributions in low- and high-dimensional space, thereby taking a perfect measure of errors; we further evaluate on standard benchmarks such as CIFAR-10, CelebA and LSUN Bedroom, showing strong performance under severe noise, data corruption, and iterative adversarial attacks.
nan
Article 798
Title@2025-05-27 (2): Polynomial Chaos Expanded Gaussian Process
Title: Polynomial Chaos Expanded Gaussian Process | Polynomisches Chaos erweiterter Gauß-Prozess | 扩大的高斯进程 2405.01052v2 |
Authors: Dominik Polke, Tim Kösters, Elmar Ahle, Dirk Söffker
In complex and unknown processes, global models are initially generated over the entire experimental space but often fail to provide accurate predictions in local areas. A common approach is to use local models, which requires partitioning the experimental space and training multiple models, adding significant complexity. Recognizing this limitation, this study addresses the need for models that effectively represent both global and local experimental spaces. It introduces a novel machine learning (ML) approach: Polynomial Chaos Expanded Gaussian Process (PCEGP), leveraging polynomial chaos expansion (PCE) to calculate input-dependent hyperparameters of the Gaussian process (GP). This provides a mathematically interpretable approach that incorporates non-stationary covariance functions and heteroscedastic noise estimation to generate locally adapted models. The model performance is compared to different algorithms in benchmark tests for regression tasks. The results demonstrate low prediction errors of the PCEGP, highlighting model performance that is often competitive with or better than previous methods. A key advantage of the presented model is its interpretable hyperparameters along with training and prediction runtimes comparable to those of a GP.
nan
Article 799
Title@2025-05-27 (2): Moment kernels: a simple and scalable approach for equivariance to rotations and reflections in deep convolutional networks
Title: Moment kernels: a simple and scalable approach for equivariance to rotations and reflections in deep convolutional networks | Momentkerne: ein einfacher und skalierbarer Ansatz für Gleichmäßigkeit zu Rotationen und Reflexionen in tiefen konvolutionären Netzwerken | 动力核心:一种简单和可伸缩的方法,在深刻的革命网络中,对轮换和反射的等同性采取简单和可伸缩的办法 2505.21736v1 |
Authors: Zachary Schlamowitz, Andrew Bennecke, Daniel J. Tward
The principle of translation equivariance (if an input image is translated an output image should be translated by the same amount), led to the development of convolutional neural networks that revolutionized machine vision. Other symmetries, like rotations and reflections, play a similarly critical role, especially in biomedical image analysis, but exploiting these symmetries has not seen wide adoption. We hypothesize that this is partially due to the mathematical complexity of methods used to exploit these symmetries, which often rely on representation theory, a bespoke concept in differential geometry and group theory. In this work, we show that the same equivariance can be achieved using a simple form of convolution kernels that we call ``moment kernels,’’ and prove that all equivariant kernels must take this form. These are a set of radially symmetric functions of a spatial position $x$, multiplied by powers of the components of $x$ or the identity matrix. We implement equivariant neural networks using standard convolution modules, and provide architectures to execute several biomedical image analysis tasks that depend on equivariance principles: classification (outputs are invariant under orthogonal transforms), 3D image registration (outputs transform like a vector), and cell segmentation (quadratic forms defining ellipses transform like a matrix).
nan
Article 800
Title@2025-05-27 (2): Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Title: Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization | Adressierung von Konzept-Mislabeling in Konzept-Bottleneck-Modellen durch Preference-Optimierung | 通过优先优化处理概念瓶颈模式中的概念误贴标签问题 2504.18026v2 |
Authors: Emiliano Penaloza, Tianyue H. Zhan, Laurent Charlin, Mateo Espinosa Zarlenga
Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of AI systems by constraining their decisions on a set of human understandable concepts. However, CBMs typically assume that datasets contains accurate concept labels an assumption often violated in practice, which we show can significantly degrade performance (by 25% in some cases). To address this, we introduce the Concept Preference Optimization (CPO) objective, a new loss function based on Direct Preference Optimization, which effectively mitigates the negative impact of concept mislabeling on CBM performance. We provide an analysis on some key properties of the CPO objective showing it directly optimizes for the concept’s posterior distribution, and contrast it against Binary Cross Entropy (BCE) where we show CPO is inherently less sensitive to concept noise. We empirically confirm our analysis finding that CPO consistently outperforms BCE in three real world datasets with and without added label noise.
nan
Article 801
Title@2025-05-27 (2): Non-Markovian Discrete Diffusion with Causal Language Models
Title: Non-Markovian Discrete Diffusion with Causal Language Models | Nicht-Markovianische Diskrepanz mit kausalen Sprachmodellen | 非马尔科维语非马尔科维语分辨语言模式的传播 2502.09767v2 |
Authors: Yangtian Zhang, Sizhuang He, Daniel Levine, Lawrence Zhao, David Zhang, Syed A Rizvi, Emanuele Zappala, Rex Ying, David van Dijk
Discrete diffusion models offer a flexible, controllable approach to structured sequence generation, yet they still lag behind causal language models in expressive power. A key limitation lies in their reliance on the Markovian assumption, which restricts each step to condition only on the current state, leading to potential uncorrectable error accumulation. In this paper, we introduce CaDDi, a discrete diffusion model that conditions on the entire generative trajectory, thereby lifting the Markov constraint and allowing the model to revisit and improve past states. By unifying sequential (causal) and temporal (diffusion) reasoning in a single non-Markovian transformer, CaDDi also treats standard causal language models as a special case and permits the direct reuse of pretrained LLM weights with no architectural changes. Empirically, CaDDi outperforms state-of-the-art discrete diffusion baselines on natural-language benchmarks, substantially narrowing the remaining gap to large autoregressive transformers.
nan
Article 802
Title@2025-05-27 (2): MIND-Stack: Modular, Interpretable, End-to-End Differentiability for Autonomous Navigation
Title: MIND-Stack: Modular, Interpretable, End-to-End Differentiability for Autonomous Navigation | MIND-Stack: Modular, interpretierbar, End-to-End-Unterscheidbarkeit für die autonome Navigation | MIND-Stack: 自主航行的模块、可解释、端到端至端差异 2505.21734v1 |
Authors: Felix Jahncke, Johannes Betz
Developing robust, efficient navigation algorithms is challenging. Rule-based methods offer interpretability and modularity but struggle with learning from large datasets, while end-to-end neural networks excel in learning but lack transparency and modularity. In this paper, we present MIND-Stack, a modular software stack consisting of a localization network and a Stanley Controller with intermediate human interpretable state representations and end-to-end differentiability. Our approach enables the upstream localization module to reduce the downstream control error, extending its role beyond state estimation. Unlike existing research on differentiable algorithms that either lack modules of the autonomous stack to span from sensor input to actuator output or real-world implementation, MIND-Stack offers both capabilities. We conduct experiments that demonstrate the ability of the localization module to reduce the downstream control loss through its end-to-end differentiability while offering better performance than state-of-the-art algorithms. We showcase sim-to-real capabilities by deploying the algorithm on a real-world embedded autonomous platform with limited computation power and demonstrate simultaneous training of both the localization and controller towards one goal. While MIND-Stack shows good results, we discuss the incorporation of additional modules from the autonomous navigation pipeline in the future, promising even greater stability and performance in the next iterations of the framework.
nan
Article 803
Title@2025-05-27 (2): LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing
Title: LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing | LaX: Förderung der Low-Rank-Schulung von Stiftungsmodellen durch Latent Crossing | LaX:通过中转交叉促进基金会模型的低射速培训 2505.21732v1 |
Authors: Ruijie Zhang, Ziyue Liu, Zhengyang Wang, Zheng Zhang
Training foundation models such as ViTs and LLMs requires tremendous computing cost. Low-rank matrix or tensor factorization offers a parameter-efficient alternative, but often downgrades performance due to the restricted parameter space. In this work, we introduce {\textbf{Latent Crossing (LaX)}} – a simple yet effective plug-and-play module that enhances the capacity of low-rank models by enabling information flow across low-rank subspaces. We extensively validate the benefits of LaX on pre-training tasks with ViT-Base/Large and LLaMA-like models ranging from 60M to 1B parameters. LaX boosts low-rank model performance to match or exceed the full-rank baselines while using 2-3(\times) fewer parameters. When equipped with low-rank adapters (i.e., LoRA) for fine-tuning LLaMA-7/13B, LaX consistently improves performance on arithmetic and common sense reasoning tasks with negligible cost.
nan
Article 804
Title@2025-05-27 (2): Deep Reinforcement Learning Agents are not even close to Human Intelligence
Title: Deep Reinforcement Learning Agents are not even close to Human Intelligence | Deep Enforcement Learning Agents sind nicht einmal der menschlichen Intelligenz nahe | 深强化学习代理机构甚至离人类情报机构不近 2505.21731v1 |
Authors: Quentin Delfosse, Jannis Blüml, Fabian Tatai, Théo Vincent, Bjarne Gregori, Elisabeth Dillies, Jan Peters, Constantin Rothkopf, Kristian Kersting
Deep reinforcement learning (RL) agents achieve impressive results in a wide variety of tasks, but they lack zero-shot adaptation capabilities. While most robustness evaluations focus on tasks complexifications, for which human also struggle to maintain performances, no evaluation has been performed on tasks simplifications. To tackle this issue, we introduce HackAtari, a set of task variations of the Arcade Learning Environments. We use it to demonstrate that, contrary to humans, RL agents systematically exhibit huge performance drops on simpler versions of their training tasks, uncovering agents’ consistent reliance on shortcuts. Our analysis across multiple algorithms and architectures highlights the persistent gap between RL agents and human behavioral intelligence, underscoring the need for new benchmarks and methodologies that enforce systematic generalization testing beyond static evaluation protocols. Training and testing in the same environment is not enough to obtain agents equipped with human-like intelligence.
nan
Article 805
Title@2025-05-27 (2): Are Statistical Methods Obsolete in the Era of Deep Learning?
Title: Are Statistical Methods Obsolete in the Era of Deep Learning? | Sind statistische Methoden im Zeitalter des tiefen Lernens überholt? | 统计方法是否在深层学习时代过时? 2505.21723v1 |
Authors: Skyler Wu, Shihao Yang, S. C. Kou
In the era of AI, neural networks have become increasingly popular for modeling, inference, and prediction, largely due to their potential for universal approximation. With the proliferation of such deep learning models, a question arises: are leaner statistical methods still relevant? To shed insight on this question, we employ the mechanistic nonlinear ordinary differential equation (ODE) inverse problem as a testbed, using physics-informed neural network (PINN) as a representative of the deep learning paradigm and manifold-constrained Gaussian process inference (MAGI) as a representative of statistically principled methods. Through case studies involving the SEIR model from epidemiology and the Lorenz model from chaotic dynamics, we demonstrate that statistical methods are far from obsolete, especially when working with sparse and noisy observations. On tasks such as parameter inference and trajectory reconstruction, statistically principled methods consistently achieve lower bias and variance, while using far fewer parameters and requiring less hyperparameter tuning. Statistical methods can also decisively outperform deep learning models on out-of-sample future prediction, where the absence of relevant data often leads overparameterized models astray. Additionally, we find that statistically principled approaches are more robust to accumulation of numerical imprecision and can represent the underlying system more faithful to the true governing ODEs.
nan
Article 806
Title@2025-05-27 (2): Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
Title: Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape | Sattel-zu-Sattel-Dynamik in Deep ReLU Networks: Low-Rank Bias bei der ersten Sattelflucht | 深 ReLU 网络中的套装到套接的动态动态: 第一次套装逃跑中的低兰克比亚 2505.21722v1 |
Authors: Ioannis Bantzis, James B. Simon, Arthur Jacot
When a deep ReLU network is initialized with small weights, GD is at first dominated by the saddle at the origin in parameter space. We study the so-called escape directions, which play a similar role as the eigenvectors of the Hessian for strict saddles. We show that the optimal escape direction features a low-rank bias in its deeper layers: the first singular value of the $\ell$-th layer weight matrix is at least $\ell^{\frac{1}{4}}$ larger than any other singular value. We also prove a number of related results about these escape directions. We argue that this result is a first step in proving Saddle-to-Saddle dynamics in deep ReLU networks, where GD visits a sequence of saddles with increasing bottleneck rank.
nan
Article 807
Title@2025-05-27 (2): CTBENCH: A Library and Benchmark for Certified Training
Title: CTBENCH: A Library and Benchmark for Certified Training | CTBENCH: Eine Bibliothek und Benchmark für zertifizierte Ausbildung | CTBENCH: 注册培训的图书馆和基准 2406.04848v4 |
Authors: Yuhao Mao, Stefan Balauca, Martin Vechev
Training certifiably robust neural networks is an important but challenging task. While many algorithms for (deterministic) certified training have been proposed, they are often evaluated on different training schedules, certification methods, and systematically under-tuned hyperparameters, making it difficult to compare their performance. To address this challenge, we introduce CTBench, a unified library and a high-quality benchmark for certified training that evaluates all algorithms under fair settings and systematically tuned hyperparameters. We show that (1) almost all algorithms in CTBench surpass the corresponding reported performance in literature in the magnitude of algorithmic improvements, thus establishing new state-of-the-art, and (2) the claimed advantage of recent algorithms drops significantly when we enhance the outdated baselines with a fair training schedule, a fair certification method and well-tuned hyperparameters. Based on CTBench, we provide new insights into the current state of certified training, including (1) certified models have less fragmented loss surface, (2) certified models share many mistakes, (3) certified models have more sparse activations, (4) reducing regularization cleverly is crucial for certified training especially for large radii and (5) certified training has the potential to improve out-of-distribution generalization. We are confident that CTBench will serve as a benchmark and testbed for future research in certified training.
nan
Article 808
Title@2025-05-27 (2): Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference
Title: Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference | Nahezu dimensionsunabhängige Konvergenz des mittleren Feldes Black-Box Variationale Schlussfolgerung | 中 - 现场黑 - 生物- 黑 - 生物- 黑 - 生物- 2505.21721v1 |
Authors: Kyurae Kim, Yi-An Ma, Trevor Campbell, Jacob R. Gardner
We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at an almost dimension-independent rate. Specifically, for strongly log-concave and log-smooth targets, the number of iterations for BBVI with a sub-Gaussian family to achieve an objective $\epsilon$-close to the global optimum is $\mathrm{O}(\log d)$, which improves over the $\mathrm{O}(d)$ dependence of full-rank location-scale families. For heavy-tailed families, we provide a weaker $\mathrm{O}(d^{2/k})$ dimension dependence, where $k$ is the number of finite moments. Additionally, if the Hessian of the target log-density is constant, the complexity is free of any explicit dimension dependence. We also prove that our bound on the gradient variance, which is key to our result, cannot be improved using only spectral bounds on the Hessian of the target log-density.
nan
Article 809
Title@2025-05-27 (2): Simple Guidance Mechanisms for Discrete Diffusion Models
Title: Simple Guidance Mechanisms for Discrete Diffusion Models | Einfache Leitmechanismen für diskrete Diffusionsmodelle | 分辨传播模型的简单指导机制 2412.10193v3 |
Authors: Yair Schiff, Subham Sekhar Sahoo, Hao Phung, Guanghan Wang, Sam Boshar, Hugo Dalla-torre, Bernardo P. de Almeida, Alexander Rush, Thomas Pierrot, Volodymyr Kuleshov
Diffusion models for continuous data gained widespread adoption owing to their high quality generation and control mechanisms. However, controllable diffusion on discrete data faces challenges given that continuous guidance methods do not directly apply to discrete diffusion. Here, we provide a straightforward derivation of classifier-free and classifier-based guidance for discrete diffusion, as well as a new class of diffusion models that leverage uniform noise and that are more guidable because they can continuously edit their outputs. We improve the quality of these models with a novel continuous-time variational lower bound that yields state-of-the-art performance, especially in settings involving guidance or fast generation. Empirically, we demonstrate that our guidance mechanisms combined with uniform noise diffusion improve controllable generation relative to autoregressive and diffusion baselines on several discrete data domains, including genomic sequences, small molecule design, and discretized image generation.
nan
Article 810
Title@2025-05-27 (2): Training Dynamics of In-Context Learning in Linear Attention
Title: Training Dynamics of In-Context Learning in Linear Attention | Trainingsdynamik des In-Context-Lernens in linearer Aufmerksamkeit | 线线性关注的内文学习培训动态 2501.16265v2 |
Authors: Yedi Zhang, Aaditya K. Singh, Peter E. Latham, Andrew Saxe
While attention-based models have demonstrated the remarkable ability of in-context learning (ICL), the theoretical understanding of how these models acquired this ability through gradient descent training is still preliminary. Towards answering this question, we study the gradient descent dynamics of multi-head linear self-attention trained for in-context linear regression. We examine two parametrizations of linear self-attention: one with the key and query weights merged as a single matrix (common in theoretical studies), and one with separate key and query matrices (closer to practical settings). For the merged parametrization, we show that the training dynamics has two fixed points and the loss trajectory exhibits a single, abrupt drop. We derive an analytical time-course solution for a certain class of datasets and initialization. For the separate parametrization, we show that the training dynamics has exponentially many fixed points and the loss exhibits saddle-to-saddle dynamics, which we reduce to scalar ordinary differential equations. During training, the model implements principal component regression in context with the number of principal components increasing over training time. Overall, we provide a theoretical description of how ICL abilities evolve during gradient descent training of linear attention, revealing abrupt acquisition or progressive improvements depending on how the key and query are parametrized.
nan
Article 811
Title@2025-05-27 (2): Network classification through random walks
Title: Network classification through random walks | Netzwerkklassifizierung durch zufällige Spaziergänge | 通过随机行走进行网络分类 2505.21706v1 |
Authors: Gonzalo Travieso, Joao Merenda, Odemir M. Bruno
Network models have been widely used to study diverse systems and analyze their dynamic behaviors. Given the structural variability of networks, an intriguing question arises: Can we infer the type of system represented by a network based on its structure? This classification problem involves extracting relevant features from the network. Existing literature has proposed various methods that combine structural measurements and dynamical processes for feature extraction. In this study, we introduce a novel approach to characterize networks using statistics from random walks, which can be particularly informative about network properties. We present the employed statistical metrics and compare their performance on multiple datasets with other state-of-the-art feature extraction methods. Our results demonstrate that the proposed method is effective in many cases, often outperforming existing approaches, although some limitations are observed across certain datasets.
nan
Article 812
Title@2025-05-27 (2): AMSFL: Adaptive Multi-Step Federated Learning via Gradient Difference-Based Error Modeling
Title: AMSFL: Adaptive Multi-Step Federated Learning via Gradient Difference-Based Error Modeling | AMSFL: Adaptives Multi-Step-Federated Learning über gradient Difference-based Error Modeling | ASFL:通过基于差异的渐进错误建模进行适应性多阶段联邦学习 2505.21695v1 |
Authors: Ganglou Xu
Federated learning faces critical challenges in balancing communication efficiency and model accuracy. One key issue lies in the approximation of update errors without incurring high computational costs. In this paper, we propose a lightweight yet effective method called Gradient Difference Approximation (GDA), which leverages first-order information to estimate local error trends without computing the full Hessian matrix. The proposed method forms a key component of the Adaptive Multi-Step Federated Learning (AMSFL) framework and provides a unified error modeling strategy for large-scale multi-step adaptive training environments.
nan
Article 813
Title@2025-05-27 (2): What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization
Title: What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization | Welche Daten ermöglichen optimale Entscheidungen? Eine genaue Charakterisierung für lineare Optimierung | 什么数据能使最佳决定实现最佳决定? 线性优化的精确属性 2505.21692v1 |
Authors: Omar Bennouna, Amine Bennouna, Saurabh Amin, Asuman Ozdaglar
We study the fundamental question of how informative a dataset is for solving a given decision-making task. In our setting, the dataset provides partial information about unknown parameters that influence task outcomes. Focusing on linear programs, we characterize when a dataset is sufficient to recover an optimal decision, given an uncertainty set on the cost vector. Our main contribution is a sharp geometric characterization that identifies the directions of the cost vector that matter for optimality, relative to the task constraints and uncertainty set. We further develop a practical algorithm that, for a given task, constructs a minimal or least-costly sufficient dataset. Our results reveal that small, well-chosen datasets can often fully determine optimal decisions – offering a principled foundation for task-aware data selection.
nan
Article 814
Title@2025-05-27 (2): LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model
Title: LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model | LLMPR: Ein neuartiges LLM-getriebenes Transfer-Learning-basiertes Petitions-Ranking-Modell | LLMPR:基于请愿排级的新式LLM-驱动转移学习模式 2505.21689v1 |
Authors: Avijit Gayen, Somyajit Chakraborty, Mainak Sen, Soham Paul, Angshuman Jana
The persistent accumulation of unresolved legal cases, especially within the Indian judiciary, significantly hampers the timely delivery of justice. Manual methods of prioritizing petitions are often prone to inefficiencies and subjective biases further exacerbating delays. To address this issue, we propose LLMPR (Large Language Model-based Petition Ranking), an automated framework that utilizes transfer learning and machine learning to assign priority rankings to legal petitions based on their contextual urgency. Leveraging the ILDC dataset comprising 7,593 annotated petitions, we process unstructured legal text and extract features through various embedding techniques, including DistilBERT, LegalBERT, and MiniLM. These textual embeddings are combined with quantitative indicators such as gap days, rank scores, and word counts to train multiple machine learning models, including Random Forest, Decision Tree, XGBoost, LightGBM, and CatBoost. Our experiments demonstrate that Random Forest and Decision Tree models yield superior performance, with accuracy exceeding 99% and a Spearman rank correlation of 0.99. Notably, models using only numerical features achieve nearly optimal ranking results (R2 = 0.988, \r{ho} = 0.998), while LLM-based embeddings offer only marginal gains. These findings suggest that automated petition ranking can effectively streamline judicial workflows, reduce case backlog, and improve fairness in legal prioritization.
nan
Article 815
Title@2025-05-27 (2): Empirical analysis of binding precedent efficiency in Brazilian Supreme Court via case classification
Title: Empirical analysis of binding precedent efficiency in Brazilian Supreme Court via case classification | Empirische Analyse der verbindlichen Präzedenzeffizienz im brasilianischen Obersten Gerichtshof über die Fallklassifizierung | 通过案件分类对巴西最高法院具有约束力的先例效率进行经验分析 2407.07004v3 |
Authors: Raphaël Tinarrage, Henrique Ennes, Lucas Resck, Lucas T. Gomes, Jean R. Ponciano, Jorge Poco
Binding precedents (s'umulas vinculantes) constitute a juridical instrument unique to the Brazilian legal system and whose objectives include the protection of the Federal Supreme Court against repetitive demands. Studies of the effectiveness of these instruments in decreasing the Court’s exposure to similar cases, however, indicate that they tend to fail in such a direction, with some of the binding precedents seemingly creating new demands. We empirically assess the legal impact of five binding precedents, 11, 14, 17, 26, and 37, at the highest Court level through their effects on the legal subjects they address. This analysis is only possible through the comparison of the Court’s ruling about the precedents’ themes before they are created, which means that these decisions should be detected through techniques of Similar Case Retrieval, which we tackle from the angle of Case Classification. The contributions of this article are therefore twofold: on the mathematical side, we compare the use of different methods of Natural Language Processing – TF-IDF, LSTM, Longformer, and regex – for Case Classification, whereas on the legal side, we contrast the inefficiency of these binding precedents with a set of hypotheses that may justify their repeated usage. We observe that the TF-IDF models performed slightly better than LSTM and Longformer when compared through common metrics; however, the deep learning models were able to detect certain important legal events that TF-IDF missed. On the legal side, we argue that the reasons for binding precedents to fail in responding to repetitive demand are heterogeneous and case-dependent, making it impossible to single out a specific cause. We identify five main hypotheses, which are found in different combinations in each of the precedents studied.
nan
Article 816
Title@2025-05-27 (2): Probabilistic Reasoning with LLMs for k-anonymity Estimation
Title: Probabilistic Reasoning with LLMs for k-anonymity Estimation | Probabilistische Begründung mit LLMs für k-Anonymitätsschätzung | K-匿名性估计法LLMs的概率推理 2503.09674v3 |
Authors: Jonathan Zheng, Sauvik Das, Alan Ritter, Wei Xu
Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a new numerical reasoning task under uncertainty for large language models, focusing on estimating the privacy risk of user-generated documents containing privacy-sensitive information. We propose BRANCH, a new LLM methodology that estimates the k-privacy value of a text-the size of the population matching the given information. BRANCH factorizes a joint probability distribution of personal information as random variables. The probability of each factor in a population is estimated separately using a Bayesian network and combined to compute the final k-value. Our experiments show that this method successfully estimates the k-value 73% of the time, a 13% increase compared to o3-mini with chain-of-thought reasoning. We also find that LLM uncertainty is a good indicator for accuracy, as high-variance predictions are 37.47% less accurate on average.
nan
Article 817
Title@2025-05-27 (2): Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models
Title: Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models | Verbesserung der Benutzerverhaltensvorhersage: Annotator-Metadaten in überwachten Machine Learning-Modellen nutzen | 改进用户行为预测:在受监督的机器学习模型中利用标记元数据 2503.21000v2 |
Authors: Lynnette Hui Xian Ng, Kokil Jaidka, Kaiyuan Tay, Hansin Ahuja, Niyati Chhaya
Supervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14% on held-out data and 12% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master’s, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.
nan
Article 818
Title@2025-05-27 (2): tenSVD algorithm for compression
Title: tenSVD algorithm for compression | tenSVD-Algorithmus zur Kompression | 用于压缩的 10SVD 算法 2505.21686v1 |
Authors: Michele Gallo
Tensors provide a robust framework for managing high-dimensional data. Consequently, tensor analysis has emerged as an active research area in various domains, including machine learning, signal processing, computer vision, graph analysis, and data mining. This study introduces an efficient image storage approach utilizing tensors, aiming to minimize memory to store, bandwidth to transmit and energy to processing. The proposed method organizes original data into a higher-order tensor and applies the Tucker model for compression. Implemented in R, this method is compared to a baseline algorithm. The evaluation focuses on efficient of algorithm measured in term of computational time and the quality of information preserved, using both simulated and real datasets. A detailed analysis of the results is conducted, employing established quantitative metrics, with significant attention paid to sustainability in terms of energy consumption across algorithms.
nan
Article 819
Title@2025-05-27 (2): Edit Distance Robust Watermarks via Indexing Pseudorandom Codes
Title: Edit Distance Robust Watermarks via Indexing Pseudorandom Codes | Entfernung bearbeiten Robuste Wasserzeichen über Indexierung Pseudorandom Codes | 通过索引化 Peredorandom 代码编辑远程硬体水印 2406.02633v2 |
Authors: Noah Golowich, Ankur Moitra
Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model’s actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text. Earlier schemes could only handle stochastic substitutions and deletions, and thus we are aiming for a more natural and appealing robustness guarantee that holds with respect to edit distance. Our main result is a watermarking scheme which achieves both undetectability and robustness to edits when the alphabet size for the language model is allowed to grow as a polynomial in the security parameter. To derive such a scheme, we follow an approach introduced by Christ & Gunn (2024), which proceeds via first constructing pseudorandom codes satisfying undetectability and robustness properties analogous to those above; our key idea is to handle adversarial insertions and deletions by interpreting the symbols as indices into the codeword, which we call indexing pseudorandom codes. Additionally, our codes rely on weaker computational assumptions than used in previous work. Then we show that there is a generic transformation from such codes over large alphabets to watermarking schemes for arbitrary language models.
nan
Article 820
Title@2025-05-27 (2): Incentivizing Permissionless Distributed Learning of LLMs
Title: Incentivizing Permissionless Distributed Learning of LLMs | Anreize für das unbefugte Lernen von LLMs | 激励对LLMM的无自由分配的学习 2505.21684v1 |
Authors: Joel Lidin, Amir Sarfi, Evangelos Pappas, Samuel Dare, Eugene Belilovsky, Jacob Steeves
We describe an incentive system for distributed deep learning of foundational models where peers are rewarded for contributions. The incentive system, \textit{Gauntlet}, has been deployed on the bittensor blockchain and used to train a 1.2B LLM with completely permissionless contributions of pseudo-gradients: no control over the users that can register or their hardware. \textit{Gauntlet} can be applied to any synchronous distributed training scheme that relies on aggregating updates or pseudo-gradients. We rely on a two-stage mechanism for fast filtering of peer uptime, reliability, and synchronization, combined with the core component that estimates the loss before and after individual pseudo-gradient contributions. We utilized an OpenSkill rating system to track competitiveness of pseudo-gradient scores across time. Finally, we introduce a novel mechanism to ensure peers on the network perform unique computations. Our live 1.2B run, which has paid out real-valued tokens to participants based on the value of their contributions, yielded a competitive (on a per-iteration basis) 1.2B model that demonstrates the utility of our incentive system.
nan
Article 821
Title@2025-05-27 (2): multivariateGPT: a decoder-only transformer for multivariate categorical and numeric data
Title: multivariateGPT: a decoder-only transformer for multivariate categorical and numeric data | multivariateGPT: ein nur Decoder-Transformator für multivariate kategoriale und numerische Daten | 多个变量GPT: 用于多变量绝对数据和数字数据的解码器专用变压器 2505.21680v1 |
Authors: Andrew J. Loza, Jun Yup Kim, Shangzheng Song, Yihang Liu, Joseph J. Y. Sung, R Andrew Taylor, Dennis L. Shung
Real-world processes often generate data that are a mix of categorical and numeric values that are recorded at irregular and informative intervals. Discrete token-based approaches are limited in numeric representation capacity while methods like neural ordinary differential equations are not well suited for categorical data or informative sampling and require augmentation to handle certain classes of trajectories. Here, we present multivariateGPT, a single architecture for modeling sequences of mixed categorical (including tokenized text) and numeric data. This is accomplished with an autoregressive sequence decomposition, embedding scheme, and loss function that extend the next token prediction task to likelihood estimation of the joint distribution of next token class and value. We demonstrate how this approach can efficiently learn to generalize patterns in simple physical systems and model complex time series including electrocardiograms and multivariate electronic health record data. This work extends the utility of transformer based models to additional classes of data.
nan
Article 822
Title@2025-05-27 (2): Fast meta-solvers for 3D complex-shape scatterers using neural operators trained on a non-scattering problem
Title: Fast meta-solvers for 3D complex-shape scatterers using neural operators trained on a non-scattering problem | Schnelle Meta-Lösung für 3D-Komplex-Spritzer mit neuronalen Operatoren, die auf einem nicht-streuenden Problem geschult sind | 使用神经操作员就非碎裂问题接受培训的3D复合碎片散散射器快速元解析器 2405.12380v2 |
Authors: Youngkyu Lee, Shanqing Liu, Zongren Zou, Adar Kahana, Eli Turkel, Rishikesh Ranade, Jay Pathak, George Em Karniadakis
Three-dimensional target identification using scattering techniques requires high accuracy solutions and very fast computations for real-time predictions in some critical applications. We first train a deep neural operator~(DeepONet) to solve wave propagation problems described by the Helmholtz equation in a domain \textit{without scatterers} but at different wavenumbers and with a complex absorbing boundary condition. We then design two classes of fast meta-solvers by combining DeepONet with either relaxation methods, such as Jacobi and Gauss-Seidel, or with Krylov methods, such as GMRES and BiCGStab, using the trunk basis of DeepONet as a coarse-scale preconditioner. We leverage the spectral bias of neural networks to account for the lower part of the spectrum in the error distribution while the upper part is handled inexpensively using relaxation methods or fine-scale preconditioners. The meta-solvers are then applied to solve scattering problems with different shape of scatterers, at no extra training cost. We first demonstrate that the resulting meta-solvers are shape-agnostic, fast, and robust, whereas the standard standalone solvers may even fail to converge without the DeepONet. We then apply both classes of meta-solvers to scattering from a submarine, a complex three-dimensional problem. We achieve very fast solutions, especially with the DeepONet-Krylov methods, which require orders of magnitude fewer iterations than any of the standalone solvers.
nan
Article 823
Title@2025-05-27 (2): Robust LLM Alignment via Distributionally Robust Direct Preference Optimization
Title: Robust LLM Alignment via Distributionally Robust Direct Preference Optimization | Robuste LLM-Ausrichtung über distributiv robuste Direktpräferenzoptimierung | 通过分布式强力直接首选项优化对齐 2502.01930v2 |
Authors: Zaiyan Xu, Sushil Vemuri, Kishan Panaganti, Dileep Kalathil, Rahul Jain, Deepak Ramachandran
A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift. LLM alignment algorithms rely on static preference datasets, assuming that they accurately represent real-world user preferences. However, user preferences vary significantly across geographical regions, demographics, linguistic patterns, and evolving cultural trends. This preference distribution shift leads to catastrophic alignment failures in many real-world applications. We address this problem using the principled framework of distributionally robust optimization, and develop two novel distributionally robust direct preference optimization (DPO) algorithms, namely, Wasserstein DPO (WDPO) and Kullback-Leibler DPO (KLDPO). We characterize the sample complexity of learning the optimal policy parameters for WDPO and KLDPO. Moreover, we propose scalable gradient descent-style learning algorithms by developing suitable approximations for the challenging minimax loss functions of WDPO and KLDPO. Our empirical experiments using benchmark data sets and LLMs demonstrate the superior performance of WDPO and KLDPO in substantially improving the alignment when there is a preference distribution shift.
nan
Article 824
Title@2025-05-27 (2): What happens when generative AI models train recursively on each others’ generated outputs?
Title: What happens when generative AI models train recursively on each others’ generated outputs? | Was passiert, wenn generative KI-Modelle rekursiv auf den jeweils anderen generierten Ausgängen trainieren? | 当基因化的AI模型对彼此产生的产出进行回溯性培训时会怎样呢? 2505.21677v1 |
Authors: Hung Ahn Vu, Galen Reeves, Emily Wenger
The internet is full of AI-generated content while also serving as a common source of training data for generative AI (genAI) models. This duality raises the possibility that future genAI models may be trained on other models’ generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society’s increasing dependence on genAI tools, understanding downstream effects of such data-mediated model interactions is critical. To this end, we provide empirical evidence for how data-mediated interactions might unfold in practice, develop a theoretical model for this interactive training process, and show experimentally possible long-term results of such interactions. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.
nan
Article 825
Title@2025-05-27 (2): In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention
Title: In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention | In-Context Lineare Regression Demystified: Trainingsdynamik und mechanistische Interpretierbarkeit von Multi-Head Softmax Achtung | 内负线倒退:对多头软体注意力进行动态和机械解释的培训 2503.12734v2 |
Authors: Jianliang He, Xintian Pan, Siyu Chen, Zhuoran Yang
We study how multi-head softmax attention models are trained to perform in-context learning on linear data. Through extensive empirical experiments and rigorous theoretical analysis, we demystify the emergence of elegant attention patterns: a diagonal and homogeneous pattern in the key-query (KQ) weights, and a last-entry-only and zero-sum pattern in the output-value (OV) weights. Remarkably, these patterns consistently appear from gradient-based training starting from random initialization. Our analysis reveals that such emergent structures enable multi-head attention to approximately implement a debiased gradient descent predictor – one that outperforms single-head attention and nearly achieves Bayesian optimality up to proportional factor. Furthermore, compared to linear transformers, the softmax attention readily generalizes to sequences longer than those seen during training. We also extend our study to scenarios with anisotropic covariates and multi-task linear regression. In the former, multi-head attention learns to implement a form of pre-conditioned gradient descent. In the latter, we uncover an intriguing regime where the interplay between head number and task number triggers a superposition phenomenon that efficiently resolves multi-task in-context learning. Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution, paving the way for deeper understanding and broader applications of in-context learning.
nan
Article 826
Title@2025-05-27 (2): Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations
Title: Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations | Schnelles lebenslanges Adaptives Inverses Verstärktes Lernen aus Demonstrationen | 从示范活动中学习 2209.11908v8 |
Authors: Letian Chen, Sravan Jayanthi, Rohan Paleja, Daniel Martin, Viacheslav Zakharov, Matthew Gombolay
Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.
nan
Article 827
Title@2025-05-27 (2): Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing
Title: Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing | Adaptive Frontier Exploration von Graphen mit Anwendungen für netzwerkbasierte Krankheitstests | 适应性边界探索应用网络基疾病测试图图的适应性边界探索 2505.21671v1 |
Authors: Davin Choo, Yuqi Pan, Tonghan Wang, Milind Tambe, Alastair van Heerden, Cheryl Johnson
We study a sequential decision-making problem on a $n$-node graph $G$ where each node has an unknown label from a finite set $\mathbf{\Sigma}$, drawn from a joint distribution $P$ that is Markov with respect to $G$. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a frontier exploration constraint, where actions are limited to neighbors of previously selected nodes, reflecting practical constraints in settings such as contact tracing and robotic exploration. We design a Gittins index-based policy that applies to general graphs and is provably optimal when $G$ is a forest. Our implementation runs in $O(n^2 \cdot | \mathbf{\Sigma} | ^2)$ time while using $O(n \cdot | \mathbf{\Sigma} | ^2)$ oracle calls to $P$ and $O(n^2 \cdot | \mathbf{\Sigma} | )$ space. Experiments on synthetic and real-world graphs show that our method consistently outperforms natural baselines, including in non-tree, budget-limited, and undiscounted settings. For example, in HIV testing simulations on real-world sexual interaction networks, our policy detects nearly all positive cases with only half the population tested, substantially outperforming other baselines. |
nan
Article 828
Title@2025-05-27 (2): Efficient Controllable Diffusion via Optimal Classifier Guidance
Title: Efficient Controllable Diffusion via Optimal Classifier Guidance | Effiziente steuerbare Diffusion über Optimal Classifier Guidance | 通过最佳分类指南有效控制可控扩散 2505.21666v1 |
Authors: Owen Oertell, Shikun Sun, Yiding Chen, Jin Peng Zhou, Zhiyong Wang, Wen Sun
The controllable generation of diffusion models aims to steer the model to generate samples that optimize some given objective functions. It is desirable for a variety of applications including image generation, molecule generation, and DNA/sequence generation. Reinforcement Learning (RL) based fine-tuning of the base model is a popular approach but it can overfit the reward function while requiring significant resources. We frame controllable generation as a problem of finding a distribution that optimizes a KL-regularized objective function. We present SLCD – Supervised Learning based Controllable Diffusion, which iteratively generates online data and trains a small classifier to guide the generation of the diffusion model. Similar to the standard classifier-guided diffusion, SLCD’s key computation primitive is classification and does not involve any complex concepts from RL or control. Via a reduction to no-regret online learning analysis, we show that under KL divergence, the output from SLCD provably converges to the optimal solution of the KL-regularized objective. Further, we empirically demonstrate that SLCD can generate high quality samples with nearly the same inference time as the base model in both image generation with continuous diffusion and biological sequence generation with discrete diffusion. Our code is available at https://github.com/Owen-Oertell/slcd
nan
Article 829
Title@2025-05-27 (2): Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning
Title: Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning | Constraint-Adaptive Policy Switching für Offline-sicheres Ausbau-Lernen | 离线安全强化学习约束性强化政策转换 2412.18946v2 |
Authors: Yassine Chemingui, Aryan Deshwal, Honghao Wei, Alan Fern, Janardhan Rao Doppa
Offline safe reinforcement learning (OSRL) involves learning a decision-making policy to maximize rewards from a fixed batch of training data to satisfy pre-defined safety constraints. However, adapting to varying safety constraints during deployment without retraining remains an under-explored challenge. To address this challenge, we introduce constraint-adaptive policy switching (CAPS), a wrapper framework around existing offline RL algorithms. During training, CAPS uses offline data to learn multiple policies with a shared representation that optimize different reward and cost trade-offs. During testing, CAPS switches between those policies by selecting at each state the policy that maximizes future rewards among those that satisfy the current cost constraint. Our experiments on 38 tasks from the DSRL benchmark demonstrate that CAPS consistently outperforms existing methods, establishing a strong wrapper-based baseline for OSRL. The code is publicly available at https://github.com/yassineCh/CAPS.
nan
Article 830
Title@2025-05-27 (2): PreGenie: An Agentic Framework for High-quality Visual Presentation Generation
Title: PreGenie: An Agentic Framework for High-quality Visual Presentation Generation | PreGenie: Agentisches Framework für hochwertige visuelle Präsentationsgeneration | PreGenie:高质量视觉演示制作的代理框架 2505.21660v1 |
Authors: Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, Ying-Cong Chen
Visual presentations are vital for effective communication. Early attempts to automate their creation using deep learning often faced issues such as poorly organized layouts, inaccurate text summarization, and a lack of image understanding, leading to mismatched visuals and text. These limitations restrict their application in formal contexts like business and scientific research. To address these challenges, we propose PreGenie, an agentic and modular framework powered by multimodal large language models (MLLMs) for generating high-quality visual presentations. PreGenie is built on the Slidev presentation framework, where slides are rendered from Markdown code. It operates in two stages: (1) Analysis and Initial Generation, which summarizes multimodal input and generates initial code, and (2) Review and Re-generation, which iteratively reviews intermediate code and rendered slides to produce final, high-quality presentations. Each stage leverages multiple MLLMs that collaborate and share information. Comprehensive experiments demonstrate that PreGenie excels in multimodal understanding, outperforming existing models in both aesthetics and content consistency, while aligning more closely with human design preferences.
nan
Article 831
Title@2025-05-27 (2): STACI: Spatio-Temporal Aleatoric Conformal Inference
Title: STACI: Spatio-Temporal Aleatoric Conformal Inference | STACI: Spatio-Temporale aleatorische Konforme Schlussfolgerung | STACI: 斯帕迪奥-时空空气迁移 2505.21658v1 |
Authors: Brandon R. Feng, David Keetae Park, Xihaier Luo, Arantxa Urdangarin, Shinjae Yoo, Brian J. Reich
Fitting Gaussian Processes (GPs) provides interpretable aleatoric uncertainty quantification for estimation of spatio-temporal fields. Spatio-temporal deep learning models, while scalable, typically assume a simplistic independent covariance matrix for the response, failing to capture the underlying correlation structure. However, spatio-temporal GPs suffer from issues of scalability and various forms of approximation bias resulting from restrictive assumptions of the covariance kernel function. We propose STACI, a novel framework consisting of a variational Bayesian neural network approximation of non-stationary spatio-temporal GP along with a novel spatio-temporal conformal inference algorithm. STACI is highly scalable, taking advantage of GPU training capabilities for neural network models, and provides statistically valid prediction intervals for uncertainty quantification. STACI outperforms competing GPs and deep methods in accurately approximating spatio-temporal processes and we show it easily scales to datasets with millions of observations.
nan
Article 832
Title@2025-05-27 (2): Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations
Title: Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations | Erklärbarkeit großer Sprachmodelle mit SMILE: Statistische Modell-agnostische Interpretierbarkeit mit lokalen Erklärungen | 使用SMILE解释大语言模型的可解释性:统计模型 – – 与当地解释的可解释性 2505.21657v1 |
Authors: Zeinab Dehghani, Koorosh Aslansefat, Adil Khan, Mohammed Naveed Akram
Large language models like GPT, LLAMA, and Claude have become incredibly powerful at generating text, but they are still black boxes, so it is hard to understand how they decide what to say. That lack of transparency can be problematic, especially in fields where trust and accountability matter. To help with this, we introduce SMILE, a new method that explains how these models respond to different parts of a prompt. SMILE is model-agnostic and works by slightly changing the input, measuring how the output changes, and then highlighting which words had the most impact. Create simple visual heat maps showing which parts of a prompt matter the most. We tested SMILE on several leading LLMs and used metrics such as accuracy, consistency, stability, and fidelity to show that it gives clear and reliable explanations. By making these models easier to understand, SMILE brings us one step closer to making AI more transparent and trustworthy.
nan
Article 833
Title@2025-05-27 (2): BACON: A fully explainable AI model with graded logic for decision making problems
Title: BACON: A fully explainable AI model with graded logic for decision making problems | BACON: Ein voll erklärbares KI-Modell mit abgestufter Logik für Entscheidungsprobleme | 具有决策问题分级逻辑的完全可解释的AI模型 2505.14510v3 |
Authors: Haishi Bai, Jozo Dujmovic, Jianwu Wang
As machine learning models and autonomous agents are increasingly deployed in high-stakes, real-world domains such as healthcare, security, finance, and robotics, the need for transparent and trustworthy explanations has become critical. To ensure end-to-end transparency of AI decisions, we need models that are not only accurate but also fully explainable and human-tunable. We introduce BACON, a novel framework for automatically training explainable AI models for decision making problems using graded logic. BACON achieves high predictive accuracy while offering full structural transparency and precise, logic-based symbolic explanations, enabling effective human-AI collaboration and expert-guided refinement. We evaluate BACON with a diverse set of scenarios: classic Boolean approximation, Iris flower classification, house purchasing decisions and breast cancer diagnosis. In each case, BACON provides high-performance models while producing compact, human-verifiable decision logic. These results demonstrate BACON’s potential as a practical and principled approach for delivering crisp, trustworthy explainable AI.
nan
Article 834
Title@2025-05-27 (2): AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent
Title: AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent | AutoSGD: Automatische Lernrate-Auswahl für stochastische Gradient Descent | AutoSGD: 存储渐变后代自动学习率选择 2505.21651v1 |
Authors: Nikola Surjanovic, Alexandre Bouchard-Côté, Trevor Campbell
The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a non-trivial amount of user tuning effort. To address this, we introduce AutoSGD: an SGD method that automatically determines whether to increase or decrease the learning rate at a given iteration and then takes appropriate action. We introduce theory supporting the convergence of AutoSGD, along with its deterministic counterpart for standard gradient descent. Empirical results suggest strong performance of the method on a variety of traditional optimization problems and machine learning tasks.
nan
Article 835
Title@2025-05-27 (2): QuARI: Query Adaptive Retrieval Improvement
Title: QuARI: Query Adaptive Retrieval Improvement | QUARI: Abfrage Adaptive Verbesserung des Retrievals | QuARI: 查询适应性检索改进 2505.21647v1 |
Authors: Eric Xing, Abby Stylianou, Robert Pless, Nathan Jacobs
Massive-scale pretraining has made vision-language models increasingly popular for image-to-image and text-to-image retrieval across a broad collection of domains. However, these models do not perform well when used for challenging retrieval tasks, such as instance retrieval in very large-scale image collections. Recent work has shown that linear transformations of VLM features trained for instance retrieval can improve performance by emphasizing subspaces that relate to the domain of interest. In this paper, we explore a more extreme version of this specialization by learning to map a given query to a query-specific feature space transformation. Because this transformation is linear, it can be applied with minimal computational cost to millions of image embeddings, making it effective for large-scale retrieval or re-ranking. Results show that this method consistently outperforms state-of-the-art alternatives, including those that require many orders of magnitude more computation at query time.
nan
Article 836
Title@2025-05-27 (2): PrivATE: Differentially Private Confidence Intervals for Average Treatment Effects
Title: PrivATE: Differentially Private Confidence Intervals for Average Treatment Effects | Private: Differenzielle private Vertrauensintervalle für durchschnittliche Behandlungseffekte | 普里瓦特:对平均待遇影响有区别的私人信任互换 2505.21641v1 |
Authors: Maresa Schröder, Justin Hartenstein, Stefan Feuerriegel
The average treatment effect (ATE) is widely used to evaluate the effectiveness of drugs and other medical interventions. In safety-critical applications like medicine, reliable inferences about the ATE typically require valid uncertainty quantification, such as through confidence intervals (CIs). However, estimating treatment effects in these settings often involves sensitive data that must be kept private. In this work, we present PrivATE, a novel machine learning framework for computing CIs for the ATE under differential privacy. Specifically, we focus on deriving valid privacy-preserving CIs for the ATE from observational data. Our PrivATE framework consists of three steps: (i) estimating a differentially private ATE through output perturbation; (ii) estimating the differentially private variance through a truncated output perturbation mechanism; and (iii) constructing the CIs while accounting for the uncertainty from both the estimation and privatization steps. Our PrivATE framework is model agnostic, doubly robust, and ensures valid CIs. We demonstrate the effectiveness of our framework using synthetic and real-world medical datasets. To the best of our knowledge, we are the first to derive a general, doubly robust framework for valid CIs of the ATE under ($\varepsilon$, $\delta$)-differential privacy.
nan
Article 837
Title@2025-05-27 (2): Efficient Diffusion Models for Symmetric Manifolds
Title: Efficient Diffusion Models for Symmetric Manifolds | Effiziente Diffusionsmodelle für symmetrische Manifolds | 高效扩散对称操纵模型 2505.21640v1 |
Authors: Oren Mangoubi, Neil He, Nisheeth K. Vishnoi
We introduce a framework for designing efficient diffusion models for $d$-dimensional symmetric-space Riemannian manifolds, including the torus, sphere, special orthogonal group and unitary group. Existing manifold diffusion models often depend on heat kernels, which lack closed-form expressions and require either $d$ gradient evaluations or exponential-in-$d$ arithmetic operations per training step. We introduce a new diffusion model for symmetric manifolds with a spatially-varying covariance, allowing us to leverage a projection of Euclidean Brownian motion to bypass heat kernel computations. Our training algorithm minimizes a novel efficient objective derived via Ito’s Lemma, allowing each step to run in $O(1)$ gradient evaluations and nearly-linear-in-$d$ ($O(d^{1.19})$) arithmetic operations, reducing the gap between diffusions on symmetric manifolds and Euclidean space. Manifold symmetries ensure the diffusion satisfies an “average-case” Lipschitz condition, enabling accurate and efficient sample generation. Empirically, our model outperforms prior methods in training speed and improves sample quality on synthetic datasets on the torus, special orthogonal group, and unitary group.
nan
Article 838
Title@2025-05-27 (2): Apprenticeship learning with prior beliefs using inverse optimization
Title: Apprenticeship learning with prior beliefs using inverse optimization | Lehrlingsstudium mit früheren Überzeugungen mit inverser Optimierung | 利用反向优化进行具有先入先信的学徒学习 2505.21639v1 |
Authors: Mauricio Junca, Esteban Leiva
The relationship between inverse reinforcement learning (IRL) and inverse optimization (IO) for Markov decision processes (MDPs) has been relatively underexplored in the literature, despite addressing the same problem. In this work, we revisit the relationship between the IO framework for MDPs, IRL, and apprenticeship learning (AL). We incorporate prior beliefs on the structure of the cost function into the IRL and AL problems, and demonstrate that the convex-analytic view of the AL formalism (Kamoutsi et al., 2021) emerges as a relaxation of our framework. Notably, the AL formalism is a special case in our framework when the regularization term is absent. Focusing on the suboptimal expert setting, we formulate the AL problem as a regularized min-max problem. The regularizer plays a key role in addressing the ill-posedness of IRL by guiding the search for plausible cost functions. To solve the resulting regularized-convex-concave-min-max problem, we use stochastic mirror descent (SMD) and establish convergence bounds for the proposed method. Numerical experiments highlight the critical role of regularization in learning cost vectors and apprentice policies.
nan
Article 839
Title@2025-05-27 (2): Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives
Title: Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives | Ist Ihr LLM überladen Sie? Tokenization, Transparenz, und Incentives | 您的法学硕士是否对你太过苛刻? 2505.21627v1 |
Authors: Ander Artola Velasco, Stratis Tsirtsis, Nastaran Okati, Manuel Gomez-Rodriguez
State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the price users pay for an output provided by a model depends on the number of tokens the model uses to generate it – they pay a fixed price per token. In this work, we show that this pricing mechanism creates a financial incentive for providers to strategize and misreport the (number of) tokens a model used to generate an output, and users cannot prove, or even know, whether a provider is overcharging them. However, we also show that, if an unfaithful provider is obliged to be transparent about the generative process used by the model, misreporting optimally without raising suspicion is hard. Nevertheless, as a proof-of-concept, we introduce an efficient heuristic algorithm that allows providers to significantly overcharge users without raising suspicion, highlighting the vulnerability of users under the current pay-per-token pricing mechanism. Further, to completely eliminate the financial incentive to strategize, we introduce a simple incentive-compatible token pricing mechanism. Under this mechanism, the price users pay for an output provided by a model depends on the number of characters of the output – they pay a fixed price per character. Along the way, to illustrate and complement our theoretical results, we conduct experiments with several large language models from the $\texttt{Llama}$, $\texttt{Gemma}$ and $\texttt{Ministral}$ families, and input prompts from the LMSYS Chatbot Arena platform.
nan
Article 840
Title@2025-05-27 (2): Localized Weather Prediction Using Kolmogorov-Arnold Network-Based Models and Deep RNNs
Title: Localized Weather Prediction Using Kolmogorov-Arnold Network-Based Models and Deep RNNs | Lokalisierte Wettervorhersage mit Kolmogorov-Arnold-Netzwerk-basierten Modellen und tiefen RNNs | 利用Kolmogorov-Arnold网络模型和深区域网网 2505.22686v1 |
Authors: Ange-Clement Akazan, Verlon Roel Mbingui, Gnankan Landry Regis N’guessan, Issa Karambal
Weather forecasting is crucial for managing risks and economic planning, particularly in tropical Africa, where extreme events severely impact livelihoods. Yet, existing forecasting methods often struggle with the region’s complex, non-linear weather patterns. This study benchmarks deep recurrent neural networks such as $\texttt{LSTM, GRU, BiLSTM, BiGRU}$, and Kolmogorov-Arnold-based models $(\texttt{KAN} and \texttt{TKAN})$ for daily forecasting of temperature, precipitation, and pressure in two tropical cities: Abidjan, Cote d’Ivoire (Ivory Coast) and Kigali (Rwanda). We further introduce two customized variants of $ \texttt{TKAN}$ that replace its original $\texttt{SiLU}$ activation function with $ \texttt{GeLU}$ and \texttt{MiSH}, respectively. Using station-level meteorological data spanning from 2010 to 2024, we evaluate all the models on standard regression metrics. $\texttt{KAN}$ achieves temperature prediction ($R^2=0.9986$ in Abidjan, $0.9998$ in Kigali, $\texttt{MSE} < 0.0014~^\circ C ^2$), while $\texttt{TKAN}$ variants minimize absolute errors for precipitation forecasting in low-rainfall regimes. The customized $\texttt{TKAN}$ models demonstrate improvements over the standard $\texttt{TKAN}$ across both datasets. Classical \texttt{RNNs} remain highly competitive for atmospheric pressure ($R^2 \approx 0.83{-}0.86$), outperforming $\texttt{KAN}$-based models in this task. These results highlight the potential of spline-based neural architectures for efficient and data-efficient forecasting.
nan
Article 841
Title@2025-05-27 (2): Learning Where to Learn: Training Distribution Selection for Provable OOD Performance
Title: Learning Where to Learn: Training Distribution Selection for Provable OOD Performance | Lernen, wo man lernen kann: Training Distribution Selection for Provable OOD Performance | 学习从何学习:选择培训分布,以选择可实现的OOD业绩 2505.21626v1 |
Authors: Nicolas Guerra, Nicholas H. Nelsen, Yunan Yang
Out-of-distribution (OOD) generalization remains a fundamental challenge in machine learning. Models trained on one data distribution often experience substantial performance degradation when evaluated on shifted or unseen domains. To address this challenge, the present paper studies the design of training data distributions that maximize average-case OOD performance. First, a theoretical analysis establishes a family of generalization bounds that quantify how the choice of training distribution influences OOD error across a predefined family of target distributions. These insights motivate the introduction of two complementary algorithmic strategies: (i) directly formulating OOD risk minimization as a bilevel optimization problem over the space of probability measures and (ii) minimizing a theoretical upper bound on OOD error. Last, the paper evaluates the two approaches across a range of function approximation and operator learning examples. The proposed methods significantly improve OOD accuracy over standard empirical risk minimization with a fixed distribution. These results highlight the potential of distribution-aware training as a principled and practical framework for robust OOD generalization.
nan
Article 842
Title@2025-05-27 (2): VideoMarkBench: Benchmarking Robustness of Video Watermarking
Title: VideoMarkBench: Benchmarking Robustness of Video Watermarking | VideoMarkBench: Benchmarking Robustheit von Video Watermarking | 视频MarkBench:视频水标记基准的坚实性 2505.21620v1 |
Authors: Zhengyuan Jiang, Moyang Guo, Kecen Li, Yuepeng Hu, Yupu Wang, Zhicong Huang, Cheng Hong, Neil Zhenqiang Gong
The rapid development of video generative models has led to a surge in highly realistic synthetic videos, raising ethical concerns related to disinformation and copyright infringement. Recently, video watermarking has been proposed as a mitigation strategy by embedding invisible marks into AI-generated videos to enable subsequent detection. However, the robustness of existing video watermarking methods against both common and adversarial perturbations remains underexplored. In this work, we introduce VideoMarkBench, the first systematic benchmark designed to evaluate the robustness of video watermarks under watermark removal and watermark forgery attacks. Our study encompasses a unified dataset generated by three state-of-the-art video generative models, across three video styles, incorporating four watermarking methods and seven aggregation strategies used during detection. We comprehensively evaluate 12 types of perturbations under white-box, black-box, and no-box threat models. Our findings reveal significant vulnerabilities in current watermarking approaches and highlight the urgent need for more robust solutions. Our code is available at https://github.com/zhengyuan-jiang/VideoMarkBench.
nan
Article 843
Title@2025-05-27 (2): Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making
Title: Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making | Schweigen ist kein Konsens: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making | 沉默不是共识:通过用于临床决策的Catfish代理商在多方代理LLMs中破坏协议的偏见 2505.21503v1 |
Authors: Yihan Wang, Qiao Yan, Zhenghao Xing, Lihao Liu, Junjun He, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng
Large language models (LLMs) have demonstrated strong potential in clinical question answering, with recent multi-agent frameworks further improving diagnostic accuracy via collaborative reasoning. However, we identify a recurring issue of Silent Agreement, where agents prematurely converge on diagnoses without sufficient critical analysis, particularly in complex or ambiguous cases. We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement. Inspired by the ``catfish effect’’ in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning. We formulate two mechanisms to encourage effective and context-aware interventions: (i) a complexity-aware intervention that modulates agent engagement based on case difficulty, and (ii) a tone-calibrated intervention articulated to balance critique and collaboration. Evaluations on nine medical Q&A and three medical VQA benchmarks show that our approach consistently outperforms both single- and multi-agent LLMs frameworks, including leading commercial models such as GPT-4o and DeepSeek-R1.
nan
Article 844
Title@2025-05-27 (2): UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Title: UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents | UI-Genie: Ein selbstverbesserender Ansatz zur iterativen Steigerung von MLLM-basierten mobilen GUI-Agenten | UI-Genie: 一种自我改进的方法,用于在刺激下促进基于MLLLM的移动图形界面工具 2505.21496v1 |
Authors: Han Xiao, Guozhi Wang, Yuxiang Chai, Zimu Lu, Weifeng Lin, Hao He, Lue Fan, Liuyang Bian, Rui Hu, Liang Liu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Aojun Zhou, Hongsheng Li
In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.
nan
Article 845
Title@2025-05-27 (2): Reinforcing General Reasoning without Verifiers
Title: Reinforcing General Reasoning without Verifiers | Verstärkung der allgemeinen Vernunft ohne Prüfer | 加强一般理由说明,无验证人 2505.21493v1 |
Authors: Xiangxin Zhou, Zichen Liu, Anya Sims, Haonan Wang, Tianyu Pang, Chongxuan Li, Liang Wang, Min Lin, Chao Du
The recent paradigm shift towards training large language models (LLMs) using DeepSeek-R1-Zero-style reinforcement learning (RL) on verifiable rewards has led to impressive advancements in code and mathematical reasoning. However, this methodology is limited to tasks where rule-based answer verification is possible and does not naturally extend to real-world domains such as chemistry, healthcare, engineering, law, biology, business, and economics. Current practical workarounds use an additional LLM as a model-based verifier; however, this introduces issues such as reliance on a strong verifier LLM, susceptibility to reward hacking, and the practical burden of maintaining the verifier model in memory during training. To address this and extend DeepSeek-R1-Zero-style training to general reasoning domains, we propose a verifier-free method (VeriFree) that bypasses answer verification and instead uses RL to directly maximize the probability of generating the reference answer. We compare VeriFree with verifier-based methods and demonstrate that, in addition to its significant practical benefits and reduced compute requirements, VeriFree matches and even surpasses verifier-based methods on extensive evaluations across MMLU-Pro, GPQA, SuperGPQA, and math-related benchmarks. Moreover, we provide insights into this method from multiple perspectives: as an elegant integration of training both the policy and implicit verifier in a unified model, and as a variational optimization approach. Code is available at https://github.com/sail-sg/VeriFree.
nan
Article 846
Title@2025-05-27 (2): Be Decisive: Noise-Induced Layouts for Multi-Subject Generation
Title: Be Decisive: Noise-Induced Layouts for Multi-Subject Generation | Entscheidend sein: Lärminduzierte Layouts für die mehrteilige Generierung | Be Decisive: 多主题生成的噪音生成布局 2505.21488v1 |
Authors: Omer Dahary, Yehonathan Cohen, Or Patashnik, Kfir Aberman, Daniel Cohen-Or
Generating multiple distinct subjects remains a challenge for existing text-to-image diffusion models. Complex prompts often lead to subject leakage, causing inaccuracies in quantities, attributes, and visual features. Preventing leakage among subjects necessitates knowledge of each subject’s spatial location. Recent methods provide these spatial locations via an external layout control. However, enforcing such a prescribed layout often conflicts with the innate layout dictated by the sampled initial noise, leading to misalignment with the model’s prior. In this work, we introduce a new approach that predicts a spatial layout aligned with the prompt, derived from the initial noise, and refines it throughout the denoising process. By relying on this noise-induced layout, we avoid conflicts with externally imposed layouts and better preserve the model’s prior. Our method employs a small neural network to predict and refine the evolving noise-induced layout at each denoising step, ensuring clear boundaries between subjects while maintaining consistency. Experimental results show that this noise-aligned strategy achieves improved text-image alignment and more stable multi-subject generation compared to existing layout-guided techniques, while preserving the rich diversity of the model’s original distribution.
nan
Article 847
Title@2025-05-27 (2): Hardware-Efficient Attention for Fast Decoding
Title: Hardware-Efficient Attention for Fast Decoding | Hardware-Effiziente Aufmerksamkeit für schnelle Dekodierung | 快速下标记的硬件高效关注 2505.21487v1 |
Authors: Ted Zadouri, Hubert Strauss, Tri Dao
LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay among arithmetic intensity, parallelization, and model quality and question whether current architectures fully exploit modern hardware. This work redesigns attention to perform more computation per byte loaded from memory to maximize hardware efficiency without trading off parallel scalability. We first propose Grouped-Tied Attention (GTA), a simple variant that combines and reuses key and value states, reducing memory transfers without compromising model quality. We then introduce Grouped Latent Attention (GLA), a parallel-friendly latent attention paired with low-level optimizations for fast decoding while maintaining high model quality. Experiments show that GTA matches Grouped-Query Attention (GQA) quality while using roughly half the KV cache and that GLA matches Multi-head Latent Attention (MLA) and is easier to shard. Our optimized GLA kernel is up to 2$\times$ faster than FlashMLA, for example, in a speculative decoding setting when the query length exceeds one. Furthermore, by fetching a smaller KV cache per device, GLA reduces end-to-end latency and increases throughput in online serving benchmarks by up to 2$\times$.
nan
Article 848
Title@2025-05-27 (2): Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models
Title: Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models | Algorithmen und SQ Lower Bounds für robustes Lernen Real-valuierte Multi-Index-Modelle | 强力学习实时估价多指数模型的等级和 SQ 下角宽度 2505.21475v1 |
Authors: Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Lisheng Ren
We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A $K$-MIM is a function $f:\mathbb{R}^d\to \mathbb{R}$ that depends only on the projection of its input onto a $K$-dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreover, we establish a nearly matching Statistical Query (SQ) lower bound, providing evidence that the complexity of our algorithm is qualitatively optimal as a function of the dimension. Specifically, we consider the class of bounded variation MIMs with the property that degree at most $m$ distinguishing moments exist with respect to projections onto any subspace. In the presence of adversarial label noise, the complexity of our learning algorithm is $d^{O(m)}2^{\mathrm{poly}(K/\epsilon)}$. For the realizable and independent noise settings, our algorithm incurs complexity $d^{O(m)}2^{\mathrm{poly}(K)}(1/\epsilon)^{O(K)}$. To complement our upper bound, we show that if for some subspace degree-$m$ distinguishing moments do not exist, then any SQ learner for the corresponding class of MIMs requires complexity $d^{\Omega(m)}$. As an application, we give the first efficient learner for the class of positive-homogeneous $L$-Lipschitz $K$-MIMs. The resulting algorithm has complexity $\mathrm{poly}(d) 2^{\mathrm{poly}(KL/\epsilon)}$. This gives a new PAC learning algorithm for Lipschitz homogeneous ReLU networks with complexity independent of the network size, removing the exponential dependence incurred in prior work.
nan
Article 849
Title@2025-05-27 (2): Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions
Title: Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions | Annealing Flow Generative Modelle zur Probenahme hochdimensionaler und multi-Modalen Verteilungen | 用于取样的高多样性和多模式分布和多模式分布的Ananining流程生成模型 2409.20547v4 |
Authors: Dongze Wu, Yao Xie
Sampling from high-dimensional, multi-modal distributions remains a fundamental challenge across domains such as statistical Bayesian inference and physics-based machine learning. In this paper, we propose Annealing Flow (AF), a method built on Continuous Normalizing Flow (CNF) for sampling from high-dimensional and multi-modal distributions. AF is trained with a dynamic Optimal Transport (OT) objective incorporating Wasserstein regularization, and guided by annealing procedures, facilitating effective exploration of modes in high-dimensional spaces. Compared to recent NF methods, AF greatly improves training efficiency and stability, with minimal reliance on MC assistance. We demonstrate the superior performance of AF compared to state-of-the-art methods through experiments on various challenging distributions and real-world datasets, particularly in high-dimensional and multi-modal settings. We also highlight AF potential for sampling the least favorable distributions.
nan
Article 850
Title@2025-05-27 (2): SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
Title: SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge | SOSBENCH: Benchmarking der Sicherheitsausrichtung auf wissenschaftliche Erkenntnisse | SOSBENCH:以科学知识为安全协调基准 2505.21605v1 |
Authors: Fengqing Jiang, Fengbo Ma, Zhangchen Xu, Yuetai Li, Bhaskar Ramasubramanian, Luyao Niu, Bo Li, Xianyan Chen, Zhen Xiang, Radha Poovendran
Large language models (LLMs) exhibit advancing capabilities in complex tasks, such as reasoning and graduate-level question answering, yet their resilience against misuse, particularly involving scientifically sophisticated risks, remains underexplored. Existing safety benchmarks typically focus either on instructions requiring minimal knowledge comprehension (e.g., ``tell me how to build a bomb”) or utilize prompts that are relatively low-risk (e.g., multiple-choice or classification tasks about hazardous content). Consequently, they fail to adequately assess model safety when handling knowledge-intensive, hazardous scenarios. To address this critical gap, we introduce SOSBench, a regulation-grounded, hazard-focused benchmark encompassing six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology. The benchmark comprises 3,000 prompts derived from real-world regulations and laws, systematically expanded via an LLM-assisted evolutionary pipeline that introduces diverse, realistic misuse scenarios (e.g., detailed explosive synthesis instructions involving advanced chemical formulas). We evaluate frontier models within a unified evaluation framework using our SOSBench. Despite their alignment claims, advanced models consistently disclose policy-violating content across all domains, demonstrating alarmingly high rates of harmful responses (e.g., 79.1% for Deepseek-R1 and 47.3% for GPT-4.1). These results highlight significant safety alignment deficiencies and underscore urgent concerns regarding the responsible deployment of powerful LLMs.
nan
Article 851
Title@2025-05-27 (2): Guide your favorite protein sequence generative model
Title: Guide your favorite protein sequence generative model | Führen Sie Ihre Lieblings-Protein-Sequenz generative Modell | 指导您最喜爱的蛋白质序列基因模型 2505.04823v2 |
Authors: Junhao Xiong, Hunter Nisonoff, Maria Lukarska, Ishan Gaur, Luke M. Oltrogge, David F. Savage, Jennifer Listgarten
Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in a plug-and-play manner. Herein, we present ProteinGuide – a principled and general method for conditioning – by unifying a broad class of protein generative models under a single framework. We demonstrate the applicability of ProteinGuide by guiding two protein generative models, ProteinMPNN and ESM3, to generate amino acid and structure token sequences, conditioned on several user-specified properties such as enhanced stability, enzyme classes, and CATH-labeled folds. We also used ProteinGuide with inverse folding models and our own experimental assay to design adenine base editor sequences for high activity.
nan
Article 852
Title@2025-05-27 (2): When Are Concepts Erased From Diffusion Models?
Title: When Are Concepts Erased From Diffusion Models? | Wann werden Konzepte von Diffusionsmodellen ausgelöscht? | 概念何时从传播模型中消失? 2505.17013v3 |
Authors: Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen
Concept erasure, the ability to selectively prevent a model from generating specific concepts, has attracted growing interest, with various approaches emerging to address the challenge. However, it remains unclear how thoroughly these methods erase the target concept. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) reducing the likelihood of generating the target concept, and (ii) interfering with the model’s internal guidance mechanisms. To thoroughly assess whether a concept has been truly erased from the model, we introduce a suite of independent evaluations. Our evaluation framework includes adversarial attacks, novel probing techniques, and analysis of the model’s alternative generations in place of the erased concept. Our results shed light on the tension between minimizing side effects and maintaining robustness to adversarial prompts. Broadly, our work underlines the importance of comprehensive evaluation for erasure in diffusion models.
nan
Article 853
Title@2025-05-27 (2): On the Robustness of Adversarial Training Against Uncertainty Attacks
Title: On the Robustness of Adversarial Training Against Uncertainty Attacks | Über die Robustheit des zweifelhaften Trainings gegen Ungewissheitsangriffe | 关于防止不确定袭击的反逆训练的有力性 2410.21952v2 |
Authors: Emanuele Ledda, Giovanni Scodeller, Daniele Angioni, Giorgio Piras, Antonio Emanuele Cinà, Giorgio Fumera, Battista Biggio, Fabio Roli
In learning problems, the noise inherent to the task at hand hinders the possibility to infer without a certain degree of uncertainty. Quantifying this uncertainty, regardless of its wide use, assumes high relevance for security-sensitive applications. Within these scenarios, it becomes fundamental to guarantee good (i.e., trustworthy) uncertainty measures, which downstream modules can securely employ to drive the final decision-making process. However, an attacker may be interested in forcing the system to produce either (i) highly uncertain outputs jeopardizing the system’s availability or (ii) low uncertainty estimates, making the system accept uncertain samples that would instead require a careful inspection (e.g., human intervention). Therefore, it becomes fundamental to understand how to obtain robust uncertainty estimates against these kinds of attacks. In this work, we reveal both empirically and theoretically that defending against adversarial examples, i.e., carefully perturbed samples that cause misclassification, additionally guarantees a more secure, trustworthy uncertainty estimate under common attack scenarios without the need for an ad-hoc defense strategy. To support our claims, we evaluate multiple adversarial-robust models from the publicly available benchmark RobustBench on the CIFAR-10 and ImageNet datasets.
nan
Article 854
Title@2025-05-27 (2): Causal Posterior Estimation
Title: Causal Posterior Estimation | Kausale hintere Schätzung | Causal Posides 估计值 2505.21468v1 |
Authors: Simon Dirmeier, Antonietta Mira
We present Causal Posterior Estimation (CPE), a novel method for Bayesian inference in simulator models, i.e., models where the evaluation of the likelihood function is intractable or too computationally expensive, but where one can simulate model outputs given parameter values. CPE utilizes a normalizing flow-based (NF) approximation to the posterior distribution which carefully incorporates the conditional dependence structure induced by the graphical representation of the model into the neural network. Thereby it is possible to improve the accuracy of the approximation. We introduce both discrete and continuous NF architectures for CPE and propose a constant-time sampling procedure for the continuous case which reduces the computational complexity of drawing samples to O(1) as for discrete NFs. We show, through an extensive experimental evaluation, that by incorporating the conditional dependencies induced by the graphical model directly into the neural network, rather than learning them from data, CPE is able to conduct highly accurate posterior inference either outperforming or matching the state of the art in the field.
nan
Article 855
Title@2025-05-27 (2): GeLLMO: Generalizing Large Language Models for Multi-property Molecule Optimization
Title: GeLLMO: Generalizing Large Language Models for Multi-property Molecule Optimization | GeLLMO: Verallgemeinern von großen Sprachmodellen für Multi-Property-Molekül-Optimierung | GELLMO:通用多财产分子优化大语言模型 2502.13398v2 |
Authors: Vishal Dey, Xiao Hu, Xia Ning
Despite recent advancements, most computational methods for molecule optimization are constrained to single- or double-property optimization tasks and suffer from poor scalability and generalizability to novel optimization tasks. Meanwhile, Large Language Models (LLMs) demonstrate remarkable out-of-domain generalizability to novel tasks. To demonstrate LLMs’ potential for molecule optimization, we introduce MuMOInstruct, the first high-quality instruction-tuning dataset specifically focused on complex multi-property molecule optimization tasks. Leveraging MuMOInstruct, we develop GeLLMOs, a series of instruction-tuned LLMs for molecule optimization. Extensive evaluations across 5 in-domain and 5 out-of-domain tasks demonstrate that GeLLMOs consistently outperform state-of-the-art baselines. GeLLMOs also exhibit outstanding zero-shot generalization to unseen tasks, significantly outperforming powerful closed-source LLMs. Such strong generalizability demonstrates the tremendous potential of GeLLMOs as foundational models for molecule optimization, thereby tackling novel optimization tasks without resource-intensive retraining. MuMOInstruct, models, and code are accessible through https://github.com/ninglab/GeLLMO.
nan
Article 856
Title@2025-05-27 (2): High-Dimensional Calibration from Swap Regret
Title: High-Dimensional Calibration from Swap Regret | Hochdimensionale Kalibrierung aus Swap-Regret | 从 Swap Regret 进行高维校准 2505.21460v1 |
Authors: Maxwell Fishelson, Noah Golowich, Mehryar Mohri, Jon Schneider
We study the online calibration of multi-dimensional forecasts over an arbitrary convex set $\mathcal{P} \subset \mathbb{R}^d$ relative to an arbitrary norm $\Vert\cdot\Vert$. We connect this with the problem of external regret minimization for online linear optimization, showing that if it is possible to guarantee $O(\sqrt{\rho T})$ worst-case regret after $T$ rounds when actions are drawn from $\mathcal{P}$ and losses are drawn from the dual $\Vert \cdot \Vert_*$ unit norm ball, then it is also possible to obtain $\epsilon$-calibrated forecasts after $T = \exp(O(\rho /\epsilon^2))$ rounds. When $\mathcal{P}$ is the $d$-dimensional simplex and $\Vert \cdot \Vert$ is the $\ell_1$-norm, the existence of $O(\sqrt{T\log d})$-regret algorithms for learning with experts implies that it is possible to obtain $\epsilon$-calibrated forecasts after $T = \exp(O(\log{d}/\epsilon^2)) = d^{O(1/\epsilon^2)}$ rounds, recovering a recent result of Peng (2025). Interestingly, our algorithm obtains this guarantee without requiring access to any online linear optimization subroutine or knowledge of the optimal rate $\rho$ – in fact, our algorithm is identical for every setting of $\mathcal{P}$ and $\Vert \cdot \Vert$. Instead, we show that the optimal regularizer for the above OLO problem can be used to upper bound the above calibration error by a swap regret, which we then minimize by running the recent TreeSwap algorithm with Follow-The-Leader as a subroutine. Finally, we prove that any online calibration algorithm that guarantees $\epsilon T$ $\ell_1$-calibration error over the $d$-dimensional simplex requires $T \geq \exp(\mathrm{poly}(1/\epsilon))$ (assuming $d \geq \mathrm{poly}(1/\epsilon)$). This strengthens the corresponding $d^{\Omega(\log{1/\epsilon})}$ lower bound of Peng, and shows that an exponential dependence on $1/\epsilon$ is necessary.
nan
Article 857
Title@2025-05-27 (2): Designing Cyclic Peptides via Harmonic SDE with Atom-Bond Modeling
Title: Designing Cyclic Peptides via Harmonic SDE with Atom-Bond Modeling | Konzipieren von Cyclic Peptides über Harmonische SDE mit Atom-Bond-Modellierung | 通过使用原子-体型建模的波力SDE, 设计圆性五氯苯并配有原子-体型建模 2505.21452v1 |
Authors: Xiangxin Zhou, Mingyu Li, Yi Xiao, Jiahan Li, Dongyu Xue, Zaixiang Zheng, Jianzhu Ma, Quanquan Gu
Cyclic peptides offer inherent advantages in pharmaceuticals. For example, cyclic peptides are more resistant to enzymatic hydrolysis compared to linear peptides and usually exhibit excellent stability and affinity. Although deep generative models have achieved great success in linear peptide design, several challenges prevent the development of computational methods for designing diverse types of cyclic peptides. These challenges include the scarcity of 3D structural data on target proteins and associated cyclic peptide ligands, the geometric constraints that cyclization imposes, and the involvement of non-canonical amino acids in cyclization. To address the above challenges, we introduce CpSDE, which consists of two key components: AtomSDE, a generative structure prediction model based on harmonic SDE, and ResRouter, a residue type predictor. Utilizing a routed sampling algorithm that alternates between these two models to iteratively update sequences and structures, CpSDE facilitates the generation of cyclic peptides. By employing explicit all-atom and bond modeling, CpSDE overcomes existing data limitations and is proficient in designing a wide variety of cyclic peptides. Our experimental results demonstrate that the cyclic peptides designed by our method exhibit reliable stability and affinity.
nan
Article 858
Title@2025-05-27 (2): Training neural control variates using correlated configurations
Title: Training neural control variates using correlated configurations | Ausbildung von Neuralsteuerungsvariaten mit korrelierten Konfigurationen | 使用相关配置的培训神经控制变异 2505.07719v2 |
Authors: Hyunwoo Oh
Neural control variates (NCVs) have emerged as a powerful tool for variance reduction in Monte Carlo (MC) simulations, particularly in high-dimensional problems where traditional control variates are difficult to construct analytically. By training neural networks to learn auxiliary functions correlated with the target observable, NCVs can significantly reduce estimator variance while preserving unbiasedness. However, a critical but often overlooked aspect of NCV training is the role of autocorrelated samples generated by Markov Chain Monte Carlo (MCMC). While such samples are typically discarded for error estimation due to their statistical redundancy, they may contain useful information about the structure of the underlying probability distribution that can benefit the training process. In this work, we systematically examine the effect of using correlated configurations in training neural control variates. We demonstrate, both conceptually and numerically, that training on correlated data can improve control variate performance, especially in settings with limited computational resources. Our analysis includes empirical results from $U(1)$ gauge theory and scalar field theory, illustrating when and how autocorrelated samples enhance NCV construction. These findings provide practical guidance for the efficient use of MCMC data in training neural networks.
nan
Article 859
Title@2025-05-27 (2): When Two LLMs Debate, Both Think They’ll Win
Title: When Two LLMs Debate, Both Think They’ll Win | Wenn zwei LLMs diskutieren, denken beide, dass sie gewinnen werden | 当两个LLM 辩论, 双方都认为他们会赢 2505.19184v2 |
Authors: Pradyumna Shyama Prasad, Minh Nhat Nguyen
Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language Models (LLMs) in a dynamic, adversarial debate setting, uniquely combining two realistic factors: (a) a multi-turn format requiring models to update beliefs as new information emerges, and (b) a zero-sum structure to control for task-related uncertainty, since mutual high-confidence claims imply systematic overconfidence. We organized 60 three-round policy debates among ten state-of-the-art LLMs, with models privately rating their confidence (0-100) in winning after each round. We observed five concerning patterns: (1) Systematic overconfidence: models began debates with average initial confidence of 72.9% vs. a rational 50% baseline. (2) Confidence escalation: rather than reducing confidence as debates progressed, debaters increased their win probabilities, averaging 83% by the final round. (3) Mutual overestimation: in 61.7% of debates, both sides simultaneously claimed >=75% probability of victory, a logical impossibility. (4) Persistent self-debate bias: models debating identical copies increased confidence from 64.1% to 75.2%; even when explicitly informed their chance of winning was exactly 50%, confidence still rose (from 50.0% to 57.1%). (5) Misaligned private reasoning: models’ private scratchpad thoughts sometimes differed from their public confidence ratings, raising concerns about faithfulness of chain-of-thought reasoning. These results suggest LLMs lack the ability to accurately self-assess or update their beliefs in dynamic, multi-turn tasks; a major concern as LLM outputs are deployed without careful review in assistant roles or agentic settings.
nan
Article 860
Title@2025-05-27 (2): Leveraging XP and CRISP-DM for Agile Data Science Projects
Title: Leveraging XP and CRISP-DM for Agile Data Science Projects | Nutzung von XP und CRISP-DM für agile Data Science Projekte | 利用XP和CRISP-DM为敏感数据科学项目发挥杠杆作用 2505.21603v1 |
Authors: Andre Massahiro Shimaoka, Renato Cordeiro Ferreira, Alfredo Goldman
This study explores the integration of eXtreme Programming (XP) and the Cross-Industry Standard Process for Data Mining (CRISP-DM) in agile Data Science projects. We conducted a case study at the e-commerce company Elo7 to answer the research question: How can the agility of the XP method be integrated with CRISP-DM in Data Science projects? Data was collected through interviews and questionnaires with a Data Science team consisting of data scientists, ML engineers, and data product managers. The results show that 86% of the team frequently or always applies CRISP-DM, while 71% adopt XP practices in their projects. Furthermore, the study demonstrates that it is possible to combine CRISP-DM with XP in Data Science projects, providing a structured and collaborative approach. Finally, the study generated improvement recommendations for the company.
nan
Article 861
Title@2025-05-27 (2): Can Large Reasoning Models Self-Train?
Title: Can Large Reasoning Models Self-Train? | Können sich große vernünftigen Modelle selbst entwickeln? | 大理由模型能够自我培训吗? 2505.21444v1 |
Authors: Sheikh Shafayat, Fahim Tajwar, Ruslan Salakhutdinov, Jeff Schneider, Andrea Zanette
Scaling the performance of large language models (LLMs) increasingly depends on methods that reduce reliance on human supervision. Reinforcement learning from automated verification offers an alternative, but it incurs scalability limitations due to dependency upon human-designed verifiers. Self-training, where the model’s own judgment provides the supervisory signal, presents a compelling direction. We propose an online self-training reinforcement learning algorithm that leverages the model’s self-consistency to infer correctness signals and train without any ground-truth supervision. We apply the algorithm to challenging mathematical reasoning tasks and show that it quickly reaches performance levels rivaling reinforcement-learning methods trained explicitly on gold-standard answers. Additionally, we analyze inherent limitations of the algorithm, highlighting how the self-generated proxy reward initially correlated with correctness can incentivize reward hacking, where confidently incorrect outputs are favored. Our results illustrate how self-supervised improvement can achieve significant performance gains without external labels, while also revealing its fundamental challenges.
nan
Article 862
Title@2025-05-27 (2): Autoencoding Random Forests
Title: Autoencoding Random Forests | Zufällige Wälder automatisch kodieren | 自动编码随机森林 2505.21441v1 |
Authors: Binh Duc Vu, Jan Kapar, Marvin Wright, David S. Watson
We propose a principled method for autoencoding with random forests. Our strategy builds on foundational results from nonparametric statistics and spectral graph theory to learn a low-dimensional embedding of the model that optimally represents relationships in the data. We provide exact and approximate solutions to the decoding problem via constrained optimization, split relabeling, and nearest neighbors regression. These methods effectively invert the compression pipeline, establishing a map from the embedding space back to the input space using splits learned by the ensemble’s constituent trees. The resulting decoders are universally consistent under common regularity assumptions. The procedure works with supervised or unsupervised models, providing a window into conditional or joint distributions. We demonstrate various applications of this autoencoder, including powerful new tools for visualization, compression, clustering, and denoising. Experiments illustrate the ease and utility of our method in a wide range of settings, including tabular, image, and genomic data.
nan
Article 863
Title@2025-05-27 (2): ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition
Title: ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition | ANCHOLIK-NER: Ein Benchmark-Datensatz für Bangla Regional Named Entity Recognition | ANCHOLIK-NER:孟加拉地区命名实体识别基准数据集 2502.11198v3 |
Authors: Bidyarthi Paul, Faika Fairuj Preotee, Shuvashis Sarker, Shamim Rahim Refat, Shifat Islam, Tashreef Muhammad, Mohammad Ashraful Hoque, Shahriar Manzoor
Named Entity Recognition (NER) in regional dialects is a critical yet underexplored area in Natural Language Processing (NLP), especially for low-resource languages like Bangla. While NER systems for Standard Bangla have made progress, no existing resources or models specifically address the challenge of regional dialects such as Barishal, Chittagong, Mymensingh, Noakhali, and Sylhet, which exhibit unique linguistic features that existing models fail to handle effectively. To fill this gap, we introduce ANCHOLIK-NER, the first benchmark dataset for NER in Bangla regional dialects, comprising 17,405 sentences distributed across five regions. The dataset was sourced from publicly available resources and supplemented with manual translations, ensuring alignment of named entities across dialects. We evaluate three transformer-based models - Bangla BERT, Bangla BERT Base, and BERT Base Multilingual Cased - on this dataset. Our findings demonstrate that BERT Base Multilingual Cased performs best in recognizing named entities across regions, with significant performance observed in Mymensingh with an F1-score of 82.611%. Despite strong overall performance, challenges remain in region like Chittagong, where the models show lower precision and recall. Since no previous NER systems for Bangla regional dialects exist, our work represents a foundational step in addressing this gap. Future work will focus on improving model performance in underperforming regions and expanding the dataset to include more dialects, enhancing the development of dialect-aware NER systems.
nan
Article 864
Title@2025-05-27 (2): Measuring Fine-Grained Relatedness in Multitask Learning via Data Attribution
Title: Measuring Fine-Grained Relatedness in Multitask Learning via Data Attribution | Messung der feinkörnigen Verbundenheit im Multitasking-Lernen über Datenzuweisung | 通过数据归责衡量多任务学习中的细微关联 2505.21438v1 |
Authors: Yiwen Tu, Ziqi Liu, Jiaqi W. Ma, Weijing Tang
Measuring task relatedness and mitigating negative transfer remain a critical open challenge in Multitask Learning (MTL). This work extends data attribution – which quantifies the influence of individual training data points on model predictions – to MTL setting for measuring task relatedness. We propose the MultiTask Influence Function (MTIF), a method that adapts influence functions to MTL models with hard or soft parameter sharing. Compared to conventional task relatedness measurements, MTIF provides a fine-grained, instance-level relatedness measure beyond the entire-task level. This fine-grained relatedness measure enables a data selection strategy to effectively mitigate negative transfer in MTL. Through extensive experiments, we demonstrate that the proposed MTIF efficiently and accurately approximates the performance of models trained on data subsets. Moreover, the data selection strategy enabled by MTIF consistently improves model performance in MTL. Our work establishes a novel connection between data attribution and MTL, offering an efficient and fine-grained solution for measuring task relatedness and enhancing MTL models.
nan
Article 865
Title@2025-05-27 (2): Distributional Scaling for Emergent Capabilities
Title: Distributional Scaling for Emergent Capabilities | Verteilungsskalierung für Emergent Capabilities | 新兴市场能力分配比例 2502.17356v3 |
Authors: Rosie Zhao, Tian Qin, David Alvarez-Melis, Sham Kakade, Naomi Saphra
This paper explores the nature of sudden breakthroughs in language model performance at scale, which stand in contrast to smooth improvements governed by scaling laws. While advocates of “emergence” view breakthroughs as unlocked capabilities, others attribute them to thresholding effects on noncontinuous metrics. We propose that breakthroughs are instead driven by continuous changes in the probability distribution of training outcomes when performance is bimodally distributed across random seeds. In synthetic length generalization tasks, we show that different random seeds can produce either highly linear or emergent scaling trends. We reveal that sharp breakthroughs in metrics are produced by underlying continuous changes in their distribution across seeds. Furthermore, we provide a case study of inverse scaling. We validate our distributional scaling framework on realistic settings by measuring MMLU performance in LM populations. These insights emphasize the role of random variation in the effect of scale on LM capabilities.
nan
Article 866
Title@2025-05-27 (2): Attribute-Efficient PAC Learning of Sparse Halfspaces with Constant Malicious Noise Rate
Title: Attribute-Efficient PAC Learning of Sparse Halfspaces with Constant Malicious Noise Rate | Effizientes PAC-Lernen von Sparse-Halbräumen mit konstanter bösartiger Lärmrate | 以常态恶意噪音率学习粗微半空空间的属性- 有效 PAC 学习 2505.21430v1 |
Authors: Shiwei Zeng, Jie Shen
Attribute-efficient learning of sparse halfspaces has been a fundamental problem in machine learning theory. In recent years, machine learning algorithms are faced with prevalent data corruptions or even adversarial attacks. It is of central interest to design efficient algorithms that are robust to noise corruptions. In this paper, we consider that there exists a constant amount of malicious noise in the data and the goal is to learn an underlying $s$-sparse halfspace $w^* \in \mathbb{R}^d$ with $\text{poly}(s,\log d)$ samples. Specifically, we follow a recent line of works and assume that the underlying distribution satisfies a certain concentration condition and a margin condition at the same time. Under such conditions, we show that attribute-efficiency can be achieved by simple variants to existing hinge loss minimization programs. Our key contribution includes: 1) an attribute-efficient PAC learning algorithm that works under constant malicious noise rate; 2) a new gradient analysis that carefully handles the sparsity constraint in hinge loss minimization.
nan
Article 867
Title@2025-05-27 (2): QuForge: A Library for Qudits Simulation
Title: QuForge: A Library for Qudits Simulation | QuForge: Eine Bibliothek für Qudits Simulation | Quforge: Quits 模拟图书馆 2409.17716v2 |
Authors: Tiago de Souza Farias, Lucas Friedrich, Jonas Maziero
Quantum computing with qudits, an extension of qubits to multiple levels, is a research field less mature than qubit-based quantum computing. However, qudits can offer some advantages over qubits, by representing information with fewer separated components. In this article, we present QuForge, a Python-based library designed to simulate quantum circuits with qudits. This library provides the necessary quantum gates for implementing quantum algorithms, tailored to any chosen qudit dimension. Built on top of differentiable frameworks, QuForge supports execution on accelerating devices such as GPUs and TPUs, significantly speeding up simulations. It also supports sparse operations, leading to a reduction in memory consumption compared to other libraries. Additionally, by constructing quantum circuits as differentiable graphs, QuForge facilitates the implementation of quantum machine learning algorithms, enhancing the capabilities and flexibility of quantum computing research.
nan
Article 868
Title@2025-05-27 (2): Stochastic Online Conformal Prediction with Semi-Bandit Feedback
Title: Stochastic Online Conformal Prediction with Semi-Bandit Feedback | Stochastische Online-Konforme Vorhersage mit Halbbandit Feedback | 具有半银行反馈的在线非正式预测 2405.13268v3 |
Authors: Haosen Ge, Hamsa Bastani, Osbert Bastani
Conformal prediction has emerged as an effective strategy for uncertainty quantification by modifying a model to output sets of labels instead of a single label. These prediction sets come with the guarantee that they contain the true label with high probability. However, conformal prediction typically requires a large calibration dataset of i.i.d. examples. We consider the online learning setting, where examples arrive over time, and the goal is to construct prediction sets dynamically. Departing from existing work, we assume semi-bandit feedback, where we only observe the true label if it is contained in the prediction set. For instance, consider calibrating a document retrieval model to a new domain; in this setting, a user would only be able to provide the true label if the target document is in the prediction set of retrieved documents. We propose a novel conformal prediction algorithm targeted at this setting, and prove that it obtains sublinear regret compared to the optimal conformal predictor. We evaluate our algorithm on a retrieval task, an image classification task, and an auction price-setting task, and demonstrate that it empirically achieves good performance compared to several baselines.
nan
Article 869
Title@2025-05-27 (2): R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Title: R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing | R2R: Effizientes Navigieren unterschiedlicher Vernunftpfade mit klein-großen Model Token Routing | R2R: 以小型模型调速器有效导航差异性理性路径 2505.21600v1 |
Authors: Tianyu Fu, Yi Ge, Yichen You, Enshu Liu, Zhihang Yuan, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang
Large Language Models (LLMs) achieve impressive reasoning capabilities at the cost of substantial inference overhead, posing substantial deployment challenges. Although distilled Small Language Models (SLMs) significantly enhance efficiency, their performance suffers as they fail to follow LLMs’ reasoning paths. Luckily, we reveal that only a small fraction of tokens genuinely diverge reasoning paths between LLMs and SLMs. Most generated tokens are either identical or exhibit neutral differences, such as minor variations in abbreviations or expressions. Leveraging this insight, we introduce Roads to Rome (R2R), a neural token routing method that selectively utilizes LLMs only for these critical, path-divergent tokens, while leaving the majority of token generation to the SLM. We also develop an automatic data generation pipeline that identifies divergent tokens and generates token-level routing labels to train the lightweight router. We apply R2R to combine R1-1.5B and R1-32B models from the DeepSeek family, and evaluate on challenging math, coding, and QA benchmarks. With an average activated parameter size of 5.6B, R2R surpasses the average accuracy of R1-7B by 1.6x, outperforming even the R1-14B model. Compared to R1-32B, it delivers a 2.8x wall-clock speedup with comparable performance, advancing the Pareto frontier of test-time scaling efficiency. Our code is available at https://github.com/thu-nics/R2R.
nan
Article 870
Title@2025-05-27 (2): Policy Induction: Predicting Startup Success via Explainable Memory-Augmented In-Context Learning
Title: Policy Induction: Predicting Startup Success via Explainable Memory-Augmented In-Context Learning | Politische Induktion: Vorhersage des Startup-Erfolgs durch erklärbares Memory-Augmented In-Context Learning | 政策介绍:通过可解释的记忆增强的内文学习预测启动成功 2505.21427v1 |
Authors: Xianling Mu, Joseph Ternasky, Fuat Alican, Yigit Ihlamur
Early-stage startup investment is a high-risk endeavor characterized by scarce data and uncertain outcomes. Traditional machine learning approaches often require large, labeled datasets and extensive fine-tuning, yet remain opaque and difficult for domain experts to interpret or improve. In this paper, we propose a transparent and data-efficient investment decision framework powered by memory-augmented large language models (LLMs) using in-context learning (ICL). Central to our method is a natural language policy embedded directly into the LLM prompt, enabling the model to apply explicit reasoning patterns and allowing human experts to easily interpret, audit, and iteratively refine the logic. We introduce a lightweight training process that combines few-shot learning with an in-context learning loop, enabling the LLM to update its decision policy iteratively based on structured feedback. With only minimal supervision and no gradient-based optimization, our system predicts startup success far more accurately than existing benchmarks. It is over 20x more precise than random chance, which succeeds 1.9% of the time. It is also 7.1x more precise than the typical 5.6% success rate of top-tier venture capital (VC) firms.
nan
Article 871
Title@2025-05-27 (2): Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks
Title: Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks | Individuelles Verhalten in agentenbasierten Modellen mit Graph Diffusionsnetzwerken lernen | 具有图表传播网络的基于代理模型的学习个人行为 2505.21426v1 |
Authors: Francesco Cozzi, Marco Pangallo, Alan Perotti, André Panisson, Corrado Monti
Agent-Based Models (ABMs) are powerful tools for studying emergent properties in complex systems. In ABMs, agent behaviors are governed by local interactions and stochastic rules. However, these rules are, in general, non-differentiable, limiting the use of gradient-based methods for optimization, and thus integration with real-world data. We propose a novel framework to learn a differentiable surrogate of any ABM by observing its generated data. Our method combines diffusion models to capture behavioral stochasticity and graph neural networks to model agent interactions. Distinct from prior surrogate approaches, our method introduces a fundamental shift: rather than approximating system-level outputs, it models individual agent behavior directly, preserving the decentralized, bottom-up dynamics that define ABMs. We validate our approach on two ABMs (Schelling’s segregation model and a Predator-Prey ecosystem) showing that it replicates individual-level patterns and accurately forecasts emergent dynamics beyond training. Our results demonstrate the potential of combining diffusion models and graph learning for data-driven ABM simulation.
nan
Article 872
Title@2025-05-27 (2): GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
Title: GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning | GenPO: Generative Diffusionsmodelle treffen auf On-Policy-Verstärkungs-Lernen | GENPO: 符合政策强化学习的生成传播模式 2505.18763v2 |
Authors: Shutong Ding, Ke Hu, Shan Zhong, Haoyang Luo, Weinan Zhang, Jingya Wang, Jun Wang, Ye Shi
Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e.g., Euler-Maruyama approximations). To bridge this gap, we propose GenPO, a generative policy optimization framework that leverages exact diffusion inversion to construct invertible action mappings. GenPO introduces a novel doubled dummy action mechanism that enables invertibility via alternating updates, resolving log-likelihood computation barriers. Furthermore, we also use the action log-likelihood for unbiased entropy and KL divergence estimation, enabling KL-adaptive learning rates and entropy regularization in on-policy updates. Extensive experiments on eight IsaacLab benchmarks, including legged locomotion (Ant, Humanoid, Anymal-D, Unitree H1, Go2), dexterous manipulation (Shadow Hand), aerial control (Quadcopter), and robotic arm tasks (Franka), demonstrate GenPO’s superiority over existing RL baselines. Notably, GenPO is the first method to successfully integrate diffusion policies into on-policy RL, unlocking their potential for large-scale parallelized training and real-world robotic deployment.
nan
Article 873
Title@2025-05-27 (2): A Lightweight Method to Disrupt Memorized Sequences in LLM
Title: A Lightweight Method to Disrupt Memorized Sequences in LLM | Eine leichte Methode zum Disruptieren von gemerkten Sequenzen in LLM | LLM 中破坏记忆序列的轻量方法 2502.05159v2 |
Authors: Parjanya Prajakta Prashant, Kaustubh Ponkshe, Babak Salimi
As language models scale, their performance improves dramatically across a wide range of tasks, but so does their tendency to memorize and regurgitate parts of their training data verbatim. This tradeoff poses serious legal, ethical, and safety concerns, especially in real-world deployments. Existing mitigation techniques, such as differential privacy or model unlearning, often require retraining or access to internal weights making them impractical for most users. In this work, we introduce TokenSwap, a lightweight, post-hoc defense designed for realistic settings where the user can only access token-level outputs. Our key insight is that while large models are necessary for high task performance, small models (e.g., DistilGPT-2) are often sufficient to assign fluent, grammatically plausible probabilities to common function words - and crucially, they memorize far less. By selectively swapping token probabilities between models, TokenSwap preserves the capabilities of large models while reducing their propensity for verbatim reproduction. Evaluations on Pythia-6.9B and Llama-3-8B show up to a 10$\times$ drop in exact memorization with negligible task degradation. Our method offers a practical, accessible solution for mitigating memorized generation in deployed LLMs.
nan
Article 874
Title@2025-05-27 (2): Can Large Language Models Understand Symbolic Graphics Programs?
Title: Can Large Language Models Understand Symbolic Graphics Programs? | Können große Sprachmodelle symbolische Grafikprogramme verstehen? | 大语言模型能理解符号图形程序吗? 2408.08313v4 |
Authors: Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf
Against the backdrop of enthusiasm for large language models (LLMs), there is a growing need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models have not encountered during training. Utilizing symbolic graphics programs, we propose a domain well-suited to test multiple spatial-semantic reasoning skills of LLMs. Popular in computer graphics, these programs procedurally generate visual data. While LLMs exhibit impressive skills in general program synthesis and analysis, symbolic graphics programs offer a new layer of evaluation: they allow us to test an LLM’s ability to answer semantic questions about the images or 3D geometries without a vision encoder. To semantically understand the symbolic programs, LLMs would need to possess the ability to “imagine” and reason how the corresponding graphics content would look with only the symbolic description of the local curvatures and strokes. We use this task to evaluate LLMs by creating a large benchmark for the semantic visual understanding of symbolic graphics programs, built procedurally with minimal human effort. Particular emphasis is placed on transformations of images that leave the image level semantics invariant while introducing significant changes to the underlying program. We evaluate commercial and open-source LLMs on our benchmark to assess their ability to reason about visual output of programs, finding that LLMs considered stronger at reasoning generally perform better. Lastly, we introduce a novel method to improve this ability – Symbolic Instruction Tuning (SIT), in which the LLM is finetuned with pre-collected instruction data on symbolic graphics programs. Interestingly, we find that SIT not only improves LLM’s understanding on symbolic programs, but it also improves general reasoning ability on various other benchmarks.
nan
Article 875
Title@2025-05-27 (2): Optimizing Deep Learning for Skin Cancer Classification: A Computationally Efficient CNN with Minimal Accuracy Trade-Off
Title: Optimizing Deep Learning for Skin Cancer Classification: A Computationally Efficient CNN with Minimal Accuracy Trade-Off | Deep Learning für Hautkrebs-Klassifikation optimieren: Ein Computational Efficient CNN mit minimaler Genauigkeit Trade-Off | 最优化皮肤癌症分类深层学习:计算效率高的有线电视新闻网与最低准确性交易 2505.21597v1 |
Authors: Abdullah Al Mamun, Pollob Chandra Ray, Md Rahat Ul Nasib, Akash Das, Jia Uddin, Md Nurul Absur
The rapid advancement of deep learning in medical image analysis has greatly enhanced the accuracy of skin cancer classification. However, current state-of-the-art models, especially those based on transfer learning like ResNet50, come with significant computational overhead, rendering them impractical for deployment in resource-constrained environments. This study proposes a custom CNN model that achieves a 96.7\% reduction in parameters (from 23.9 million in ResNet50 to 692,000) while maintaining a classification accuracy deviation of less than 0.022\%. Our empirical analysis of the HAM10000 dataset reveals that although transfer learning models provide a marginal accuracy improvement of approximately 0.022\%, they result in a staggering 13,216.76\% increase in FLOPs, considerably raising computational costs and inference latency. In contrast, our lightweight CNN architecture, which encompasses only 30.04 million FLOPs compared to ResNet50’s 4.00 billion, significantly reduces energy consumption, memory footprint, and inference time. These findings underscore the trade-off between the complexity of deep models and their real-world feasibility, positioning our optimized CNN as a practical solution for mobile and edge-based skin cancer diagnostics.
nan
Article 876
Title@2025-05-27 (2): Learning optimal treatment strategies for intraoperative hypotension using deep reinforcement learning
Title: Learning optimal treatment strategies for intraoperative hypotension using deep reinforcement learning | Optimale Therapiestrategien für intraoperative Hypotonie mit Deep-Enforcement-Lernen | 利用深强化学习学习,学习采用最佳治疗战略,以弥补职业内衰退 2505.21596v1 |
Authors: Esra Adiyeke, Tianqi Liu, Venkata Sai Dheeraj Naganaboina, Han Li, Tyler J. Loftus, Yuanfang Ren, Benjamin Shickel, Matthew M. Ruppert, Karandeep Singh, Ruogu Fang, Parisa Rashidi, Azra Bihorac, Tezcan Ozrazgat-Baslanti
Traditional methods of surgical decision making heavily rely on human experience and prompt actions, which are variable. A data-driven system generating treatment recommendations based on patient states can be a substantial asset in perioperative decision-making, as in cases of intraoperative hypotension, for which suboptimal management is associated with acute kidney injury (AKI), a common and morbid postoperative complication. We developed a Reinforcement Learning (RL) model to recommend optimum dose of intravenous (IV) fluid and vasopressors during surgery to avoid intraoperative hypotension and postoperative AKI. We retrospectively analyzed 50,021 surgeries from 42,547 adult patients who underwent major surgery at a quaternary care hospital between June 2014 and September 2020. Of these, 34,186 surgeries were used for model training and 15,835 surgeries were reserved for testing. We developed a Deep Q-Networks based RL model using 16 variables including intraoperative physiologic time series, total dose of IV fluid and vasopressors extracted for every 15-minute epoch. The model replicated 69% of physician’s decisions for the dosage of vasopressors and proposed higher or lower dosage of vasopressors than received in 10% and 21% of the treatments, respectively. In terms of IV fluids, the model’s recommendations were within 0.05 ml/kg/15 min of the actual dose in 41% of the cases, with higher or lower doses recommended for 27% and 32% of the treatments, respectively. The model resulted in a higher estimated policy value compared to the physicians’ actual treatments, as well as random and zero-drug policies. AKI prevalence was the lowest in patients receiving medication dosages that aligned with model’s decisions. Our findings suggest that implementation of the model’s policy has the potential to reduce postoperative AKI and improve other outcomes driven by intraoperative hypotension.
nan
Article 877
Title@2025-05-27 (2): Relevance-driven Input Dropout: an Explanation-guided Regularization Technique
Title: Relevance-driven Input Dropout: an Explanation-guided Regularization Technique | Relevanz-gesteuerter Input Dropout: eine Erklärungs-geführte Regularisierungstechnik | 由相关性驱动的 “ 投入辍学:解释指导规范化技术 “ 2505.21595v1 |
Authors: Shreyas Gururaj, Lars Grüne, Wojciech Samek, Sebastian Lapuschkin, Leander Weber
Overfitting is a well-known issue extending even to state-of-the-art (SOTA) Machine Learning (ML) models, resulting in reduced generalization, and a significant train-test performance gap. Mitigation measures include a combination of dropout, data augmentation, weight decay, and other regularization techniques. Among the various data augmentation strategies, occlusion is a prominent technique that typically focuses on randomly masking regions of the input during training. Most of the existing literature emphasizes randomness in selecting and modifying the input features instead of regions that strongly influence model decisions. We propose Relevance-driven Input Dropout (RelDrop), a novel data augmentation method which selectively occludes the most relevant regions of the input, nudging the model to use other important features in the prediction process, thus improving model generalization through informed regularization. We further conduct qualitative and quantitative analyses to study how Relevance-driven Input Dropout (RelDrop) affects model decision-making. Through a series of experiments on benchmark datasets, we demonstrate that our approach improves robustness towards occlusion, results in models utilizing more features within the region of interest, and boosts inference time generalization performance. Our code is available at https://github.com/Shreyas-Gururaj/LRP_Relevance_Dropout.
nan
Article 878
Title@2025-05-27 (2): Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges
Title: Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges | Benchmarking Spatiotemporal Reasoning in LLMs und Reasoning Models: Fähigkeiten und Herausforderungen | 确定LLM和理由模型的偏差理由基准:能力和挑战 2505.11618v2 |
Authors: Pengrui Quan, Brian Wang, Kang Yang, Liying Han, Mani Srivastava
Spatiotemporal reasoning plays a key role in Cyber-Physical Systems (CPS). Despite advances in Large Language Models (LLMs) and Large Reasoning Models (LRMs), their capacity to reason about complex spatiotemporal signals remains underexplored. This paper proposes a hierarchical SpatioTemporal reAsoning benchmaRK, STARK, to systematically evaluate LLMs across three levels of reasoning complexity: state estimation (e.g., predicting field variables, localizing and tracking events in space and time), spatiotemporal reasoning over states (e.g., inferring spatial-temporal relationships), and world-knowledge-aware reasoning that integrates contextual and domain knowledge (e.g., intent prediction, landmark-aware navigation). We curate 26 distinct spatiotemporal tasks with diverse sensor modalities, comprising 14,552 challenges where models answer directly or by Python Code Interpreter. Evaluating 3 LRMs and 8 LLMs, we find LLMs achieve limited success in tasks requiring geometric reasoning (e.g., multilateration or triangulation), particularly as complexity increases. Surprisingly, LRMs show robust performance across tasks with various levels of difficulty, often competing or surpassing traditional first-principle-based methods. Our results show that in reasoning tasks requiring world knowledge, the performance gap between LLMs and LRMs narrows, with some LLMs even surpassing LRMs. However, the LRM o3 model continues to achieve leading performance across all evaluated tasks, a result attributed primarily to the larger size of the reasoning models. STARK motivates future innovations in model architectures and reasoning paradigms for intelligent CPS by providing a structured framework to identify limitations in the spatiotemporal reasoning of LLMs and LRMs.
nan
Article 879
Title@2025-05-27 (2): Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization
Title: Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization | Widersprüchliche Biasen am Rande der Stabilität: Norm versus Schärfe Regularisierung | 稳定边缘的冲突两重冲突:规范与尖锐的规范化 2505.21423v1 |
Authors: Vit Fojtik, Maria Matveev, Hung-Hsu Chou, Gitta Kutyniok, Johannes Maly
A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this theoretically, recent works examine gradient descent and its variants in simplified training settings, often assuming vanishing learning rates. These studies reveal various forms of implicit regularization, such as $\ell_1$-norm minimizing parameters in regression and max-margin solutions in classification. Concurrently, empirical findings show that moderate to large learning rates exceeding standard stability thresholds lead to faster, albeit oscillatory, convergence in the so-called Edge-of-Stability regime, and induce an implicit bias towards minima of low sharpness (norm of training loss Hessian). In this work, we argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization. We empirically demonstrate that the learning rate balances between low parameter norm and low sharpness of the trained model. We furthermore prove for diagonal linear networks trained on a simple regression task that neither implicit bias alone minimizes the generalization error. These findings demonstrate that focusing on a single implicit bias is insufficient to explain good generalization, and they motivate a broader view of implicit regularization that captures the dynamic trade-off between norm and sharpness induced by non-negligible learning rates.
nan
Article 880
Title@2025-05-27 (2): When Shift Happens - Confounding Is to Blame
Title: When Shift Happens - Confounding Is to Blame | Wenn es zu einer Verschiebung kommt - Verwirren ist die Schuld | 发生变迁时 - 令人不安的是责怪 2505.21422v1 |
Authors: Abbavaram Gowtham Reddy, Celia Rubio-Madrigal, Rebekka Burkholz, Krikamol Muandet
Distribution shifts introduce uncertainty that undermines the robustness and generalization capabilities of machine learning models. While conventional wisdom suggests that learning causal-invariant representations enhances robustness to such shifts, recent empirical studies present a counterintuitive finding: (i) empirical risk minimization (ERM) can rival or even outperform state-of-the-art out-of-distribution (OOD) generalization methods, and (ii) its OOD generalization performance improves when all available covariates, not just causal ones, are utilized. Drawing on both empirical and theoretical evidence, we attribute this phenomenon to hidden confounding. Shifts in hidden confounding induce changes in data distributions that violate assumptions commonly made by existing OOD generalization approaches. Under such conditions, we prove that effective generalization requires learning environment-specific relationships, rather than relying solely on invariant ones. Furthermore, we show that models augmented with proxies for hidden confounders can mitigate the challenges posed by hidden confounding shifts. These findings offer new theoretical insights and practical guidance for designing robust OOD generalization algorithms and principled covariate selection strategies.
nan
Article 881
Title@2025-05-27 (2): A Physics-Augmented GraphGPS Framework for the Reconstruction of 3D Riemann Problems from Sparse Data
Title: A Physics-Augmented GraphGPS Framework for the Reconstruction of 3D Riemann Problems from Sparse Data | Ein physikgestütztes GraphGPS-Framework für den Wiederaufbau von 3D Riemann-Problemen aus Sparse-Daten | 物理辅助图形GPS框架,用于从简简数据中重建3D里伊曼问题 2505.21421v1 |
Authors: Rami Cassia, Rich Kerswell
In compressible fluid flow, reconstructing shocks, discontinuities, rarefactions, and their interactions from sparse measurements is an important inverse problem with practical applications. Moreover, physics-informed machine learning has recently become an increasingly popular approach for performing reconstructions tasks. In this work we explore a machine learning recipe, known as GraphGPS, for reconstructing canonical compressible flows known as 3D Riemann problems from sparse observations, in a physics-informed manner. The GraphGPS framework combines the benefits of positional encodings, local message-passing of graphs, and global contextual awareness, and we explore the latter two components through an ablation study. Furthermore, we modify the aggregation step of message-passing such that it is aware of shocks and discontinuities, resulting in sharper reconstructions of these features. Additionally, we modify message-passing such that information flows strictly from known nodes only, which results in computational savings, better training convergence, and no degradation of reconstruction accuracy. We also show that the GraphGPS framework outperforms numerous machine learning benchmarks.
nan
Article 882
Title@2025-05-27 (2): From Continual Learning to SGD and Back: Better Rates for Continual Linear Models
Title: From Continual Learning to SGD and Back: Better Rates for Continual Linear Models | Vom kontinuierlichen Lernen bis hin zu SGD und Back: Bessere Preise für kontinuierliche lineare Modelle | 从持续学习到SGD和后退:持续线性模型的更好比率 2504.04579v2 |
Authors: Itay Evron, Ran Levinstein, Matan Schliserman, Uri Sherman, Tomer Koren, Daniel Soudry, Nathan Srebro
We theoretically study the common continual learning setup where an overparameterized model is sequentially fitted to a set of jointly realizable tasks. We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations. For continual linear models, we prove that fitting a task is equivalent to a single stochastic gradient descent (SGD) step on a modified objective. We develop novel last-iterate SGD upper bounds in the realizable least squares setup, which we then leverage to derive new results for continual learning. Focusing on random orderings over $T$ tasks, we establish universal forgetting rates, whereas existing rates depend on the problem dimensionality or complexity. Specifically, in continual regression with replacement, we improve the best existing rate from $O((d-r)/k)$ to $O(\min(k^{-1/4}, \sqrt{d-r}/k, \sqrt{Tr}/k))$, where $d$ is the dimensionality and $r$ the average task rank. Furthermore, we establish the first rate for random task orderings without replacement. The obtained rate of $O(\min(T^{-1/4}, (d-r)/T))$ proves for the first time that randomization alone, with no task repetition, can prevent catastrophic forgetting in sufficiently long task sequences. Finally, we prove a matching $O(k^{-1/4})$ forgetting rate for continual linear classification on separable data. Our universal rates apply for broader projection methods, such as block Kaczmarz and POCS, illuminating their loss convergence under i.i.d. and one-pass orderings.
nan
Article 883
Title@2025-05-27 (2): Efficiently Scaling LLM Reasoning with Certaindex
Title: Efficiently Scaling LLM Reasoning with Certaindex | Effiziente Skalierung der LLM-Vernunft mit bestimmtem Dex | 高效扩增 LLM 使用 emitedex 说明 2412.20993v2 |
Authors: Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Yonghao Zhuang, Yian Ma, Aurick Qiao, Tajana Rosing, Ion Stoica, Hao Zhang
Test-time reasoning algorithms such as chain-of-thought, self-consistency, and MCTS enhance LLM problem-solving but can wastefully generate many tokens without improving accuracy. At the same time, we observe that these algorithms exhibit answer stabilization: their intermediate solutions often cease to change after a certain point, and further investment of compute does not change their final answer. To quantify this phenomenon, we introduce Certaindex, an algorithm-agnostic metric measuring this evolving stability, signaling when further computation is unlikely to alter the final result. Certaindex is lightweight, can accelerate reasoning program inference via early exit, and further enables dynamic token allocation, gang scheduling, and many opportunities when integrated with real-world LLM serving systems. To quantify real-world benefits, we built Certaindex as a scheduler into Dynasor, our reasoning-aware LLM serving system, and demonstrate up to 50% compute savings and 3.3x higher throughput in real workloads with no accuracy drop. Our code is available at https://github.com/hao-ai-lab/Dynasor.git
nan
Article 884
Title@2025-05-27 (2): A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment
Title: A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment | Ein Rahmen für die strittige Analyse von Entscheidungsunterstützungssystemen vor der Einführung | 在部署之前对决定支助系统进行反对分析的框架 2505.21414v1 |
Authors: Brett Bissey, Kyle Gatesman, Walker Dimon, Mohammad Alam, Luis Robaina, Joseph Weissman
This paper introduces a comprehensive framework designed to analyze and secure decision-support systems trained with Deep Reinforcement Learning (DRL), prior to deployment, by providing insights into learned behavior patterns and vulnerabilities discovered through simulation. The introduced framework aids in the development of precisely timed and targeted observation perturbations, enabling researchers to assess adversarial attack outcomes within a strategic decision-making context. We validate our framework, visualize agent behavior, and evaluate adversarial outcomes within the context of a custom-built strategic game, CyberStrike. Utilizing the proposed framework, we introduce a method for systematically discovering and ranking the impact of attacks on various observation indices and time-steps, and we conduct experiments to evaluate the transferability of adversarial attacks across agent architectures and DRL training algorithms. The findings underscore the critical need for robust adversarial defense mechanisms to protect decision-making policies in high-stakes environments.
nan
Article 885
Title@2025-05-27 (2): Comparison of the Cox proportional hazards model and Random Survival Forest algorithm for predicting patient-specific survival probabilities in clinical trial data
Title: Comparison of the Cox proportional hazards model and Random Survival Forest algorithm for predicting patient-specific survival probabilities in clinical trial data | Vergleich des Cox-Proportional-Hazards-Modells und des Random Survival Forest-Algorithmus zur Vorhersage patientenspezifischer Überlebenswahrscheinlichkeiten in klinischen Studiendaten | 比较Cox按比例比例危害模型和随机生存森林算法,以预测临床试验数据中特定患者生存概率 2502.03119v2 |
Authors: Ricarda Graf, Susan Todd, M. Fazil Baksh
The Cox proportional hazards model is often used to analyze data from Randomized Controlled Trials (RCT) with time-to-event outcomes. Random survival forest (RSF) is a machine-learning algorithm known for its high predictive performance. We conduct a comprehensive neutral comparison study to compare the performance of Cox regression and RSF in various simulation scenarios based on two reference datasets from RCTs. The motivation is to identify settings in which one method is preferable over the other when comparing different aspects of performance using measures according to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) recommendations. Our results show that conclusions solely based on the C index, a performance measure that has been predominantly used in previous studies comparing predictive accuracy of the Cox-PH and RSF model based on real-world observational time-to-event data and that has been criticized by methodologists, may not be generalizable to other aspects of predictive performance. We found that measures of overall performance may generally give more reasonable results, and that the standard log-rank splitting rule used for the RSF may be outperformed by alternative splitting rules, in particular in nonproportional hazards settings. In our simulations, performance of the RSF suffers less in data with treatment-covariate interactions compared to data where these are absent. Performance of the Cox-PH model is affected by the violation of the proportional hazards assumption.
nan
Article 886
Title@2025-05-27 (2): MRSD: Multi-Resolution Skill Discovery for HRL Agents
Title: MRSD: Multi-Resolution Skill Discovery for HRL Agents | MRSD: Multi-Resolution Skill Discovery für HRL-Agenten | MRSD: HRL代理机构多分辨率技能发现 2505.21410v1 |
Authors: Shashank Sharma, Janina Hoffmann, Vinay Namboodiri
Hierarchical reinforcement learning (HRL) relies on abstract skills to solve long-horizon tasks efficiently. While existing skill discovery methods learns these skills automatically, they are limited to a single skill per task. In contrast, humans learn and use both fine-grained and coarse motor skills simultaneously. Inspired by human motor control, we propose Multi-Resolution Skill Discovery (MRSD), an HRL framework that learns multiple skill encoders at different temporal resolutions in parallel. A high-level manager dynamically selects among these skills, enabling adaptive control strategies over time. We evaluate MRSD on tasks from the DeepMind Control Suite and show that it outperforms prior state-of-the-art skill discovery and HRL methods, achieving faster convergence and higher final performance. Our findings highlight the benefits of integrating multi-resolution skills in HRL, paving the way for more versatile and efficient agents.
nan
Article 887
Title@2025-05-27 (2): Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks
Title: Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks | Dual Natural Gradient Descent für skalierbare Ausbildung von physikinformierten Neuronalen Netzwerken | 物理内成形神经网络可缩放培训 2505.21404v1 |
Authors: Anas Jnini, Flavio Vella
Natural-gradient methods markedly accelerate the training of Physics-Informed Neural Networks (PINNs), yet their Gauss–Newton update must be solved in the parameter space, incurring a prohibitive $O(n^3)$ time complexity, where $n$ is the number of network trainable weights. We show that exactly the same step can instead be formulated in a generally smaller residual space of size $m = \sum_{\gamma} N_{\gamma} d_{\gamma}$, where each residual class $\gamma$ (e.g. PDE interior, boundary, initial data) contributes $N_{\gamma}$ collocation points of output dimension $d_{\gamma}$. Building on this insight, we introduce \textit{Dual Natural Gradient Descent} (D-NGD). D-NGD computes the Gauss–Newton step in residual space, augments it with a geodesic-acceleration correction at negligible extra cost, and provides both a dense direct solver for modest $m$ and a Nystrom-preconditioned conjugate-gradient solver for larger $m$. Experimentally, D-NGD scales second-order PINN optimization to networks with up to 12.8 million parameters, delivers one- to three-order-of-magnitude lower final error $L^2$ than first-order methods (Adam, SGD) and quasi-Newton methods, and – crucially – enables natural-gradient training of PINNs at this scale on a single GPU.
nan
Article 888
Title@2025-05-27 (2): A Convergence Theory for Diffusion Language Models: An Information-Theoretic Perspective
Title: A Convergence Theory for Diffusion Language Models: An Information-Theoretic Perspective | Eine Konvergenztheorie für Diffusions-Sprachmodelle: Eine informationstheoretische Perspektive | 传播语言模型集成理论:信息理论视角 2505.21400v1 |
Authors: Gen Li, Changxiao Cai
Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially, diffusion models enable parallel token sampling, leading to faster generation and eliminating left-to-right generation constraints. Despite their empirical success, the theoretical understanding of diffusion model approaches remains underdeveloped. In this work, we develop convergence guarantees for diffusion language models from an information-theoretic perspective. Our analysis demonstrates that the sampling error, measured by the Kullback-Leibler (KL) divergence, decays inversely with the number of iterations $T$ and scales linearly with the mutual information between tokens in the target text sequence. In particular, we establish matching upper and lower bounds, up to some constant factor, to demonstrate the tightness of our convergence analysis. These results offer novel theoretical insights into the practical effectiveness of diffusion language models.
nan
Article 889
Title@2025-05-27 (2): Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling
Title: Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling | Factual Self-Awareness in Sprachmodellen: Repräsentation, Robustheit und Skalierung | 语言模式中的事实自觉意识:代表性、强力和比例 2505.21399v1 |
Authors: Hovhannes Tamoyan, Subhabrata Dutta, Iryna Gurevych
Factual incorrectness in generated content is one of the primary concerns in ubiquitous deployment of large language models (LLMs). Prior findings suggest LLMs can (sometimes) detect factual incorrectness in their generated content (i.e., fact-checking post-generation). In this work, we provide evidence supporting the presence of LLMs’ internal compass that dictate the correctness of factual recall at the time of generation. We demonstrate that for a given subject entity and a relation, LLMs internally encode linear features in the Transformer’s residual stream that dictate whether it will be able to recall the correct attribute (that forms a valid entity-relation-attribute triplet). This self-awareness signal is robust to minor formatting variations. We investigate the effects of context perturbation via different example selection strategies. Scaling experiments across model sizes and training dynamics highlight that self-awareness emerges rapidly during training and peaks in intermediate layers. These findings uncover intrinsic self-monitoring capabilities within LLMs, contributing to their interpretability and reliability.
nan
Article 890
Title@2025-05-27 (2): Square$χ$PO: Differentially Private and Robust $χ^2$-Preference Optimization in Offline Direct Alignment
Title: Square$χ$PO: Differentially Private and Robust $χ^2$-Preference Optimization in Offline Direct Alignment | Square$x$PO: Differential privat und robust $x^2$-Preference Optimierung in Offline Direct Alignment | 平方美元=美元PO$:在离线直接调整中区别对待的私人和强势的美元=2美元-优惠优化 2505.21395v1 |
Authors: Xingyu Zhou, Yulian Wu, Wenqian Weng, Francesco Orabona
In this paper, we theoretically study the offline alignment of language models with human preference feedback, under both preference label corruption and privacy protections. To this end, we propose Square$\chi$PO, a simple one-line change to $\chi$PO where the standard log-loss is replaced by a new square loss over probability. Thanks to the inherent properties of this new loss, we have advanced the state-of-the-art of differentially private and robust offline direct alignment. Specifically, for the local model of label privacy, Square$\chi$PO is the first algorithm that attains an optimal rate based on single-policy concentrability even with general function approximations. It also gives the first result under the central model of privacy protection over both prompts (responses) and labels. On the robustness side against Huber label corruption, Square$\chi$PO is the first alignment method that has a meaningful theoretical guarantee under general function approximations. More importantly, Square$\chi$PO can address privacy protection and corruption simultaneously, where an interesting separation is observed, implying that the order of privacy and corruption matters. Furthermore, we show that Square$\chi$PO can also be easily extended to handle the scenario of the general preference model with state-of-the-art guarantees under corruption and privacy. Last but not least, all of our theoretical guarantees enjoy a unified analysis, building upon a new result on the generalization error bounds of least-square regression under corruption and privacy constraints, which we believe is of independent interest to the community.
nan
Article 891
Title@2025-05-27 (2): Foundation Models on a Budget: Approximating Blocks in Large Vision Models
Title: Foundation Models on a Budget: Approximating Blocks in Large Vision Models | Basismodelle auf einem Budget: Annähernde Blöcke in großen Visionsmodellen | 预算模式基础模式:大愿景模式中类似障碍 2410.04941v5 |
Authors: Irene Cannistraci, Simone Antonelli, Emanuele Palumbo, Thomas M. Sutter, Emanuele Rodolà, Bastian Rieck, Julia E. Vogt
Foundation Models have shown impressive performance in various tasks and domains, yet they require massive computational resources, raising concerns about accessibility and sustainability. Previous attempts to reduce foundation model size fall short of fully addressing the problem, as they end up increasing computational load through additional training steps. Recent works reveal that deep neural networks exhibit internal representation similarities. While inter-network similarities have enabled techniques such as model stitching and merging, intra-network similarities remain underexplored for improving efficiency. In this paper, we propose Transformer Blocks Approximation (TBA), a novel method that leverages intra-network similarities to identify and approximate transformer blocks in large vision models. TBA replaces these blocks using lightweight, closed-form transformations, without retraining or fine-tuning the rest of the model. The proposed method reduces the number of parameters while having minimal impact on the downstream task. We validate the effectiveness and generalizability of TBA through extensive experiments across multiple datasets (e.g., Imagenet-1k and CIFAR100) and state-of-the-art pretrained vision models (e.g, ViT, DiNO-v2, and DEiT).
nan
Article 892
Title@2025-05-27 (2): Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits
Title: Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits | Die Macht der Gespräche nutzen: Optimale Auswahl der Schlüsselbegriffe in konversatorischen Kontextbanditen | 利用对话的力量:在对话背景强盗中最佳关键条件选择 2505.21393v1 |
Authors: Maoli Liu, Zhuohua Li, Xiangxiang Dai, John C. S. Lui
Conversational recommender systems proactively query users with relevant “key terms” and leverage the feedback to elicit users’ preferences for personalized recommendations. Conversational contextual bandits, a prevalent approach in this domain, aim to optimize preference learning by balancing exploitation and exploration. However, several limitations hinder their effectiveness in real-world scenarios. First, existing algorithms employ key term selection strategies with insufficient exploration, often failing to thoroughly probe users’ preferences and resulting in suboptimal preference estimation. Second, current algorithms typically rely on deterministic rules to initiate conversations, causing unnecessary interactions when preferences are well-understood and missed opportunities when preferences are uncertain. To address these limitations, we propose three novel algorithms: CLiSK, CLiME, and CLiSK-ME. CLiSK introduces smoothed key term contexts to enhance exploration in preference learning, CLiME adaptively initiates conversations based on preference uncertainty, and CLiSK-ME integrates both techniques. We theoretically prove that all three algorithms achieve a tighter regret upper bound of $O(\sqrt{dT\log{T}})$ with respect to the time horizon $T$, improving upon existing methods. Additionally, we provide a matching lower bound $\Omega(\sqrt{dT})$ for conversational bandits, demonstrating that our algorithms are nearly minimax optimal. Extensive evaluations on both synthetic and real-world datasets show that our approaches achieve at least a 14.6% improvement in cumulative regret.
nan
Article 893
Title@2025-05-27 (2): Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Title: Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features | Finite-Probenanalyse von linearen zeitlichen Unterschieden Lernen mit willkürlichen Funktionen | 具有任意地貌特征的线性时间上差异学习的简单抽样分析 2505.21391v1 |
Authors: Zixuan Xie, Xinyu Liu, Rohan Chandra, Shangtong Zhang
Linear TD($\lambda$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold in many practical scenarios. This paper instead establishes the first $L^2$ convergence rates for linear TD($\lambda$) operating under arbitrary features, without making any algorithmic modification or additional assumptions. Our results apply to both the discounted and average-reward settings. To address the potential non-uniqueness of solutions resulting from arbitrary features, we develop a novel stochastic approximation result featuring convergence rates to the solution set instead of a single point.
nan
Article 894
Title@2025-05-27 (2): DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models
Title: DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models | DeCAF: Dezentrale Konsens-und-Factorisierung für Low-Rank-Anpassung von Stiftungsmodellen | DeCAF: 基金会模式的低成本改造的分散化共识和因素 2505.21382v1 |
Authors: Nastaran Saadati, Zhanhong Jiang, Joshua R. Waite, Shreyan Ganguly, Aditya Balu, Chinmay Hegde, Soumik Sarkar
Low-Rank Adaptation (LoRA) has emerged as one of the most effective, computationally tractable fine-tuning approaches for training Vision-Language Models (VLMs) and Large Language Models (LLMs). LoRA accomplishes this by freezing the pre-trained model weights and injecting trainable low-rank matrices, allowing for efficient learning of these foundation models even on edge devices. However, LoRA in decentralized settings still remains under explored, particularly for the theoretical underpinnings due to the lack of smoothness guarantee and model consensus interference (defined formally below). This work improves the convergence rate of decentralized LoRA (DLoRA) to match the rate of decentralized SGD by ensuring gradient smoothness. We also introduce DeCAF, a novel algorithm integrating DLoRA with truncated singular value decomposition (TSVD)-based matrix factorization to resolve consensus interference. Theoretical analysis shows TSVD’s approximation error is bounded and consensus differences between DLoRA and DeCAF vanish as rank increases, yielding DeCAF’s matching convergence rate. Extensive experiments across vision/language tasks demonstrate our algorithms outperform local training and rivals federated learning under both IID and non-IID data distributions.
nan
Article 895
Title@2025-05-27 (2): Securing Federated Learning against Backdoor Threats with Foundation Model Integration
Title: Securing Federated Learning against Backdoor Threats with Foundation Model Integration | Sichern von Federated Learning gegen Hintertürbedrohungen durch die Integration von Foundation-Modellen | 安全联邦学习应对后门威胁,采用基金会模式一体化模式 2410.17573v3 |
Authors: Xiaohuan Bi, Xi Li
Federated Learning (FL) enables decentralized model training while preserving privacy. Recently, the integration of Foundation Models (FMs) into FL has enhanced performance but introduced a novel backdoor attack mechanism. Attackers can exploit FM vulnerabilities to embed backdoors into synthetic data generated by FMs. During global model fusion, these backdoors are transferred to the global model through compromised synthetic data, subsequently infecting all client models. Existing FL backdoor defenses are ineffective against this novel attack due to its fundamentally different mechanism compared to classic ones. In this work, we propose a novel data-free defense strategy that addresses both classic and novel backdoor attacks in FL. The shared attack pattern lies in the abnormal activations within the hidden feature space during model aggregation. Hence, we propose to constrain internal activations to remain within reasonable ranges, effectively mitigating attacks while preserving model functionality. The activation constraints are optimized using synthetic data alongside FL training. Extensive experiments demonstrate its effectiveness against both novel and classic backdoor attacks, outperforming existing defenses.
nan
Article 896
Title@2025-05-27 (2): Linear $Q$-Learning Does Not Diverge in $L^2$: Convergence Rates to a Bounded Set
Title: Linear $Q$-Learning Does Not Diverge in $L^2$: Convergence Rates to a Bounded Set | Lineares $Q$-Lernen unterscheidet sich nicht in $L^2$: Konvergenzraten zu einem begrenzten Satz | 线性 $Q $ 美元 学习 的 学习 不 以 $L $2 美元 进行 : 汇合率 与 环形 集 的 汇合率 2501.19254v4 |
Authors: Xinyu Liu, Zixuan Xie, Shangtong Zhang
$Q$-learning is one of the most fundamental reinforcement learning algorithms. It is widely believed that $Q$-learning with linear function approximation (i.e., linear $Q$-learning) suffers from possible divergence until the recent work Meyn (2024) which establishes the ultimate almost sure boundedness of the iterates of linear $Q$-learning. Building on this success, this paper further establishes the first $L^2$ convergence rate of linear $Q$-learning iterates (to a bounded set). Similar to Meyn (2024), we do not make any modification to the original linear $Q$-learning algorithm, do not make any Bellman completeness assumption, and do not make any near-optimality assumption on the behavior policy. All we need is an $\epsilon$-softmax behavior policy with an adaptive temperature. The key to our analysis is the general result of stochastic approximations under Markovian noise with fast-changing transition functions. As a side product, we also use this general result to establish the $L^2$ convergence rate of tabular $Q$-learning with an $\epsilon$-softmax behavior policy, for which we rely on a novel pseudo-contraction property of the weighted Bellman optimality operator.
nan
Article 897
Title@2025-05-27 (2): Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
Title: Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment | Chain-of-Zoom: Extreme Super-Resolution über Scale Autoregression und Preference Alignment | 缩放链 缩放链 : 通过 缩放自动递减和偏好对齐, 极超分辨率 2505.18600v2 |
Authors: Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye
Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments show that a standard 4x diffusion SR model wrapped in CoZ attains beyond 256x enlargement with high perceptual quality and fidelity. Project Page: https://bryanswkim.github.io/chain-of-zoom/ .
nan
Article 898
Title@2025-05-27 (2): Improving LLM-based Global Optimization with Search Space Partitioning
Title: Improving LLM-based Global Optimization with Search Space Partitioning | Verbesserung der globalen Optimierung auf LLM-Basis mit Search Space Partitioning | 改进以LLM为基础的全球最佳利用搜索空间分割法 2505.21372v1 |
Authors: Andrej Schwanke, Lyubomir Ivanov, David Salinas, Fabio Ferreira, Aaron Klein, Frank Hutter, Arber Zela
Large Language Models (LLMs) have recently emerged as effective surrogate models and candidate generators within global optimization frameworks for expensive blackbox functions. Despite promising results, LLM-based methods often struggle in high-dimensional search spaces or when lacking domain-specific priors, leading to sparse or uninformative suggestions. To overcome these limitations, we propose HOLLM, a novel global optimization algorithm that enhances LLM-driven sampling by partitioning the search space into promising subregions. Each subregion acts as a ``meta-arm’’ selected via a bandit-inspired scoring mechanism that effectively balances exploration and exploitation. Within each selected subregion, an LLM then proposes high-quality candidate points, without any explicit domain knowledge. Empirical evaluation on standard optimization benchmarks shows that HOLLM consistently matches or surpasses leading Bayesian optimization and trust-region methods, while substantially outperforming global LLM-based sampling strategies.
nan
Article 899
Title@2025-05-27 (2): PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment
Title: PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment | PLANETALIGN: Eine umfassende Python-Bibliothek für die Ausrichtung von Benchmarking-Netzwerken | PlanETALIGN: 用于基准确定网络协调的综合性俾顿图书馆 2505.21366v1 |
Authors: Qi Yu, Zhichen Zeng, Yuchen Yan, Zhining Liu, Baoyu Jing, Ruizhong Qiu, Ariful Azad, Hanghang Tong
Network alignment (NA) aims to identify node correspondence across different networks and serves as a critical cornerstone behind various downstream multi-network learning tasks. Despite growing research in NA, there lacks a comprehensive library that facilitates the systematic development and benchmarking of NA methods. In this work, we introduce PLANETALIGN, a comprehensive Python library for network alignment that features a rich collection of built-in datasets, methods, and evaluation pipelines with easy-to-use APIs. Specifically, PLANETALIGN integrates 18 datasets and 14 NA methods with extensible APIs for easy use and development of NA methods. Our standardized evaluation pipeline encompasses a wide range of metrics, enabling a systematic assessment of the effectiveness, scalability, and robustness of NA methods. Through extensive comparative studies, we reveal practical insights into the strengths and limitations of existing NA methods. We hope that PLANETALIGN can foster a deeper understanding of the NA problem and facilitate the development and benchmarking of more effective, scalable, and robust methods in the future. The source code of PLANETALIGN is available at https://github.com/yq-leo/PlanetAlign.
nan
Article 900
Title@2025-05-27 (2): Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
Title: Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders | Auf dem Weg zur Verdolmetschbarkeit ohne Opfer: treue Dense-Layer-Zersetzung mit Mischung aus Decodern | 实现无牺牲的解释性:忠实的高密度层分解与代谢物混合 2505.21364v1 |
Authors: James Oldfield, Shawn Im, Yixuan Li, Mihalis A. Nicolaou, Ioannis Patras, Grigorios G Chrysos
Multilayer perceptrons (MLPs) are an integral part of large language models, yet their dense representations render them difficult to understand, edit, and steer. Recent methods learn interpretable approximations via neuron-level sparsity, yet fail to faithfully reconstruct the original mapping–significantly increasing model’s next-token cross-entropy loss. In this paper, we advocate for moving to layer-level sparsity to overcome the accuracy trade-off in sparse layer approximation. Under this paradigm, we introduce Mixture of Decoders (MxDs). MxDs generalize MLPs and Gated Linear Units, expanding pre-trained dense layers into tens of thousands of specialized sublayers. Through a flexible form of tensor factorization, each sparsely activating MxD sublayer implements a linear transformation with full-rank weights–preserving the original decoders’ expressive capacity even under heavy sparsity. Experimentally, we show that MxDs significantly outperform state-of-the-art methods (e.g., Transcoders) on the sparsity-accuracy frontier in language models with up to 3B parameters. Further evaluations on sparse probing and feature steering demonstrate that MxDs learn similarly specialized features of natural language–opening up a promising new avenue for designing interpretable yet faithful decompositions. Our code is included at: https://github.com/james-oldfield/MxD/.
nan
Article 901
Title@2025-05-27 (2): CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models
Title: CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models | CRISP-NAM: Konkurrenzfähige Risiken interpretierbare Überlebensvorhersage mit neuralen Additivenmodellen | CRIISP-NAM: 与神经添加模型相竞争的风险解释性生存预测 2505.21360v1 |
Authors: Dhanesh Ramachandram
Competing risks are crucial considerations in survival modelling, particularly in healthcare domains where patients may experience multiple distinct event types. We propose CRISP-NAM (Competing Risks Interpretable Survival Prediction with Neural Additive Models), an interpretable neural additive model for competing risks survival analysis which extends the neural additive architecture to model cause-specific hazards while preserving feature-level interpretability. Each feature contributes independently to risk estimation through dedicated neural networks, allowing for visualization of complex non-linear relationships between covariates and each competing risk. We demonstrate competitive performance on multiple datasets compared to existing approaches.
nan
Article 902
Title@2025-05-27 (2): Learning with Selectively Labeled Data from Multiple Decision-makers
Title: Learning with Selectively Labeled Data from Multiple Decision-makers | Lernen mit selektiv beschrifteten Daten von mehreren Entscheidungsträgern | 学习来自多个决策者的选择性标签数据 2306.07566v4 |
Authors: Jian Chen, Zhehao Li, Xiaojie Mao
We study the problem of classification with selectively labeled data, whose distribution may differ from the full population due to historical decision-making. We exploit the fact that in many applications historical decisions were made by multiple decision-makers, each with different decision rules. We analyze this setup under a principled instrumental variable (IV) framework and rigorously study the identification of classification risk. We establish conditions for the exact identification of classification risk and derive tight partial identification bounds when exact identification fails. We further propose a unified cost-sensitive learning (UCL) approach to learn classifiers robust to selection bias in both identification settings. Finally, we theoretically and numerically validate the efficacy of our proposed method.
nan
Article 903
Title@2025-05-27 (2): Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning
Title: Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning | Nutzung von großen Sprachmodellen für Bengalische Mathematik-Wort-Probleme bei der Lösung der Kette der Gedankenveranlagung | 利用大语言模型解决孟加拉语数学字词与思维链理性的解决问题 2505.21354v1 |
Authors: Bidyarthi Paul, Jalisha Jashim Era, Mirazur Rahman Zim, Tahmid Sattar Aothoi, Faisal Muhammad Shah
Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP) due to the language’s low-resource status and the multi-step reasoning required. Existing models struggle with complex Bengali MWPs, largely because no human-annotated Bengali dataset has previously addressed this task. This gap has limited progress in Bengali mathematical reasoning. To address this, we created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions. We designed this dataset to support reasoning-focused evaluation and model development in a linguistically underrepresented context. Using SOMADHAN, we evaluated a range of large language models (LLMs) - including GPT-4o, GPT-3.5 Turbo, LLaMA series models, Deepseek, and Qwen - through both zero-shot and few-shot prompting with and without Chain of Thought (CoT) reasoning. CoT prompting consistently improved performance over standard prompting, especially in tasks requiring multi-step logic. LLaMA-3.3 70B achieved the highest accuracy of 88% with few-shot CoT prompting. We also applied Low-Rank Adaptation (LoRA) to fine-tune models efficiently, enabling them to adapt to Bengali MWPs with minimal computational cost. Our work fills a critical gap in Bengali NLP by providing a high-quality reasoning dataset and a scalable framework for solving complex MWPs. We aim to advance equitable research in low-resource languages and enhance reasoning capabilities in educational and language technologies.
nan
Article 904
Title@2025-05-27 (2): Diffusion Predictive Control with Constraints
Title: Diffusion Predictive Control with Constraints | Diffusion Predictive Control mit Einschränkungen | 受限制的预测控制 2412.09342v2 |
Authors: Ralf Römer, Alexander von Rohr, Angela P. Schoellig
Diffusion models have become popular for policy learning in robotics due to their ability to capture high-dimensional and multimodal distributions. However, diffusion policies are stochastic and typically trained offline, limiting their ability to handle unseen and dynamic conditions where novel constraints not represented in the training data must be satisfied. To overcome this limitation, we propose diffusion predictive control with constraints (DPCC), an algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data. DPCC incorporates model-based projections into the denoising process of a trained trajectory diffusion model and uses constraint tightening to account for model mismatch. This allows us to generate constraint-satisfying, dynamically feasible, and goal-reaching trajectories for predictive control. We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
nan
Article 905
Title@2025-05-27 (2): An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction
Title: An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction | Eine unsichere ED-LSTM für probabilistische Suffix-Vorhersage | 用于概率后置物后置物预测的不确定性( ED-LSTM) 的不确定性警告 ED-LSTM 2505.21339v1 |
Authors: Henryk Mustroph, Michel Kunkler, Stefanie Rinderle-Ma
Suffix prediction of business processes forecasts the remaining sequence of events until process completion. Current approaches focus on predicting a single, most likely suffix. However, if the future course of a process is exposed to uncertainty or has high variability, the expressiveness of a single suffix prediction can be limited. To address this limitation, we propose probabilistic suffix prediction, a novel approach that approximates a probability distribution of suffixes. The proposed approach is based on an Uncertainty-Aware Encoder-Decoder LSTM (U-ED-LSTM) and a Monte Carlo (MC) suffix sampling algorithm. We capture epistemic uncertainties via MC dropout and aleatoric uncertainties as learned loss attenuation. This technical report provides a detailed evaluation of the U-ED-LSTM’s predictive performance and assesses its calibration on four real-life event logs with three different hyperparameter settings. The results show that i) the U-ED-LSTM has reasonable predictive performance across various datasets, ii) aggregating probabilistic suffix predictions into mean values can outperform most likely predictions, particularly for rare prefixes or longer suffixes, and iii) the approach effectively captures uncertainties present in event logs.
nan
Article 906
Title@2025-05-27 (2): Controlling Participation in Federated Learning with Feedback
Title: Controlling Participation in Federated Learning with Feedback | Mit Feedback die Teilnahme am Föderierten Lernen kontrollieren | 控制参加有反馈的联邦学习 2411.19242v2 |
Authors: Michael Cummins, Guner Dilsad Er, Michael Muehlebach
We address the problem of client participation in federated learning, where traditional methods typically rely on a random selection of a small subset of clients for each training round. In contrast, we propose FedBack, a deterministic approach that leverages control-theoretic principles to manage client participation in ADMM-based federated learning. FedBack models client participation as a discrete-time dynamical system and employs an integral feedback controller to adjust each client’s participation rate individually, based on the client’s optimization dynamics. We provide global convergence guarantees for our approach by building on the recent federated learning research. Numerical experiments on federated image classification demonstrate that FedBack achieves up to 50\% improvement in communication and computational efficiency over algorithms that rely on a random selection of clients.
nan
Article 907
Title@2025-05-27 (2): PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Title: PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning | PeerGuard: Verteidigen von Multi-Agenten-Systemen gegen Hintertürangriffe durch gegenseitige Vernunft | 同伴保护:捍卫多机构系统,防止通过相互理由进行后门攻击 2505.11642v2 |
Authors: Falong Fan, Xi Li
Multi-agent systems leverage advanced AI models as autonomous agents that interact, cooperate, or compete to complete complex tasks across applications such as robotics and traffic management. Despite their growing importance, safety in multi-agent systems remains largely underexplored, with most research focusing on single AI models rather than interacting agents. This work investigates backdoor vulnerabilities in multi-agent systems and proposes a defense mechanism based on agent interactions. By leveraging reasoning abilities, each agent evaluates responses from others to detect illogical reasoning processes, which indicate poisoned agents. Experiments on LLM-based multi-agent systems, including ChatGPT series and Llama 3, demonstrate the effectiveness of the proposed method, achieving high accuracy in identifying poisoned agents while minimizing false positives on clean agents. We believe this work provides insights into multi-agent system safety and contributes to the development of robust, trustworthy AI interactions.
nan
Article 908
Title@2025-05-27 (2): Adaptive Sample Sharing for Multi Agent Linear Bandits
Title: Adaptive Sample Sharing for Multi Agent Linear Bandits | Adaptive Probenfreigabe für Multi Agent Linear Bandits | 多剂线性强盗的适应性样本共享 2309.08710v3 |
Authors: Hamza Cherkaoui, Merwan Barlier, Igor Colin
The multi-agent linear bandit setting is a well-known setting for which designing efficient collaboration between agents remains challenging. This paper studies the impact of data sharing among agents on regret minimization. Unlike most existing approaches, our contribution does not rely on any assumptions on the bandit parameters structure. Our main result formalizes the trade-off between the bias and uncertainty of the bandit parameter estimation for efficient collaboration. This result is the cornerstone of the Bandit Adaptive Sample Sharing (BASS) algorithm, whose efficiency over the current state-of-the-art is validated through both theoretical analysis and empirical evaluations on both synthetic and real-world datasets. Furthermore, we demonstrate that, when agents’ parameters display a cluster structure, our algorithm accurately recovers them.
nan
Article 909
Title@2025-05-27 (2): Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness
Title: Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness | Sign-Operator für den Umgang mit schwerfälligen Geräuschen in Nicht-Konvex-Optimierung: Hohe Wahrscheinlichkeitsgrenzen unter $(L_0, L_1)$-Smoothness | 在非Convex优化情况下处理重故障噪音的签名操作员: 高概率弹道低于$(L_0, L_1), 低于$(L_1) 2502.07923v2 |
Authors: Nikita Kornilov, Philip Zmushko, Andrei Semenov, Mark Ikonnikov, Alexander Gasnikov, Alexander Beznosikov
In recent years, non-convex optimization problems are more often described by generalized $(L_0, L_1)$-smoothness assumption rather than standard one. Meanwhile, severely corrupted data used in these problems has increased the demand for methods capable of handling heavy-tailed noises, i.e., noises with bounded $\kappa$-th moment. Motivated by these real-world trends and challenges, we explore sign-based methods in this setup and demonstrate their effectiveness in comparison with other popular solutions like clipping or normalization. In theory, we prove the first-known high probability convergence bounds under $(L_0, L_1)$-smoothness and heavy-tailed noises with mild parameter dependencies. In the case of standard smoothness, these bounds are novel for sign-based methods as well. In particular, SignSGD with batching achieves sample complexity $\tilde{O}\left(\left(\frac{\Delta L_0d}{\varepsilon^2} + \frac{\Delta L_1d^\frac{3}{2}}{\varepsilon}\right)\left[1 + \left(\frac{\sigma}{\varepsilon}\right)^\frac{\kappa}{\kappa-1}\right]\right), \kappa \in (1,2]$. Under the assumption of symmetric noises, SignSGD with Majority Voting can robustly work on the whole range of $\kappa \in (0,2]$ with complexity $\tilde{O}\left(\left(\frac{\Delta L_0d}{\varepsilon^2} + \frac{\Delta L_1d^\frac{3}{2}}{\varepsilon}\right)\left[\frac{1}{\kappa^2} + \frac{\sigma^2}{\varepsilon^2}\right]\right)$. We also obtain results for parameter-agnostic setups, Polyak-Lojasiewicz functions and momentum-based methods (in expectation). Our theoretical findings are supported by the superior performance of sign-based methods in training Large Language Models compared to clipping and normalization.
nan
Article 910
Title@2025-05-27 (2): Joint Learning in the Gaussian Single Index Model
Title: Joint Learning in the Gaussian Single Index Model | Gemeinsames Lernen im Gaussischen Einzelindexmodell | Gaussian单一指数模式联合学习 2505.21336v1 |
Authors: Loucas Pillaud-Vivien, Adrien Schertzer
We consider the problem of jointly learning a one-dimensional projection and a univariate function in high-dimensional Gaussian models. Specifically, we study predictors of the form $f(x)=\varphi^\star(\langle w^\star, x \rangle)$, where both the direction $w^\star \in \mathcal{S}_{d-1}$, the sphere of $\mathbb{R}^d$, and the function $\varphi^\star: \mathbb{R} \to \mathbb{R}$ are learned from Gaussian data. This setting captures a fundamental non-convex problem at the intersection of representation learning and nonlinear regression. We analyze the gradient flow dynamics of a natural alternating scheme and prove convergence, with a rate controlled by the information exponent reflecting the \textit{Gaussian regularity} of the function $\varphi^\star$. Strikingly, our analysis shows that convergence still occurs even when the initial direction is negatively correlated with the target. On the practical side, we demonstrate that such joint learning can be effectively implemented using a Reproducing Kernel Hilbert Space (RKHS) adapted to the structure of the problem, enabling efficient and flexible estimation of the univariate function. Our results offer both theoretical insight and practical methodology for learning low-dimensional structure in high-dimensional settings.
nan
Article 911
Title@2025-05-27 (2): DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
Title: DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents | DHP: Diskrete Hierarchische Planung für Hierarchische Verstärkungs-Learning Agents | DHP: 等级加强学习代理的分级分级规划 2502.01956v2 |
Authors: Shashank Sharma, Janina Hoffmann, Vinay Namboodiri
Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goals into sequences of simpler subtasks, using a novel advantage estimation strategy that inherently rewards shorter plans and generalizes beyond training depths. In addition, to address the data efficiency challenge, we introduce an exploration strategy that generates targeted training examples for the planning modules without needing expert data. Experiments in 25-room navigation environments demonstrate $100\%$ success rate (vs $82\%$ baseline) and $73$-step average episode length (vs $158$-step baseline). The method also generalizes to momentum-based control tasks and requires only $\log N$ steps for replanning. Theoretical analysis and ablations validate our design choices.
nan
Article 912
Title@2025-05-27 (2): Structure from Collision
Title: Structure from Collision | Struktur aus Kollision | 来自碰撞的结构 2505.21335v1 |
Authors: Takuhiro Kaneko
Recent advancements in neural 3D representations, such as neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS), have enabled the accurate estimation of 3D structures from multiview images. However, this capability is limited to estimating the visible external structure, and identifying the invisible internal structure hidden behind the surface is difficult. To overcome this limitation, we address a new task called Structure from Collision (SfC), which aims to estimate the structure (including the invisible internal structure) of an object from appearance changes during collision. To solve this problem, we propose a novel model called SfC-NeRF that optimizes the invisible internal structure of an object through a video sequence under physical, appearance (i.e., visible external structure)-preserving, and keyframe constraints. In particular, to avoid falling into undesirable local optima owing to its ill-posed nature, we propose volume annealing; that is, searching for global optima by repeatedly reducing and expanding the volume. Extensive experiments on 115 objects involving diverse structures (i.e., various cavity shapes, locations, and sizes) and material properties revealed the properties of SfC and demonstrated the effectiveness of the proposed SfC-NeRF.
nan
Article 913
Title@2025-05-27 (2): Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach
Title: Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach | Robustheit und Genauigkeit in der Mischung von Experten optimieren: Ein Dual-Model-Ansatz | 优化专家混合中的力量和准确性:双模式办法 2502.06832v3 |
Authors: Xu Zhang, Kaidi Xu, Ziqing Hu, Ren Wang
Mixture of Experts (MoE) have shown remarkable success in leveraging specialized expert networks for complex machine learning tasks. However, their susceptibility to adversarial attacks presents a critical challenge for deployment in robust applications. This paper addresses the critical question of how to incorporate robustness into MoEs while maintaining high natural accuracy. We begin by analyzing the vulnerability of MoE components, finding that expert networks are notably more susceptible to adversarial attacks than the router. Based on this insight, we propose a targeted robust training technique that integrates a novel loss function to enhance the adversarial robustness of MoE, requiring only the robustification of one additional expert without compromising training or inference efficiency. Building on this, we introduce a dual-model strategy that linearly combines a standard MoE model with our robustified MoE model using a smoothing parameter. This approach allows for flexible control over the robustness-accuracy trade-off. We further provide theoretical foundations by deriving certified robustness bounds for both the single MoE and the dual-model. To push the boundaries of robustness and accuracy, we propose a novel joint training strategy JTDMoE for the dual-model. This joint training enhances both robustness and accuracy beyond what is achievable with separate models. Experimental results on CIFAR-10 and TinyImageNet datasets using ResNet18 and Vision Transformer (ViT) architectures demonstrate the effectiveness of our proposed methods. The code is publicly available at https://github.com/TIML-Group/Robust-MoE-Dual-Model.
nan
Article 914
Title@2025-05-27 (2): Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices
Title: Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices | Eingewickelt Gaussian auf der Mannigfaltigkeit der Symmetrischen Positiven Definiten Matrizen | 以正负负负负下方矩阵的方块包装高森 2502.01512v3 |
Authors: Thibault de Surrel, Fabien Lotte, Sylvain Chevallier, Florian Yger
Circular and non-flat data distributions are prevalent across diverse domains of data science, yet their specific geometric structures often remain underutilized in machine learning frameworks. A principled approach to accounting for the underlying geometry of such data is pivotal, particularly when extending statistical models, like the pervasive Gaussian distribution. In this work, we tackle those issue by focusing on the manifold of symmetric positive definite (SPD) matrices, a key focus in information geometry. We introduce a non-isotropic wrapped Gaussian by leveraging the exponential map, we derive theoretical properties of this distribution and propose a maximum likelihood framework for parameter estimation. Furthermore, we reinterpret established classifiers on SPD through a probabilistic lens and introduce new classifiers based on the wrapped Gaussian model. Experiments on synthetic and real-world datasets demonstrate the robustness and flexibility of this geometry-aware distribution, underscoring its potential to advance manifold-based data analysis. This work lays the groundwork for extending classical machine learning and statistical methods to more complex and structured data.
nan
Article 915
Title@2025-05-27 (2): Scheduling with Uncertain Holding Costs and its Application to Content Moderation
Title: Scheduling with Uncertain Holding Costs and its Application to Content Moderation | Planung mit unsicheren Holdingkosten und deren Anwendung auf Content Moderation | 与不确定的控股成本及其对内容调节应用的时间安排 2505.21331v1 |
Authors: Caner Gocmen, Thodoris Lykouris, Deeksha Sinha, Wentao Weng
In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown. Motivated by such uncertain holding costs, we consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs. We demonstrate that in the presence of such uncertain holding costs, the two canonical algorithmic principles, instantaneous-cost ($c\mu$-rule) and expected-remaining-cost ($c\mu/\theta$-rule), are suboptimal. By viewing each job as a Markovian ski-rental problem, we develop a new index-based algorithm, Opportunity-adjusted Remaining Cost (OaRC), that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves. We show that the regret of OaRC scales as $\tilde{O}(L^{1.5}\sqrt{N})$, where $L$ is the maximum length of a job’s holding cost trajectory and $N$ is the system size. This regret bound shows that OaRC achieves asymptotic optimality when the system size $N$ scales to infinity. Moreover, its regret is independent of the state-space size, which is a desirable property when job states contain contextual information. We corroborate our results with an extensive simulation study based on two holding cost patterns (online ads and user-generated content) that arise in content moderation for social media platforms. Our simulations based on synthetic and real datasets demonstrate that OaRC consistently outperforms existing practice, which is based on the two canonical algorithmic principles.
nan
Article 916
Title@2025-05-27 (2): UGCE: User-Guided Incremental Counterfactual Exploration
Title: UGCE: User-Guided Incremental Counterfactual Exploration | UGCE: User-Guided Incremental Counterfactual Exploration | UGCE: 用户指导的递增反事实探索 2505.21330v1 |
Authors: Christos Fragkathoulas, Evaggelia Pitoura
Counterfactual explanations (CFEs) are a popular approach for interpreting machine learning predictions by identifying minimal feature changes that alter model outputs. However, in real-world settings, users often refine feasibility constraints over time, requiring counterfactual generation to adapt dynamically. Existing methods fail to support such iterative updates, instead recomputing explanations from scratch with each change, an inefficient and rigid approach. We propose User-Guided Incremental Counterfactual Exploration (UGCE), a genetic algorithm-based framework that incrementally updates counterfactuals in response to evolving user constraints. Experimental results across five benchmark datasets demonstrate that UGCE significantly improves computational efficiency while maintaining high-quality solutions compared to a static, non-incremental approach. Our evaluation further shows that UGCE supports stable performance under varying constraint sequences, benefits from an efficient warm-start strategy, and reveals how different constraint types may affect search behavior.
nan
Article 917
Title@2025-05-27 (2): Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization
Title: Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization | Bencher: Einfaches und reproduzierbares Benchmarking für Black-Box-Optimierung | 座谈人: 简化和可复制的黑箱优化基准 2505.21321v1 |
Authors: Leonard Papenmeier, Luigi Nardi
We present Bencher, a modular benchmarking framework for black-box optimization that fundamentally decouples benchmark execution from optimization logic. Unlike prior suites that focus on combining many benchmarks in a single project, Bencher introduces a clean abstraction boundary: each benchmark is isolated in its own virtual Python environment and accessed via a unified, version-agnostic remote procedure call (RPC) interface. This design eliminates dependency conflicts and simplifies the integration of diverse, real-world benchmarks, which often have complex and conflicting software requirements. Bencher can be deployed locally or remotely via Docker or on high-performance computing (HPC) clusters via Singularity, providing a containerized, reproducible runtime for any benchmark. Its lightweight client requires minimal setup and supports drop-in evaluation of 80 benchmarks across continuous, categorical, and binary domains.
nan
Article 918
Title@2025-05-27 (2): A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features
Title: A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features | Ein Cross Modal Knowledge Destillation & Data Augmentation Rezept zur Verbesserung von Transkriptionsdarstellungen durch morphologische Merkmale | 一种交叉模式知识蒸馏和数据增强休息室,以通过生理特征改进转基因医学的表现形式 2505.21317v1 |
Authors: Ihab Bendidi, Yassir El Mesbahi, Alisandra K. Denton, Karush Suri, Kian Kenyon-Dean, Auguste Genovesio, Emmanuel Noutahi
Understanding cellular responses to stimuli is crucial for biological discovery and drug development. Transcriptomics provides interpretable, gene-level insights, while microscopy imaging offers rich predictive features but is harder to interpret. Weakly paired datasets, where samples share biological states, enable multimodal learning but are scarce, limiting their utility for training and multimodal inference. We propose a framework to enhance transcriptomics by distilling knowledge from microscopy images. Using weakly paired data, our method aligns and binds modalities, enriching gene expression representations with morphological information. To address data scarcity, we introduce (1) Semi-Clipped, an adaptation of CLIP for cross-modal distillation using pretrained foundation models, achieving state-of-the-art results, and (2) PEA (Perturbation Embedding Augmentation), a novel augmentation technique that enhances transcriptomics data while preserving inherent biological information. These strategies improve the predictive power and retain the interpretability of transcriptomics, enabling rich unimodal representations for complex biological tasks.
nan
Article 919
Title@2025-05-27 (2): It’s complicated. The relationship of algorithmic fairness and non-discrimination regulations for high-risk systems in the EU AI Act
Title: It’s complicated. The relationship of algorithmic fairness and non-discrimination regulations for high-risk systems in the EU AI Act | Es ist kompliziert. Das Verhältnis algorithmischer Fairness- und Nichtdiskriminierungsvorschriften für Hochrisikosysteme im EU-AI-Gesetz | 这很复杂,在欧盟的AI法案中, 高风险系统的算法公正和不歧视规定之间的关系。 2501.12962v3 |
Authors: Kristof Meding
What constitutes a fair decision? This question is not only difficult for humans but becomes more challenging when Artificial Intelligence (AI) models are used. In light of discriminatory algorithmic behaviors, the EU has recently passed the AI Act, which mandates specific rules for high-risk systems, incorporating both traditional legal non-discrimination regulations and machine learning based algorithmic fairness concepts. This paper aims to bridge these two different concepts in the AI Act through: First, a necessary high-level introduction of both concepts targeting legal and computer science-oriented scholars, and second, an in-depth analysis of the AI Act’s relationship between legal non-discrimination regulations and algorithmic fairness. Our analysis reveals three key findings: (1.) Most non-discrimination regulations target only high-risk AI systems. (2.) The regulation of high-risk systems encompasses both data input requirements and output monitoring, though these regulations are partly inconsistent and raise questions of computational feasibility. (3.) Finally, we consider the possible (future) interaction of classical EU non-discrimination law and the AI Act regulations. We recommend developing more specific auditing and testing methodologies for AI systems. This paper aims to serve as a foundation for future interdisciplinary collaboration between legal scholars and computer science-oriented machine learning researchers studying discrimination in AI systems.
nan
Article 920
Title@2025-05-27 (2): Item Cluster-aware Prompt Learning for Session-based Recommendation
Title: Item Cluster-aware Prompt Learning for Session-based Recommendation | Artikel Cluster-aware Prompt Learning für sitzungsbasierte Empfehlung | 项目 集群意识快速学习促进基于会议的建议 2410.04756v2 |
Authors: Wooseong Yang, Chen Wang, Zihe Song, Weizhi Zhang, Philip S. Yu
Session-based recommendation (SBR) aims to capture dynamic user preferences by analyzing item sequences within individual sessions. However, most existing approaches focus mainly on intra-session item relationships, neglecting the connections between items across different sessions (inter-session relationships), which limits their ability to fully capture complex item interactions. While some methods incorporate inter-session information, they often suffer from high computational costs, leading to longer training times and reduced efficiency. To address these challenges, we propose the CLIP-SBR (Cluster-aware Item Prompt learning for Session-Based Recommendation) framework. CLIP-SBR is composed of two modules: 1) an item relationship mining module that builds a global graph to effectively model both intra- and inter-session relationships, and 2) an item cluster-aware prompt learning module that uses soft prompts to integrate these relationships into SBR models efficiently. We evaluate CLIP-SBR across eight SBR models and three benchmark datasets, consistently demonstrating improved recommendation performance and establishing CLIP-SBR as a robust solution for session-based recommendation tasks.
nan
Article 921
Title@2025-05-27 (2): Overcoming Spurious Solutions in Semi-Dual Neural Optimal Transport: A Smoothing Approach for Learning the Optimal Transport Plan
Title: Overcoming Spurious Solutions in Semi-Dual Neural Optimal Transport: A Smoothing Approach for Learning the Optimal Transport Plan | Überwinden von sauberen Lösungen im halbdualen Neural Optimalen Verkehr: Ein glättender Ansatz für das Lernen des optimalen Verkehrsplans | 克服半双轨神经优化运输中的纯净解决方案:学习最佳运输计划的平滑方法 2502.04583v2 |
Authors: Jaemoo Choi, Jaewoong Choi, Dohyun Kwon
We address the convergence problem in learning the Optimal Transport (OT) map, where the OT Map refers to a map from one distribution to another while minimizing the transport cost. Semi-dual Neural OT, a widely used approach for learning OT Maps with neural networks, often generates spurious solutions that fail to transfer one distribution to another accurately. We identify a sufficient condition under which the max-min solution of Semi-dual Neural OT recovers the true OT Map. Moreover, to address cases when this sufficient condition is not satisfied, we propose a novel method, OTP, which learns both the OT Map and the Optimal Transport Plan, representing the optimal coupling between two distributions. Under sharp assumptions on the distributions, we prove that our model eliminates the spurious solution issue and correctly solves the OT problem. Our experiments show that the OTP model recovers the optimal transport map where existing methods fail and outperforms current OT-based models in image-to-image translation tasks. Notably, the OTP model can learn stochastic transport maps when deterministic OT Maps do not exist, such as one-to-many tasks like colorization.
nan
Article 922
Title@2025-05-27 (2): Interlocking-free Selective Rationalization Through Genetic-based Learning
Title: Interlocking-free Selective Rationalization Through Genetic-based Learning | Interlocking-free Selektive Rationalisierung durch gentechnisch-basiertes Lernen | 通过基于遗传的学习实现互连、无互闭和无互换的选择性合理化 2412.10312v2 |
Authors: Federico Ruggeri, Gaetano Signorelli
A popular end-to-end architecture for selective rationalization is the select-then-predict pipeline, comprising a generator to extract highlights fed to a predictor. Such a cooperative system suffers from suboptimal equilibrium minima due to the dominance of one of the two modules, a phenomenon known as interlocking. While several contributions aimed at addressing interlocking, they only mitigate its effect, often by introducing feature-based heuristics, sampling, and ad-hoc regularizations. We present GenSPP, the first interlocking-free architecture for selective rationalization that does not require any learning overhead, as the above-mentioned. GenSPP avoids interlocking by performing disjoint training of the generator and predictor via genetic global search. Experiments on a synthetic and a real-world benchmark show that our model outperforms several state-of-the-art competitors.
nan
Article 923
Title@2025-05-27 (2): Optimizing fMRI Data Acquisition for Decoding Natural Speech with Limited Participants
Title: Optimizing fMRI Data Acquisition for Decoding Natural Speech with Limited Participants | Optimierung der fMRI-Datenerfassung für die Dekodierung von Natural Speech mit begrenzten Teilnehmern | 优化FMRI数据获取,以便与有限参加者进行自然演讲 2505.21304v1 |
Authors: Louis Jalouzot, Alexis Thual, Yair Lakretz, Christophe Pallier, Bertrand Thirion
We investigate optimal strategies for decoding perceived natural speech from fMRI data acquired from a limited number of participants. Leveraging Lebel et al. (2023)’s dataset of 8 participants, we first demonstrate the effectiveness of training deep neural networks to predict LLM-derived text representations from fMRI activity. Then, in this data regime, we observe that multi-subject training does not improve decoding accuracy compared to single-subject approach. Furthermore, training on similar or different stimuli across subjects has a negligible effect on decoding accuracy. Finally, we find that our decoders better model syntactic than semantic features, and that stories containing sentences with complex syntax or rich semantic content are more challenging to decode. While our results demonstrate the benefits of having extensive data per participant (deep phenotyping), they suggest that leveraging multi-subject for natural speech decoding likely requires deeper phenotyping or a substantially larger cohort.
nan
Article 924
Title@2025-05-27 (2): Large Language Models Miss the Multi-Agent Mark
Title: Large Language Models Miss the Multi-Agent Mark | Große Sprachmodelle vermissen das Multi-Agent Mark | 大语言模型 2505.21298v1 |
Authors: Emanuele La Malfa, Gabriele La Malfa, Samuele Marro, Jie M. Zhang, Elizabeth Black, Micheal Luck, Philip Torr, Michael Wooldridge
Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.
nan
Article 925
Title@2025-05-27 (2): Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Title: Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation | Auf dem Weg zur Anpassung von Open Source großen Sprachmodellen für die Erstellung klinischer Notizen auf Expertenebene | 努力调整用于专家级临床笔记制作的开放源大语言模型 2405.00715v6 |
Authors: Hanyin Wang, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Korsapati, Chuck Outcalt, Jimeng Sun
Proprietary Large Language Models (LLMs) such as GPT-4 and Gemini have demonstrated promising capabilities in clinical text summarization tasks. However, due to patient data privacy concerns and computational costs, many healthcare providers prefer using small, locally-hosted models over external generic LLMs. This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model, enabling it to generate high-quality clinical notes from outpatient patient-doctor dialogues. Our process incorporates continued pretraining, supervised fine-tuning, and reinforcement learning from both AI and human feedback. We introduced a new approach, DistillDirect, for performing on-policy reinforcement learning with Gemini 1.0 Pro as the teacher model. Our resulting model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians. In a blinded physician reader study, the majority (92.8%) of individual evaluations rated the notes generated by LLaMA-Clinic as “acceptable” or higher across three criteria: real-world readiness, completeness, and accuracy. In the more challenging “Assessment and Plan” section, LLaMA-Clinic matched physician-authored notes in real-world readiness score. We highlight key considerations for future clinical note-generation tasks, emphasizing the importance of pre-defining a “best practice” note format, rather than relying on LLMs to determine this for clinical practice.
nan
Article 926
Title@2025-05-27 (2): LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
Title: LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning | LoFT: Low-Rank-Anpassung, die sich wie Full-Fine-Tuning verhält | LOFT: 行为如完全精美调整的低朗适应 2505.21289v1 |
Authors: Nurbek Tastan, Stefanos Laskaridis, Martin Takac, Karthik Nandakumar, Samuel Horvath
Large pre-trained models are commonly adapted to downstream tasks using parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA), which injects small trainable low-rank matrices instead of updating all weights. While LoRA dramatically reduces trainable parameters with little overhead, it can still underperform full fine-tuning in accuracy and often converges more slowly. We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning by aligning the optimizer’s internal dynamics with those of updating all model weights. LoFT not only learns weight updates in a low-rank subspace (like LoRA) but also properly projects the optimizer’s first and second moments (Adam’s momentum and variance) into the same subspace, mirroring full-model updates. By aligning the low-rank update itself with the full update, LoFT eliminates the need for tuning extra hyperparameters, e.g., LoRA scaling factor $\alpha$. Empirically, this approach substantially narrows the performance gap between adapter-based tuning and full fine-tuning and consistently outperforms standard LoRA-style methods, all without increasing inference cost.
nan
Article 927
Title@2025-05-27 (2): GSAT: Graph Structure Attention Networks
Title: GSAT: Graph Structure Attention Networks | GSAT: Grafische Struktur | GSAT: 图表结构关注网络 2505.21288v1 |
Authors: Farshad Noravesh, Reza Haffari, Layki Soon, Arghya Pal
Graph Neural Networks (GNNs) have emerged as a powerful tool for processing data represented in graph structures, achieving remarkable success across a wide range of applications. However, to further improve the performance on graph classification benchmarks, structural representation of each node that encodes rich local topological information in the neighbourhood of nodes is an important type of feature that is often overlooked in the modeling. The consequence of neglecting the structural information has resulted high number of layers to connect messages from distant nodes which by itself produces other problems such as oversmoothing. In the present paper, we leverage these structural information that are modeled by anonymous random walks (ARWs) and introduce graph structure attention network (GSAT) which is a generalization of graph attention network(GAT) to integrate the original attribute and the structural representation to enforce the model to automatically find patterns for attending to different edges in the node neighbourhood to enrich graph representation. Our experiments show GSAT slightly improves SOTA on some graph classification benchmarks.
nan
Article 928
Title@2025-05-27 (2): Learnable Kernel Density Estimation for Graphs
Title: Learnable Kernel Density Estimation for Graphs | Erlernbare Kerneldichteschätzung für Graphen | 可学习的内核密度 2505.21285v1 |
Authors: Xudong Wang, Ziheng Sun, Chris Ding, Jicong Fan
This work proposes a framework LGKDE that learns kernel density estimation for graphs. The key challenge in graph density estimation lies in effectively capturing both structural patterns and semantic variations while maintaining theoretical guarantees. Combining graph kernels and kernel density estimation (KDE) is a standard approach to graph density estimation, but has unsatisfactory performance due to the handcrafted and fixed features of kernels. Our method LGKDE leverages graph neural networks to represent each graph as a discrete distribution and utilizes maximum mean discrepancy to learn the graph metric for multi-scale KDE, where all parameters are learned by maximizing the density of graphs relative to the density of their well-designed perturbed counterparts. The perturbations are conducted on both node features and graph spectra, which helps better characterize the boundary of normal density regions. Theoretically, we establish consistency and convergence guarantees for LGKDE, including bounds on the mean integrated squared error, robustness, and complexity. We validate LGKDE by demonstrating its effectiveness in recovering the underlying density of synthetic graph distributions and applying it to graph anomaly detection across diverse benchmark datasets. Extensive empirical evaluation shows that LGKDE demonstrates superior performance compared to state-of-the-art baselines on most benchmark datasets.
nan
Article 929
Title@2025-05-27 (2): Optimal Pricing for Data-Augmented AutoML Marketplaces
Title: Optimal Pricing for Data-Augmented AutoML Marketplaces | Optimale Preise für datengesteigerte AutoML-Märkte | 数据增强自动自动ML 市场最佳定价 2310.17843v2 |
Authors: Minbiao Han, Jonathan Light, Steven Xia, Sainyam Galhotra, Raul Castro Fernandez, Haifeng Xu
Organizations often lack sufficient data to effectively train machine learning (ML) models, while others possess valuable data that remains underutilized. Data markets promise to unlock substantial value by matching data suppliers with demand from ML consumers. However, market design involves addressing intricate challenges, including data pricing, fairness, robustness, and strategic behavior. In this paper, we propose a pragmatic data-augmented AutoML market that seamlessly integrates with existing cloud-based AutoML platforms such as Google’s Vertex AI and Amazon’s SageMaker. Unlike standard AutoML solutions, our design automatically augments buyer-submitted training data with valuable external datasets, pricing the resulting models based on their measurable performance improvements rather than computational costs as the status quo. Our key innovation is a pricing mechanism grounded in the instrumental value - the marginal model quality improvement - of externally sourced data. This approach bypasses direct dataset pricing complexities, mitigates strategic buyer behavior, and accommodates diverse buyer valuations through menu-based options. By integrating automated data and model discovery, our solution not only enhances ML outcomes but also establishes an economically sustainable framework for monetizing external data.
nan
Article 930
Title@2025-05-27 (2): Accelerated Parallel Tempering via Neural Transports
Title: Accelerated Parallel Tempering via Neural Transports | Beschleunigung des parallelen Temperierens über neurale Transporte | 通过神经运输加速平行探险 2502.10328v2 |
Authors: Leo Zhang, Peter Potaptchik, Jiajun He, Yuanqi Du, Arnaud Doucet, Francisco Vargas, Hai-Dang Dau, Saifuddin Syed
Markov Chain Monte Carlo (MCMC) algorithms are essential tools in computational statistics for sampling from unnormalised probability distributions, but can be fragile when targeting high-dimensional, multimodal, or complex target distributions. Parallel Tempering (PT) enhances MCMC’s sample efficiency through annealing and parallel computation, propagating samples from tractable reference distributions to intractable targets via state swapping across interpolating distributions. The effectiveness of PT is limited by the often minimal overlap between adjacent distributions in challenging problems, which requires increasing the computational resources to compensate. We introduce a framework that accelerates PT by leveraging neural samplers-including normalising flows, diffusion models, and controlled diffusions-to reduce the required overlap. Our approach utilises neural samplers in parallel, circumventing the computational burden of neural samplers while preserving the asymptotic consistency of classical PT. We demonstrate theoretically and empirically on a variety of multimodal sampling problems that our method improves sample quality, reduces the computational cost compared to classical PT, and enables efficient free energies/normalising constants estimation.
nan
Article 931
Title@2025-05-27 (2): Dual-Directed Algorithm Design for Efficient Pure Exploration
Title: Dual-Directed Algorithm Design for Efficient Pure Exploration | Dual-Directed-Algorithm-Design für effizientes Pure-Exploring | 高效纯勘探的双重稀释算法设计 2310.19319v3 |
Authors: Chao Qin, Wei You
While experimental design often focuses on selecting the single best alternative from a finite set (e.g., in ranking and selection or best-arm identification), many pure-exploration problems pursue richer goals. Given a specific goal, adaptive experimentation aims to achieve it by strategically allocating sampling effort, with the underlying sample complexity characterized by a maximin optimization problem. By introducing dual variables, we derive necessary and sufficient conditions for an optimal allocation, yielding a unified algorithm design principle that extends the top-two approach beyond best-arm identification. This principle gives rise to Information-Directed Selection, a hyperparameter-free rule that dynamically evaluates and chooses among candidates based on their current informational value. We prove that, when combined with Information-Directed Selection, top-two Thompson sampling attains asymptotic optimality for Gaussian best-arm identification, resolving a notable open question in the pure-exploration literature. Furthermore, our framework produces asymptotically optimal algorithms for pure-exploration thresholding bandits and $\varepsilon$-best-arm identification (i.e., ranking and selection with probability-of-good-selection guarantees), and more generally establishes a recipe for adapting Thompson sampling across a broad class of pure-exploration problems. Extensive numerical experiments highlight the efficiency of our proposed algorithms compared to existing methods.
nan
Article 932
Title@2025-05-27 (2): Taylor expansion-based Kolmogorov-Arnold network for blind image quality assessment
Title: Taylor expansion-based Kolmogorov-Arnold network for blind image quality assessment | Taylor-expansionsbasiertes Kolmogorov-Arnold-Netzwerk für blinde Bildqualitätsbewertung | 以泰勒为扩展基地的Kolmogorov-Arnold盲人图像质量评估网络 2505.21592v1 |
Authors: Ze Chen, Shaode Yu
Kolmogorov-Arnold Network (KAN) has attracted growing interest for its strong function approximation capability. In our previous work, KAN and its variants were explored in score regression for blind image quality assessment (BIQA). However, these models encounter challenges when processing high-dimensional features, leading to limited performance gains and increased computational cost. To address these issues, we propose TaylorKAN that leverages the Taylor expansions as learnable activation functions to enhance local approximation capability. To improve the computational efficiency, network depth reduction and feature dimensionality compression are integrated into the TaylorKAN-based score regression pipeline. On five databases (BID, CLIVE, KonIQ, SPAQ, and FLIVE) with authentic distortions, extensive experiments demonstrate that TaylorKAN consistently outperforms the other KAN-related models, indicating that the local approximation via Taylor expansions is more effective than global approximation using orthogonal functions. Its generalization capacity is validated through inter-database experiments. The findings highlight the potential of TaylorKAN as an efficient and robust model for high-dimensional score regression.
nan
Article 933
Title@2025-05-27 (2): Minimizing False-Positive Attributions in Explanations of Non-Linear Models
Title: Minimizing False-Positive Attributions in Explanations of Non-Linear Models | Minimierung falsch-positiver Attribute in Erklärungen nicht-linearer Modelle | 尽量减少解释非碱模型中的虚假动机归属 2505.11210v2 |
Authors: Anders Gjølbye, Stefan Haufe, Lars Kai Hansen
Suppressor variables can influence model predictions without being dependent on the target outcome and they pose a significant challenge for Explainable AI (XAI) methods. These variables may cause false-positive feature attributions, undermining the utility of explanations. Although effective remedies exist for linear models, their extension to non-linear models and to instance-based explanations has remained limited. We introduce PatternLocal, a novel XAI technique that addresses this gap. PatternLocal begins with a locally linear surrogate, e.g. LIME, KernelSHAP, or gradient-based methods, and transforms the resulting discriminative model weights into a generative representation, thereby suppressing the influence of suppressor variables while preserving local fidelity. In extensive hyperparameter optimization on the XAI-TRIS benchmark, PatternLocal consistently outperformed other XAI methods and reduced false-positive attributions when explaining non-linear tasks, thereby enabling more reliable and actionable insights.
nan
Article 934
Title@2025-05-27 (2): ResKoopNet: Learning Koopman Representations for Complex Dynamics with Spectral Residuals
Title: ResKoopNet: Learning Koopman Representations for Complex Dynamics with Spectral Residuals | ResKoopNet: Koopman-Repräsentanzen für komplexe Dynamiken mit Spektralresidualen lernen | ResKoopNet:学习 Koopman 代表器, 用于使用光谱残余物的复杂动态 2501.00701v4 |
Authors: Yuanchao Xu, Kaidi Shao, Nikos Logothetis, Zhongwei Shen
Analyzing the long-term behavior of high-dimensional nonlinear dynamical systems remains a significant challenge. While the Koopman operator framework provides a powerful global linearization tool, current methods for approximating its spectral components often face theoretical limitations and depend on predefined dictionaries. Residual Dynamic Mode Decomposition (ResDMD) advanced the field by introducing the \emph{spectral residual} to assess Koopman operator approximation accuracy; however, its approach of only filtering precomputed spectra prevents the discovery of the operator’s complete spectral information, a limitation known as the `spectral inclusion’ problem. We introduce ResKoopNet (Residual-based Koopman-learning Network), a novel method that directly addresses this by explicitly minimizing the \emph{spectral residual} to compute Koopman eigenpairs. This enables the identification of a more precise and complete Koopman operator spectrum. Using neural networks, our approach provides theoretical guarantees while maintaining computational adaptability. Experiments on a variety of physical and biological systems show that ResKoopNet achieves more accurate spectral approximations than existing methods, particularly for high-dimensional systems and those with continuous spectra, which demonstrates its effectiveness as a tool for analyzing complex dynamical systems.
nan
Article 935
Title@2025-05-27 (2): Mitigating Molecular Aggregation in Drug Discovery with Predictive Insights from Explainable AI
Title: Mitigating Molecular Aggregation in Drug Discovery with Predictive Insights from Explainable AI | Mildernde molekulare Aggregation in der Drogenentdeckung mit vorausschauenden Erkenntnissen von erklärbarer KI | 利用可解释的人工智能的预测洞察力减轻药物发现中的分子聚合 2306.02206v2 |
Authors: Hunter Sturm, Jonas Teufel, Kaitlin A. Isfeld, Pascal Friederich, Rebecca L. Davis
Herein, we present the application of MEGAN, our explainable AI (xAI) model, for the identification of small colloidally aggregating molecules (SCAMs). This work offers solutions to the long-standing problem of false positives caused by SCAMs in high throughput screening for drug discovery and demonstrates the power of xAI in the classification of molecular properties that are not chemically intuitive based on our current understanding. We leverage xAI insights and molecular counterfactuals to design alternatives to problematic compounds in drug screening libraries. Additionally, we experimentally validate the MEGAN prediction classification for one of the counterfactuals and demonstrate the utility of counterfactuals for altering the aggregation properties of a compound through minor structural modifications. The integration of this method in high-throughput screening approaches will help combat and circumvent false positives, providing better lead molecules more rapidly and thus accelerating drug discovery cycles.
nan
Article 936
Title@2025-05-27 (2): BindEnergyCraft: Casting Protein Structure Predictors as Energy-Based Models for Binder Design
Title: BindEnergyCraft: Casting Protein Structure Predictors as Energy-Based Models for Binder Design | BindEnergyCraft: Proteinstrukturvorhersagen als energiebasierte Modelle für Binder-Design | Bind EnergyCraft: 将蛋白结构预测器作为Binder设计以能源为基础的模型 2505.21241v1 |
Authors: Divya Nori, Anisha Parsan, Caroline Uhler, Wengong Jin
Protein binder design has been transformed by hallucination-based methods that optimize structure prediction confidence metrics, such as the interface predicted TM-score (ipTM), via backpropagation. However, these metrics do not reflect the statistical likelihood of a binder-target complex under the learned distribution and yield sparse gradients for optimization. In this work, we propose a method to extract such likelihoods from structure predictors by reinterpreting their confidence outputs as an energy-based model (EBM). By leveraging the Joint Energy-based Modeling (JEM) framework, we introduce pTMEnergy, a statistical energy function derived from predicted inter-residue error distributions. We incorporate pTMEnergy into BindEnergyCraft (BECraft), a design pipeline that maintains the same optimization framework as BindCraft but replaces ipTM with our energy-based objective. BECraft outperforms BindCraft, RFDiffusion, and ESM3 across multiple challenging targets, achieving higher in silico binder success rates while reducing structural clashes. Furthermore, pTMEnergy establishes a new state-of-the-art in structure-based virtual screening tasks for miniprotein and RNA aptamer binders.
nan
Article 937
Title@2025-05-27 (2): Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies
Title: Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies | Breaking the Performance Ceiling in komplexen Verstärkungs-Lernen erfordert Inferenz-Strategien | 综合加强学习中业绩上限的打破需要推断战略 2505.21236v1 |
Authors: Felix Chalumeau, Daniel Rajaonarivonivelomanantsoa, Ruan de Kock, Claude Formanek, Sasha Abramowitz, Oumayma Mahjoub, Wiem Khlifi, Simon Du Toit, Louay Ben Nessir, Refiloe Shabe, Arnol Fokam, Siddarth Singh, Ulrich Mbou Sob, Arnu Pretorius
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-the-art RL systems, trained until convergence, to hit a performance ceiling which they are unable to break out of with zero-shot inference. Meanwhile, many digital or simulation-based applications allow for an inference phase that utilises a specific time and compute budget to explore multiple attempts before outputting a final solution. In this work, we show that such an inference phase employed at execution time, and the choice of a corresponding inference strategy, are key to breaking the performance ceiling observed in complex multi-agent RL problems. Our main result is striking: we can obtain up to a 126% and, on average, a 45% improvement over the previous state-of-the-art across 17 tasks, using only a couple seconds of extra wall-clock time during execution. We also demonstrate promising compute scaling properties, supported by over 60k experiments, making it the largest study on inference strategies for complex RL to date. Our experimental data and code are available at https://sites.google.com/view/inf-marl.
nan
Article 938
Title@2025-05-27 (2): STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization
Title: STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization | STRAP: Spatio-Temporal Pattern Retrieval für Out-of-Distribution-Verallgemeinerung | STRAP: 普遍分发的Spadio-Temporal 样板回收 2505.19547v2 |
Authors: Haoyu Zhang, Wentao Zhang, Hao Miao, Xinke Jiang, Yuchen Fang, Yifan Zhang
Spatio-Temporal Graph Neural Networks (STGNNs) have emerged as a powerful tool for modeling dynamic graph-structured data across diverse domains. However, they often fail to generalize in Spatio-Temporal Out-of-Distribution (STOOD) scenarios, where both temporal dynamics and spatial structures evolve beyond the training distribution. To address this problem, we propose an innovative Spatio-Temporal Retrieval-Augmented Pattern Learning framework,STRAP, which enhances model generalization by integrating retrieval-augmented learning into the STGNN continue learning pipeline. The core of STRAP is a compact and expressive pattern library that stores representative spatio-temporal patterns enriched with historical, structural, and semantic information, which is obtained and optimized during the training phase. During inference, STRAP retrieves relevant patterns from this library based on similarity to the current input and injects them into the model via a plug-and-play prompting mechanism. This not only strengthens spatio-temporal representations but also mitigates catastrophic forgetting. Moreover, STRAP introduces a knowledge-balancing objective to harmonize new information with retrieved knowledge. Extensive experiments across multiple real-world streaming graph datasets show that STRAP consistently outperforms state-of-the-art STGNN baselines on STOOD tasks, demonstrating its robustness, adaptability, and strong generalization capability without task-specific fine-tuning.
nan
Article 939
Title@2025-05-27 (2): FRIREN: Beyond Trajectories – A Spectral Lens on Time
Title: FRIREN: Beyond Trajectories – A Spectral Lens on Time | FRIREN: Jenseits von Trajektorien – Eine Spektrallinse auf Zeit | 在轨迹之外 – – 时光透镜 2505.17370v2 |
Authors: Qilin Wang
Long-term time-series forecasting (LTSF) models are often presented as general-purpose solutions that can be applied across domains, implicitly assuming that all data is pointwise predictable. Using chaotic systems such as Lorenz-63 as a case study, we argue that geometric structure - not pointwise prediction - is the right abstraction for a dynamic-agnostic foundational model. Minimizing the Wasserstein-2 distance (W2), which captures geometric changes, and providing a spectral view of dynamics are essential for long-horizon forecasting. Our model, FRIREN (Flow-inspired Representations via Interpretable Eigen-networks), implements an augmented normalizing-flow block that embeds data into a normally distributed latent representation. It then generates a W2-efficient optimal path that can be decomposed into rotation, scaling, inverse rotation, and translation. This architecture yields locally generated, geometry-preserving predictions that are independent of the underlying dynamics, and a global spectral representation that functions as a finite Koopman operator with a small modification. This enables practitioners to identify which modes grow, decay, or oscillate, both locally and system-wide. FRIREN achieves an MSE of 11.4, MAE of 1.6, and SWD of 0.96 on Lorenz-63 in a 336-in, 336-out, dt=0.01 setting, surpassing TimeMixer (MSE 27.3, MAE 2.8, SWD 2.1). The model maintains effective prediction for 274 out of 336 steps, approximately 2.5 Lyapunov times. On Rossler (96-in, 336-out), FRIREN achieves an MSE of 0.0349, MAE of 0.0953, and SWD of 0.0170, outperforming TimeMixer’s MSE of 4.3988, MAE of 0.886, and SWD of 3.2065. FRIREN is also competitive on standard LTSF datasets such as ETT and Weather. By connecting modern generative flows with classical spectral analysis, FRIREN makes long-term forecasting both accurate and interpretable, setting a new benchmark for LTSF model design.
nan
Article 940
Title@2025-05-27 (2): Is Hyperbolic Space All You Need for Medical Anomaly Detection?
Title: Is Hyperbolic Space All You Need for Medical Anomaly Detection? | Ist hyperbolischer Raum alles, was Sie für medizinische Anomalie-Erkennung benötigen? | 超双曲空间 是否所有你需要的 医疗异常检测? 2505.21228v1 |
Authors: Alvaro Gonzalez-Jimenez, Simone Lionetti, Ludovic Amruthalingam, Philippe Gottfrois, Fabian Gröger, Marc Pouly, Alexander A. Navarini
Medical anomaly detection has emerged as a promising solution to challenges in data availability and labeling constraints. Traditional methods extract features from different layers of pre-trained networks in Euclidean space; however, Euclidean representations fail to effectively capture the hierarchical relationships within these features, leading to suboptimal anomaly detection performance. We propose a novel yet simple approach that projects feature representations into hyperbolic space, aggregates them based on confidence levels, and classifies samples as healthy or anomalous. Our experiments demonstrate that hyperbolic space consistently outperforms Euclidean-based frameworks, achieving higher AUROC scores at both image and pixel levels across multiple medical benchmark datasets. Additionally, we show that hyperbolic space exhibits resilience to parameter variations and excels in few-shot scenarios, where healthy images are scarce. These findings underscore the potential of hyperbolic space as a powerful alternative for medical anomaly detection. The project website can be found at https://hyperbolic-anomalies.github.io
nan
Article 941
Title@2025-05-27 (2): Why Do More Experts Fail? A Theoretical Analysis of Model Merging
Title: Why Do More Experts Fail? A Theoretical Analysis of Model Merging | Warum scheitern weitere Experten? Eine theoretische Analyse der Modellzusammenführung | 为何有更多的专家失败?对模式合并的理论分析 2505.21226v1 |
Authors: Zijing Wang, Xingle Xu, Yongkang Liu, Yiqun Zhang, Peiqin Lin, Shi Feng, Xiaocui Yang, Daling Wang, Hinrich Schütze
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model. Although recent model merging methods have shown promising results, they struggle to maintain performance gains as the number of merged models increases. In this paper, we investigate the key obstacles that limit the scalability of model merging when integrating a large number of expert models. First, we prove that there is an upper bound on model merging. Further theoretical analysis reveals that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged. Gaussian Width shows that the marginal benefit of merging additional models diminishes according to a strictly concave function. This implies that the effective parameter space becomes rapidly saturated as the number of merged models increases. Furthermore, using Approximate Kinematics Theory, we prove the existence of a unique optimal threshold beyond which adding more models does not yield significant performance improvements. At the same time, we introduce a straightforward Reparameterized Heavy-Tailed method (RHT) to extend the coverage of the merged model, thereby enhancing its performance. Empirical results on 12 benchmarks, including both knowledge-intensive and general-purpose tasks, validate our theoretical analysis. We believe that these results spark further research beyond the current scope of model merging. The source code is in the anonymous Github repository https://github.com/wzj1718/ModelMergingAnalysis.
nan
Article 942
Title@2025-05-27 (2): The dark side of the forces: assessing non-conservative force models for atomistic machine learning
Title: The dark side of the forces: assessing non-conservative force models for atomistic machine learning | Die dunkle Seite der Kräfte: Bewertung nicht konservativer Kraftmodelle für atomistisches maschinelles Lernen | 部队的黑暗面:评估非保守力量模型,以进行原子学机器学习 2412.11569v3 |
Authors: Filippo Bigi, Marcel Langer, Michele Ceriotti
The use of machine learning to estimate the energy of a group of atoms, and the forces that drive them to more stable configurations, has revolutionized the fields of computational chemistry and materials discovery. In this domain, rigorous enforcement of symmetry and conservation laws has traditionally been considered essential. For this reason, interatomic forces are usually computed as the derivatives of the potential energy, ensuring energy conservation. Several recent works have questioned this physically constrained approach, suggesting that directly predicting the forces yields a better trade-off between accuracy and computational efficiency – and that energy conservation can be learned during training. This work investigates the applicability of such non-conservative models in microscopic simulations. We identify and demonstrate several fundamental issues, from ill-defined convergence of geometry optimization to instability in various types of molecular dynamics. Contrary to the case of rotational symmetry, energy conservation is hard to learn, monitor, and correct for. The best approach to exploit the acceleration afforded by direct force prediction might be to use it in tandem with a conservative model, reducing – rather than eliminating – the additional cost of backpropagation, but avoiding the pathological behavior associated with non-conservative forces.
nan
Article 943
Title@2025-05-27 (2): Wavelet Flow For Extragalactic Foreground Simulations
Title: Wavelet Flow For Extragalactic Foreground Simulations | Wavelet Flow für extragalaktische Foreground Simulationen | 用于外星际前景模拟的波浪流 2505.21220v1 |
Authors: M. Mebratu, W. L. K. Wu
Extragalactic foregrounds in cosmic microwave background (CMB) observations are both a source of cosmological and astrophysical information and a nuisance to the CMB. Effective field-level modeling that captures their non-Gaussian statistical distributions is increasingly important for optimal information extraction, particularly given the precise and low-noise observations from current and upcoming experiments. We explore the use of Wavelet Flow (WF) models to tackle the novel task of modeling the field-level probability distributions of multi-component CMB secondaries. Specifically, we jointly train correlated CMB lensing convergence ($\kappa$) and cosmic infrared background (CIB) maps with a WF model and obtain a network that statistically recovers the input to high accuracy – the trained network generates samples of $\kappa$ and CIB fields whose average power spectra are within a few percent of the inputs across all scales, and whose Minkowski functionals are similarly accurate compared to the inputs. Leveraging the multiscale architecture of these models, we fine-tune both the model parameters and the priors at each scale independently, optimizing performance across different resolutions. These results demonstrate that WF models can accurately simulate correlated components of CMB secondaries, supporting improved analysis of cosmological data. Our code and trained models can be found here (https://github.com/matiwosm/HybridPriorWavletFlow.git).
nan
Article 944
Title@2025-05-27 (2): Addressing Data Quality Decompensation in Federated Learning via Dynamic Client Selection
Title: Addressing Data Quality Decompensation in Federated Learning via Dynamic Client Selection | Adressierung von Datenqualitätsentkompensation im Federated Learning über Dynamic Client Selection | 通过动态客户选择解决联邦学习中的数据质量补偿问题 2505.21219v1 |
Authors: Qinjun Fei, Nuria Rodríguez-Barroso, María Victoria Luzón, Zhongliang Zhang, Francisco Herrera
In cross-silo Federated Learning (FL), client selection is critical to ensure high model performance, yet it remains challenging due to data quality decompensation, budget constraints, and incentive compatibility. As training progresses, these factors exacerbate client heterogeneity and degrade global performance. Most existing approaches treat these challenges in isolation, making jointly optimizing multiple factors difficult. To address this, we propose Shapley-Bid Reputation Optimized Federated Learning (SBRO-FL), a unified framework integrating dynamic bidding, reputation modeling, and cost-aware selection. Clients submit bids based on their perceived data quality, and their contributions are evaluated using Shapley values to quantify their marginal impact on the global model. A reputation system, inspired by prospect theory, captures historical performance while penalizing inconsistency. The client selection problem is formulated as a 0-1 integer program that maximizes reputation-weighted utility under budget constraints. Experiments on FashionMNIST, EMNIST, CIFAR-10, and SVHN datasets show that SBRO-FL improves accuracy, convergence speed, and robustness, even in adversarial and low-bid interference scenarios. Our results highlight the importance of balancing data reliability, incentive compatibility, and cost efficiency to enable scalable and trustworthy FL deployments.
nan
Article 945
Title@2025-05-27 (2): Transfer learning for multifidelity simulation-based inference in cosmology
Title: Transfer learning for multifidelity simulation-based inference in cosmology | Transfer-Lernen für Multifidelity-Simulationsbasierte Schlussfolgerungen in der Kosmologie | 在宇宙学中进行多种不贞行为模拟推论的转让性学习 2505.21215v1 |
Authors: Alex A. Saoulis, Davide Piras, Niall Jeffrey, Alessio Spurio Mancini, Ana M. G. Ferreira, Benjamin Joachimi
Simulation-based inference (SBI) enables cosmological parameter estimation when closed-form likelihoods or models are unavailable. However, SBI relies on machine learning for neural compression and density estimation. This requires large training datasets which are prohibitively expensive for high-quality simulations. We overcome this limitation with multifidelity transfer learning, combining less expensive, lower-fidelity simulations with a limited number of high-fidelity simulations. We demonstrate our methodology on dark matter density maps from two separate simulation suites in the hydrodynamical CAMELS Multifield Dataset. Pre-training on dark-matter-only $N$-body simulations reduces the required number of high-fidelity hydrodynamical simulations by a factor between $8$ and $15$, depending on the model complexity, posterior dimensionality, and performance metrics used. By leveraging cheaper simulations, our approach enables performant and accurate inference on high-fidelity models while substantially reducing computational costs.
nan
Article 946
Title@2025-05-27 (2): Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
Title: Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning | Auf dem Weg zur Enthüllung der Wirksamkeit von Klein-Scale-Fine-Tuning im R1-Stil Verstärktes Lernen | 提高R1型强化学习中小规模微调的效力 2505.17988v2 |
Authors: Yutong Chen, Jiandong Gao, Ji Wu
R1-style Reinforcement Learning (RL) significantly enhances Large Language Models’ reasoning capabilities, yet the mechanism behind rule-based RL remains unclear. We found that small-scale SFT has significant influence on RL but shows poor efficiency. To explain our observations, we propose an analytical framework and compare the efficiency of SFT and RL by measuring sample effect. Hypothetical analysis show that SFT efficiency is limited by training data. Guided by our analysis, we propose Re-distillation, a technique that fine-tunes pretrain model through small-scale distillation from the RL-trained policy. Experiments on Knight & Knave and MATH datasets demonstrate re-distillation’s surprising efficiency: re-distilled models match RL performance with far fewer samples and less computation. Empirical verification shows that sample effect is a good indicator of performance improvements. As a result, on K&K dataset, our re-distilled Qwen2.5-1.5B model surpasses DeepSeek-V3-0324 with only 1K SFT samples. On MATH, Qwen2.5-1.5B fine-tuned with re-distilled 500 samples matches its instruct-tuned variant without RL. Our work explains several interesting phenomena in R1-style RL, shedding light on the mechanisms behind its empirical success. Code is available at: https://github.com/on1262/deep-reasoning
nan
Article 947
Title@2025-05-27 (2): Input Convex Kolmogorov Arnold Networks
Title: Input Convex Kolmogorov Arnold Networks | Input Convex Kolmogorov Arnold Networks | 投入 Convex Kolmogorov Arnold 网络 2505.21208v1 |
Authors: Thomas Deschatre, Xavier Warin
This article presents an input convex neural network architecture using Kolmogorov-Arnold networks (ICKAN). Two specific networks are presented: the first is based on a low-order, linear-by-part, representation of functions, and a universal approximation theorem is provided. The second is based on cubic splines, for which only numerical results support convergence. We demonstrate on simple tests that these networks perform competitively with classical input convex neural networks (ICNNs). In a second part, we use the networks to solve some optimal transport problems needing a convex approximation of functions and demonstrate their effectiveness. Comparisons with ICNNs show that cubic ICKANs produce results similar to those of classical ICNNs.
nan
Article 948
Title@2025-05-27 (2): Towards Identifiability of Interventional Stochastic Differential Equations
Title: Towards Identifiability of Interventional Stochastic Differential Equations | Zur Identifizierbarkeit interventioneller stochastischer Differentialgleichungen | 实现干预性斯托卡差异等同的可识别性 2505.15987v2 |
Authors: Aaron Zweig, Zaikang Lin, Elham Azizi, David Knowles
We study identifiability of stochastic differential equation (SDE) models under multiple interventions. Our results give the first provable bounds for unique recovery of SDE parameters given samples from their stationary distributions. We give tight bounds on the number of necessary interventions for linear SDEs, and upper bounds for nonlinear SDEs in the small noise regime. We experimentally validate the recovery of true parameters in synthetic data, and motivated by our theoretical results, demonstrate the advantage of parameterizations with learnable activation functions.
nan
Article 949
Title@2025-05-27 (2): Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs
Title: Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs | Universal Reasoner: Ein einfacher, komponierbarer Plug-and-Play-Reasoner für gefrorene LLMs | 通用理由:冻结长效LMs的单一、可合成插管和布局理由 2505.19075v2 |
Authors: Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye
Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise their generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically requires retraining for each LLM backbone due to architectural dependencies. To address these challenges, here we propose Universal Reasoner (UniR) - a single, lightweight, composable, and plug-and-play reasoning module that can be used with any frozen LLM to endow it with specialized reasoning capabilities. Specifically, UniR decomposes the reward into a standalone reasoning module that is trained independently using predefined rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR can be combined with any frozen LLM at inference time by simply adding its output logits to those of the LLM backbone. This additive structure naturally enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Experimental results on mathematical reasoning and machine translation tasks show that UniR significantly outperforms existing baseline fine-tuning methods using the Llama3.2 model. Furthermore, UniR demonstrates strong weak-to-strong generalization: reasoning modules trained on smaller models effectively guide much larger LLMs. This makes UniR a cost-efficient, adaptable, and robust solution for enhancing reasoning in LLMs without compromising their core capabilities. Code is open-sourced at https://github.com/hangeol/UniR
nan
Article 950
Title@2025-05-27 (2): Developing hybrid mechanistic and data-driven personalized prediction models for platelet dynamics
Title: Developing hybrid mechanistic and data-driven personalized prediction models for platelet dynamics | Entwicklung hybrider mechanistischer und datengesteuerter personalisierter Vorhersagemodelle für Thrombozytendynamik | 开发混合机械和数据驱动的小板板动力学混合机械和个人化预测模型 2505.21204v1 |
Authors: Marie Steinacker, Yuri Kheifetz, Markus Scholz
Hematotoxicity, drug-induced damage to the blood-forming system, is a frequent side effect of cytotoxic chemotherapy and poses a significant challenge in clinical practice due to its high inter-patient variability and limited predictability. Current mechanistic models often struggle to accurately forecast outcomes for patients with irregular or atypical trajectories. In this study, we develop and compare hybrid mechanistic and data-driven approaches for individualized time series modeling of platelet counts during chemotherapy. We consider hybrid models that combine mechanistic models with neural networks, known as universal differential equations. As a purely data-driven alternative, we utilize a nonlinear autoregressive exogenous model using gated recurrent units as the underlying architecture. These models are evaluated across a range of real patient scenarios, varying in data availability and sparsity, to assess predictive performance. Our findings demonstrate that data-driven methods, when provided with sufficient data, significantly improve prediction accuracy, particularly for high-risk patients with irregular platelet dynamics. This highlights the potential of data-driven approaches in enhancing clinical decision-making. In contrast, hybrid and mechanistic models are superior in scenarios with limited or sparse data. The proposed modeling and comparison framework is generalizable and could be extended to predict other treatment-related toxicities, offering broad applicability in personalized medicine.
nan
Article 951
Title@2025-05-27 (2): Implicit Dynamical Flow Fusion (IDFF) for Generative Modeling
Title: Implicit Dynamical Flow Fusion (IDFF) for Generative Modeling | Implizite Dynamische Flussfusion (IDFF) für generative Modellierung | 用于产生建模的隐含动态流动融合(IDFF) 2409.14599v4 |
Authors: Mohammad R. Rezaei, Milos R. Popovic, Milad Lankarany, Rahul G. Krishnan
Conditional Flow Matching (CFM) models can generate high-quality samples from a non-informative prior, but they can be slow, often needing hundreds of network evaluations (NFE). To address this, we propose Implicit Dynamical Flow Fusion (IDFF); IDFF learns a new vector field with an additional momentum term that enables taking longer steps during sample generation while maintaining the fidelity of the generated distribution. Consequently, IDFFs reduce the NFEs by a factor of ten (relative to CFMs) without sacrificing sample quality, enabling rapid sampling and efficient handling of image and time-series data generation tasks. We evaluate IDFF on standard benchmarks such as CIFAR-10 and CelebA for image generation, where we achieve likelihood and quality performance comparable to CFMs and diffusion-based models with fewer NFEs. IDFF also shows superior performance on time-series datasets modeling, including molecular simulation and sea surface temperature (SST) datasets, highlighting its versatility and effectiveness across different domains.\href{https://github.com/MrRezaeiUofT/IDFF}{Github Repository}
nan
Article 952
Title@2025-05-27 (2): Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection
Title: Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection | Kulturempfehlung mit maschinellem Lernen: Nutzung ökologischer und wirtschaftlicher Faktoren für eine optimale Ernteauswahl | 采用机械学习的作物建议:利用环境和经济因素优化作物选择 2505.21201v1 |
Authors: Steven Sam, Silima Marshal DAbreo
Agriculture constitutes a primary source of food production, economic growth and employment in India, but the sector is confronted with low farm productivity and yields aggravated by increased pressure on natural resources and adverse climate change variability. Efforts involving green revolution, land irrigations, improved seeds and organic farming have yielded suboptimal outcomes. The adoption of computational tools like crop recommendation systems offers a new way to provide insights and help farmers tackle low productivity. However, most agricultural recommendation systems in India focus narrowly on environmental factors and regions, limiting accurate predictions of high-yield, profitable crops. This study uses environmental and economic factors with 19 crops across 15 states to develop and evaluate Random Forest and SVM models using 10-fold Cross Validation, Time-series Split, and Lag Variables. The 10-fold cross validation showed high accuracy (RF: 99.96%, SVM: 94.71%) but raised overfitting concerns. Introducing temporal order, better reflecting real-world conditions, reduced performance (RF: 78.55%, SVM: 71.18%) in the Time-series Split.To further increase the model accuracy while maintaining the temporal order, the Lag Variables approach was employed, which resulted in improved performance (RF: 83.62%, SVM: 74.38%) compared to the 10-fold cross validation approach. Overall, the models in the Time-series Split and Lag Variable Approaches offer practical insights by handling temporal dependencies and enhancing its adaptability to changing agricultural conditions over time. Consequently, the study shows the Random Forest model developed based on the Lag Variables as the most preferred algorithm for optimal crop recommendation in the Indian context.
nan
Article 953
Title@2025-05-27 (2): Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Title: Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning | Pioniere 4-Bit FP-Quantisierung für Diffusionsmodelle: Mixup-Sign-Quantisierung und Timestep-Aware Feintuning | 推出4-Bit FP 扩散模型量化:混合- Sign 量度和时间步骤- 软件精美调试 2505.21591v1 |
Authors: Maosen Zhao, Pengtao Chen, Chong Yu, Yan Wen, Xudong Tan, Tao Chen
Model quantization reduces the bit-width of weights and activations, improving memory efficiency and inference speed in diffusion models. However, achieving 4-bit quantization remains challenging. Existing methods, primarily based on integer quantization and post-training quantization fine-tuning, struggle with inconsistent performance. Inspired by the success of floating-point (FP) quantization in large language models, we explore low-bit FP quantization for diffusion models and identify key challenges: the failure of signed FP quantization to handle asymmetric activation distributions, the insufficient consideration of temporal complexity in the denoising process during fine-tuning, and the misalignment between fine-tuning loss and quantization error. To address these challenges, we propose the mixup-sign floating-point quantization (MSFP) framework, first introducing unsigned FP quantization in model quantization, along with timestep-aware LoRA (TALoRA) and denoising-factor loss alignment (DFA), which ensure precise and stable fine-tuning. Extensive experiments show that we are the first to achieve superior performance in 4-bit FP quantization for diffusion models, outperforming existing PTQ fine-tuning methods in 4-bit INT quantization.
nan
Article 954
Title@2025-05-27 (2): Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities
Title: Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities | Enthüllen von instruction-spezifischen Neuronen & Experten: Ein analytischer Rahmen für die instruction-following Fähigkeiten von LLM | 具体未完成的指示性具体神经和专家:LLM教学-执行能力分析框架 2505.21191v1 |
Authors: Junyan Zhang, Yubo Gao, Yibo Yan, Jungang Li, Zhaorui Hou, Sicheng Tao, Shuliang Liu, Song Dai, Yonghua Hei, Junzhuo Li, Xuming Hu
The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.
nan
Article 955
Title@2025-05-27 (2): Exploring the Latent Capacity of LLMs for One-Step Text Generation
Title: Exploring the Latent Capacity of LLMs for One-Step Text Generation | Erforschung der Latent-Kapazität von LLMs für die einstufige Textgenerierung | 探索单步制文本生成LLMs的原始能力 2505.21189v1 |
Authors: Gleb Mezentsev, Ivan Oseledets
A recent study showed that large language models (LLMs) can reconstruct surprisingly long texts - up to thousands of tokens - via autoregressive generation from just one specially trained input embedding. In this work, we explore whether such reconstruction is possible without autoregression. We show that frozen LLMs can generate hundreds of accurate tokens in just one forward pass, when provided with only two learned embeddings. This reveals a surprising and underexplored capability of LLMs - multi-token generation without iterative decoding. We investigate the behaviour of these embeddings and provide insight into the type of information they encode. We also empirically show that although these representations are not unique for a given text, they form connected and local regions in embedding space - a property that suggests the potential of learning a dedicated encoder into that space.
nan
Article 956
Title@2025-05-27 (2): Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees
Title: Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees | Gleichwertiges Repräsentationslernen für Symmetrie-Bewusstschluss mit Garantien | 关于有担保的对称-软件推断的等同代表制学习 2505.19809v2 |
Authors: Daniel Ordoñez-Apraez, Vladimir Kostić, Alek Fröhlich, Vivien Brandt, Karim Lounici, Massimiliano Pontil
In many real-world applications of regression, conditional probability estimation, and uncertainty quantification, exploiting symmetries rooted in physics or geometry can dramatically improve generalization and sample efficiency. While geometric deep learning has made significant empirical advances by incorporating group-theoretic structure, less attention has been given to statistical learning guarantees. In this paper, we introduce an equivariant representation learning framework that simultaneously addresses regression, conditional probability estimation, and uncertainty quantification while providing first-of-its-kind non-asymptotic statistical learning guarantees. Grounded in operator and group representation theory, our framework approximates the spectral decomposition of the conditional expectation operator, building representations that are both equivariant and disentangled along independent symmetry subgroups. Empirical evaluations on synthetic datasets and real-world robotics applications confirm the potential of our approach, matching or outperforming existing equivariant baselines in regression while additionally providing well-calibrated parametric uncertainty estimates.
nan
Article 957
Title@2025-05-27 (2): PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing
Title: PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing | GiftSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing | 毒物群:通过示范众包普及有害信息合成 2505.21184v1 |
Authors: Yu Yan, Sheng Sun, Zhifei Zheng, Ziji Hao, Teli Liu, Min Liu
To construct responsible and secure AI applications, harmful information data is widely utilized for adversarial testing and the development of safeguards. Existing studies mainly leverage Large Language Models (LLMs) to synthesize data to obtain high-quality task datasets at scale, thereby avoiding costly human annotation. However, limited by the safety alignment mechanisms of LLMs, the synthesis of harmful data still faces challenges in generation reliability and content diversity. In this study, we propose a novel harmful information synthesis framework, PoisonSwarm, which applies the model crowdsourcing strategy to generate diverse harmful data while maintaining a high success rate. Specifically, we generate abundant benign data as the based templates in a counterfactual manner. Subsequently, we decompose each based template into multiple semantic units and perform unit-by-unit toxification and final refinement through dynamic model switching, thus ensuring the success of synthesis. Experimental results demonstrate that PoisonSwarm achieves state-of-the-art performance in synthesizing different categories of harmful data with high scalability and diversity.
nan
Article 958
Title@2025-05-27 (2): Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations
Title: Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations | Lernen, was zu tun ist und was nicht: Offline-Imitation von Experten und unerwünschten Demonstrationen | 学会做什么做什么和不做什么:专家的脱线模仿和不受欢迎的示威 2505.21182v1 |
Authors: Huy Hoang, Tien Mai, Pradeep Varakantham, Tanvi Verma
Offline imitation learning typically learns from expert and unlabeled demonstrations, yet often overlooks the valuable signal in explicitly undesirable behaviors. In this work, we study offline imitation learning from contrasting behaviors, where the dataset contains both expert and undesirable demonstrations. We propose a novel formulation that optimizes a difference of KL divergences over the state-action visitation distributions of expert and undesirable (or bad) data. Although the resulting objective is a DC (Difference-of-Convex) program, we prove that it becomes convex when expert demonstrations outweigh undesirable demonstrations, enabling a practical and stable non-adversarial training objective. Our method avoids adversarial training and handles both positive and negative demonstrations in a unified framework. Extensive experiments on standard offline imitation learning benchmarks demonstrate that our approach consistently outperforms state-of-the-art baselines.
nan
Article 959
Title@2025-05-27 (2): Latent label distribution grid representation for modeling uncertainty
Title: Latent label distribution grid representation for modeling uncertainty | Latent Label Distribution Grid Darstellung für Modellierung Unsicherheit | 用于模拟不确定性模型的延迟标签分配网格代表 2505.21180v1 |
Authors: ShuNing Sun, YinSong Xiong, Yu Zhang, Zhuoran Zheng
Although \textbf{L}abel \textbf{D}istribution \textbf{L}earning (LDL) has promising representation capabilities for characterizing the polysemy of an instance, the complexity and high cost of the label distribution annotation lead to inexact in the construction of the label space. The existence of a large number of inexact labels generates a label space with uncertainty, which misleads the LDL algorithm to yield incorrect decisions. To alleviate this problem, we model the uncertainty of label distributions by constructing a \textbf{L}atent \textbf{L}abel \textbf{D}istribution \textbf{G}rid (LLDG) to form a low-noise representation space. Specifically, we first construct a label correlation matrix based on the differences between labels, and then expand each value of the matrix into a vector that obeys a Gaussian distribution, thus building a LLDG to model the uncertainty of the label space. Finally, the LLDG is reconstructed by the LLDG-Mixer to generate an accurate label distribution. Note that we enforce a customized low-rank scheme on this grid, which assumes that the label relations may be noisy and it needs to perform noise-reduction with the help of a Tucker reconstruction technique. Furthermore, we attempt to evaluate the effectiveness of the LLDG by considering its generation as an upstream task to achieve the classification of the objects. Extensive experimental results show that our approach performs competitively on several benchmarks.
nan
Article 960
Title@2025-05-27 (2): Improved Online Confidence Bounds for Multinomial Logistic Bandits
Title: Improved Online Confidence Bounds for Multinomial Logistic Bandits | Verbesserte Online-Konfidenzgrenzen für multinomiale Logistische Banditen | 提高多军后勤大盗的在线信任度 2502.10020v4 |
Authors: Joongkyu Lee, Min-hwan Oh
In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online confidence bound for MNL models and achieved nearly minimax-optimal regret in MNL bandits. However, their results still depend on the norm-boundedness of the unknown parameter $B$ and the maximum size of possible outcomes $K$. To address this, we first derive an online confidence bound of $O\left(\sqrt{d \log t} + B \right)$, which is a significant improvement over the previous bound of $O (B \sqrt{d} \log t \log K )$ (Lee & Oh, 2024). This is mainly achieved by establishing tighter self-concordant properties of the MNL loss and introducing a novel intermediary term to bound the estimation error. Using this new online confidence bound, we propose a constant-time algorithm, OFU-MNL++, which achieves a variance-dependent regret bound of $O \Big( d \log T \sqrt{ \sum_{t=1}^T \sigma_t^2 } \Big) $ for sufficiently large $T$, where $\sigma_t^2$ denotes the variance of the rewards at round $t$, $d$ is the dimension of the contexts, and $T$ is the total number of rounds. Furthermore, we introduce a Maximum Likelihood Estimation (MLE)-based algorithm, OFU-MN$^2$L, which achieves an anytime poly(B)-free regret of $O \Big( d \log (BT) \sqrt{ \sum_{t=1}^T \sigma_t^2 } \Big) $.
nan
Article 961
Title@2025-05-27 (2): Topological Deep Learning for Speech Data
Title: Topological Deep Learning for Speech Data | Topologisches Deep Learning für Sprachdaten | 为语音数据进行地形深层学习 2505.21173v1 |
Authors: Zhiwang Yu
Topological data analysis (TDA) offers novel mathematical tools for deep learning. Inspired by Carlsson et al., this study designs topology-aware convolutional kernels that significantly improve speech recognition networks. Theoretically, by investigating orthogonal group actions on kernels, we establish a fiber-bundle decomposition of matrix spaces, enabling new filter generation methods. Practically, our proposed Orthogonal Feature (OF) layer achieves superior performance in phoneme recognition, particularly in low-noise scenarios, while demonstrating cross-domain adaptability. This work reveals TDA’s potential in neural network optimization, opening new avenues for mathematics-deep learning interdisciplinary studies.
nan
Article 962
Title@2025-05-27 (2): Parameter Efficient Continual Learning with Dynamic Low-Rank Adaptation
Title: Parameter Efficient Continual Learning with Dynamic Low-Rank Adaptation | Parameter Effizientes kontinuierliches Lernen mit dynamischer Low-Rank-Anpassung | 具有动态低Rank适应性的持续学习 2505.11998v2 |
Authors: Prashant Shivaram Bhat, Shakib Yazdani, Elahe Arani, Bahram Zonooz
Catastrophic forgetting has remained a critical challenge for deep neural networks in Continual Learning (CL) as it undermines consolidated knowledge when learning new tasks. Parameter efficient fine tuning CL techniques are gaining traction for their effectiveness in addressing catastrophic forgetting with a lightweight training schedule while avoiding degradation of consolidated knowledge in pre-trained models. However, low rank adapters (LoRA) in these approaches are highly sensitive to rank selection which can lead to sub-optimal resource allocation and performance. To this end, we introduce PEARL, a rehearsal-free CL framework that entails dynamic rank allocation for LoRA components during CL training. Specifically, PEARL leverages reference task weights and adaptively determines the rank of task-specific LoRA components based on the current tasks’ proximity to reference task weights in parameter space. To demonstrate the versatility of PEARL, we evaluate it across three vision architectures (ResNet, Separable Convolutional Network and Vision Transformer) and a multitude of CL scenarios, and show that PEARL outperforms all considered baselines by a large margin.
nan
Article 963
Title@2025-05-27 (2): STEB: In Search of the Best Evaluation Approach for Synthetic Time Series
Title: STEB: In Search of the Best Evaluation Approach for Synthetic Time Series | STEB: Auf der Suche nach dem besten Bewertungsansatz für die Synthetische Zeitreihe | STEB:寻求合成时间系列的最佳评价方法 2505.21160v1 |
Authors: Michael Stenger, Robert Leppich, André Bauer, Samuel Kounev
The growing need for synthetic time series, due to data augmentation or privacy regulations, has led to numerous generative models, frameworks, and evaluation measures alike. Objectively comparing these measures on a large scale remains an open challenge. We propose the Synthetic Time series Evaluation Benchmark (STEB) – the first benchmark framework that enables comprehensive and interpretable automated comparisons of synthetic time series evaluation measures. Using 10 diverse datasets, randomness injection, and 13 configurable data transformations, STEB computes indicators for measure reliability and score consistency. It tracks running time, test errors, and features sequential and parallel modes of operation. In our experiments, we determine a ranking of 41 measures from literature and confirm that the choice of upstream time series embedding heavily impacts the final score.
nan
Article 964
Title@2025-05-27 (2): Model as Loss: A Self-Consistent Training Paradigm
Title: Model as Loss: A Self-Consistent Training Paradigm | Modell als Verlust: Ein selbstkonsistentes Trainingsparadigma | 损失模型:自我协调培训模型 2505.21156v1 |
Authors: Saisamarth Rajesh Phaye, Milos Cernak, Andrew Harper
Conventional methods for speech enhancement rely on handcrafted loss functions (e.g., time or frequency domain losses) or deep feature losses (e.g., using WavLM or wav2vec), which often fail to capture subtle signal properties essential for optimal performance. To address this, we propose Model as Loss, a novel training paradigm that utilizes the encoder from the same model as a loss function to guide the training. The Model as Loss paradigm leverages the encoder’s task-specific feature space, optimizing the decoder to produce output consistent with perceptual and task-relevant characteristics of the clean signal. By using the encoder’s learned features as a loss function, this framework enforces self-consistency between the clean reference speech and the enhanced model output. Our approach outperforms pre-trained deep feature losses on standard speech enhancement benchmarks, offering better perceptual quality and robust generalization to both in-domain and out-of-domain datasets.
nan
Article 965
Title@2025-05-27 (2): FlexiReg: Flexible Urban Region Representation Learning
Title: FlexiReg: Flexible Urban Region Representation Learning | FlexiReg: Flexibles Stadtraum-Repräsentanz-Lernen | 灵活的城市地区代表性学习:灵活的城市地区代表性学习 2503.09128v2 |
Authors: Fengze Sun, Yanchuan Chang, Egemen Tanin, Shanika Karunasekera, Jianzhong Qi
The increasing availability of urban data offers new opportunities for learning region representations, which can be used as input to machine learning models for downstream tasks such as check-in or crime prediction. While existing solutions have produced promising results, an issue is their fixed formation of regions and fixed input region features, which may not suit the needs of different downstream tasks. To address this limitation, we propose a model named FlexiReg for urban region representation learning that is flexible with both the formation of urban regions and the input region features. FlexiReg is based on a spatial grid partitioning over the spatial area of interest. It learns representations for the grid cells, leveraging publicly accessible data, including POI, land use, satellite imagery, and street view imagery. We propose adaptive aggregation to fuse the cell representations and prompt learning techniques to tailor the representations towards different tasks, addressing the needs of varying formations of urban regions and downstream tasks. Extensive experiments on five real-world datasets demonstrate that FlexiReg outperforms state-of-the-art models by up to 202% in term of the accuracy of four diverse downstream tasks using the produced urban region representations.
nan
Article 966
Title@2025-05-27 (2): Predicate Invention for Bilevel Planning
Title: Predicate Invention for Bilevel Planning | Prädikat Erfindung für Bilevel-Planung | 双级规划预发明 2203.09634v3 |
Authors: Tom Silver, Rohan Chitnis, Nishanth Kumar, Willie McClinton, Tomas Lozano-Perez, Leslie Pack Kaelbling, Joshua Tenenbaum
Efficient planning in continuous state and action spaces is fundamentally hard, even when the transition model is deterministic and known. One way to alleviate this challenge is to perform bilevel planning with abstractions, where a high-level search for abstract plans is used to guide planning in the original transition space. Previous work has shown that when state abstractions in the form of symbolic predicates are hand-designed, operators and samplers for bilevel planning can be learned from demonstrations. In this work, we propose an algorithm for learning predicates from demonstrations, eliminating the need for manually specified state abstractions. Our key idea is to learn predicates by optimizing a surrogate objective that is tractable but faithful to our real efficient-planning objective. We use this surrogate objective in a hill-climbing search over predicate sets drawn from a grammar. Experimentally, we show across four robotic planning environments that our learned abstractions are able to quickly solve held-out tasks, outperforming six baselines. Code: https://tinyurl.com/predicators-release
nan
Article 967
Title@2025-05-27 (2): Semi-Supervised Conformal Prediction With Unlabeled Nonconformity Score
Title: Semi-Supervised Conformal Prediction With Unlabeled Nonconformity Score | Halbüberwachte konforme Vorhersage mit nicht markiertem Nonkonformity Score | 带有未贴标签的不合规分数的半超半常规预测 2505.21147v1 |
Authors: Xuanning Zhou, Hao Zeng, Xiaobo Xia, Bingyi Jing, Hongxin Wei
Conformal prediction (CP) is a powerful framework for uncertainty quantification, providing prediction sets with coverage guarantees when calibrated on sufficient labeled data. However, in real-world applications where labeled data is often limited, standard CP can lead to coverage deviation and output overly large prediction sets. In this paper, we extend CP to the semi-supervised setting and propose SemiCP, leveraging both labeled data and unlabeled data for calibration. Specifically, we introduce a novel nonconformity score function, NNM, designed for unlabeled data. This function selects labeled data with similar pseudo-label scores to estimate nonconformity scores, integrating them into the calibration process to overcome sample size limitations. We theoretically demonstrate that, under mild assumptions, SemiCP provide asymptotically coverage guarantee for prediction sets. Extensive experiments further validate that our approach effectively reduces instability and inefficiency under limited calibration data, can be adapted to conditional coverage settings, and integrates seamlessly with existing CP methods.
nan
Article 968
Title@2025-05-27 (2): A Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven Deformable Linear Object Manipulation
Title: A Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven Deformable Linear Object Manipulation | Eine distributive Behandlung von Real2Sim2Real für die Anpassung an Objekt-Zentrische Agenten in visionsgetriebener, deformierbarer linearer Objektmanipulation | 在视觉-驱动式可变线性物体操纵中用于物体中心剂适应的Real2Sim2Real的分布式处理法 2502.18615v2 |
Authors: Georgios Kamaras, Subramanian Ramamoorthy
We present an integrated (or end-to-end) framework for the Real2Sim2Real problem of manipulating deformable linear objects (DLOs) based on visual perception. Working with a parameterised set of DLOs, we use likelihood-free inference (LFI) to compute the posterior distributions for the physical parameters using which we can approximately simulate the behaviour of each specific DLO. We use these posteriors for domain randomisation while training, in simulation, object-specific visuomotor policies (i.e. assuming only visual and proprioceptive sensory) for a DLO reaching task, using model-free reinforcement learning. We demonstrate the utility of this approach by deploying sim-trained DLO manipulation policies in the real world in a zero-shot manner, i.e. without any further fine-tuning. In this context, we evaluate the capacity of a prominent LFI method to perform fine classification over the parametric set of DLOs, using only visual and proprioceptive data obtained in a dynamic manipulation trajectory. We then study the implications of the resulting domain distributions in sim-based policy learning and real-world performance.
nan
Article 969
Title@2025-05-27 (2): Hallucinations are inevitable but can be made statistically negligible. The “innate” inevitability of hallucinations cannot explain practical LLM issues
Title: Hallucinations are inevitable but can be made statistically negligible. The “innate” inevitability of hallucinations cannot explain practical LLM issues | Halluzinationen sind unvermeidlich, können aber statistisch vernachlässigbar gemacht werden. Die “angeborene” Unvermeidbarkeit von Halluzinationen kann praktische LLM-Probleme nicht erklären | 幻觉的“内在”不可避免性无法解释实际的LLM问题。 2502.12187v2 |
Authors: Atsushi Suzuki, Yulan He, Feng Tian, Zhongyuan Wang
Hallucinations, a phenomenon where a language model (LM) generates nonfactual content, pose a significant challenge to the practical deployment of LMs. While many empirical methods have been proposed to mitigate hallucinations, recent studies established a computability-theoretic result showing that any LM will inevitably generate hallucinations on an infinite set of inputs, regardless of the quality and quantity of training datasets and the choice of the language model architecture and training and inference algorithms. Although the computability-theoretic result may seem pessimistic, its significance in practical viewpoints has remained unclear. This paper claims that those “innate” inevitability results from computability theory and diagonal argument, in principle, cannot explain practical issues of LLMs. We demonstrate this claim by presenting a positive theoretical result from a probabilistic perspective. Specifically, we prove that hallucinations can be made statistically negligible, provided that the quality and quantity of the training data are sufficient. Interestingly, our positive result coexists with the computability-theoretic result, implying that while hallucinations on an infinite set of inputs cannot be entirely eliminated, their probability can always be reduced by improving algorithms and training data. By evaluating the two seemingly contradictory results through the lens of information theory, we argue that our probability-theoretic positive result better reflects practical considerations than the computability-theoretic negative result.
nan
Article 970
Title@2025-05-27 (2): A Predicting Phishing Websites Using Support Vector Machine and MultiClass Classification Based on Association Rule Techniques
Title: A Predicting Phishing Websites Using Support Vector Machine and MultiClass Classification Based on Association Rule Techniques | Eine Vorhersage Phishing-Websites mit Unterstützung Vektor-Maschine und Multi-Klasse Klassifizierung basierend auf Assoziation Regel Techniken | 基于协会规则技术的利用辅助病媒机和多类分类的预测钓鱼网站 2505.21141v1 |
Authors: Nancy C. Woods, Virtue Ene Agada, Adebola K. Ojo
Phishing is a semantic attack which targets the user rather than the computer. It is a new Internet crime in comparison with other forms such as virus and hacking. Considering the damage phishing websites has caused to various economies by collapsing organizations, stealing information and financial diversion, various researchers have embarked on different ways of detecting phishing websites but there has been no agreement about the best algorithm to be used for prediction. This study is interested in integrating the strengths of two algorithms, Support Vector Machines (SVM) and Multi-Class Classification Rules based on Association Rules (MCAR) to establish a strong and better means of predicting phishing websites. A total of 11,056 websites were used from both PhishTank and yahoo directory to verify the effectiveness of this approach. Feature extraction and rules generation were done by the MCAR technique; classification and prediction were done by SVM technique. The result showed that the technique achieved 98.30% classification accuracy with a computation time of 2205.33s with minimum error rate. It showed a total of 98% Area under the Curve (AUC) which showed the proportion of accuracy in classifying phishing websites. The model showed 82.84% variance in the prediction of phishing websites based on the coefficient of determination. The use of two techniques together in detecting phishing websites produced a more accurate result as it combined the strength of both techniques respectively. This research work centralized on this advantage by building a hybrid of two techniques to help produce a more accurate result.
nan
Article 971
Title@2025-05-27 (2): HeteroBA: A Structure-Manipulating Backdoor Attack on Heterogeneous Graphs
Title: HeteroBA: A Structure-Manipulating Backdoor Attack on Heterogeneous Graphs | HeteroBA: Ein strukturmanipulierender Backdoor-Angriff auf Heterogene Graphen | 异型BA:结构调节式后门对异种图的后门攻击 2505.21140v1 |
Authors: Honglin Gao, Xiang Li, Lan Zhao, Gaoxi Xiao
Heterogeneous graph neural networks (HGNNs) have recently drawn increasing attention for modeling complex multi-relational data in domains such as recommendation, finance, and social networks. While existing research has been largely focused on enhancing HGNNs’ predictive performance, their robustness and security, especially under backdoor attacks, remain underexplored. In this paper, we propose a novel Heterogeneous Backdoor Attack (HeteroBA) framework for node classification tasks on heterogeneous graphs. HeteroBA inserts carefully crafted trigger nodes with realistic features and targeted structural connections, leveraging attention-based and clustering-based strategies to select influential auxiliary nodes for effective trigger propagation, thereby causing the model to misclassify specific nodes into a target label while maintaining accuracy on clean data. Experimental results on three datasets and various HGNN architectures demonstrate that HeteroBA achieves high attack success rates with minimal impact on the clean accuracy. Our method sheds light on potential vulnerabilities in HGNNs and calls for more robust defenses against backdoor threats in multi-relational graph scenarios.
nan
Article 972
Title@2025-05-27 (2): Identifying Heart Attack Risk in Vulnerable Population: A Machine Learning Approach
Title: Identifying Heart Attack Risk in Vulnerable Population: A Machine Learning Approach | Identifikation von Herzinfarktrisiko in gefährdeter Bevölkerung: Ein Ansatz zum maschinellen Lernen | 查明弱势人口中的心脏攻击风险:机械学习方法 2505.21139v1 |
Authors: Subhagata Chattopadhyay, Amit K Chattopadhyay
The COVID-19 pandemic has significantly increased the incidence of post-infection cardiovascular events, particularly myocardial infarction, in individuals over 40. While the underlying mechanisms remain elusive, this study employs a hybrid machine learning approach to analyze epidemiological data in assessing 13 key heart attack risk factors and their susceptibility. Based on a unique dataset that combines demographic, biochemical, ECG, and thallium stress-tests, this study categorizes distinct subpopulations against varying risk profiles and then divides the population into ‘at-risk’ (AR) and ‘not-at-risk’ (NAR) groups using clustering algorithms. The study reveals strong association between the likelihood of experiencing a heart attack on the 13 risk factors studied. The aggravated risk for postmenopausal patients indicates compromised individual risk factors due to estrogen depletion that may be, further compromised by extraneous stress impacts, like anxiety and fear, aspects that have traditionally eluded data modeling predictions.
nan
Article 973
Title@2025-05-27 (2): Learning Single Index Models with Diffusion Priors
Title: Learning Single Index Models with Diffusion Priors | Einzelindexmodelle mit Diffusion Priors lernen | 具有传播前版本的学习单一指数模式 2505.21135v1 |
Authors: Anqi Tang, Youming Chen, Shuchen Xue, Zhaoqiang Liu
Diffusion models (DMs) have demonstrated remarkable ability to generate diverse and high-quality images by efficiently modeling complex data distributions. They have also been explored as powerful generative priors for signal recovery, resulting in a substantial improvement in the quality of reconstructed signals. However, existing research on signal recovery with diffusion models either focuses on specific reconstruction problems or is unable to handle nonlinear measurement models with discontinuous or unknown link functions. In this work, we focus on using DMs to achieve accurate recovery from semi-parametric single index models, which encompass a variety of popular nonlinear models that may have {\em discontinuous} and {\em unknown} link functions. We propose an efficient reconstruction method that only requires one round of unconditional sampling and (partial) inversion of DMs. Theoretical analysis on the effectiveness of the proposed methods has been established under appropriate conditions. We perform numerical experiments on image datasets for different nonlinear measurement models. We observe that compared to competing methods, our approach can yield more accurate reconstructions while utilizing significantly fewer neural function evaluations.
nan
Article 974
Title@2025-05-27 (2): Robust and Computation-Aware Gaussian Processes
Title: Robust and Computation-Aware Gaussian Processes | Robuste und rechnergestützte Gaußsche Prozesse | 强力和计算- 软件软件高斯进程 2505.21133v1 |
Authors: Marshal Arijona Sinaga, Julien Martinelli, Samuel Kaski
Gaussian processes (GPs) are widely used for regression and optimization tasks such as Bayesian optimization (BO) due to their expressiveness and principled uncertainty estimates. However, in settings with large datasets corrupted by outliers, standard GPs and their sparse approximations struggle with computational tractability and robustness. We introduce Robust Computation-aware Gaussian Process (RCaGP), a novel GP model that jointly addresses these challenges by combining a principled treatment of approximation-induced uncertainty with robust generalized Bayesian updating. The key insight is that robustness and approximation-awareness are not orthogonal but intertwined: approximations can exacerbate the impact of outliers, and mitigating one without the other is insufficient. Unlike previous work that focuses narrowly on either robustness or approximation quality, RCaGP combines both in a principled and scalable framework, thus effectively managing both outliers and computational uncertainties introduced by approximations such as low-rank matrix multiplications. Our model ensures more conservative and reliable uncertainty estimates, a property we rigorously demonstrate. Additionally, we establish a robustness property and show that the mean function is key to preserving it, motivating a tailored model selection scheme for robust mean functions. Empirical results confirm that solving these challenges jointly leads to superior performance across both clean and outlier-contaminated settings, both on regression and high-throughput Bayesian optimization benchmarks.
nan
Article 975
Title@2025-05-27 (2): Backpropagation-free Spiking Neural Networks with the Forward-Forward Algorithm
Title: Backpropagation-free Spiking Neural Networks with the Forward-Forward Algorithm | Rückpropagierungsfreie Spiking-Neural-Netzwerke mit dem vorwärts-vorwärts-Algorithmus | 带有前向前向演算法的无后向反向反向光谱反向神经网络 2502.20411v2 |
Authors: Mohammadnavid Ghader, Saeed Reza Kheradpisheh, Bahar Farahani, Mahmood Fazlali
Spiking Neural Networks (SNNs) offer a biologically inspired computational paradigm that emulates neuronal activity through discrete spike-based processing. Despite their advantages, training SNNs with traditional backpropagation (BP) remains challenging due to computational inefficiencies and a lack of biological plausibility. This study explores the Forward-Forward (FF) algorithm as an alternative learning framework for SNNs. Unlike backpropagation, which relies on forward and backward passes, the FF algorithm employs two forward passes, enabling layer-wise localized learning, enhanced computational efficiency, and improved compatibility with neuromorphic hardware. We introduce an FF-based SNN training framework and evaluate its performance across both non-spiking (MNIST, Fashion-MNIST, Kuzushiji-MNIST) and spiking (Neuro-MNIST, SHD) datasets. Experimental results demonstrate that our model surpasses existing FF-based SNNs on evaluated static datasets with a much lighter architecture while achieving accuracy comparable to state-of-the-art backpropagation-trained SNNs. On more complex spiking tasks such as SHD, our approach outperforms other SNN models and remains competitive with leading backpropagation-trained SNNs. These findings highlight the FF algorithm’s potential to advance SNN training methodologies by addressing some key limitations of backpropagation.
nan
Article 976
Title@2025-05-27 (2): MetaGS: A Meta-Learned Gaussian-Phong Model for Out-of-Distribution 3D Scene Relighting
Title: MetaGS: A Meta-Learned Gaussian-Phong Model for Out-of-Distribution 3D Scene Relighting | MetaGS: Ein meta-erlerntes Gaussian-Phong-Modell für 3D-Szenen-Erhellung im Out-of-Distribution-Bereich | MetaGS: 3D号场景光化模型 2405.20791v2 |
Authors: Yumeng He, Yunbo Wang, Xiaokang Yang
Out-of-distribution (OOD) 3D relighting requires novel view synthesis under unseen lighting conditions that differ significantly from the observed images. Existing relighting methods, which assume consistent light source distributions between training and testing, often degrade in OOD scenarios. We introduce MetaGS to tackle this challenge from two perspectives. First, we propose a meta-learning approach to train 3D Gaussian splatting, which explicitly promotes learning generalizable Gaussian geometries and appearance attributes across diverse lighting conditions, even with biased training data. Second, we embed fundamental physical priors from the Blinn-Phong reflection model into Gaussian splatting, which enhances the decoupling of shading components and leads to more accurate 3D scene reconstruction. Results on both synthetic and real-world datasets demonstrate the effectiveness of MetaGS in challenging OOD relighting tasks, supporting efficient point-light relighting and generalizing well to unseen environment lighting maps.
nan
Article 977
Title@2025-05-27 (2): Universal Value-Function Uncertainties
Title: Universal Value-Function Uncertainties | Universelle Wert-Funktions-Unsicherheiten | 通用价值-功能不确定性 2505.21119v1 |
Authors: Moritz A. Zanger, Max Weltevrede, Yaniv Oren, Pascal R. Van der Vaart, Caroline Horsch, Wendelin Böhmer, Matthijs T. J. Spaan
Estimating epistemic uncertainty in value functions is a crucial challenge for many aspects of reinforcement learning (RL), including efficient exploration, safe decision-making, and offline RL. While deep ensembles provide a robust method for quantifying value uncertainty, they come with significant computational overhead. Single-model methods, while computationally favorable, often rely on heuristics and typically require additional propagation mechanisms for myopic uncertainty estimates. In this work we introduce universal value-function uncertainties (UVU), which, similar in spirit to random network distillation (RND), quantify uncertainty as squared prediction errors between an online learner and a fixed, randomly initialized target network. Unlike RND, UVU errors reflect policy-conditional value uncertainty, incorporating the future uncertainties any given policy may encounter. This is due to the training procedure employed in UVU: the online network is trained using temporal difference learning with a synthetic reward derived from the fixed, randomly initialized target network. We provide an extensive theoretical analysis of our approach using neural tangent kernel (NTK) theory and show that in the limit of infinite network width, UVU errors are exactly equivalent to the variance of an ensemble of independent universal value functions. Empirically, we show that UVU achieves equal performance to large ensembles on challenging multi-task offline RL settings, while offering simplicity and substantial computational savings.
nan
Article 978
Title@2025-05-27 (2): A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction
Title: A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction | Ein leichtes Multi-Expert Generatives Sprachmodellsystem für Engineering Information and Knowledge Extraction | 工程信息和知识采掘轻量多专家生成语言示范系统 2505.21109v1 |
Authors: Bogdan Bogachov, Yaoyao Fiona Zhao
Despite recent advancements in domain adaptation techniques for large language models, these methods remain computationally intensive, and the resulting models can still exhibit hallucination issues. Most existing adaptation methods do not prioritize reducing the computational resources required for fine-tuning and inference of language models. Hallucination issues have gradually decreased with each new model release. However, they remain prevalent in engineering contexts, where generating well-structured text with minimal errors and inconsistencies is critical. This work introduces a novel approach called the Small Language Graph (SLG), which is a lightweight adaptation solution designed to address the two key challenges outlined above. The system is structured in the form of a graph, where each node represents a lightweight expert - a small language model fine-tuned on specific and concise texts. The results of this study have shown that SLG was able to surpass conventional fine-tuning methods on the Exact Match metric by 3 times. Additionally, the fine-tuning process was 1.7 times faster compared to that of a larger stand-alone language model. These findings introduce a potential for small to medium-sized engineering companies to confidently use generative AI technologies, such as LLMs, without the necessity to invest in expensive computational resources. Also, the graph architecture and the small size of expert nodes offer a possible opportunity for distributed AI systems, thus potentially diverting the global need for expensive centralized compute clusters.
nan
Article 979
Title@2025-05-27 (2): Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance
Title: Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance | Bedingte Diffusionsmodelle mit klassifikatorfreier Gibbs-ähnlicher Anleitung | 有条件传播模式,附有无分类者免费吉布布斯类指南 2505.21101v1 |
Authors: Badr Moufad, Yazid Janati, Alain Durmus, Ahmed Ghorbel, Eric Moulines, Jimmy Olsson
Classifier-Free Guidance (CFG) is a widely used technique for improving conditional diffusion models by linearly combining the outputs of conditional and unconditional denoisers. While CFG enhances visual quality and improves alignment with prompts, it often reduces sample diversity, leading to a challenging trade-off between quality and diversity. To address this issue, we make two key contributions. First, CFG generally does not correspond to a well-defined denoising diffusion model (DDM). In particular, contrary to common intuition, CFG does not yield samples from the target distribution associated with the limiting CFG score as the noise level approaches zero – where the data distribution is tilted by a power $w \gt 1$ of the conditional distribution. We identify the missing component: a R'enyi divergence term that acts as a repulsive force and is required to correct CFG and render it consistent with a proper DDM. Our analysis shows that this correction term vanishes in the low-noise limit. Second, motivated by this insight, we propose a Gibbs-like sampling procedure to draw samples from the desired tilted distribution. This method starts with an initial sample from the conditional diffusion model without CFG and iteratively refines it, preserving diversity while progressively enhancing sample quality. We evaluate our approach on both image and text-to-audio generation tasks, demonstrating substantial improvements over CFG across all considered metrics. The code is available at https://github.com/yazidjanati/cfgig
nan
Article 980
Title@2025-05-27 (2): Random Walk Diffusion for Efficient Large-Scale Graph Generation
Title: Random Walk Diffusion for Efficient Large-Scale Graph Generation | Random Walk Diffusion für effiziente großformatige Graphengeneration | 高效大型图表生成的随机漫步扩散 2408.04461v2 |
Authors: Tobias Bernecker, Ghalia Rehawi, Francesco Paolo Casale, Janine Knauer-Arloth, Annalisa Marsico
Graph generation addresses the problem of generating new graphs that have a data distribution similar to real-world graphs. While previous diffusion-based graph generation methods have shown promising results, they often struggle to scale to large graphs. In this work, we propose ARROW-Diff (AutoRegressive RandOm Walk Diffusion), a novel random walk-based diffusion approach for efficient large-scale graph generation. Our method encompasses two components in an iterative process of random walk sampling and graph pruning. We demonstrate that ARROW-Diff can scale to large graphs efficiently, surpassing other baseline methods in terms of both generation time and multiple graph statistics, reflecting the high quality of the generated graphs.
nan
Article 981
Title@2025-05-27 (2): Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI
Title: Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI | Sehen Sie, was ich sehe? Ein Ambiguous Optical Illusion Dataset, das Beschränkungen der erklärbaren KI aufdeckt | 你看到我所看到的吗?一个模糊的光学幻影数据集暴露了可解释的人工智能的局限性。 2505.21589v1 |
Authors: Carina Newen, Luca Hinkamp, Maria Ntonti, Emmanuel Müller
From uncertainty quantification to real-world object detection, we recognize the importance of machine learning algorithms, particularly in safety-critical domains such as autonomous driving or medical diagnostics. In machine learning, ambiguous data plays an important role in various machine learning domains. Optical illusions present a compelling area of study in this context, as they offer insight into the limitations of both human and machine perception. Despite this relevance, optical illusion datasets remain scarce. In this work, we introduce a novel dataset of optical illusions featuring intermingled animal pairs designed to evoke perceptual ambiguity. We identify generalizable visual concepts, particularly gaze direction and eye cues, as subtle yet impactful features that significantly influence model accuracy. By confronting models with perceptual ambiguity, our findings underscore the importance of concepts in visual learning and provide a foundation for studying bias and alignment between human and machine vision. To make this dataset useful for general purposes, we generate optical illusions systematically with different concepts discussed in our bias mitigation section. The dataset is accessible in Kaggle via https://kaggle.com/datasets/693bf7c6dd2cb45c8a863f9177350c8f9849a9508e9d50526e2ffcc5559a8333. Our source code can be found at https://github.com/KDD-OpenSource/Ambivision.git.
nan
Article 982
Title@2025-05-27 (2): Sequential Function-Space Variational Inference via Gaussian Mixture Approximation
Title: Sequential Function-Space Variational Inference via Gaussian Mixture Approximation | Sequentielle Funktions-Raum Variationelle Schlussfolgerung über Gaußsche Mischungsannäherung | 通过高森混ixture近似加速发生序列函数-空间空间变动推断 2503.07114v2 |
Authors: Menghao Waiyan William Zhu, Pengcheng Hao, Ercan Engin Kuruoğlu
Continual learning in neural networks aims to learn new tasks without forgetting old tasks. Sequential function-space variational inference (SFSVI) uses a Gaussian variational distribution to approximate the distribution of the outputs of the neural network corresponding to a finite number of selected inducing points. Since the posterior distribution of a neural network is multi-modal, a Gaussian distribution could only match one mode of the posterior distribution, and a Gaussian mixture distribution could be used to better approximate the posterior distribution. We propose an SFSVI method based on a Gaussian mixture variational distribution. We also compare different types of variational inference methods with a fixed pre-trained feature extractor (where continual learning is performed on the final layer) and without a fixed pre-trained feature extractor (where continual learning is performed on all layers). We find that in terms of final average accuracy, likelihood-focused Gaussian mixture SFSVI outperforms other sequential variational inference methods, especially in the latter case.
nan
Article 983
Title@2025-05-27 (2): Thinker: Learning to Think Fast and Slow
Title: Thinker: Learning to Think Fast and Slow | Denker: Schnell und langsam denken lernen | 思考者:学会快速和缓慢思考 2505.21097v1 |
Authors: Stephen Chung, Wenyu Du, Jie Fu
Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs may learn to perform search, as indicated by the self-correction behavior observed in DeepSeek R1. However, this search behavior is often imprecise and lacks confidence, resulting in long, redundant responses and highlighting deficiencies in intuition and verification. Inspired by the Dual Process Theory in psychology, we introduce a simple modification to the QA task that includes four stages: Fast Thinking, where the LLM must answer within a strict token budget; Verification, where the model evaluates its initial response; Slow Thinking, where it refines the initial response with more deliberation; and Summarization, where it distills the refinement from the previous stage into precise steps. Our proposed task improves average accuracy from 24.9% to 27.9% for Qwen2.5-1.5B, and from 45.9% to 49.8% for DeepSeek-R1-Qwen-1.5B. Notably, for Qwen2.5-1.5B, the Fast Thinking mode alone achieves 26.8% accuracy using fewer than 1000 tokens, demonstrating substantial inference efficiency gains. These findings suggest that intuition and deliberative reasoning are distinct, complementary systems benefiting from targeted training.
nan
Article 984
Title@2025-05-27 (2): Improved Impossible Tuning and Lipschitz-Adaptive Universal Online Learning with Gradient Variations
Title: Improved Impossible Tuning and Lipschitz-Adaptive Universal Online Learning with Gradient Variations | Verbessertes Unmögliches Tuning und Lipschitz-Adaptives Universal Online-Lernen mit gradienten Variationen | 改进不可能的图金和利普施维茨-适应性通用在线学习,有渐进变异 2505.21095v1 |
Authors: Kei Takemura, Ryuta Matsuno, Keita Sakuma
A central goal in online learning is to achieve adaptivity to unknown problem characteristics, such as environmental changes captured by gradient variation (GV), function curvature (universal online learning, UOL), and gradient scales (Lipschitz adaptivity, LA). Simultaneously achieving these with optimal performance is a major challenge, partly due to limitations in algorithms for prediction with expert advice. These algorithms often serve as meta-algorithms in online ensemble frameworks, and their sub-optimality hinders overall UOL performance. Specifically, existing algorithms addressing the ``impossible tuning’’ issue incur an excess $\sqrt{\log T}$ factor in their regret bound compared to the lower bound. To solve this problem, we propose a novel optimistic online mirror descent algorithm with an auxiliary initial round using large learning rates. This design enables a refined analysis where a generated negative term cancels the gap-related factor, resolving the impossible tuning issue up to $\log\log T$ factors. Leveraging our improved algorithm as a meta-algorithm, we develop the first UOL algorithm that simultaneously achieves state-of-the-art GV bounds and LA under standard assumptions. Our UOL result overcomes key limitations of prior works, notably resolving the conflict between LA mechanisms and regret analysis for GV bounds – an open problem highlighted by Xie et al.
nan
Article 985
Title@2025-05-27 (2): Recurrent Memory for Online Interdomain Gaussian Processes
Title: Recurrent Memory for Online Interdomain Gaussian Processes | Recurrent Speicher für Online-Interdomain Gaussian Prozesse | Gaussian 在线内部进程经常性内存 2502.08736v3 |
Authors: Wenlong Chen, Naoki Kiyohara, Harrison Bo Hua Zhu, Jacob Curran-Sebastian, Samir Bhatt, Yingzhen Li
We propose a novel online Gaussian process (GP) model that is capable of capturing long-term memory in sequential data in an online learning setting. Our model, Online HiPPO Sparse Variational Gaussian Process (OHSVGP), leverages the HiPPO (High-order Polynomial Projection Operators) framework, which is popularized in the RNN domain due to its long-range memory modeling capabilities. We interpret the HiPPO time-varying orthogonal projections as inducing variables with time-dependent orthogonal polynomial basis functions, which allows the SVGP inducing variables to memorize the process history. We show that the HiPPO framework fits naturally into the interdomain GP framework and demonstrate that the kernel matrices can also be updated online in a recurrence form based on the ODE evolution of HiPPO. We evaluate OHSVGP with online prediction for 1D time series, continual learning in discriminative GP model for data with multidimensional inputs, and deep generative modeling with sparse Gaussian process variational autoencoder, showing that it outperforms existing online GP methods in terms of predictive performance, long-term memory preservation, and computational efficiency.
nan
Article 986
Title@2025-05-27 (2): Out of the Shadows: Exploring a Latent Space for Neural Network Verification
Title: Out of the Shadows: Exploring a Latent Space for Neural Network Verification | Out of the Shadows: Erforschen eines latenten Raumes für neurale Netzwerkverifizierung | 暗影外:探索神经网络的原始空间核查 2505.17854v2 |
Authors: Lukas Koller, Tobias Ladner, Matthias Althoff
Neural networks are ubiquitous. However, they are often sensitive to small input changes. Hence, to prevent unexpected behavior in safety-critical applications, their formal verification – a notoriously hard problem – is necessary. Many state-of-the-art verification algorithms use reachability analysis or abstract interpretation to enclose the set of possible outputs of a neural network. Often, the verification is inconclusive due to the conservatism of the enclosure. To address this problem, we design a novel latent space for formal verification that enables the transfer of output specifications to the input space for an iterative specification-driven input refinement, i.e., we iteratively reduce the set of possible inputs to only enclose the unsafe ones. The latent space is constructed from a novel view of projection-based set representations, e.g., zonotopes, which are commonly used in reachability analysis of neural networks. A projection-based set representation is a “shadow” of a higher-dimensional set – a latent space – that does not change during a set propagation through a neural network. Hence, the input set and the output enclosure are “shadows” of the same latent space that we can use to transfer constraints. We present an efficient verification tool for neural networks that uses our iterative refinement to significantly reduce the number of subproblems in a branch-and-bound procedure. Using zonotopes as a set representation, unlike many other state-of-the-art approaches, our approach can be realized by only using matrix operations, which enables a significant speed-up through efficient GPU acceleration. We demonstrate that our tool achieves competitive performance, which would place it among the top-ranking tools of the last neural network verification competition (VNN-COMP’24).
nan
Article 987
Title@2025-05-27 (2): Efficient Large Language Model Inference with Neural Block Linearization
Title: Efficient Large Language Model Inference with Neural Block Linearization | Effiziente großsprachige Modellinferenz mit neuraler Blocklinearisierung | 高效大语言模型与神经区块线性线性结合的推断 2505.21077v1 |
Authors: Mete Erdogan, Francesco Tonin, Volkan Cevher
The high inference demands of transformer-based Large Language Models (LLMs) pose substantial challenges in their deployment. To this end, we introduce Neural Block Linearization (NBL), a novel framework for accelerating transformer model inference by replacing self-attention layers with linear approximations derived from Linear Minimum Mean Squared Error estimators. NBL leverages Canonical Correlation Analysis to compute a theoretical upper bound on the approximation error. Then, we use this bound as a criterion for substitution, selecting the LLM layers with the lowest linearization error. NBL can be efficiently applied to pre-trained LLMs without the need for fine-tuning. In experiments, NBL achieves notable computational speed-ups while preserving competitive accuracy on multiple reasoning benchmarks. For instance, applying NBL to 12 self-attention layers in DeepSeek-R1-Distill-Llama-8B increases the inference speed by 32% with less than 1% accuracy trade-off, making it a flexible and promising solution to improve the inference efficiency of LLMs.
nan
Article 988
Title@2025-05-27 (2): Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling
Title: Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling | Red-Teaming Text-to-Image-Systeme durch regelbasiertes Preference-Modelling | 通过基于规则的首选模式建立红色团队式文本到图像系统 2505.21074v1 |
Authors: Yichuan Cao, Yibo Miao, Xiao-Shan Gao, Yinpeng Dong
Text-to-image (T2I) models raise ethical and safety concerns due to their potential to generate inappropriate or harmful images. Evaluating these models’ security through red-teaming is vital, yet white-box approaches are limited by their need for internal access, complicating their use with closed-source models. Moreover, existing black-box methods often assume knowledge about the model’s specific defense mechanisms, limiting their utility in real-world commercial API scenarios. A significant challenge is how to evade unknown and diverse defense mechanisms. To overcome this difficulty, we propose a novel Rule-based Preference modeling Guided Red-Teaming (RPG-RT), which iteratively employs LLM to modify prompts to query and leverages feedback from T2I systems for fine-tuning the LLM. RPG-RT treats the feedback from each iteration as a prior, enabling the LLM to dynamically adapt to unknown defense mechanisms. Given that the feedback is often labeled and coarse-grained, making it difficult to utilize directly, we further propose rule-based preference modeling, which employs a set of rules to evaluate desired or undesired feedback, facilitating finer-grained control over the LLM’s dynamic adaptation process. Extensive experiments on nineteen T2I systems with varied safety mechanisms, three online commercial API services, and T2V models verify the superiority and practicality of our approach.
nan
Article 989
Title@2025-05-27 (2): A domain adaptation neural network for digital twin-supported fault diagnosis
Title: A domain adaptation neural network for digital twin-supported fault diagnosis | Ein neuronales Netzwerk für die Domänenanpassung für die digitale Doppel-unterstützte Fehlerdiagnose | 数字双支持缺陷诊断领域适应性神经神经网络 2505.21046v1 |
Authors: Zhenling Chen, Haiwei Fu, Zhiguo Zeng
Digital twins offer a promising solution to the lack of sufficient labeled data in deep learning-based fault diagnosis by generating simulated data for model training. However, discrepancies between simulation and real-world systems can lead to a significant drop in performance when models are applied in real scenarios. To address this issue, we propose a fault diagnosis framework based on Domain-Adversarial Neural Networks (DANN), which enables knowledge transfer from simulated (source domain) to real-world (target domain) data. We evaluate the proposed framework using a publicly available robotics fault diagnosis dataset, which includes 3,600 sequences generated by a digital twin model and 90 real sequences collected from physical systems. The DANN method is compared with commonly used lightweight deep learning models such as CNN, TCN, Transformer, and LSTM. Experimental results show that incorporating domain adaptation significantly improves the diagnostic performance. For example, applying DANN to a baseline CNN model improves its accuracy from 70.00% to 80.22% on real-world test data, demonstrating the effectiveness of domain adaptation in bridging the sim-to-real gap.
nan
Article 990
Title@2025-05-27 (2): Scalable and adaptive prediction bands with kernel sum-of-squares
Title: Scalable and adaptive prediction bands with kernel sum-of-squares | Skalierbare und adaptive Vorhersagebänder mit Kernel-Summe von Quadraten | 可缩放和适应性预测带带内核和平方总和的可缩放和适应性预测波段 2505.21039v1 |
Authors: Louis Allain, Sébastien da Veiga, Brian Staber
Conformal Prediction (CP) is a popular framework for constructing prediction bands with valid coverage in finite samples, while being free of any distributional assumption. A well-known limitation of conformal prediction is the lack of adaptivity, although several works introduced practically efficient alternate procedures. In this work, we build upon recent ideas that rely on recasting the CP problem as a statistical learning problem, directly targeting coverage and adaptivity. This statistical learning problem is based on reproducible kernel Hilbert spaces (RKHS) and kernel sum-of-squares (SoS) methods. First, we extend previous results with a general representer theorem and exhibit the dual formulation of the learning problem. Crucially, such dual formulation can be solved efficiently by accelerated gradient methods with several hundreds or thousands of samples, unlike previous strategies based on off-the-shelf semidefinite programming algorithms. Second, we introduce a new hyperparameter tuning strategy tailored specifically to target adaptivity through bounds on test-conditional coverage. This strategy, based on the Hilbert-Schmidt Independence Criterion (HSIC), is introduced here to tune kernel lengthscales in our framework, but has broader applicability since it could be used in any CP algorithm where the score function is learned. Finally, extensive experiments are conducted to show how our method compares to related work. All figures can be reproduced with the accompanying code.
nan
Article 991
Title@2025-05-27 (2): Unraveling Indirect In-Context Learning Using Influence Functions
Title: Unraveling Indirect In-Context Learning Using Influence Functions | Indirektes In-Context-Lernen mit Einflussfunktionen entschlüsseln | 利用影响功能进行分散的间接间接内文学习 2501.01473v2 |
Authors: Hadi Askari, Shivanshu Gupta, Terry Tong, Fei Wang, Anshuman Chhabra, Muhao Chen
In this work, we introduce a novel paradigm for generalized In-Context Learning (ICL), termed Indirect In-Context Learning. In Indirect ICL, we explore demonstration selection strategies tailored for two distinct real-world scenarios: Mixture of Tasks and Noisy ICL. We systematically evaluate the effectiveness of Influence Functions (IFs) as a selection tool for these settings, highlighting the potential of IFs to better capture the informativeness of examples within the demonstration pool. For the Mixture of Tasks setting, demonstrations are drawn from 28 diverse tasks, including MMLU, BigBench, StrategyQA, and CommonsenseQA. We demonstrate that combining BertScore-Recall (BSR) with an IF surrogate model can further improve performance, leading to average absolute accuracy gains of 0.37\% and 1.45\% for 3-shot and 5-shot setups when compared to traditional ICL metrics. In the Noisy ICL setting, we examine scenarios where demonstrations might be mislabeled or have adversarial noise. Our experiments show that reweighting traditional ICL selectors (BSR and Cosine Similarity) with IF-based selectors boosts accuracy by an average of 2.90\% for Cosine Similarity and 2.94\% for BSR on noisy GLUE benchmarks. For the adversarial sub-setting, we show the utility of using IFs for task-agnostic demonstration selection for backdoor attack mitigation. Showing a 32.89\% reduction in Attack Success Rate compared to task-aware methods. In sum, we propose a robust framework for demonstration selection that generalizes beyond traditional ICL, offering valuable insights into the role of IFs for Indirect ICL.
nan
Article 992
Title@2025-05-27 (2): CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive Learning
Title: CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive Learning | CellCLAT: Topologie und Trimming Redundanz im selbstüberwachten zellulären Kontrastiven Lernen erhalten | CellCLAT: 在自我维持的细胞抵触学习中保留地形学和三角再利用 2505.21587v1 |
Authors: Bin Qin, Qirui Ji, Jiangmeng Li, Yupeng Wang, Xuesong Wu, Jianwen Cao, Fanjiang Xu
Self-supervised topological deep learning (TDL) represents a nascent but underexplored area with significant potential for modeling higher-order interactions in simplicial complexes and cellular complexes to derive representations of unlabeled graphs. Compared to simplicial complexes, cellular complexes exhibit greater expressive power. However, the advancement in self-supervised learning for cellular TDL is largely hindered by two core challenges: \textit{extrinsic structural constraints} inherent to cellular complexes, and intrinsic semantic redundancy in cellular representations. The first challenge highlights that traditional graph augmentation techniques may compromise the integrity of higher-order cellular interactions, while the second underscores that topological redundancy in cellular complexes potentially diminish task-relevant information. To address these issues, we introduce Cellular Complex Contrastive Learning with Adaptive Trimming (CellCLAT), a twofold framework designed to adhere to the combinatorial constraints of cellular complexes while mitigating informational redundancy. Specifically, we propose a parameter perturbation-based augmentation method that injects controlled noise into cellular interactions without altering the underlying cellular structures, thereby preserving cellular topology during contrastive learning. Additionally, a cellular trimming scheduler is employed to mask gradient contributions from task-irrelevant cells through a bi-level meta-learning approach, effectively removing redundant topological elements while maintaining critical higher-order semantics. We provide theoretical justification and empirical validation to demonstrate that CellCLAT achieves substantial improvements over existing self-supervised graph learning methods, marking a significant attempt in this domain.
nan
Article 993
Title@2025-05-27 (2): Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding
Title: Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding | Direktes Semi-Simplizielles Lernen mit Anwendungen zur Entschlüsselung der Gehirnaktivität | 定向半简化学习,应用脑活动解码 2505.17939v2 |
Authors: Manuel Lecha, Andrea Cavallo, Francesca Dominici, Ran Levi, Alessio Del Bue, Elvin Isufi, Pietro Morerio, Claudio Battiloro
Graph Neural Networks (GNNs) excel at learning from pairwise interactions but often overlook multi-way and hierarchical relationships. Topological Deep Learning (TDL) addresses this limitation by leveraging combinatorial topological spaces. However, existing TDL models are restricted to undirected settings and fail to capture the higher-order directed patterns prevalent in many complex systems, e.g., brain networks, where such interactions are both abundant and functionally significant. To fill this gap, we introduce Semi-Simplicial Neural Networks (SSNs), a principled class of TDL models that operate on semi-simplicial sets – combinatorial structures that encode directed higher-order motifs and their directional relationships. To enhance scalability, we propose Routing-SSNs, which dynamically select the most informative relations in a learnable manner. We prove that SSNs are strictly more expressive than standard graph and TDL models. We then introduce a new principled framework for brain dynamics representation learning, grounded in the ability of SSNs to provably recover topological descriptors shown to successfully characterize brain activity. Empirically, SSNs achieve state-of-the-art performance on brain dynamics classification tasks, outperforming the second-best model by up to 27%, and message passing GNNs by up to 50% in accuracy. Our results highlight the potential of principled topological models for learning from structured brain data, establishing a unique real-world case study for TDL. We also test SSNs on standard node classification and edge regression tasks, showing competitive performance. We will make the code and data publicly available.
nan
Article 994
Title@2025-05-27 (2): LLaMEA-BO: A Large Language Model Evolutionary Algorithm for Automatically Generating Bayesian Optimization Algorithms
Title: LLaMEA-BO: A Large Language Model Evolutionary Algorithm for Automatically Generating Bayesian Optimization Algorithms | LLaMEA-BO: Ein evolutionärer Algorithmus für die automatische Generierung Bayesischer Optimierungsalgorithmen | LLAMEA-BO:用于自动生成贝耶斯优化优化生成的大型语言模型进化演化算法 2505.21034v1 |
Authors: Wenhu Li, Niki van Stein, Thomas Bäck, Elena Raponi
Bayesian optimization (BO) is a powerful class of algorithms for optimizing expensive black-box functions, but designing effective BO algorithms remains a manual, expertise-driven task. Recent advancements in Large Language Models (LLMs) have opened new avenues for automating scientific discovery, including the automatic design of optimization algorithms. While prior work has used LLMs within optimization loops or to generate non-BO algorithms, we tackle a new challenge: Using LLMs to automatically generate full BO algorithm code. Our framework uses an evolution strategy to guide an LLM in generating Python code that preserves the key components of BO algorithms: An initial design, a surrogate model, and an acquisition function. The LLM is prompted to produce multiple candidate algorithms, which are evaluated on the established Black-Box Optimization Benchmarking (BBOB) test suite from the COmparing Continuous Optimizers (COCO) platform. Based on their performance, top candidates are selected, combined, and mutated via controlled prompt variations, enabling iterative refinement. Despite no additional fine-tuning, the LLM-generated algorithms outperform state-of-the-art BO baselines in 19 (out of 24) BBOB functions in dimension 5 and generalize well to higher dimensions, and different tasks (from the Bayesmark framework). This work demonstrates that LLMs can serve as algorithmic co-designers, offering a new paradigm for automating BO development and accelerating the discovery of novel algorithmic combinations. The source code is provided at https://github.com/Ewendawi/LLaMEA-BO.
nan
Article 995
Title@2025-05-27 (2): Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models
Title: Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models | Optimierung des Case-Based-Reasoning-Systems für die Generierung funktionaler Testskripte mit großen Sprachmodellen | 为具有大语言模型的功能测试脚本生成优化基于个案的理由说明系统 2503.20576v3 |
Authors: Siyuan Guo, Huiwu Liu, Xiaolong Chen, Yuming Xie, Liang Zhang, Tao Han, Hechang Chen, Yi Chang, Jun Wang
In this work, we explore the potential of large language models (LLMs) for generating functional test scripts, which necessitates understanding the dynamically evolving code structure of the target software. To achieve this, we propose a case-based reasoning (CBR) system utilizing a 4R cycle (i.e., retrieve, reuse, revise, and retain), which maintains and leverages a case bank of test intent descriptions and corresponding test scripts to facilitate LLMs for test script generation. To improve user experience further, we introduce Re4, an optimization method for the CBR system, comprising reranking-based retrieval finetuning and reinforced reuse finetuning. Specifically, we first identify positive examples with high semantic and script similarity, providing reliable pseudo-labels for finetuning the retriever model without costly labeling. Then, we apply supervised finetuning, followed by a reinforcement learning finetuning stage, to align LLMs with our production scenarios, ensuring the faithful reuse of retrieved cases. Extensive experimental results on two product development units from Huawei Datacom demonstrate the superiority of the proposed CBR+Re4. Notably, we also show that the proposed Re4 method can help alleviate the repetitive generation issues with LLMs.
nan
Article 996
Title@2025-05-27 (2): Generalizable and Robust Spectral Method for Multi-view Representation Learning
Title: Generalizable and Robust Spectral Method for Multi-view Representation Learning | Verallgemeinerbare und robuste Spektralmethode für Multi-View Representative Learning | 多视角代表制学习通用和强力光谱方法 2411.02138v3 |
Authors: Amitai Yacobi, Ofir Lindenbaum, Uri Shaham
Multi-view representation learning (MvRL) has garnered substantial attention in recent years, driven by the increasing demand for applications that can effectively process and analyze data from multiple sources. In this context, graph Laplacian-based MvRL methods have demonstrated remarkable success in representing multi-view data. However, these methods often struggle with generalization to new data and face challenges with scalability. Moreover, in many practical scenarios, multi-view data is contaminated by noise or outliers. In such cases, modern deep-learning-based MvRL approaches that rely on alignment or contrastive objectives present degraded performance in downstream tasks, as they may impose incorrect consistency between clear and corrupted data sources. We introduce $\textit{SpecRaGE}$, a novel fusion-based framework that integrates the strengths of graph Laplacian methods with the power of deep learning to overcome these challenges. SpecRage uses neural networks to learn parametric mapping that approximates a joint diagonalization of graph Laplacians. This solution bypasses the need for alignment while enabling generalizable and scalable learning of informative and meaningful representations. Moreover, it incorporates a meta-learning fusion module that dynamically adapts to data quality, ensuring robustness against outliers and noisy views. Our extensive experiments demonstrate that SpecRaGE outperforms state-of-the-art methods, particularly in scenarios with data contamination, paving the way for more reliable and efficient multi-view learning.
nan
Article 997
Title@2025-05-27 (2): FeatInv: Spatially resolved mapping from feature space to input space using conditional diffusion models
Title: FeatInv: Spatially resolved mapping from feature space to input space using conditional diffusion models | FeatInv: Räumlich aufgelöstes Mapping vom Feature Space zum Input Space mit bedingten Diffusionsmodellen | FeatInv:使用有条件扩散模型从地物空间到输入空间的空间空间的空间分辨率绘图 2505.21032v1 |
Authors: Nils Neukirch, Johanna Vielhaben, Nils Strodthoff
Internal representations are crucial for understanding deep neural networks, such as their properties and reasoning patterns, but remain difficult to interpret. While mapping from feature space to input space aids in interpreting the former, existing approaches often rely on crude approximations. We propose using a conditional diffusion model - a pretrained high-fidelity diffusion model conditioned on spatially resolved feature maps - to learn such a mapping in a probabilistic manner. We demonstrate the feasibility of this approach across various pretrained image classifiers from CNNs to ViTs, showing excellent reconstruction capabilities. Through qualitative comparisons and robustness analysis, we validate our method and showcase possible applications, such as the visualization of concept steering in input space or investigations of the composite nature of the feature space. This approach has broad potential for improving feature space understanding in computer vision models.
nan
Article 998
Title@2025-05-27 (2): TabAttackBench: A Benchmark for Adversarial Attacks on Tabular Data
Title: TabAttackBench: A Benchmark for Adversarial Attacks on Tabular Data | TabAttackBench: Ein Benchmark für feindliche Angriffe auf Tabellendaten | TabAttack Bench: 表格数据对抗性攻击基准 2505.21027v1 |
Authors: Zhipeng He, Chun Ouyang, Lijie Wen, Cong Liu, Catarina Moreira
Adversarial attacks pose a significant threat to machine learning models by inducing incorrect predictions through imperceptible perturbations to input data. While these attacks have been extensively studied in unstructured data like images, their application to tabular data presents new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ significantly from those in image data. To address these differences, it is crucial to consider imperceptibility as a key criterion specific to tabular data. Most current research focuses primarily on achieving effective adversarial attacks, often overlooking the importance of maintaining imperceptibility. To address this gap, we propose a new benchmark for adversarial attacks on tabular data that evaluates both effectiveness and imperceptibility. In this study, we assess the effectiveness and imperceptibility of five adversarial attacks across four models using eleven tabular datasets, including both mixed and numerical-only datasets. Our analysis explores how these factors interact and influence the overall performance of the attacks. We also compare the results across different dataset types to understand the broader implications of these findings. The findings from this benchmark provide valuable insights for improving the design of adversarial attack algorithms, thereby advancing the field of adversarial machine learning on tabular data.
nan
Article 999
Title@2025-05-27 (2): PaSa: An LLM Agent for Comprehensive Academic Paper Search
Title: PaSa: An LLM Agent for Comprehensive Academic Paper Search | PaSa: Ein LLM-Agent für umfassende wissenschaftliche Papiersuche | Pasa: 法学硕士全面学术论文搜索代理 2501.10120v2 |
Authors: Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, Weinan E
We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholar queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4o for paraphrased queries, ChatGPT (search-enabled GPT-4o), GPT-o1, and PaSa-GPT-4o (PaSa implemented by prompting GPT-4o). Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37.78% in recall@20 and 39.90% in recall@50, and exceeds PaSa-GPT-4o by 30.36% in recall and 4.25% in precision. Model, datasets, and code are available at https://github.com/bytedance/pasa.
nan
Article 1000
Title@2025-05-27 (2): Multi-Mode Process Control Using Multi-Task Inverse Reinforcement Learning
Title: Multi-Mode Process Control Using Multi-Task Inverse Reinforcement Learning | Multi-Mode-Prozesssteuerung mit Multi-Task Inverse Verstärkungslernen | 利用多任务反向强化学习进行多模式程序控制 2505.21026v1 |
Authors: Runze Lin, Junghui Chen, Biao Huang, Lei Xie, Hongye Su
In the era of Industry 4.0 and smart manufacturing, process systems engineering must adapt to digital transformation. While reinforcement learning offers a model-free approach to process control, its applications are limited by the dependence on accurate digital twins and well-designed reward functions. To address these limitations, this paper introduces a novel framework that integrates inverse reinforcement learning (IRL) with multi-task learning for data-driven, multi-mode control design. Using historical closed-loop data as expert demonstrations, IRL extracts optimal reward functions and control policies. A latent-context variable is incorporated to distinguish modes, enabling the training of mode-specific controllers. Case studies on a continuous stirred tank reactor and a fed-batch bioreactor validate the effectiveness of this framework in handling multi-mode data and training adaptable controllers.
nan
Article 1001
Title@2025-05-27 (2): Text-Queried Audio Source Separation via Hierarchical Modeling
Title: Text-Queried Audio Source Separation via Hierarchical Modeling | Textbefragte Audioquelle Trennung über Hierarchische Modellierung | 通过等级制建模模式对文本查询的音频源分离 2505.21025v1 |
Authors: Xinlei Yin, Xiulian Peng, Xue Jiang, Zhiwei Xiong, Yan Lu
Target audio source separation with natural language queries presents a promising paradigm for extracting arbitrary audio events through arbitrary text descriptions. Existing methods mainly face two challenges, the difficulty in jointly modeling acoustic-textual alignment and semantic-aware separation within a blindly-learned single-stage architecture, and the reliance on large-scale accurately-labeled training data to compensate for inefficient cross-modal learning and separation. To address these challenges, we propose a hierarchical decomposition framework, HSM-TSS, that decouples the task into global-local semantic-guided feature separation and structure-preserving acoustic reconstruction. Our approach introduces a dual-stage mechanism for semantic separation, operating on distinct global and local semantic feature spaces. We first perform global-semantic separation through a global semantic feature space aligned with text queries. A Q-Audio architecture is employed to align audio and text modalities, serving as pretrained global-semantic encoders. Conditioned on the predicted global feature, we then perform the second-stage local-semantic separation on AudioMAE features that preserve time-frequency structures, followed by acoustic reconstruction. We also propose an instruction processing pipeline to parse arbitrary text queries into structured operations, extraction or removal, coupled with audio descriptions, enabling flexible sound manipulation. Our method achieves state-of-the-art separation performance with data-efficient training while maintaining superior semantic consistency with queries in complex auditory scenes.
nan
Article 1002
Title@2025-05-27 (2): Pause Tokens Strictly Increase the Expressivity of Constant-Depth Transformers
Title: Pause Tokens Strictly Increase the Expressivity of Constant-Depth Transformers | Pause Tokens erhöhen streng die Expressivität der konstant-tiefen Transformer | 严格提高常数面变换器的表达性 2505.21024v1 |
Authors: Charles London, Varun Kanade
Pause tokens, simple filler symbols such as “…”, consistently improve Transformer performance on both language and mathematical tasks, yet their theoretical effect remains unexplained. We provide the first formal separation result, proving that adding pause tokens to constant-depth, logarithmic-width Transformers strictly increases their computational expressivity. With bounded-precision activations, Transformers without pause tokens compute only a strict subset of $\mathsf{AC}^0$ functions, while adding a polynomial number of pause tokens allows them to express the entire class. For logarithmic-precision Transformers, we show that adding pause tokens achieves expressivity equivalent to $\mathsf{TC}^0$, matching known upper bounds. Empirically, we demonstrate that two-layer causally masked Transformers can learn parity when supplied with pause tokens, a function that they appear unable to learn without them. Our results provide a rigorous theoretical explanation for prior empirical findings, clarify how pause tokens interact with width, depth, and numeric precision, and position them as a distinct mechanism, complementary to chain-of-thought prompting, for enhancing Transformer reasoning.
nan
Article 1003
Title@2025-05-27 (2): NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation
Title: NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation | NeuralOM: Neurales Ozeanmodell für die Simulation von Subsaisonal-zu-Seasonal | 神经力OM:次季节到季节模拟神经海洋模型 2505.21020v1 |
Authors: Yuan Gao, Ruiqi Shu, Hao Wu, Fan Xu, Yanfei Xiang, Ruijian Gou, Qingsong Wen, Xian Wu, Xiaomeng Huang
Accurate Subseasonal-to-Seasonal (S2S) ocean simulation is critically important for marine research, yet remains challenging due to its substantial thermal inertia and extended time delay. Machine learning (ML)-based models have demonstrated significant advancements in simulation accuracy and computational efficiency compared to traditional numerical methods. Nevertheless, a significant limitation of current ML models for S2S ocean simulation is their inadequate incorporation of physical consistency and the slow-changing properties of the ocean system. In this work, we propose a neural ocean model (NeuralOM) for S2S ocean simulation with a multi-scale interactive graph neural network to emulate diverse physical phenomena associated with ocean systems effectively. Specifically, we propose a multi-stage framework tailored to model the ocean’s slowly changing nature. Additionally, we introduce a multi-scale interactive messaging module to capture complex dynamical behaviors, such as gradient changes and multiplicative coupling relationships inherent in ocean dynamics. Extensive experimental evaluations confirm that our proposed NeuralOM outperforms state-of-the-art models in S2S and extreme event simulation. The codes are available at https://github.com/YuanGao-YG/NeuralOM.
nan
Article 1004
Title@2025-05-27 (2): Cardiac Digital Twins at Scale from MRI: Open Tools and Representative Models from ~55000 UK Biobank Participants
Title: Cardiac Digital Twins at Scale from MRI: Open Tools and Representative Models from ~55000 UK Biobank Participants | Cardiac Digital Twins auf Scale von MRI: Offene Werkzeuge und repräsentative Modelle von ~55000 britischen Biobank-Teilnehmern | 来自MRI的大规模心脏病数字双对:来自~55000英国生物库参与者的开放工具和代表模型 2505.21019v1 |
Authors: Devran Ugurlu, Shuang Qian, Elliot Fairweather, Charlene Mauger, Bram Ruijsink, Laura Dal Toso, Yu Deng, Marina Strocchi, Reza Razavi, Alistair Young, Pablo Lamata, Steven Niederer, Martin Bishop
A cardiac digital twin is a virtual replica of a patient’s heart for screening, diagnosis, prognosis, risk assessment, and treatment planning of cardiovascular diseases. This requires an anatomically accurate patient-specific 3D structural representation of the heart, suitable for electro-mechanical simulations or study of disease mechanisms. However, generation of cardiac digital twins at scale is demanding and there are no public repositories of models across demographic groups. We describe an automatic open-source pipeline for creating patient-specific left and right ventricular meshes from cardiovascular magnetic resonance images, its application to a large cohort of ~55000 participants from UK Biobank, and the construction of the most comprehensive cohort of adult heart models to date, comprising 1423 representative meshes across sex (male, female), body mass index (range: 16 - 42 kg/m$^2$) and age (range: 49 - 80 years). Our code is available at https://github.com/cdttk/biv-volumetric-meshing/tree/plos2025 , and pre-trained networks, representative volumetric meshes with fibers and UVCs will be made available soon.
nan
Article 1005
Title@2025-05-27 (2): Federated Instrumental Variable Analysis via Federated Generalized Method of Moments
Title: Federated Instrumental Variable Analysis via Federated Generalized Method of Moments | Federated Instrumental Variable Analysis via Federated Generalized Method of Moments | 通过联邦通用时数方法进行的联邦仪器变量分析 2505.21012v1 |
Authors: Geetika, Somya Tyagi, Bapi Chatterjee
Instrumental variables (IV) analysis is an important applied tool for areas such as healthcare and consumer economics. For IV analysis in high-dimensional settings, the Generalized Method of Moments (GMM) using deep neural networks offers an efficient approach. With non-i.i.d. data sourced from scattered decentralized clients, federated learning is a popular paradigm for training the models while promising data privacy. However, to our knowledge, no federated algorithm for either GMM or IV analysis exists to date. In this work, we introduce federated instrumental variables analysis (FedIV) via federated generalized method of moments (FedGMM). We formulate FedGMM as a federated zero-sum game defined by a federated non-convex non-concave minimax optimization problem, which is solved using federated gradient descent ascent (FedGDA) algorithm. One key challenge arises in theoretically characterizing the federated local optimality. To address this, we present properties and existence results of clients’ local equilibria via FedGDA limit points. Thereby, we show that the federated solution consistently estimates the local moment conditions of every participating client. The proposed algorithm is backed by extensive experiments to demonstrate the efficacy of our approach.
nan
Article 1006
Title@2025-05-27 (2): Unified Alignment Protocol: Making Sense of the Unlabeled Data in New Domains
Title: Unified Alignment Protocol: Making Sense of the Unlabeled Data in New Domains | Unified Alignment Protocol: Sense der unmarkierten Daten in neuen Domains | 统一对齐协议: 在新域域中感知无标签数据 2505.21010v1 |
Authors: Sabbir Ahmed, Mamshad Nayeem Rizve, Abdullah Al Arafat, Jacqueline Liu, Rahim Hossain, Mohaiminul Al Nahian, Adnan Siraj Rakin
Semi-Supervised Federated Learning (SSFL) is gaining popularity over conventional Federated Learning in many real-world applications. Due to the practical limitation of limited labeled data on the client side, SSFL considers that participating clients train with unlabeled data, and only the central server has the necessary resources to access limited labeled data, making it an ideal fit for real-world applications (e.g., healthcare). However, traditional SSFL assumes that the data distributions in the training phase and testing phase are the same. In practice, however, domain shifts frequently occur, making it essential for SSFL to incorporate generalization capabilities and enhance their practicality. The core challenge is improving model generalization to new, unseen domains while the client participate in SSFL. However, the decentralized setup of SSFL and unsupervised client training necessitates innovation to achieve improved generalization across domains. To achieve this, we propose a novel framework called the Unified Alignment Protocol (UAP), which consists of an alternating two-stage training process. The first stage involves training the server model to learn and align the features with a parametric distribution, which is subsequently communicated to clients without additional communication overhead. The second stage proposes a novel training algorithm that utilizes the server feature distribution to align client features accordingly. Our extensive experiments on standard domain generalization benchmark datasets across multiple model architectures reveal that proposed UAP successfully achieves SOTA generalization performance in SSFL setting.
nan
Article 1007
Title@2025-05-27 (2): Transformers in Protein: A Survey
Title: Transformers in Protein: A Survey | Transformer in Protein: Eine Umfrage | 蛋白质变换器:调查 2505.20098v2 |
Authors: Xiaowen Ling, Zhiqiang Li, Yanbin Wang, Zhuhong You
As protein informatics advances rapidly, the demand for enhanced predictive accuracy, structural analysis, and functional understanding has intensified. Transformer models, as powerful deep learning architectures, have demonstrated unprecedented potential in addressing diverse challenges across protein research. However, a comprehensive review of Transformer applications in this field remains lacking. This paper bridges this gap by surveying over 100 studies, offering an in-depth analysis of practical implementations and research progress of Transformers in protein-related tasks. Our review systematically covers critical domains, including protein structure prediction, function prediction, protein-protein interaction analysis, functional annotation, and drug discovery/target identification. To contextualize these advancements across various protein domains, we adopt a domain-oriented classification system. We first introduce foundational concepts: the Transformer architecture and attention mechanisms, categorize Transformer variants tailored for protein science, and summarize essential protein knowledge. For each research domain, we outline its objectives and background, critically evaluate prior methods and their limitations, and highlight transformative contributions enabled by Transformer models. We also curate and summarize pivotal datasets and open-source code resources to facilitate reproducibility and benchmarking. Finally, we discuss persistent challenges in applying Transformers to protein informatics and propose future research directions. This review aims to provide a consolidated foundation for the synergistic integration of Transformer and protein informatics, fostering further innovation and expanded applications in the field.
nan
Article 1008
Title@2025-05-27 (2): Fairness in Federated Learning: Fairness for Whom?
Title: Fairness in Federated Learning: Fairness for Whom? | Fairness im Federated Learning: Fairness für wen? | 联邦学习中的公平性:谁的公平性? 2505.21584v1 |
Authors: Afaf Taik, Khaoula Chehbouni, Golnoosh Farnadi
Fairness in federated learning has emerged as a rapidly growing area of research, with numerous works proposing formal definitions and algorithmic interventions. Yet, despite this technical progress, fairness in FL is often defined and evaluated in ways that abstract away from the sociotechnical contexts in which these systems are deployed. In this paper, we argue that existing approaches tend to optimize narrow system level metrics, such as performance parity or contribution-based rewards, while overlooking how harms arise throughout the FL lifecycle and how they impact diverse stakeholders. We support this claim through a critical analysis of the literature, based on a systematic annotation of papers for their fairness definitions, design decisions, evaluation practices, and motivating use cases. Our analysis reveals five recurring pitfalls: 1) fairness framed solely through the lens of server client architecture, 2) a mismatch between simulations and motivating use-cases and contexts, 3) definitions that conflate protecting the system with protecting its users, 4) interventions that target isolated stages of the lifecycle while neglecting upstream and downstream effects, 5) and a lack of multi-stakeholder alignment where multiple fairness definitions can be relevant at once. Building on these insights, we propose a harm centered framework that links fairness definitions to concrete risks and stakeholder vulnerabilities. We conclude with recommendations for more holistic, context-aware, and accountable fairness research in FL.
nan
Article 1009
Title@2025-05-27 (2): Efficient and Unbiased Sampling from Boltzmann Distributions via Variance-Tuned Diffusion Models
Title: Efficient and Unbiased Sampling from Boltzmann Distributions via Variance-Tuned Diffusion Models | Effiziente und unvoreingenommene Probenahme von Boltzmann Distributionen über Variance-Tuned Diffusion Modelle | Boltzmann分销公司通过差异传播模型进行高效和无偏见的抽样 2505.21005v1 |
Authors: Fengzhe Zhang, Laurence I. Midgley, José Miguel Hernández-Lobato
Score-based diffusion models (SBDMs) are powerful amortized samplers for Boltzmann distributions; however, imperfect score estimates bias downstream Monte Carlo estimates. Classical importance sampling (IS) can correct this bias, but computing exact likelihoods requires solving the probability-flow ordinary differential equation (PF-ODE), a procedure that is prohibitively costly and scales poorly with dimensionality. We introduce Variance-Tuned Diffusion Importance Sampling (VT-DIS), a lightweight post-training method that adapts the per-step noise covariance of a pretrained SBDM by minimizing the $\alpha$-divergence ($\alpha=2$) between its forward diffusion and reverse denoising trajectories. VT-DIS assigns a single trajectory-wise importance weight to the joint forward-reverse process, yielding unbiased expectation estimates at test time with negligible overhead compared to standard sampling. On the DW-4, LJ-13, and alanine-dipeptide benchmarks, VT-DIS achieves effective sample sizes of approximately 80 %, 35 %, and 3.5 %, respectively, while using only a fraction of the computational budget required by vanilla diffusion + IS or PF-ODE-based IS.
nan
Article 1010
Title@2025-05-27 (2): BIPNN: Learning to Solve Binary Integer Programming via Hypergraph Neural Networks
Title: BIPNN: Learning to Solve Binary Integer Programming via Hypergraph Neural Networks | BIPNN: Lernen, Binäre Integer-Programmierung über Hypergraph Neuronale Netzwerke zu lösen | BIPNN: 学习通过超光速神经网络解决二元整数编程 2505.20997v1 |
Authors: Sen Bai, Chunqi Yang, Xin Bai, Xin Zhang, Zhengang Jiang
Binary (0-1) integer programming (BIP) is pivotal in scientific domains requiring discrete decision-making. As the advance of AI computing, recent works explore neural network-based solvers for integer linear programming (ILP) problems. Yet, they lack scalability for tackling nonlinear challenges. To handle nonlinearities, state-of-the-art Branch-and-Cut solvers employ linear relaxations, leading to exponential growth in auxiliary variables and severe computation limitations. To overcome these limitations, we propose BIPNN (Binary Integer Programming Neural Network), an unsupervised learning framework to solve nonlinear BIP problems via hypergraph neural networks (HyperGNN). Specifically, BIPNN reformulates BIPs-constrained, discrete, and nonlinear (sin, log, exp) optimization problems-into unconstrained, differentiable, and polynomial loss functions. The reformulation stems from the observation of a precise one-to-one mapping between polynomial BIP objectives and hypergraph structures, enabling the unsupervised training of HyperGNN to optimize BIP problems in an end-to-end manner. On this basis, we propose a GPU-accelerated and continuous-annealing-enhanced training pipeline for BIPNN. The pipeline enables BIPNN to optimize large-scale nonlinear terms in BIPs fully in parallel via straightforward gradient descent, thus significantly reducing the training cost while ensuring the generation of discrete, high-quality solutions. Extensive experiments on synthetic and real-world datasets highlight the superiority of our approach.
nan
Article 1011
Title@2025-05-27 (2): Efficient Identity and Position Graph Embedding via Spectral-Based Random Feature Aggregation
Title: Efficient Identity and Position Graph Embedding via Spectral-Based Random Feature Aggregation | Effiziente Einbettung von Identitäts- und Positionsdiagrammen über spektralbasierte Random Feature Aggregation | 通过光谱-基于随机地物聚合的高效身份和位置图嵌入 2505.20992v1 |
Authors: Meng Qin, Jiahong Liu, Irwin King
Graph neural networks (GNNs), which capture graph structures via a feature aggregation mechanism following the graph embedding framework, have demonstrated a powerful ability to support various tasks. According to the topology properties (e.g., structural roles or community memberships of nodes) to be preserved, graph embedding can be categorized into identity and position embedding. However, it is unclear for most GNN-based methods which property they can capture. Some of them may also suffer from low efficiency and scalability caused by several time- and space-consuming procedures (e.g., feature extraction and training). From a perspective of graph signal processing, we find that high- and low-frequency information in the graph spectral domain may characterize node identities and positions, respectively. Based on this investigation, we propose random feature aggregation (RFA) for efficient identity and position embedding, serving as an extreme ablation study regarding GNN feature aggregation. RFA (i) adopts a spectral-based GNN without learnable parameters as its backbone, (ii) only uses random noises as inputs, and (iii) derives embeddings via just one feed-forward propagation (FFP). Inspired by degree-corrected spectral clustering, we further introduce a degree correction mechanism to the GNN backbone. Surprisingly, our experiments demonstrate that two variants of RFA with high- and low-pass filters can respectively derive informative identity and position embeddings via just one FFP (i.e., without any training). As a result, RFA can achieve a better trade-off between quality and efficiency for both identity and position embedding over various baselines.
nan
Article 1012
Title@2025-05-27 (2): Identifying Super Spreaders in Multilayer Networks
Title: Identifying Super Spreaders in Multilayer Networks | Identifizieren von Superspreizern in Multilayer-Netzwerken | 识别多层网络中的超级传播器 2505.20980v1 |
Authors: Michał Czuba, Mateusz Stolarski, Adam Piróg, Piotr Bielak, Piotr Bródka
Identifying super-spreaders can be framed as a subtask of the influence maximisation problem. It seeks to pinpoint agents within a network that, if selected as single diffusion seeds, disseminate information most effectively. Multilayer networks, a specific class of heterogeneous graphs, can capture diverse types of interactions (e.g., physical-virtual or professional-social), and thus offer a more accurate representation of complex relational structures. In this work, we introduce a novel approach to identifying super-spreaders in such networks by leveraging graph neural networks. To this end, we construct a dataset by simulating information diffusion across hundreds of networks - to the best of our knowledge, the first of its kind tailored specifically to multilayer networks. We further formulate the task as a variation of the ranking prediction problem based on a four-dimensional vector that quantifies each agent’s spreading potential: (i) the number of activations; (ii) the duration of the diffusion process; (iii) the peak number of activations; and (iv) the simulation step at which this peak occurs. Our model, TopSpreadersNetwork, comprises a relationship-agnostic encoder and a custom aggregation layer. This design enables generalisation to previously unseen data and adapts to varying graph sizes. In an extensive evaluation, we compare our model against classic centrality-based heuristics and competitive deep learning methods. The results, obtained across a broad spectrum of real-world and synthetic multilayer networks, demonstrate that TopSpreadersNetwork achieves superior performance in identifying high-impact nodes, while also offering improved interpretability through its structured output.
nan
Article 1013
Title@2025-05-27 (2): Deep k-grouping: An Unsupervised Learning Framework for Combinatorial Optimization on Graphs and Hypergraphs
Title: Deep k-grouping: An Unsupervised Learning Framework for Combinatorial Optimization on Graphs and Hypergraphs | Deep k-grouping: Ein unüberwachter Lernrahmen für die kombinatorische Optimierung von Graphen und Hypergraphen | 深 k 组: 图形和高光谱组合优化的无人监督的学习框架 2505.20972v1 |
Authors: Sen Bai, Chunqi Yang, Xin Bai, Xin Zhang, Zhengang Jiang
Along with AI computing shining in scientific discovery, its potential in the combinatorial optimization (CO) domain has also emerged in recent years. Yet, existing unsupervised neural network solvers struggle to solve $k$-grouping problems (e.g., coloring, partitioning) on large-scale graphs and hypergraphs, due to limited computational frameworks. In this work, we propose Deep $k$-grouping, an unsupervised learning-based CO framework. Specifically, we contribute: Novel one-hot encoded polynomial unconstrained binary optimization (OH-PUBO), a formulation for modeling k-grouping problems on graphs and hypergraphs (e.g., graph/hypergraph coloring and partitioning); GPU-accelerated algorithms for large-scale k-grouping CO problems. Deep $k$-grouping employs the relaxation of large-scale OH-PUBO objectives as differentiable loss functions and trains to optimize them in an unsupervised manner. To ensure scalability, it leverages GPU-accelerated algorithms to unify the training pipeline; A Gini coefficient-based continuous relaxation annealing strategy to enforce discreteness of solutions while preventing convergence to local optima. Experimental results demonstrate that Deep $k$-grouping outperforms existing neural network solvers and classical heuristics such as SCIP and Tabu.
nan
Article 1014
Title@2025-05-27 (2): Semantic Communication meets System 2 ML: How Abstraction, Compositionality and Emergent Languages Shape Intelligence
Title: Semantic Communication meets System 2 ML: How Abstraction, Compositionality and Emergent Languages Shape Intelligence | Semantische Kommunikation trifft System 2 ML: Wie Abstraktion, Kompositionalität und Emergente Sprachen Formintelligenz | 语义通信满足系统2 ML:如何抽象、组成和新兴语言形式情报 2505.20964v1 |
Authors: Mehdi Bennis, Salem Lahlou
The trajectories of 6G and AI are set for a creative collision. However, current visions for 6G remain largely incremental evolutions of 5G, while progress in AI is hampered by brittle, data-hungry models that lack robust reasoning capabilities. This paper argues for a foundational paradigm shift, moving beyond the purely technical level of communication toward systems capable of semantic understanding and effective, goal-oriented interaction. We propose a unified research vision rooted in the principles of System-2 cognition, built upon three pillars: Abstraction, enabling agents to learn meaningful world models from raw sensorimotor data; Compositionality, providing the algebraic tools to combine learned concepts and subsystems; and Emergent Communication, allowing intelligent agents to create their own adaptive and grounded languages. By integrating these principles, we lay the groundwork for truly intelligent systems that can reason, adapt, and collaborate, unifying advances in wireless communications, machine learning, and robotics under a single coherent framework.
nan
Article 1015
Title@2025-05-27 (2): Resampling Filter Design for Multirate Neural Audio Effect Processing
Title: Resampling Filter Design for Multirate Neural Audio Effect Processing | Resampling Filter Design für Multirate Neural Audio Effect Processing | 多立体神经音频效果处理的抽取过滤器设计 2501.18470v2 |
Authors: Alistair Carson, Vesa Välimäki, Alec Wright, Stefan Bilbao
Neural networks have become ubiquitous in audio effects modelling, especially for guitar amplifiers and distortion pedals. One limitation of such models is that the sample rate of the training data is implicitly encoded in the model weights and therefore not readily adjustable at inference. Recent work explored modifications to recurrent neural network architecture to approximate a sample rate independent system, enabling audio processing at a rate that differs from the original training rate. This method works well for integer oversampling and can reduce aliasing caused by nonlinear activation functions. For small fractional changes in sample rate, fractional delay filters can be used to approximate sample rate independence, but in some cases this method fails entirely. Here, we explore the use of real-time signal resampling at the input and output of the neural network as an alternative solution. We investigate several resampling filter designs and show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method with many fewer filtering operations per sample and less than one millisecond of latency at typical audio rates. Furthermore, we investigate interpolation and decimation filters for the task of integer oversampling and show that cascaded half-band IIR and FIR designs can be used in conjunction with the model adjustment method to reduce aliasing in a range of distortion effect models.
nan
Article 1016
Title@2025-05-27 (2): Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization
Title: Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization | Effiziente und Mikrofon-Fehler-Tolerante 3D-Soundquelle Lokalisierung | 高效的麦克风和麦克风-默认的 3D 声音源源本地化 2505.20961v1 |
Authors: Yiyuan Yang, Shitong Xu, Niki Trigoni, Andrew Markham
Sound source localization (SSL) is a critical technology for determining the position of sound sources in complex environments. However, existing methods face challenges such as high computational costs and precise calibration requirements, limiting their deployment in dynamic or resource-constrained environments. This paper introduces a novel 3D SSL framework, which uses sparse cross-attention, pretraining, and adaptive signal coherence metrics, to achieve accurate and computationally efficient localization with fewer input microphones. The framework is also fault-tolerant to unreliable or even unknown microphone position inputs, ensuring its applicability in real-world scenarios. Preliminary experiments demonstrate its scalability for multi-source localization without requiring additional hardware. This work advances SSL by balancing the model’s performance and efficiency and improving its robustness for real-world scenarios.
nan
Article 1017
Title@2025-05-27 (2): Personalized Clustering via Targeted Representation Learning
Title: Personalized Clustering via Targeted Representation Learning | Personalisiertes Clustering über gezieltes Repräsentationslernen | 通过有针对性的代表学习进行个性化集群组合 2412.13690v3 |
Authors: Xiwen Geng, Suyun Zhao, Yixin Yu, Borui Peng, Pan Du, Hong Chen, Cuiping Li, Mengdie Wang
Clustering traditionally aims to reveal a natural grouping structure within unlabeled data. However, this structure may not always align with users’ preferences. In this paper, we propose a personalized clustering method that explicitly performs targeted representation learning by interacting with users via modicum task information (e.g., $\textit{must-link}$ or $\textit{cannot-link}$ pairs) to guide the clustering direction. We query users with the most informative pairs, i.e., those pairs most hard to cluster and those most easy to miscluster, to facilitate the representation learning in terms of the clustering preference. Moreover, by exploiting attention mechanism, the targeted representation is learned and augmented. By leveraging the targeted representation and constrained contrastive loss as well, personalized clustering is obtained. Theoretically, we verify that the risk of personalized clustering is tightly bounded, guaranteeing that active queries to users do mitigate the clustering risk. Experimentally, extensive results show that our method performs well across different clustering tasks and datasets, even when only a limited number of queries are available.
nan
Article 1018
Title@2025-05-27 (2): Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models
Title: Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models | Auswirkungen von Frequenzkomponenten auf Mitgliedschafts-Inferenzangriffe für Diffusionsmodelle enthüllen | 频率组成部分对传播模型的传播成员推断攻击的不懈影响 2505.20955v1 |
Authors: Puwei Lian, Yujun Cai, Songze Li
Diffusion models have achieved tremendous success in image generation, but they also raise significant concerns regarding privacy and copyright issues. Membership Inference Attacks (MIAs) are designed to ascertain whether specific data were utilized during a model’s training phase. As current MIAs for diffusion models typically exploit the model’s image prediction ability, we formalize them into a unified general paradigm which computes the membership score for membership identification. Under this paradigm, we empirically find that existing attacks overlook the inherent deficiency in how diffusion models process high-frequency information. Consequently, this deficiency leads to member data with more high-frequency content being misclassified as hold-out data, and hold-out data with less high-frequency content tend to be misclassified as member data. Moreover, we theoretically demonstrate that this deficiency reduces the membership advantage of attacks, thereby interfering with the effective discrimination of member data and hold-out data. Based on this insight, we propose a plug-and-play high-frequency filter module to mitigate the adverse effects of the deficiency, which can be seamlessly integrated into any attacks within this general paradigm without additional time costs. Extensive experiments corroborate that this module significantly improves the performance of baseline attacks across different datasets and models.
nan
Article 1019
Title@2025-05-27 (2): More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
Title: More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives | Mehr ist nicht immer besser? Viel-Shot-In-Context-Lernen mit differenzierten und neugewichtigen Zielen verbessern | 越多越好,越多越好?用差异化和再加权目标,加强多热化的内流学习 2501.04070v3 |
Authors: Xiaoqing Zhang, Ang Lv, Yuhan Liu, Flood Sung, Wei Liu, Jian Luan, Shuo Shang, Xiuying Chen, Rui Yan
Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates. However, as ICL demonstrations increase from a few to many, performance tends to plateau and eventually decline. We identify two primary causes for this trend: the suboptimal negative log-likelihood (NLL) optimization objective and the incremental data noise. To address these issues, we introduce \textit{DrICL}, a novel optimization method that enhances model performance through \textit{Differentiated} and \textit{Reweighting} objectives. Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels. Locally, it dynamically adjusts the weighting of many-shot demonstrations by leveraging cumulative advantages inspired by reinforcement learning, thereby mitigating the impact of noisy data. Recognizing the lack of multi-task datasets with diverse many-shot distributions, we develop the \textit{Many-Shot ICL Benchmark} (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for both fine-tuning and evaluation purposes. Experimental results demonstrate that LLMs enhanced with DrICL achieve significant improvements in many-shot setups across various tasks, including both in-domain and out-of-domain scenarios. We release the code and dataset hoping to facilitate further research in many-shot ICL\footnote{https://github.com/xiaoqzhwhu/DrICL}.
nan
Article 1020
Title@2025-05-27 (2): Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the role of model complexity
Title: Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the role of model complexity | Doppelter Abstieg trifft auf Out-of-Distribution Detection: Theoretische Erkenntnisse und empirische Analyse zur Rolle der Modellkomplexität | 双重人种与分配外探测:关于模型复杂性作用的理论洞察和经验分析 2411.02184v2 |
Authors: Mouïn Ben Ammar, David Brellmann, Arturo Mendoza, Antoine Manzanera, Gianni Franchi
Out-of-distribution (OOD) detection is essential for ensuring the reliability and safety of machine learning systems. In recent years, it has received increasing attention, particularly through post-hoc detection and training-based methods. In this paper, we focus on post-hoc OOD detection, which enables identifying OOD samples without altering the model’s training procedure or objective. Our primary goal is to investigate the relationship between model capacity and its OOD detection performance. Specifically, we aim to answer the following question: Does the Double Descent phenomenon manifest in post-hoc OOD detection? This question is crucial, as it can reveal whether overparameterization, which is already known to benefit generalization, can also enhance OOD detection. Despite the growing interest in these topics by the classic supervised machine learning community, this intersection remains unexplored for OOD detection. We empirically demonstrate that the Double Descent effect does indeed appear in post-hoc OOD detection. Furthermore, we provide theoretical insights to explain why this phenomenon emerges in such setting. Finally, we show that the overparameterized regime does not yield superior results consistently, and we propose a method to identify the optimal regime for OOD detection based on our observations.
nan
Article 1021
Title@2025-05-27 (2): Recovering Fairness Directly from Modularity: a New Way for Fair Community Partitioning
Title: Recovering Fairness Directly from Modularity: a New Way for Fair Community Partitioning | Fairness direkt aus Modularität zu gewinnen: ein neuer Weg für faire Gemeinschaftspartitionierung | 直接从模式中恢复公平:公平社区分割的新途径 2505.22684v1 |
Authors: Yufeng Wang, Yiguang Bai, Tianqing Zhu, Ismail Ben Ayed, Jing Yuan
Community partitioning is crucial in network analysis, with modularity optimization being the prevailing technique. However, traditional modularity-based methods often overlook fairness, a critical aspect in real-world applications. To address this, we introduce protected group networks and propose a novel fairness-modularity metric. This metric extends traditional modularity by explicitly incorporating fairness, and we prove that minimizing it yields naturally fair partitions for protected groups while maintaining theoretical soundness. We develop a general optimization framework for fairness partitioning and design the efficient Fair Fast Newman (FairFN) algorithm, enhancing the Fast Newman (FN) method to optimize both modularity and fairness. Experiments show FairFN achieves significantly improved fairness and high-quality partitions compared to state-of-the-art methods, especially on unbalanced datasets.
nan
Article 1022
Title@2025-05-27 (2): Scattering Networks on Noncommutative Finite Groups
Title: Scattering Networks on Noncommutative Finite Groups | Streunetze für nichtkommutative Finite-Gruppen | 关于非调解性有限集团的散射网络 2505.20950v1 |
Authors: Maria Teresa Arias, Davide Barbieri, Eugenio Hernández
Scattering Networks were initially designed to elucidate the behavior of early layers in Convolutional Neural Networks (CNNs) over Euclidean spaces and are grounded in wavelets. In this work, we introduce a scattering transform on an arbitrary finite group (not necessarily abelian) within the context of group-equivariant convolutional neural networks (G-CNNs). We present wavelets on finite groups and analyze their similarity to classical wavelets. We demonstrate that, under certain conditions in the wavelet coefficients, the scattering transform is non-expansive, stable under deformations, preserves energy, equivariant with respect to left and right group translations, and, as depth increases, the scattering coefficients are less sensitive to group translations of the signal, all desirable properties of convolutional neural networks. Furthermore, we provide examples illustrating the application of the scattering transform to classify data with domains involving abelian and nonabelian groups.
nan
Article 1023
Title@2025-05-27 (2): shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python
Title: shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python | shapr: Erklären von Machine Learning-Modellen mit bedingten Shapley-Werten in R und Python | Shapr:解释R和Python中带有有条件阴影值的机器学习模型 2504.01842v2 |
Authors: Martin Jullum, Lars Henry Berge Olsen, Jon Lachmann, Annabelle Redelmeier
This paper introduces the shapr R package, a versatile tool for generating Shapley value based prediction explanations for machine learning and statistical regression models. Moreover, the shaprpy Python library brings the core capabilities of shapr to the Python ecosystem. Shapley values originate from cooperative game theory in the 1950s, but have over the past few years become a widely used method for quantifying how a model’s features/covariates contribute to specific prediction outcomes. The shapr package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies – a crucial aspect for correct model explanation, typically lacking in similar software. In addition to regular tabular data, the shapr R package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible default values for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. Overall, the shapr and shaprpy packages aim to enhance the interpretability of predictive models within a powerful and user-friendly framework.
nan
Article 1024
Title@2025-05-27 (2): Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
Title: Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training | Zwei Experten sind alles, was Sie zum Lenken Denken brauchen: Kognitive Bemühungen in MoE-Reasoning-Modellen ohne zusätzliches Training verstärken | 两位专家是指导思考所需要的两个专家:在没有额外培训的情况下加强教育部理由说明模式中的认知努力 2505.14681v2 |
Authors: Mengru Wang, Xingyu Chen, Yue Wang, Zhiwei He, Jiahao Xu, Tian Liang, Qiuzhi Liu, Yunzhi Yao, Wenxuan Wang, Ruotian Ma, Haitao Mi, Ningyu Zhang, Zhaopeng Tu, Xiaolong Li, Dong Yu
Mixture-of-Experts (MoE) architectures within Large Reasoning Models (LRMs) have achieved impressive reasoning capabilities by selectively activating experts to facilitate structured cognitive processes. Despite notable advances, existing reasoning models often suffer from cognitive inefficiencies like overthinking and underthinking. To address these limitations, we introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE), designed to improve reasoning performance without additional training or complex heuristics. Leveraging normalized Pointwise Mutual Information (nPMI), we systematically identify specialized experts, termed ‘‘cognitive experts’’ that orchestrate meta-level reasoning operations characterized by tokens like ‘‘
nan
Article 1025
Title@2025-05-27 (2): Efficient Spectral Control of Partially Observed Linear Dynamical Systems
Title: Efficient Spectral Control of Partially Observed Linear Dynamical Systems | Effiziente Spektralsteuerung teilweise beobachteter linearer dynamischer Systeme | 局部观察线性动态系统的有效光谱控制 2505.20943v1 |
Authors: Anand Brahmbhatt, Gon Buzaglo, Sofiia Druchyna, Elad Hazan
We propose a new method for the problem of controlling linear dynamical systems under partial observation and adversarial disturbances. Our new algorithm, Double Spectral Control (DSC), matches the best known regret guarantees while exponentially improving runtime complexity over previous approaches in its dependence on the system’s stability margin. Our key innovation is a two-level spectral approximation strategy, leveraging double convolution with a universal basis of spectral filters, enabling efficient and accurate learning of the best linear dynamical controllers.
nan
Article 1026
Title@2025-05-27 (2): Towards Training One-Step Diffusion Models Without Distillation
Title: Towards Training One-Step Diffusion Models Without Distillation | Auf dem Weg zum Training von Ein-Schritt-Diffusionsmodellen ohne Destillation | 培训不蒸馏的单级传播模型 2502.08005v3 |
Authors: Mingtian Zhang, Wenlin Chen, Jiajun He, Zijing Ou, José Miguel Hernández-Lobato, Bernhard Schölkopf, David Barber
Recent advances in training one-step diffusion models typically follow a two-stage pipeline: first training a teacher diffusion model and then distilling it into a one-step student model. This process often depends on both the teacher’s score function for supervision and its weights for initializing the student model. In this paper, we explore whether one-step diffusion models can be trained directly without this distillation procedure. We introduce a family of new training methods that entirely forgo teacher score supervision, yet outperforms most teacher-guided distillation approaches. This suggests that score supervision is not essential for effective training of one-step diffusion models. However, we find that initializing the student model with the teacher’s weights remains critical. Surprisingly, the key advantage of teacher initialization is not due to better latent-to-output mappings, but rather the rich set of feature representations across different noise levels that the teacher diffusion model provides. These insights take us one step closer towards training one-step diffusion models without distillation and provide a better understanding of the roles of teacher supervision and initialization in the distillation process.
nan
Article 1027
Title@2025-05-27 (2): Revisiting Sparsity Constraint Under High-Rank Property in Partial Multi-Label Learning
Title: Revisiting Sparsity Constraint Under High-Rank Property in Partial Multi-Label Learning | Überprüfung der Sparsamkeitsbeschränkungen unter Hochrangigem Eigentum im Teil-Multi-Label-Lernen | 重新审视部分多标签学习中高等级属性下的平等限制 2505.20938v1 |
Authors: Chongjie Si, Yidan Cui, Fuchao Yang, Xiaokang Yang, Wei Shen
Partial Multi-Label Learning (PML) extends the multi-label learning paradigm to scenarios where each sample is associated with a candidate label set containing both ground-truth labels and noisy labels. Existing PML methods commonly rely on two assumptions: sparsity of the noise label matrix and low-rankness of the ground-truth label matrix. However, these assumptions are inherently conflicting and impractical for real-world scenarios, where the true label matrix is typically full-rank or close to full-rank. To address these limitations, we demonstrate that the sparsity constraint contributes to the high-rank property of the predicted label matrix. Based on this, we propose a novel method Schirn, which introduces a sparsity constraint on the noise label matrix while enforcing a high-rank property on the predicted label matrix. Extensive experiments demonstrate the superior performance of Schirn compared to state-of-the-art methods, validating its effectiveness in tackling real-world PML challenges.
nan
Article 1028
Title@2025-05-27 (2): EPIC: Efficient Position-Independent Caching for Serving Large Language Models
Title: EPIC: Efficient Position-Independent Caching for Serving Large Language Models | EPIC: Effizientes positionsunabhängiges Caching für das Servieren großer Sprachmodelle | EPIC: 高效的、独立定位的为大语言模式服务的工作 2410.15332v3 |
Authors: Junhao Hu, Wenrui Huang, Weidong Wang, Haoyi Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie
Large Language Models (LLMs) show great capabilities in a wide range of applications, but serving them efficiently becomes increasingly challenging as requests (prompts) become more complex. Context caching improves serving performance by reusing Key-Value (KV) vectors, the intermediate representations of tokens that are repeated across requests. However, existing context caching requires exact prefix matches across requests, limiting reuse cases in settings such as few-shot learning and retrieval-augmented generation, where immutable content (e.g., documents) remains unchanged across requests but is preceded by varying prefixes. Position-Independent Caching (PIC) addresses this issue by enabling modular reuse of the KV vectors regardless of prefixes. We formalize PIC and advance prior work by introducing EPIC, a serving system incorporating our new LegoLink algorithm, which mitigates the inappropriate “attention sink” effect at every document beginning, to maintain accuracy with minimal computation. Experiments show that EPIC achieves up to 8x improvements in Time-To-First-Token (TTFT) and 7x throughput gains over existing systems, with negligible or no accuracy loss.
nan
Article 1029
Title@2025-05-27 (2): Linear Bandits with Non-i.i.d. Noise
Title: Linear Bandits with Non-i.i.d. Noise | Lineare Banditen mit Non-i.i.d. Lärm | 带有非i.i.d. 噪音的线形强盗 2505.20017v2 |
Authors: Baptiste Abélès, Eugenio Clerico, Hamish Flynn, Gergely Neu
We study the linear stochastic bandit problem, relaxing the standard i.i.d. assumption on the observation noise. As an alternative to this restrictive assumption, we allow the noise terms across rounds to be sub-Gaussian but interdependent, with dependencies that decay over time. To address this setting, we develop new confidence sequences using a recently introduced reduction scheme to sequential probability assignment, and use these to derive a bandit algorithm based on the principle of optimism in the face of uncertainty. We provide regret bounds for the resulting algorithm, expressed in terms of the decay rate of the strength of dependence between observations. Among other results, we show that our bounds recover the standard rates up to a factor of the mixing time for geometrically mixing observation noise.
nan
Article 1030
Title@2025-05-27 (2): NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion
Title: NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion | NatADiff: Adversariale Grenzführung für natürliche Adversariale Diffusion | NatadADiff: 自然反向扩散反向边界指南 2505.20934v1 |
Authors: Max Collins, Jordan Vice, Tim French, Ajmal Mian
Adversarial samples exploit irregularities in the manifold ``learned’’ by deep learning models to cause misclassifications. The study of these adversarial samples provides insight into the features a model uses to classify inputs, which can be leveraged to improve robustness against future attacks. However, much of the existing literature focuses on constrained adversarial samples, which do not accurately reflect test-time errors encountered in real-world settings. To address this, we propose `NatADiff’, an adversarial sampling scheme that leverages denoising diffusion to generate natural adversarial samples. Our approach is based on the observation that natural adversarial samples frequently contain structural elements from the adversarial class. Deep learning models can exploit these structural elements to shortcut the classification process, rather than learning to genuinely distinguish between classes. To leverage this behavior, we guide the diffusion trajectory towards the intersection of the true and adversarial classes, combining time-travel sampling with augmented classifier guidance to enhance attack transferability while preserving image fidelity. Our method achieves comparable attack success rates to current state-of-the-art techniques, while exhibiting significantly higher transferability across model architectures and better alignment with natural test-time errors as measured by FID. These results demonstrate that NatADiff produces adversarial samples that not only transfer more effectively across models, but more faithfully resemble naturally occurring test-time errors.
nan
Article 1031
Title@2025-05-27 (2): MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate Models
Title: MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate Models | MLMC-basierte Ressourcenadäquatitätsbewertung mit aktiven Learning-Trained-Surrogate-Modellen | 以MLMC为基础的基于MLMC的资源充足性评估,与积极学习、经过培训的代用模型进行资源充足性评估 2505.20930v1 |
Authors: Ruiqi Zhang, Simon H. Tindemans
Multilevel Monte Carlo (MLMC) is a flexible and effective variance reduction technique for accelerating reliability assessments of complex power system. Recently, data-driven surrogate models have been proposed as lower-level models in the MLMC framework due to their high correlation and negligible execution time once trained. However, in resource adequacy assessments, pre-labeled datasets are typically unavailable. For large-scale systems, the efficiency gains from surrogate models are often offset by the substantial time required for labeling training data. Therefore, this paper introduces a speed metric that accounts for training time in evaluating MLMC efficiency. Considering the total time budget is limited, a vote-by-committee active learning approach is proposed to reduce the required labeling calls. A case study demonstrates that, within practical variance thresholds, active learning enables significantly improved MLMC efficiency with reduced training effort, compared to regular surrogate modelling approaches.
nan
Article 1032
Title@2025-05-27 (2): Label Leakage in Federated Inertial-based Human Activity Recognition
Title: Label Leakage in Federated Inertial-based Human Activity Recognition | Label-Leakage in Föderated Inertial-based Human Activity Recognition | 以联邦为本的人类活动确认中 联邦内地人类活动确认中的Label渗漏 2505.20924v1 |
Authors: Marius Bock, Maximilian Hopp, Kristof Van Laerhoven, Michael Moeller
While prior work has shown that Federated Learning updates can leak sensitive information, label reconstruction attacks, which aim to recover input labels from shared gradients, have not yet been examined in the context of Human Activity Recognition (HAR). Given the sensitive nature of activity labels, this study evaluates the effectiveness of state-of-the-art gradient-based label leakage attacks on HAR benchmark datasets. Our findings show that the number of activity classes, sampling strategy, and class imbalance are critical factors influencing the extent of label leakage, with reconstruction accuracies reaching up to 90% on two benchmark datasets, even for trained models. Moreover, we find that Local Differential Privacy techniques such as gradient noise and clipping offer only limited protection, as certain attacks still reliably infer both majority and minority class labels. We conclude by offering practical recommendations for the privacy-aware deployment of federated HAR systems and identify open challenges for future research. Code to reproduce our experiments is publicly available via github.com/mariusbock/leakage_har.
nan
Article 1033
Title@2025-05-27 (2): Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Title: Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective | Multi-Agenten-Weltmodellierung aus einer diffusionsinspirierten Perspektive Revue passieren | 从传播启发的视角重新审视多股权世界建模 2505.20922v1 |
Authors: Yang Zhang, Xinran Li, Jianing Ye, Delin Qu, Shuang Qiu, Chongjie Zhang, Xiu Li, Chenjia Bai
World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately modeling environments in MARL is challenging due to the exponentially large joint action space and highly uncertain dynamics inherent in multi-agent systems. To address this, we reduce modeling complexity by shifting from jointly modeling the entire state-action transition dynamics to focusing on the state space alone at each timestep through sequential agent modeling. Specifically, our approach enables the model to progressively resolve uncertainty while capturing the structured dependencies among agents, providing a more accurate representation of how agents influence the state. Interestingly, this sequential revelation of agents’ actions in a multi-agent system aligns with the reverse process in diffusion models–a class of powerful generative models known for their expressiveness and training stability compared to autoregressive or latent variable models. Leveraging this insight, we develop a flexible and robust world model for MARL using diffusion models. Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks, significantly outperforming prior world models in terms of final return and sample efficiency, including MAMuJoCo and Bi-DexHands. DIMA establishes a new paradigm for constructing multi-agent world models, advancing the frontier of MARL research.
nan
Article 1034
Title@2025-05-27 (2): Humble AI in the real-world: the case of algorithmic hiring
Title: Humble AI in the real-world: the case of algorithmic hiring | Humble KI in der realen Welt: der Fall der algorithmischen Einstellung | 现实世界中的黄土人工智能:算法雇用案例 2505.20918v1 |
Authors: Rahul Nair, Inge Vejsbjerg, Elizabeth Daly, Christos Varytimidis, Bran Knowles
Humble AI (Knowles et al., 2023) argues for cautiousness in AI development and deployments through scepticism (accounting for limitations of statistical learning), curiosity (accounting for unexpected outcomes), and commitment (accounting for multifaceted values beyond performance). We present a real-world case study for humble AI in the domain of algorithmic hiring. Specifically, we evaluate virtual screening algorithms in a widely used hiring platform that matches candidates to job openings. There are several challenges in misrecognition and stereotyping in such contexts that are difficult to assess through standard fairness and trust frameworks; e.g., someone with a non-traditional background is less likely to rank highly. We demonstrate technical feasibility of how humble AI principles can be translated to practice through uncertainty quantification of ranks, entropy estimates, and a user experience that highlights algorithmic unknowns. We describe preliminary discussions with focus groups made up of recruiters. Future user studies seek to evaluate whether the higher cognitive load of a humble AI system fosters a climate of trust in its outcomes.
nan
Article 1035
Title@2025-05-27 (2): A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models
Title: A Kernelised Stein Discrepancy for Assessing the Fit of Inhomogeneous Random Graph Models | Eine zerkleinerte Stein-Diskrepanz für die Beurteilung der Passform von inhomogenen Zufallsgraphenmodellen | 用于评估不相容随机图模型是否适合的内核化石 Stein 差异性评估 2505.21580v1 |
Authors: Anum Fatima, Gesine Reinert
Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as of an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in high dimensions, kernelised Stein discrepancy (KSD) tests are a powerful tool. Here, we develop, test, and analyse a KSD-type goodness-of-fit test for IRG models that can be carried out with a single observation of the network. The test is applicable to a network of any size and does not depend on the asymptotic distribution of the test statistic. We also provide theoretical guarantees.
nan
Article 1036
Title@2025-05-27 (2): Exploring the Boundary of Diffusion-based Methods for Solving Constrained Optimization
Title: Exploring the Boundary of Diffusion-based Methods for Solving Constrained Optimization | Erforschung der Grenzen von diffusionsbasierten Methoden zur Lösung eingeschränkter Optimierung | 探索以传播为基础的解决受限制的优化的解决方法的界限 2502.10330v3 |
Authors: Shutong Ding, Yimiao Zhou, Ke Hu, Xi Yao, Junchi Yan, Xiaoying Tang, Ye Shi
Diffusion models have achieved remarkable success in generative tasks such as image and video synthesis, and in control domains like robotics, owing to their strong generalization capabilities and proficiency in fitting complex multimodal distributions. However, their full potential in solving Continuous Constrained Optimization problems remains largely underexplored. Our work commences by investigating a two-dimensional constrained quadratic optimization problem as an illustrative example to explore the inherent challenges and issues when applying diffusion models to such optimization tasks and providing theoretical analyses for these observations. To address the identified gaps and harness diffusion models for Continuous Constrained Optimization, we build upon this analysis to propose a novel diffusion-based framework for optimization problems called DiOpt. This framework operates in two distinct phases: an initial warm-start phase, implemented via supervised learning, followed by a bootstrapping phase. This dual-phase architecture is designed to iteratively refine solutions, thereby improving the objective function while rigorously satisfying problem constraints. Finally, multiple candidate solutions are sampled, and the optimal one is selected through a screening process. We present extensive experiments detailing the training dynamics of DiOpt, its performance across a diverse set of Continuous Constrained Optimization problems, and an analysis of the impact of DiOpt’s various hyperparameters.
nan
Article 1037
Title@2025-05-27 (2): A data augmentation strategy for deep neural networks with application to epidemic modelling
Title: A data augmentation strategy for deep neural networks with application to epidemic modelling | Eine Datenvergrößerungsstrategie für tiefe neuronale Netzwerke mit Anwendung in der Epidemiemodellierung | 用于流行病建模的深层神经网络数据增强战略 2502.21033v2 |
Authors: Muhammad Awais, Abu Safyan Ali, Giacomo Dimarco, Federica Ferrarese, Lorenzo Pareschi
In this work, we integrate the predictive capabilities of compartmental disease dynamics models with machine learning ability to analyze complex, high-dimensional data and uncover patterns that conventional models may overlook. Specifically, we present a proof of concept demonstrating the application of data-driven methods and deep neural networks to a recently introduced Susceptible-Infected-Recovered type model with social features, including a saturated incidence rate, to improve epidemic prediction and forecasting. Our results show that a robust data augmentation strategy trough suitable data-driven models can improve the reliability of Feed-Forward Neural Networks and Nonlinear Autoregressive Networks, providing a complementary strategy to Physics-Informed Neural Networks, particularly in settings where data augmentation from mechanistic models can enhance learning. This approach enhances the ability to handle nonlinear dynamics and offers scalable, data-driven solutions for epidemic forecasting, prioritizing predictive accuracy over the constraints of physics-based models. Numerical simulations of the lockdown and post-lockdown phase of the COVID-19 epidemic in Italy and Spain validate our methodology.
nan
Article 1038
Title@2025-05-27 (2): “Oh LLM, I’m Asking Thee, Please Give Me a Decision Tree”: Zero-Shot Decision Tree Induction and Embedding with Large Language Models
Title: “Oh LLM, I’m Asking Thee, Please Give Me a Decision Tree”: Zero-Shot Decision Tree Induction and Embedding with Large Language Models | “Oh LLM, ich frage dich, bitte gib mir einen Entscheidungsbaum”: Nullschnelle Entscheidungsbauminduktion und Einbettung mit großen Sprachmodellen | “哦,LLM,我问你,请给我一棵决定树”: “零热决定树上演和嵌入大语言模型” 2409.18594v2 |
Authors: Ricardo Knauer, Mario Koddenbrock, Raphael Wallsberger, Nicholas M. Brisson, Georg N. Duda, Deborah Falla, David W. Evans, Erik Rodner
Large language models (LLMs) provide powerful means to leverage prior knowledge for predictive modeling when data is limited. In this work, we demonstrate how LLMs can use their compressed world knowledge to generate intrinsically interpretable machine learning models, i.e., decision trees, without any training data. We find that these zero-shot decision trees can even surpass data-driven trees on some small-sized tabular datasets and that embeddings derived from these trees perform better than data-driven tree-based embeddings on average. Our decision tree induction and embedding approaches can therefore serve as new knowledge-driven baselines for data-driven machine learning methods in the low-data regime. Furthermore, they offer ways to harness the rich world knowledge within LLMs for tabular machine learning tasks. Our code and results are available at https://github.com/ml-lab-htw/llm-trees.
nan
Article 1039
Title@2025-05-27 (2): Music Foundation Model as Generic Booster for Music Downstream Tasks
Title: Music Foundation Model as Generic Booster for Music Downstream Tasks | Music Foundation Modell als Generic Booster für Downstream-Aufgaben | 音乐基金会模式,作为音乐下流任务通用推进器 2411.01135v3 |
Authors: WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya, Zhi Zhong, Chieh-Hsin Lai, Giorgio Fabbro, Kazuki Shimada, Keisuke Toyama, Kinwai Cheuk, Marco A. Martínez-Ramírez, Shusuke Takahashi, Stefan Uhlich, Taketo Akama, Woosung Choi, Yuichiro Koyama, Yuki Mitsufuji
We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging hierarchical intermediate features, SoniDo constrains the information granularity, leading to improved performance across various downstream tasks including both understanding and generative tasks. We specifically evaluated this approach on representative tasks such as music tagging, music transcription, music source separation, and music mixing. Our results reveal that the features extracted from foundation models provide valuable enhancements in training downstream task models. This highlights the capability of using features extracted from music foundation models as a booster for downstream tasks. Our approach not only benefits existing task-specific models but also supports music downstream tasks constrained by data scarcity. This paves the way for more effective and accessible music processing solutions.
nan
Article 1040
Title@2025-05-27 (2): Simple Relative Deviation Bounds for Covariance and Gram Matrices
Title: Simple Relative Deviation Bounds for Covariance and Gram Matrices | Einfache relative Abweichungen für Kovarianz und Gram Matrices | 常数和小数母体的简单相对偏差宽度 2410.05754v3 |
Authors: Daniel Barzilai, Ohad Shamir
We provide non-asymptotic, relative deviation bounds for the eigenvalues of empirical covariance and Gram matrices in general settings. Unlike typical uniform bounds, which may fail to capture the behavior of smaller eigenvalues, our results provide sharper control across the spectrum. Our analysis is based on a general-purpose theorem that allows one to convert existing uniform bounds into relative ones. The theorems and techniques emphasize simplicity and should be applicable across various settings.
nan
Article 1041
Title@2025-05-27 (2): Enhancing Performance of Explainable AI Models with Constrained Concept Refinement
Title: Enhancing Performance of Explainable AI Models with Constrained Concept Refinement | Leistungssteigerung erklärbarer KI-Modelle mit eingeschränkter Konzeptverfeinerung | 增强可解释的AI 概念改进模型的绩效 2502.06775v2 |
Authors: Geyu Liang, Senne Michielssen, Salar Fattahi
The trade-off between accuracy and interpretability has long been a challenge in machine learning (ML). This tension is particularly significant for emerging interpretable-by-design methods, which aim to redesign ML algorithms for trustworthy interpretability but often sacrifice accuracy in the process. In this paper, we address this gap by investigating the impact of deviations in concept representations-an essential component of interpretable models-on prediction performance and propose a novel framework to mitigate these effects. The framework builds on the principle of optimizing concept embeddings under constraints that preserve interpretability. Using a generative model as a test-bed, we rigorously prove that our algorithm achieves zero loss while progressively enhancing the interpretability of the resulting model. Additionally, we evaluate the practical performance of our proposed framework in generating explainable predictions for image classification tasks across various benchmarks. Compared to existing explainable methods, our approach not only improves prediction accuracy while preserving model interpretability across various large-scale benchmarks but also achieves this with significantly lower computational cost.
nan
Article 1042
Title@2025-05-27 (2): Achieving binary weight and activation for LLMs using Post-Training Quantization
Title: Achieving binary weight and activation for LLMs using Post-Training Quantization | Erreichen des binären Gewichts und Aktivierung für LLMs mit Post-Training Quantization | 利用培训后量化办法使LLMMs实现二进制加权和激活 2504.05352v2 |
Authors: Siqing Song, Chuang Wang, Ruiqi Wang, Yi Yang, Xuyao Zhang
Quantizing large language models (LLMs) to 1-bit precision significantly reduces computational costs, but existing quantization techniques suffer from noticeable performance degradation when using weight and activation precisions below 4 bits (W4A4). In this paper, we propose a post-training quantization framework with W(1+1)A(1*4) configuration, where weights are quantized to 1 bit with an additional 1 bit for fine-grain grouping and activations are quantized to 1 bit with a 4-fold increase in the number of channels. For weight quantization, we propose utilizing Hessian-aware fine-grained grouping along with an EM-based quantization scheme. For activation quantization, we decompose INT4-quantized activations into a 4 * INT1 format equivalently and simultaneously smooth the scaling factors based on quantization errors, which further reduces the quantization errors in activations. Our method surpasses state-of-the-art (SOTA) LLM quantization baselines on W2A4 across multiple tasks, pushing the boundaries of existing LLM quantization methods toward fully binarized models. Code is available at https://github.com/JimmyCrave/LLM-PTQ-binarization.
nan
Article 1043
Title@2025-05-27 (2): Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers
Title: Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers | Frequency-Aware Maskierte Autoencoder für die Erkennung menschlicher Aktivität mit Beschleunigungsmessern | 使用加速计识别人类活动的频率软件 2502.17477v2 |
Authors: Niels R. Lorenzen, Poul J. Jennum, Emmanuel Mignot, Andreas Brink-Kjaer
Wearable accelerometers are widely used for continuous monitoring of physical activity. Supervised machine learning and deep learning algorithms have long been used to extract meaningful activity information from raw accelerometry data, but progress has been hampered by the limited amount of labeled data that is publicly available. Exploiting large unlabeled datasets using self-supervised pretraining is a relatively new and underexplored approach in the field of human activity recognition (HAR). We used a time-series transformer masked autoencoder (MAE) approach to self-supervised pretraining and propose two novel spectrogram-based loss functions: the log-scale meanmagnitude (LMM) and log-scale magnitude variance (LMV) losses. We compared these losses with the mean squared error (MSE) loss for MAE training. We leveraged the large unlabeled UK Biobank accelerometry dataset (n = 109k) for pretraining and evaluated downstream HAR performance using a linear classifier in a smaller labelled dataset. We found that pretraining with the LMM loss improved performance compared to an MAE pretrained with the MSE loss, with 12.7% increase in subject-wise F1 score when using linear probing. Compared with a state-of-the-art ResNet-based HAR model, our LMM-pretrained transformer models performed better (+9.8% F1) with linear probing and comparably when fine-tuned using an LSTM classifier. The addition of the LMV to the LMM loss decreased performance compared to the LMM loss alone. These findings establish the LMM loss as a robust and effective method for pretraining MAE models on accelerometer data for HAR and show the potential of pretraining sequence-based models for free-living HAR.
nan
Article 1044
Title@2025-05-27 (2): How Do Transformers Learn Variable Binding in Symbolic Programs?
Title: How Do Transformers Learn Variable Binding in Symbolic Programs? | Wie lernen Transformer variable Bindungen in Symbolischen Programmen? | 变换者如何在符号程序中学习变数绑定 ? 2505.20896v1 |
Authors: Yiwei Wu, Atticus Geiger, Raphaël Millière
Variable binding – the ability to associate variables with values – is fundamental to symbolic computation and cognition. Although classical architectures typically implement variable binding via addressable memory, it is not well understood how modern neural networks lacking built-in binding operations may acquire this capacity. We investigate this by training a Transformer to dereference queried variables in symbolic programs where variables are assigned either numerical constants or other variables. Each program requires following chains of variable assignments up to four steps deep to find the queried value, and also contains irrelevant chains of assignments acting as distractors. Our analysis reveals a developmental trajectory with three distinct phases during training: (1) random prediction of numerical constants, (2) a shallow heuristic prioritizing early variable assignments, and (3) the emergence of a systematic mechanism for dereferencing assignment chains. Using causal interventions, we find that the model learns to exploit the residual stream as an addressable memory space, with specialized attention heads routing information across token positions. This mechanism allows the model to dynamically track variable bindings across layers, resulting in accurate dereferencing. Our results show how Transformer models can learn to implement systematic variable binding without explicit architectural support, bridging connectionist and symbolic approaches.
nan
Article 1045
Title@2025-05-27 (2): DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition
Title: DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition | DeepConvContext: Ein mehrstufiger Ansatz zur Zeitreihenklassifizierung in der Anerkennung menschlicher Aktivität | 深刻信念:人类活动确认中的时间序列分类的多比额表办法 2505.20894v1 |
Authors: Marius Bock, Michael Moeller, Kristof Van Laerhoven
Despite recognized limitations in modeling long-range temporal dependencies, Human Activity Recognition (HAR) has traditionally relied on a sliding window approach to segment labeled datasets. Deep learning models like the DeepConvLSTM typically classify each window independently, thereby restricting learnable temporal context to within-window information. To address this constraint, we propose DeepConvContext, a multi-scale time series classification framework for HAR. Drawing inspiration from the vision-based Temporal Action Localization community, DeepConvContext models both intra- and inter-window temporal patterns by processing sequences of time-ordered windows. Unlike recent HAR models that incorporate attention mechanisms, DeepConvContext relies solely on LSTMs – with ablation studies demonstrating the superior performance of LSTMs over attention-based variants for modeling inertial sensor data. Across six widely-used HAR benchmarks, DeepConvContext achieves an average 10% improvement in F1-score over the classic DeepConvLSTM, with gains of up to 21%. Code to reproduce our experiments is publicly available via github.com/mariusbock/context_har.
nan
Article 1046
Title@2025-05-27 (2): One-Time Soft Alignment Enables Resilient Learning without Weight Transport
Title: One-Time Soft Alignment Enables Resilient Learning without Weight Transport | One-Time Soft Alignment ermöglicht resilientes Lernen ohne Gewicht Transport | 一次性软对齐使有弹性的学习无需体力运输 2505.20892v1 |
Authors: Jeonghwan Cheon, Jaehyuk Bae, Se-Bum Paik
Backpropagation is the cornerstone of deep learning, but its reliance on symmetric weight transport and global synchronization makes it computationally expensive and biologically implausible. Feedback alignment offers a promising alternative by approximating error gradients through fixed random feedback, thereby avoiding symmetric weight transport. However, this approach often struggles with poor learning performance and instability, especially in deep networks. Here, we show that a one-time soft alignment between forward and feedback weights at initialization enables deep networks to achieve performance comparable to backpropagation, without requiring weight transport during learning. This simple initialization condition guides stable error minimization in the loss landscape, improving network trainability. Spectral analyses further reveal that initial alignment promotes smoother gradient flow and convergence to flatter minima, resulting in better generalization and robustness. Notably, we also find that allowing moderate deviations from exact weight symmetry can improve adversarial robustness compared to standard backpropagation. These findings demonstrate that a simple initialization strategy can enable effective learning in deep networks in a biologically plausible and resource-efficient manner.
nan
Article 1047
Title@2025-05-27 (2): ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention
Title: ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention | ComplexEhemaliger: Disruptived Advance Transformer Inferenz-Fähigkeit über Head-Specific Complex Vector Achtung | 复杂形式:通过头部特定复杂矢量的注意,干扰推进变压器推断能力 2505.10222v2 |
Authors: Jintian Shao, Hongyi Huang, Jiayi Wu, Beiwen Zhang, ZhiYu Wu, You Shan, MingKai Zheng
Transformer models rely on self-attention to capture token dependencies but face challenges in effectively integrating positional information while allowing multi-head attention (MHA) flexibility. Prior methods often model semantic and positional differences disparately or apply uniform positional adjustments across heads, potentially limiting representational capacity. This paper introduces ComplexFormer, featuring Complex Multi-Head Attention-CMHA. CMHA empowers each head to independently model semantic and positional differences unified within the complex plane, representing interactions as rotations and scaling. ComplexFormer incorporates two key improvements: (1) a per-head Euler transformation, converting real-valued query/key projections into polar-form complex vectors for head-specific complex subspace operation; and (2) a per-head adaptive differential rotation mechanism, exp[i(Adapt(ASmn,i) + Delta(Pmn),i)], allowing each head to learn distinct strategies for integrating semantic angle differences (ASmn,i) with relative positional encodings (Delta(Pmn),i). Extensive experiments on language modeling, text generation, code generation, and mathematical reasoning show ComplexFormer achieves superior performance, significantly lower generation perplexity , and improved long-context coherence compared to strong baselines like RoPE-Transformers. ComplexFormer demonstrates strong parameter efficiency, offering a more expressive, adaptable attention mechanism.
nan
Article 1048
Title@2025-05-27 (2): Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality
Title: Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality | Macht-Rechts-Dekay-Verlust für große Sprachmodell Finetuning: Fokussierung auf Informationssparsität zur Verbesserung der Generationsqualität | 大语言模型调整的功率法减退损失:侧重于信息平等以提高世代质量 2505.16900v3 |
Authors: Jintian Shao, Yiming Cheng, Hongyi Huang, Jiayi Wu, Beiwen Zhang, Zhiyu Wu, You Shan, Mingkai Zheng
During the finetuning stage of text generation tasks, standard cross-entropy loss treats all tokens equally. This can lead models to overemphasize high-frequency, low-information tokens, neglecting lower-frequency tokens crucial for specificity and informativeness in generated content. This paper introduces a novel loss function, Power-Law Decay Loss (PDL), specifically designed to optimize the finetuning process for text generation. The core motivation for PDL stems from observations in information theory and linguistics: the informativeness of a token is often inversely proportional to its frequency of occurrence. PDL re-weights the contribution of each token in the standard cross-entropy loss based on its frequency in the training corpus, following a power-law decay. Specifically, the weights for high-frequency tokens are reduced, while low-frequency, information-dense tokens are assigned higher weights. This mechanism guides the model during finetuning to focus more on learning and generating tokens that convey specific and unique information, thereby enhancing the quality, diversity, and informativeness of the generated text. We theoretically elaborate on the motivation and construction of PDL and discuss its potential applications and advantages across various text generation finetuning tasks, such as abstractive summarization, dialogue systems, and style transfer.
nan
Article 1049
Title@2025-05-27 (2): Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective
Title: Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective | Auf dem Weg zur Analyse und dem Verständnis der Grenzen von VAPO: Eine theoretische Perspektive | 分析和理解VAPO的局限性:理论视角 2505.17997v2 |
Authors: Jintian Shao, Yiming Cheng, Hongyi Huang, Beiwen Zhang, Zhiyu Wu, You Shan, Mingkai Zheng
The VAPO framework has demonstrated significant empirical success in enhancing the efficiency and reliability of reinforcement learning for long chain-of-thought (CoT) reasoning tasks with large language models (LLMs). By systematically addressing challenges such as value model bias, heterogeneous sequence lengths, and sparse reward signals, VAPO achieves state-of-the-art performance. While its practical benefits are evident, a deeper theoretical understanding of its underlying mechanisms and potential limitations is crucial for guiding future advancements. This paper aims to initiate such a discussion by exploring VAPO from a theoretical perspective, highlighting areas where its assumptions might be challenged and where further investigation could yield more robust and generalizable reasoning agents. We delve into the intricacies of value function approximation in complex reasoning spaces, the optimality of adaptive advantage estimation, the impact of token-level optimization, and the enduring challenges of exploration and generalization.
nan
Article 1050
Title@2025-05-27 (2): Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning
Title: Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning | Fedivertex: ein Graph Dataset auf Basis dezentralisierter sozialer Netzwerke für vertrauenswürdiges maschinelles Lernen | Fedivertex:基于分散社会网络的图表数据集,用于可信赖的机器学习 2505.20882v1 |
Authors: Marc Damie, Edwige Cyffers
Decentralized machine learning - where each client keeps its own data locally and uses its own computational resources to collaboratively train a model by exchanging peer-to-peer messages - is increasingly popular, as it enables better scalability and control over the data. A major challenge in this setting is that learning dynamics depend on the topology of the communication graph, which motivates the use of real graph datasets for benchmarking decentralized algorithms. Unfortunately, existing graph datasets are largely limited to for-profit social networks crawled at a fixed point in time and often collected at the user scale, where links are heavily influenced by the platform and its recommendation algorithms. The Fediverse, which includes several free and open-source decentralized social media platforms such as Mastodon, Misskey, and Lemmy, offers an interesting real-world alternative. We introduce Fedivertex, a new dataset of 182 graphs, covering seven social networks from the Fediverse, crawled weekly over 14 weeks. We release the dataset along with a Python package to facilitate its use, and illustrate its utility on several tasks, including a new defederation task, which captures a process of link deletion observed on these networks.
nan
Article 1051
Title@2025-05-27 (2): Generalizable Heuristic Generation Through Large Language Models with Meta-Optimization
Title: Generalizable Heuristic Generation Through Large Language Models with Meta-Optimization | Generalisierbare Heuristische Generation durch große Sprachmodelle mit Meta-Optimierung | 通过配有元-优化的大型语言模型实现可普遍实现的超营养代 2505.20881v1 |
Authors: Yiding Shi, Jianan Zhou, Wen Song, Jieyi Bi, Yaoxin Wu, Jie Zhang
Heuristic design with large language models (LLMs) has emerged as a promising approach for tackling combinatorial optimization problems (COPs). However, existing approaches often rely on manually predefined evolutionary computation (EC) optimizers and single-task training schemes, which may constrain the exploration of diverse heuristic algorithms and hinder the generalization of the resulting heuristics. To address these issues, we propose Meta-Optimization of Heuristics (MoH), a novel framework that operates at the optimizer level, discovering effective optimizers through the principle of meta-learning. Specifically, MoH leverages LLMs to iteratively refine a meta-optimizer that autonomously constructs diverse optimizers through (self-)invocation, thereby eliminating the reliance on a predefined EC optimizer. These constructed optimizers subsequently evolve heuristics for downstream tasks, enabling broader heuristic exploration. Moreover, MoH employs a multi-task training scheme to promote its generalization capability. Experiments on classic COPs demonstrate that MoH constructs an effective and interpretable meta-optimizer, achieving state-of-the-art performance across various downstream tasks, particularly in cross-size settings.
nan
Article 1052
Title@2025-05-27 (2): Conditional Distribution Compression via the Kernel Conditional Mean Embedding
Title: Conditional Distribution Compression via the Kernel Conditional Mean Embedding | Conditional Distribution Compression über den Kernel Conditional Mean Embedding | 通过内核有条件平均嵌入式压缩有条件分发 2504.10139v2 |
Authors: Dominic Broadbent, Nick Whiteley, Robert Allison, Tom Lovett
Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of labelled data. To address this gap, we first introduce the Average Maximum Conditional Mean Discrepancy (AMCMD), a natural metric for comparing conditional distributions. We then derive a consistent estimator for the AMCMD and establish its rate of convergence. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the AMCMD can be reduced from $\mathcal{O}(n^3)$ to $\mathcal{O}(n)$. Building on this, we extend the idea of KH to develop Average Conditional Kernel Herding (ACKH), a linear-time greedy algorithm that constructs a compressed set targeting the AMCMD. To better understand the advantages of directly compressing the conditional distribution rather than doing so via the joint distribution, we introduce Joint Kernel Herding (JKH), a straightforward adaptation of KH designed to compress the joint distribution of labelled data. While herding methods provide a simple and interpretable selection process, they rely on a greedy heuristic. To explore alternative optimisation strategies, we propose Joint Kernel Inducing Points (JKIP) and Average Conditional Kernel Inducing Points (ACKIP), which jointly optimise the compressed set while maintaining linear complexity. Experiments show that directly preserving conditional distributions with ACKIP outperforms both joint distribution compression (via JKH and JKIP) and the greedy selection used in ACKH. Moreover, we see that JKIP consistently outperforms JKH.
nan
Article 1053
Title@2025-05-27 (2): Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials – A minireview
Title: Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials – A minireview | Machine Learning - Driven Materials Discovery: Locking Next-Generation Functional Materials – Eine Minireview | 机器学习 – – 驱动材料发现:解锁下一轮启动功能材料 – – 小型审查 2503.18975v2 |
Authors: Dilshod Nematov, Mirabbos Hojamberdiev
The rapid advancement of machine learning and artificial intelligence (AI)-driven techniques is revolutionizing materials discovery, property prediction, and material design by minimizing human intervention and accelerating scientific progress. This review provides a comprehensive overview of smart, machine learning (ML)-driven approaches, emphasizing their role in predicting material properties, discovering novel compounds, and optimizing material structures. Key methodologies ranging from deep learning, graph neural networks, and Bayesian optimization to automated generative models, such as generative adversarial networks (GANs) and variational autoencoders (VAEs) enable the autonomous design of materials with tailored functionalities. By leveraging AutoML frameworks (e.g., AutoGluon, TPOT, and H2O.ai), researchers can automate the model selection, hyperparameter tuning, and feature engineering, significantly improving the efficiency of materials informatics. Furthermore, the integration of AI-driven robotic laboratories and high-throughput computing has established a fully automated pipeline for rapid synthesis and experimental validation, drastically reducing the time and cost of material discovery. This review highlights real-world applications of automated ML-driven approaches in predicting mechanical, thermal, electrical, and optical properties of materials, demonstrating successful cases in superconductors, catalysts, photovoltaics, and energy storage systems. We also address key challenges, such as data quality, interpretability, and the integration of AutoML with quantum computing, which are essential for future advancements. Ultimately, the synergy between AI, automated experimentation, and computational modeling transforms the way the materials are discovered, optimized, and designed, paving the way for next-generation innovations in energy, electronics, and nanotechnology.
nan
Article 1054
Title@2025-05-27 (2): In Context Learning with Vision Transformers: Case Study
Title: In Context Learning with Vision Transformers: Case Study | Im Kontext Lernen mit Vision Transformers: Fallstudie | 与愿景变异者进行背景学习:案例研究 2505.20872v1 |
Authors: Antony Zhao, Alex Proshkin, Fergal Hennessy, Francesco Crivelli
Large transformer models have been shown to be capable of performing in-context learning. By using examples in a prompt as well as a query, they are capable of performing tasks such as few-shot, one-shot, or zero-shot learning to output the corresponding answer to this query. One area of interest to us is that these transformer models have been shown to be capable of learning the general class of certain functions, such as linear functions and small 2-layer neural networks, on random data (Garg et al, 2023). We aim to extend this to the image space to analyze their capability to in-context learn more complex functions on the image space, such as convolutional neural networks and other methods.
nan
Article 1055
Title@2025-05-27 (2): RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs
Title: RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs | RL-SPH: Lernen, um durchführbare Lösungen für Integer-Lineare-Programme zu erreichen | RL-SPH:学习为整数线性方案找到可行的解决办法 2411.19517v5 |
Authors: Tae-Hoon Lee, Min-Soo Kim
Integer linear programming (ILP) is widely utilized for various combinatorial optimization problems. Primal heuristics play a crucial role in quickly finding feasible solutions for NP-hard ILP. Although \textit{end-to-end learning}-based primal heuristics (E2EPH) have recently been proposed, they are typically unable to independently generate feasible solutions and mainly focus on binary variables. Ensuring feasibility is critical, especially when handling non-binary integer variables. To address this challenge, we propose RL-SPH, a novel reinforcement learning-based start primal heuristic capable of independently generating feasible solutions, even for ILP involving non-binary integers. Experimental results demonstrate that RL-SPH rapidly obtains high-quality feasible solutions, achieving on average a 44x lower primal gap and a 2.3x lower primal integral compared to existing primal heuristics.
nan
Article 1056
Title@2025-05-27 (2): Leveraging Diffusion Models for Parameterized Quantum Circuit Generation
Title: Leveraging Diffusion Models for Parameterized Quantum Circuit Generation | Nutzung von Diffusionsmodellen für die parameterisierte Quantum Circuit Generation | 利用可计量量子电路生成的传播模型 2505.20863v1 |
Authors: Daniel Barta, Darya Martyniuk, Johannes Jung, Adrian Paschke
Quantum computing holds immense potential, yet its practical success depends on multiple factors, including advances in quantum circuit design. In this paper, we introduce a generative approach based on denoising diffusion models (DMs) to synthesize parameterized quantum circuits (PQCs). Extending the recent diffusion model pipeline of F"urrutter et al. [1], our model effectively conditions the synthesis process, enabling the simultaneous generation of circuit architectures and their continuous gate parameters. We demonstrate our approach in synthesizing PQCs optimized for generating high-fidelity Greenberger-Horne-Zeilinger (GHZ) states and achieving high accuracy in quantum machine learning (QML) classification tasks. Our results indicate a strong generalization across varying gate sets and scaling qubit counts, highlighting the versatility and computational efficiency of diffusion-based methods. This work illustrates the potential of generative models as a powerful tool for accelerating and optimizing the design of PQCs, supporting the development of more practical and scalable quantum applications.
nan
Article 1057
Title@2025-05-27 (2): Model Agnostic Differentially Private Causal Inference
Title: Model Agnostic Differentially Private Causal Inference | Modell Agnostisch unterschiedliche private Kausalableitung | 示范性Agnistic 区分法私人原因推断 2505.19589v2 |
Authors: Christian Lebeda, Mathieu Even, Aurélien Bellet, Julie Josse
Estimating causal effects from observational data is essential in fields such as medicine, economics and social sciences, where privacy concerns are paramount. We propose a general, model-agnostic framework for differentially private estimation of average treatment effects (ATE) that avoids strong structural assumptions on the data-generating process or the models used to estimate propensity scores and conditional outcomes. In contrast to prior work, which enforces differential privacy by directly privatizing these nuisance components and results in a privacy cost that scales with model complexity, our approach decouples nuisance estimation from privacy protection. This separation allows the use of flexible, state-of-the-art black-box models, while differential privacy is achieved by perturbing only predictions and aggregation steps within a fold-splitting scheme with ensemble techniques. We instantiate the framework for three classical estimators – the G-formula, inverse propensity weighting (IPW), and augmented IPW (AIPW) – and provide formal utility and privacy guarantees. Empirical results show that our methods maintain competitive performance under realistic privacy budgets. We further extend our framework to support meta-analysis of multiple private ATE estimates. Our results bridge a critical gap between causal inference and privacy-preserving data analysis.
nan
Article 1058
Title@2025-05-27 (2): UOD: Unseen Object Detection in 3D Point Cloud
Title: UOD: Unseen Object Detection in 3D Point Cloud | UOD: Unsichtbare Objekterkennung in 3D-Punkt-Cloud | UOD: 3D点云中未见物体探测 2401.03846v2 |
Authors: Hyunjun Choi, Daeho Um, Hawook Jeong
Existing 3D object detectors encounter extreme challenges in localizing unseen 3D objects and recognizing them as unseen, which is a crucial technology in autonomous driving in the wild. To address these challenges, we propose practical methods to enhance the performance of 3D detection and Out-Of-Distribution (OOD) classification for unseen objects. The proposed methods include anomaly sample augmentation, learning of universal objectness, learning of detecting unseen objects, and learning of distinguishing unseen objects. To demonstrate the effectiveness of our approach, we propose the KITTI Misc benchmark and two additional synthetic OOD benchmarks: the Nuscenes OOD benchmark and the SUN-RGBD OOD benchmark. The proposed methods consistently enhance performance by a large margin across all existing methods, giving insight for future work on unseen 3D object detection in the wild.
nan
Article 1059
Title@2025-05-27 (2): Aggregation Buffer: Revisiting DropEdge with a New Parameter Block
Title: Aggregation Buffer: Revisiting DropEdge with a New Parameter Block | Aggregation Buffer: DropEdge mit einem neuen Parameterblock erneut aufrufen | 聚合缓冲:用新参数块重新检查下坡面 2505.20840v1 |
Authors: Dooho Lee, Myeong Kong, Sagad Hamid, Cheonwoo Lee, Jaemin Yoo
We revisit DropEdge, a data augmentation technique for GNNs which randomly removes edges to expose diverse graph structures during training. While being a promising approach to effectively reduce overfitting on specific connections in the graph, we observe that its potential performance gain in supervised learning tasks is significantly limited. To understand why, we provide a theoretical analysis showing that the limited performance of DropEdge comes from the fundamental limitation that exists in many GNN architectures. Based on this analysis, we propose Aggregation Buffer, a parameter block specifically designed to improve the robustness of GNNs by addressing the limitation of DropEdge. Our method is compatible with any GNN model, and shows consistent performance improvements on multiple datasets. Moreover, our method effectively addresses well-known problems such as degree bias or structural disparity as a unifying solution. Code and datasets are available at https://github.com/dooho00/agg-buffer.
nan
Article 1060
Title@2025-05-27 (2): Tuning LLM Judge Design Decisions for 1/1000 of the Cost
Title: Tuning LLM Judge Design Decisions for 1/1000 of the Cost | Tuning LLM Richter Design Entscheidungen für 1/1000 der Kosten | 1 000美元费用1 000美元法官设计决定 2501.17178v4 |
Authors: David Salinas, Omar Swelam, Frank Hutter
Evaluating Large Language Models (LLMs) often requires costly human annotations. To address this, LLM-based judges have been proposed, which compare the outputs of two LLMs enabling the ranking of models without human intervention. While several approaches have been proposed, many confounding factors are present between different papers. For instance the model, the prompt and other hyperparameters are typically changed at the same time making apple-to-apple comparisons challenging. In this paper, we propose to systematically analyze and tune the hyperparameters of LLM judges. To alleviate the high cost of evaluating a judge, we propose to leverage multi-objective multi-fidelity which allows to find judges that trade accuracy for cost and also significantly reduce the cost of the search. Our method identifies judges that not only outperform existing benchmarks in accuracy and cost-efficiency but also utilize open-weight models, ensuring greater accessibility and reproducibility. The code to reproduce our experiments is available at this repository https://github.com/geoalgo/judgetuning .
nan
Article 1061
Title@2025-05-27 (2): HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling
Title: HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling | HAD: Hybride Architektur Destillation übertrifft Lehrer in genomischer Sequenzmodellierung | HAD:混合结构蒸馏(混合结构蒸馏) 2505.20836v1 |
Authors: Hexiong Yang, Mingrui Chen, Huaibo Huang, Junxian Duan, Jie Cao, Zhen Zhou, Ran He
Inspired by the great success of Masked Language Modeling (MLM) in the natural language domain, the paradigm of self-supervised pre-training and fine-tuning has also achieved remarkable progress in the field of DNA sequence modeling. However, previous methods often relied on massive pre-training data or large-scale base models with huge parameters, imposing a significant computational burden. To address this, many works attempted to use more compact models to achieve similar outcomes but still fell short by a considerable margin. In this work, we propose a Hybrid Architecture Distillation (HAD) approach, leveraging both distillation and reconstruction tasks for more efficient and effective pre-training. Specifically, we employ the NTv2-500M as the teacher model and devise a grouping masking strategy to align the feature embeddings of visible tokens while concurrently reconstructing the invisible tokens during MLM pre-training. To validate the effectiveness of our proposed method, we conducted comprehensive experiments on the Nucleotide Transformer Benchmark and Genomic Benchmark. Compared to models with similar parameters, our model achieved excellent performance. More surprisingly, it even surpassed the distillation ceiling-teacher model on some sub-tasks, which is more than 500 $\times$ larger. Lastly, we utilize t-SNE for more intuitive visualization, which shows that our model can gain a sophisticated understanding of the intrinsic representation pattern in genomic sequences.
nan
Article 1062
Title@2025-05-27 (2): Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
Title: Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens | Jenseits von Semantik: Die unvernünftige Wirksamkeit von vernünftigen Zwischenmarken | 超越语义:无理性中肯的不合理效力 2505.13775v2 |
Authors: Kaya Stechly, Karthik Valmeekam, Atharva Gundawar, Vardhan Palod, Subbarao Kambhampati
Recent impressive results from large reasoning models have been interpreted as a triumph of Chain of Thought (CoT), and especially of the process of training on CoTs sampled from base LLMs in order to help find new reasoning patterns. In this paper, we critically examine that interpretation by investigating how the semantics of intermediate tokens-often anthropomorphized as “thoughts” or reasoning traces and which are claimed to display behaviors like backtracking, self-verification etc.-actually influence model performance. We train transformer models on formally verifiable reasoning traces and solutions, constraining both intermediate steps and final outputs to align with those of a formal solver (in our case, A* search). By constructing a formal interpreter of the semantics of our problems and intended algorithm, we systematically evaluate not only solution accuracy but also the correctness of intermediate traces, thus allowing us to evaluate whether the latter causally influences the former. We notice that, despite significant improvements on the solution-only baseline, models trained on entirely correct traces still produce invalid reasoning traces when arriving at correct solutions. To further show that trace accuracy is only loosely connected to solution accuracy, we then train models on noisy, corrupted traces which have no relation to the specific problem each is paired with, and find that not only does performance remain largely consistent with models trained on correct data, but in some cases can improve upon it and generalize more robustly on out-of-distribution tasks. These results challenge the assumption that intermediate tokens or “Chains of Thought” induce predictable reasoning behaviors and caution against anthropomorphizing such outputs or over-interpreting them (despite their mostly correct forms) as evidence of human-like or algorithmic behaviors in language models.
nan
Article 1063
Title@2025-05-27 (2): Concentration Distribution Learning from Label Distributions
Title: Concentration Distribution Learning from Label Distributions | Konzentrationsverteilung Lernen von Etikettenverteilungen | 从标签分发中学习 2505.21576v1 |
Authors: Jiawei Tang, Yuheng Jia
Label distribution learning (LDL) is an effective method to predict the relative label description degree (a.k.a. label distribution) of a sample. However, the label distribution is not a complete representation of an instance because it overlooks the absolute intensity of each label. Specifically, it’s impossible to obtain the total description degree of hidden labels that not in the label space, which leads to the loss of information and confusion in instances. To solve the above problem, we come up with a new concept named background concentration to serve as the absolute description degree term of the label distribution and introduce it into the LDL process, forming the improved paradigm of concentration distribution learning. Moreover, we propose a novel model by probabilistic methods and neural networks to learn label distributions and background concentrations from existing LDL datasets. Extensive experiments prove that the proposed approach is able to extract background concentrations from label distributions while producing more accurate prediction results than the state-of-the-art LDL methods. The code is available in https://github.com/seutjw/CDL-LD.
nan
Article 1064
Title@2025-05-27 (2): The Third Pillar of Causal Analysis? A Measurement Perspective on Causal Representations
Title: The Third Pillar of Causal Analysis? A Measurement Perspective on Causal Representations | Die dritte Säule der Kausalanalyse? Eine Messperspektive auf Kausaldarstellungen | Causal 分析的第三个支柱? Causal 代表比例的衡量观点 2505.17708v2 |
Authors: Dingling Yao, Shimeng Huang, Riccardo Cadei, Kun Zhang, Francesco Locatello
Causal reasoning and discovery, two fundamental tasks of causal analysis, often face challenges in applications due to the complexity, noisiness, and high-dimensionality of real-world data. Despite recent progress in identifying latent causal structures using causal representation learning (CRL), what makes learned representations useful for causal downstream tasks and how to evaluate them are still not well understood. In this paper, we reinterpret CRL using a measurement model framework, where the learned representations are viewed as proxy measurements of the latent causal variables. Our approach clarifies the conditions under which learned representations support downstream causal reasoning and provides a principled basis for quantitatively assessing the quality of representations using a new Test-based Measurement EXclusivity (T-MEX) score. We validate T-MEX across diverse causal inference scenarios, including numerical simulations and real-world ecological video analysis, demonstrating that the proposed framework and corresponding score effectively assess the identification of learned representations and their usefulness for causal downstream tasks.
nan
Article 1065
Title@2025-05-27 (2): HybridLinker: Topology-Guided Posterior Sampling for Enhanced Diversity and Validity in 3D Molecular Linker Generation
Title: HybridLinker: Topology-Guided Posterior Sampling for Enhanced Diversity and Validity in 3D Molecular Linker Generation | HybridLinker: Topologie-geführte hintere Probenahme für verbesserte Diversität und Validität in der 3D-Molekularlinker-Generation | GlubLinker: 3D 分子联系器生成中加强多样性和有效性的地形学-指导外表抽样 2502.17349v3 |
Authors: Minyeong Hwang, Ziseok Lee, Kwang-Soo Kim, Kyungsu Kim, Eunho Yang
Linker generation is critical in drug discovery applications such as lead optimization and PROTAC design, where molecular fragments are assembled into diverse drug candidates via molecular linker. Existing methods fall into point cloud-free and point cloud-aware categories based on their use of fragments’ 3D poses alongside their topologies in sampling the linker’s topology. Point cloud-free models prioritize sample diversity but suffer from lower validity due to overlooking fragments’ spatial constraints, while point cloud-aware models ensure higher validity but restrict diversity by enforcing strict spatial constraints. To overcome these trade-offs without additional training, we propose HybridLinker, a framework that enhances point cloud-aware inference by providing diverse bonding topologies from a pretrained point cloud-free model as guidance. At its core, we propose LinkerDPS, the first diffusion posterior sampling (DPS) method operating across point cloud-free and point cloud-aware spaces, bridging molecular topology with 3D point clouds via an energy-inspired function. By transferring the diverse sampling distribution of point cloud-free models into the point cloud-aware distribution, HybridLinker significantly surpasses baselines, improving both validity and diversity in foundational molecular design and applied drug optimization tasks, establishing a new DPS framework in the molecular domains beyond imaging.
nan
Article 1066
Title@2025-05-27 (2): Do We Need All the Synthetic Data? Towards Targeted Synthetic Image Augmentation via Diffusion Models
Title: Do We Need All the Synthetic Data? Towards Targeted Synthetic Image Augmentation via Diffusion Models | Brauchen wir alle synthetischen Daten? Auf dem Weg zu einer gezielten Synthetischen Bildvergrößerung über Diffusionsmodelle | 我们需要所有合成数据吗?通过扩散模型实现有针对性的合成图像增强 2505.21574v1 |
Authors: Dang Nguyen, Jiping Li, Jinghao Zheng, Baharan Mirzasoleiman
Synthetically augmenting training datasets with diffusion models has been an effective strategy for improving generalization of image classifiers. However, existing techniques struggle to ensure the diversity of generation and increase the size of the data by up to 10-30x to improve the in-distribution performance. In this work, we show that synthetically augmenting part of the data that is not learned early in training outperforms augmenting the entire dataset. By analyzing a two-layer CNN, we prove that this strategy improves generalization by promoting homogeneity in feature learning speed without amplifying noise. Our extensive experiments show that by augmenting only 30%-40% of the data, our method boosts the performance by up to 2.8% in a variety of scenarios, including training ResNet, ViT and DenseNet on CIFAR-10, CIFAR-100, and TinyImageNet, with a range of optimizers including SGD and SAM. Notably, our method applied with SGD outperforms the SOTA optimizer, SAM, on CIFAR-100 and TinyImageNet. It can also easily stack with existing weak and strong augmentation strategies to further boost the performance.
nan
Article 1067
Title@2025-05-27 (2): Spectral-inspired Neural Operator for Data-efficient PDE Simulation in Physics-agnostic Regimes
Title: Spectral-inspired Neural Operator for Data-efficient PDE Simulation in Physics-agnostic Regimes | Spektral-inspirierter Neuraloperator für dateneffiziente PDE-Simulation in physik-agnostischen Regimes | 物理 – – 不可知系统数据高效PDE模拟光导神经操作器 2505.21573v1 |
Authors: Han Wan, Rui Zhang, Hao Sun
Partial differential equations (PDEs) govern the spatiotemporal evolution of various physical systems. Classical numerical solvers, while accurate, require fine discretization and full knowledge of the governing PDEs, limiting their applicability when the physics is unknown or fast inference is required. Data-driven neural PDE solvers alleviate these constraints by learning from data but demand large training datasets and perform poorly in data-scarce regimes. Physics-aware methods mitigate data requirements by incorporating physical knowledge yet rely on known PDE terms or local numerical schemes, restricting their ability to handle unknown or globally coupled systems. In this work, we propose the Spectral-inspired Neural Operator (SINO), a novel framework that learns PDE operators from limited trajectories (as few as 2-5), without any known PDE terms. SINO operates in the frequency domain and introduces a Frequency-to-Vector module to learn spectral representations analogous to derivative multipliers. To model nonlinear physical interactions, we design a nonlinear operator block that includes a $\Pi$-Block with low-pass filtering to prevent aliasing. Finally, we introduce an operator distillation technique to distill the trained model for efficient inference. SINO achieves state-of-the-art results across multiple PDE benchmarks, demonstrating strong discretization invariance and robust generalization to out-of-distribution initial conditions. To our knowledge, SINO is the first physics-aware method capable of accurately simulating globally coupled systems (e.g., the Navier-Stokes equations) from limited data without any explicit PDE terms.
nan
Article 1068
Title@2025-05-27 (2): Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise
Title: Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise | Konvergenz von Clipped-SGD für Convex $(L_0,L_1)$-Smooth-Optimierung mit schwerfälligem Lärm | 使用 Cllipped-SGD 组合(L_0,L_1) $- 与重故障噪音平滑优化 2505.20817v1 |
Authors: Savelii Chezhegov, Aleksandr Beznosikov, Samuel Horváth, Eduard Gorbunov
Gradient clipping is a widely used technique in Machine Learning and Deep Learning (DL), known for its effectiveness in mitigating the impact of heavy-tailed noise, which frequently arises in the training of large language models. Additionally, first-order methods with clipping, such as Clip-SGD, exhibit stronger convergence guarantees than SGD under the $(L_0,L_1)$-smoothness assumption, a property observed in many DL tasks. However, the high-probability convergence of Clip-SGD under both assumptions – heavy-tailed noise and $(L_0,L_1)$-smoothness – has not been fully addressed in the literature. In this paper, we bridge this critical gap by establishing the first high-probability convergence bounds for Clip-SGD applied to convex $(L_0,L_1)$-smooth optimization with heavy-tailed noise. Our analysis extends prior results by recovering known bounds for the deterministic case and the stochastic setting with $L_1 = 0$ as special cases. Notably, our rates avoid exponentially large factors and do not rely on restrictive sub-Gaussian noise assumptions, significantly broadening the applicability of gradient clipping.
nan
Article 1069
Title@2025-05-27 (2): Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting
Title: Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting | Mischung aus Low-Rank-Anpassung mit Teilparameter-Sharing für Zeitreihen-Prognose | 低级别适应与时间序列预测部分参数共享混合 2505.17872v2 |
Authors: Licheng Pan, Zhichao Chen, Haoxuan Li, Guangyi Liu, Zhijian Xu, Zhaoran Liu, Hao Wang, Ying Wei
Multi-task forecasting has become the standard approach for time-series forecasting (TSF). However, we show that it suffers from an Expressiveness Bottleneck, where predictions at different time steps share the same representation, leading to unavoidable errors even with optimal representations. To address this issue, we propose a two-stage framework: first, pre-train a foundation model for one-step-ahead prediction; then, adapt it using step-specific LoRA modules.This design enables the foundation model to handle any number of forecast steps while avoiding the expressiveness bottleneck. We further introduce the Mixture-of-LoRA (MoLA) model, which employs adaptively weighted LoRA experts to achieve partial parameter sharing across steps. This approach enhances both efficiency and forecasting performance by exploiting interdependencies between forecast steps. Experiments show that MoLA significantly improves model expressiveness and outperforms state-of-the-art time-series forecasting methods. Code is available at https://anonymous.4open.science/r/MoLA-BC92.
nan
Article 1070
Title@2025-05-27 (2): Interpretable Credit Default Prediction with Ensemble Learning and SHAP
Title: Interpretable Credit Default Prediction with Ensemble Learning and SHAP | Interpretierbare Credit Default Vorhersage mit Ensemble Learning und SHAP | 组合学习和SHAP的可解释信用默认预测 2505.20815v1 |
Authors: Shiqi Yang, Ziyi Huang, Wengran Xiao, Xinyu Shen
This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms. Through preprocessing, feature engineering, and model training of the Home Credit dataset, the performance of multiple models including logistic regression, random forest, XGBoost, LightGBM, etc. in terms of accuracy, precision, and recall is evaluated. The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems. It shows strong robustness. At the same time, the SHAP method is used to analyze the importance and dependency of features, and it is found that the external credit score variable plays a dominant role in model decision making, which helps to improve the model’s interpretability and practical application value. The research results provide effective reference and technical support for the intelligent development of credit risk control systems.
nan
Article 1071
Title@2025-05-27 (2): Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains
Title: Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains | Geometry Aware Operator Transformer als effizientes und präzises Neural Surrogate für PDEs auf willkürlichen Domains | 操作者变异器作为任意域中PDEs的高效和准确神经外壳 2505.18781v2 |
Authors: Shizheng Wen, Arsh Kumbhat, Levi Lingsch, Sepehr Mousavi, Yizhou Zhao, Praveen Chandrashekar, Siddhartha Mishra
The very challenging task of learning solution operators of PDEs on arbitrary domains accurately and efficiently is of vital importance to engineering and industrial simulations. Despite the existence of many operator learning algorithms to approximate such PDEs, we find that accurate models are not necessarily computationally efficient and vice versa. We address this issue by proposing a geometry aware operator transformer (GAOT) for learning PDEs on arbitrary domains. GAOT combines novel multiscale attentional graph neural operator encoders and decoders, together with geometry embeddings and (vision) transformer processors to accurately map information about the domain and the inputs into a robust approximation of the PDE solution. Multiple innovations in the implementation of GAOT also ensure computational efficiency and scalability. We demonstrate this significant gain in both accuracy and efficiency of GAOT over several baselines on a large number of learning tasks from a diverse set of PDEs, including achieving state of the art performance on a large scale three-dimensional industrial CFD dataset.
nan
Article 1072
Title@2025-05-27 (2): Thickness-aware E(3)-Equivariant 3D Mesh Neural Networks
Title: Thickness-aware E(3)-Equivariant 3D Mesh Neural Networks | Dicke bewusst E(3)-Equivariante 3D-Mesh-Neurale Netze | E(3)-等离 3D 3D 气象神经网络 2505.21572v1 |
Authors: Sungwon Kim, Namkyeong Lee, Yunyoung Doh, Seungmin Shin, Guimok Cho, Seung-Won Jeon, Sangkook Kim, Chanyoung Park
Mesh-based 3D static analysis methods have recently emerged as efficient alternatives to traditional computational numerical solvers, significantly reducing computational costs and runtime for various physics-based analyses. However, these methods primarily focus on surface topology and geometry, often overlooking the inherent thickness of real-world 3D objects, which exhibits high correlations and similar behavior between opposing surfaces. This limitation arises from the disconnected nature of these surfaces and the absence of internal edge connections within the mesh. In this work, we propose a novel framework, the Thickness-aware E(3)-Equivariant 3D Mesh Neural Network (T-EMNN), that effectively integrates the thickness of 3D objects while maintaining the computational efficiency of surface meshes. Additionally, we introduce data-driven coordinates that encode spatial information while preserving E(3)-equivariance or invariance properties, ensuring consistent and robust analysis. Evaluations on a real-world industrial dataset demonstrate the superior performance of T-EMNN in accurately predicting node-level 3D deformations, effectively capturing thickness effects while maintaining computational efficiency.
nan
Article 1073
Title@2025-05-27 (2): Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs
Title: Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs | Schrittweise adaptive Integration von überwachtem Feinabstimmungs- und Verstärkungslernen für aufgabenspezifische LLMs | 监督特定任务专责性微调和强化学习的渐进式适应性整合 2505.13026v2 |
Authors: Jack Chen, Fazhong Liu, Naruto Liu, Yuhan Luo, Erqu Qin, Harry Zheng, Tian Dong, Haojin Zhu, Yan Meng, Xiao Wang
Large language models (LLMs) excel at mathematical reasoning and logical problem-solving. The current popular training paradigms primarily use supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance the models’ reasoning abilities. However, when using SFT or RL alone, there are respective challenges: SFT may suffer from overfitting, while RL is prone to mode collapse. The state-of-the-art methods have proposed hybrid training schemes. However, static switching faces challenges such as poor generalization across different tasks and high dependence on data quality. In response to these challenges, inspired by the curriculum learning-quiz mechanism in human reasoning cultivation, We propose SASR, a step-wise adaptive hybrid training framework that theoretically unifies SFT and RL and dynamically balances the two throughout optimization. SASR uses SFT for initial warm-up to establish basic reasoning skills, and then uses an adaptive dynamic adjustment algorithm based on gradient norm and divergence relative to the original distribution to seamlessly integrate SFT with the online RL method GRPO. By monitoring the training status of LLMs and adjusting the training process in sequence, SASR ensures a smooth transition between training schemes, maintaining core reasoning abilities while exploring different paths. Experimental results demonstrate that SASR outperforms SFT, RL, and static hybrid training methods.
nan
Article 1074
Title@2025-05-27 (2): Simple yet Effective Graph Distillation via Clustering
Title: Simple yet Effective Graph Distillation via Clustering | Einfache und dennoch effektive Graphendestillation über Clustering | 通过集群进行简单而有效的图形蒸馏 2505.20807v1 |
Authors: Yurui Lai, Taiyan Zhang, Renchi Yang
Despite plentiful successes achieved by graph representation learning in various domains, the training of graph neural networks (GNNs) still remains tenaciously challenging due to the tremendous computational overhead needed for sizable graphs in practice. Recently, graph data distillation (GDD), which seeks to distill large graphs into compact and informative ones, has emerged as a promising technique to enable efficient GNN training. However, most existing GDD works rely on heuristics that align model gradients or representation distributions on condensed and original graphs, leading to compromised result quality, expensive training for distilling large graphs, or both. Motivated by this, this paper presents an efficient and effective GDD approach, ClustGDD. Under the hood, ClustGDD resorts to synthesizing the condensed graph and node attributes through fast and theoretically-grounded clustering that minimizes the within-cluster sum of squares and maximizes the homophily on the original graph. The fundamental idea is inspired by our empirical and theoretical findings unveiling the connection between clustering and empirical condensation quality using Fr'echet Inception Distance, a well-known quality metric for synthetic images. Furthermore, to mitigate the adverse effects caused by the homophily-based clustering, ClustGDD refines the nodal attributes of the condensed graph with a small augmentation learned via class-aware graph sampling and consistency loss. Our extensive experiments exhibit that GNNs trained over condensed graphs output by ClustGDD consistently achieve superior or comparable performance to state-of-the-art GDD methods in terms of node classification on five benchmark datasets, while being orders of magnitude faster.
nan
Article 1075
Title@2025-05-27 (2): FCOS: A Two-Stage Recoverable Model Pruning Framework for Automatic Modulation Recognition
Title: FCOS: A Two-Stage Recoverable Model Pruning Framework for Automatic Modulation Recognition | FCOS: Ein zweistufiges, wiederherstellbares Modell-Beschneidungs-Framework für die automatische Modulationserkennung | FCOS: 自动调整识别的双层可回收模型保护框架 2505.21571v1 |
Authors: Yao Lu, Tengfei Ma, Zeyu Wang, Zhuangzhi Chen, Dongwei Xu, Yun Lin, Qi Xuan, Guan Gui
With the rapid development of wireless communications and the growing complexity of digital modulation schemes, traditional manual modulation recognition methods struggle to extract reliable signal features and meet real-time requirements in modern scenarios. Recently, deep learning based Automatic Modulation Recognition (AMR) approaches have greatly improved classification accuracy. However, their large model sizes and high computational demands hinder deployment on resource-constrained devices. Model pruning provides a general approach to reduce model complexity, but existing weight, channel, and layer pruning techniques each present a trade-off between compression rate, hardware acceleration, and accuracy preservation. To this end, in this paper, we introduce FCOS, a novel Fine-to-COarse two-Stage pruning framework that combines channel-level pruning with layer-level collapse diagnosis to achieve extreme compression, high performance and efficient inference. In the first stage of FCOS, hierarchical clustering and parameter fusion are applied to channel weights to achieve channel-level pruning. Then a Layer Collapse Diagnosis (LaCD) module uses linear probing to identify layer collapse and removes the collapsed layers due to high channel compression ratio. Experiments on multiple AMR benchmarks demonstrate that FCOS outperforms existing channel and layer pruning methods. Specifically, FCOS achieves 95.51% FLOPs reduction and 95.31% parameter reduction while still maintaining performance close to the original ResNet56, with only a 0.46% drop in accuracy on Sig2019-12. Code is available at https://github.com/yaolu-zjut/FCOS.
nan
Article 1076
Title@2025-05-27 (2): Quantum Machine Learning in Healthcare: Evaluating QNN and QSVM Models
Title: Quantum Machine Learning in Healthcare: Evaluating QNN and QSVM Models | Quantum Machine Learning in Healthcare: Bewertung von QNN- und QSVM-Modellen | QNN和QSVM模型评估 QNN和QSVM模型 2505.20804v1 |
Authors: Antonio Tudisco, Deborah Volpe, Giovanna Turvani
Effective and accurate diagnosis of diseases such as cancer, diabetes, and heart failure is crucial for timely medical intervention and improving patient survival rates. Machine learning has revolutionized diagnostic methods in recent years by developing classification models that detect diseases based on selected features. However, these classification tasks are often highly imbalanced, limiting the performance of classical models. Quantum models offer a promising alternative, exploiting their ability to express complex patterns by operating in a higher-dimensional computational space through superposition and entanglement. These unique properties make quantum models potentially more effective in addressing the challenges of imbalanced datasets. This work evaluates the potential of quantum classifiers in healthcare, focusing on Quantum Neural Networks (QNNs) and Quantum Support Vector Machines (QSVMs), comparing them with popular classical models. The study is based on three well-known healthcare datasets – Prostate Cancer, Heart Failure, and Diabetes. The results indicate that QSVMs outperform QNNs across all datasets due to their susceptibility to overfitting. Furthermore, quantum models prove the ability to overcome classical models in scenarios with high dataset imbalance. Although preliminary, these findings highlight the potential of quantum models in healthcare classification tasks and lead the way for further research in this domain.
nan
Article 1077
Title@2025-05-27 (2): Sentiment Reasoning for Healthcare
Title: Sentiment Reasoning for Healthcare | Sentiment Reasoning für die Gesundheitsversorgung | 保健的情感理由 2407.21054v4 |
Authors: Khai-Nguyen Nguyen, Khai Le-Duc, Bach Phan Tat, Duy Le, Long Vo-Dang, Truong-Son Hy
Transparency in AI healthcare decision-making is crucial. By incorporating rationales to explain reason for each predicted label, users could understand Large Language Models (LLMs)’s reasoning to make better decision. In this work, we introduce a new task - Sentiment Reasoning - for both speech and text modalities, and our proposed multimodal multitask framework and the world’s largest multimodal sentiment analysis dataset. Sentiment Reasoning is an auxiliary task in sentiment analysis where the model predicts both the sentiment label and generates the rationale behind it based on the input transcript. Our study conducted on both human transcripts and Automatic Speech Recognition (ASR) transcripts shows that Sentiment Reasoning helps improve model transparency by providing rationale for model prediction with quality semantically comparable to humans while also improving model’s classification performance (+2% increase in both accuracy and macro-F1) via rationale-augmented fine-tuning. Also, no significant difference in the semantic quality of generated rationales between human and ASR transcripts. All code, data (five languages - Vietnamese, English, Chinese, German, and French) and models are published online: https://github.com/leduckhai/Sentiment-Reasoning
nan
Article 1078
Title@2025-05-27 (2): Leaner Transformers: More Heads, Less Depth
Title: Leaner Transformers: More Heads, Less Depth | Leaner Transformer: Mehr Köpfe, weniger Tiefe | 皮质变形器: 更多的头, 更少深度 2505.20802v1 |
Authors: Hemanth Saratchandran, Damien Teney, Simon Lucey
Transformers have reshaped machine learning by utilizing attention mechanisms to capture complex patterns in large datasets, leading to significant improvements in performance. This success has contributed to the belief that “bigger means better”, leading to ever-increasing model sizes. This paper challenge this ideology by showing that many existing transformers might be unnecessarily oversized. We discover a theoretical principle that redefines the role of multi-head attention. An important benefit of the multiple heads is in improving the conditioning of the attention block. We exploit this theoretical insight and redesign popular architectures with an increased number of heads. The improvement in the conditioning proves so significant in practice that model depth can be decreased, reducing the parameter count by up to 30-50% while maintaining accuracy. We obtain consistent benefits across a variety of transformer-based architectures of various scales, on tasks in computer vision (ImageNet-1k) as well as language and sequence modeling (GLUE benchmark, TinyStories, and the Long-Range Arena benchmark).
nan
Article 1079
Title@2025-05-27 (2): Multi-VQC: A Novel QML Approach for Enhancing Healthcare Classification
Title: Multi-VQC: A Novel QML Approach for Enhancing Healthcare Classification | Multi-VQC: Ein neuartiger QML-Ansatz zur Verbesserung der Gesundheitsklassifikation | 多VQC:加强保健分类的新QML方法 2505.20797v1 |
Authors: Antonio Tudisco, Deborah Volpe, Giovanna Turvani
Accurate and reliable diagnosis of diseases is crucial in enabling timely medical treatment and enhancing patient survival rates. In recent years, Machine Learning has revolutionized diagnostic practices by creating classification models capable of identifying diseases. However, these classification problems often suffer from significant class imbalances, which can inhibit the effectiveness of traditional models. Therefore, the interest in Quantum models has arisen, driven by the captivating promise of overcoming the limitations of the classical counterpart thanks to their ability to express complex patterns by mapping data in a higher-dimensional computational space.
nan
Article 1080
Title@2025-05-27 (2): A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models
Title: A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models | Eine Graphenperspektive zur Untersuchung struktureller Wissensmuster in großen Sprachmodellen | 《大语言模式知识结构模式研究图示展望》 2505.19286v2 |
Authors: Utkarsh Sahu, Zhisheng Qi, Yongjia Lei, Ryan A. Rossi, Franck Dernoncourt, Nesreen K. Ahmed, Mahantesh M Halappanavar, Yao Ma, Yu Wang
Large language models have been extensively studied as neural knowledge bases for their knowledge access, editability, reasoning, and explainability. However, few works focus on the structural patterns of their knowledge. Motivated by this gap, we investigate these structural patterns from a graph perspective. We quantify the knowledge of LLMs at both the triplet and entity levels, and analyze how it relates to graph structural properties such as node degree. Furthermore, we uncover the knowledge homophily, where topologically close entities exhibit similar levels of knowledgeability, which further motivates us to develop graph machine learning models to estimate entity knowledge based on its local neighbors. This model further enables valuable knowledge checking by selecting triplets less known to LLMs. Empirical results show that using selected triplets for fine-tuning leads to superior performance.
nan
Article 1081
Title@2025-05-27 (2): Amortized Bayesian Workflow
Title: Amortized Bayesian Workflow | Amortisierter Bayesischer Workflow | 摊还的贝耶斯人工作流量 2409.04332v2 |
Authors: Chengkun Li, Aki Vehtari, Paul-Christian Bürkner, Stefan T. Radev, Luigi Acerbi, Marvin Schmitt
Bayesian inference often faces a trade-off between computational speed and sampling accuracy. We propose an adaptive workflow that integrates rapid amortized inference with gold-standard MCMC techniques to achieve a favorable combination of both speed and accuracy when performing inference on many observed datasets. Our approach uses principled diagnostics to guide the choice of inference method for each dataset, moving along the Pareto front from fast amortized sampling via generative neural networks to slower but guaranteed-accurate MCMC when needed. By reusing computations across steps, our workflow synergizes amortized and MCMC-based inference. We demonstrate the effectiveness of this integrated approach on several synthetic and real-world problems with tens of thousands of datasets, showing efficiency gains while maintaining high posterior quality.
nan
Article 1082
Title@2025-05-27 (2): Where You Place the Norm Matters: From Prejudiced to Neutral Initializations
Title: Where You Place the Norm Matters: From Prejudiced to Neutral Initializations | Wo Sie die Norm-Materien platzieren: Von voreingenommenen zu neutralen Initialisierungen | 将规范问题放在哪里: 从偏见到中立初始化 2505.11312v3 |
Authors: Emanuele Francazi, Francesco Pinto, Aurelien Lucchi, Marco Baity-Jesi
Normalization layers, such as Batch Normalization and Layer Normalization, are central components in modern neural networks, widely adopted to improve training stability and generalization. While their practical effectiveness is well documented, a detailed theoretical understanding of how normalization affects model behavior, starting from initialization, remains an important open question. In this work, we investigate how both the presence and placement of normalization within hidden layers influence the statistical properties of network predictions before training begins. In particular, we study how these choices shape the distribution of class predictions at initialization, which can range from unbiased (Neutral) to highly concentrated (Prejudiced) toward a subset of classes. Our analysis shows that normalization placement induces systematic differences in the initial prediction behavior of neural networks, which in turn shape the dynamics of learning. By linking architectural choices to prediction statistics at initialization, our work provides a principled understanding of how normalization can influence early training behavior and offers guidance for more controlled and interpretable network design.
nan
Article 1083
Title@2025-05-27 (2): Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset
Title: Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset | Verbesserung der tragbaren Wasserhahn-Audioerkennung durch Unterklasse-Annotation im HD-Epic-Datensatz | 通过在HD-Epic数据集中分级注解,加强穿戴式塔普水音频探测 2505.20788v1 |
Authors: Robin Burchard, Kristof Van Laerhoven
Wearable human activity recognition has been shown to benefit from the inclusion of acoustic data, as the sounds around a person often contain valuable context. However, due to privacy concerns, it is usually not ethically feasible to record and save microphone data from the device, since the audio could, for instance, also contain private conversations. Rather, the data should be processed locally, which in turn requires processing power and consumes energy on the wearable device. One special use case of contextual information that can be utilized to augment special tasks in human activity recognition is water flow detection, which can, e.g., be used to aid wearable hand washing detection. We created a new label called tap water for the recently released HD-Epic data set, creating 717 hand-labeled annotations of tap water flow, based on existing annotations of the water class. We analyzed the relation of tap water and water in the dataset and additionally trained and evaluated two lightweight classifiers to evaluate the newly added label class, showing that the new class can be learned more easily.
nan
Article 1084
Title@2025-05-27 (2): LIB-KD: Learning Inductive Bias, Not Just Parameters A New Perspective on Knowledge Distillations
Title: LIB-KD: Learning Inductive Bias, Not Just Parameters A New Perspective on Knowledge Distillations | LIB-KD: Induktive Bias lernen, nicht nur Parameter Eine neue Perspektive auf Wissensdestillationen | LIB-KD:学习感性偏见,而不仅仅是知识蒸馏的新视角参数 2310.00369v3 |
Authors: Gousia Habib, Tausifa Jan Saleem, Ishfaq Ahmad Malik, Brejesh Lall
With the rapid development of computer vision, Vision Transformers (ViTs) offer the tantalizing prospect of unified information processing across visual and textual domains. But due to the lack of inherent inductive biases in ViTs, they require enormous amount of data for training. To make their applications practical, we introduce an innovative ensemble-based distillation approach distilling inductive bias from complementary lightweight teacher models. Prior systems relied solely on convolution-based teaching. However, this method incorporates an ensemble of light teachers with different architectural tendencies, such as convolution and involution, to instruct the student transformer jointly. Because of these unique inductive biases, instructors can accumulate a wide range of knowledge, even from readily identifiable stored datasets, which leads to enhanced student performance. Our proposed framework also involves precomputing and storing logits in advance, essentially the unnormalized predictions of the model. This optimization can accelerate the distillation process by eliminating the need for repeated forward passes during knowledge distillation, significantly reducing the computational burden and enhancing efficiency.
nan
Article 1085
Title@2025-05-27 (2): Low-Rank Adapting Models for Sparse Autoencoders
Title: Low-Rank Adapting Models for Sparse Autoencoders | Low-Rank Anpassungsmodelle für Sparse Autoencoder | 普通自动解析器低 Rank 适应模型 2501.19406v2 |
Authors: Matthew Chen, Joshua Engels, Max Tegmark
Sparse autoencoders (SAEs) decompose language model representations into a sparse set of linear latent vectors. Recent works have improved SAEs using language model gradients, but these techniques require many expensive backward passes during training and still cause a significant increase in cross entropy loss when SAE reconstructions are inserted into the model. In this work, we improve on these limitations by taking a fundamentally different approach: we use low-rank adaptation (LoRA) to finetune the \textit{language model itself} around a previously trained SAE. We analyze our method across SAE sparsity, SAE width, language model size, LoRA rank, and model layer on the Gemma Scope family of SAEs. In these settings, our method reduces the cross entropy loss gap by 30\% to 55\% when SAEs are inserted during the forward pass. We also find that compared to end-to-end (e2e) SAEs, our approach achieves the same downstream cross entropy loss 3$\times$ to 20$\times$ faster on \gemma and 2$\times$ to 10$\times$ faster on \llama. We further show that our technique improves downstream metrics and can adapt multiple SAEs at once without harming general language model capabilities. Our results demonstrate that improving model interpretability is not limited to post-hoc SAE training; Pareto improvements can also be achieved by directly optimizing the model itself.
nan
Article 1086
Title@2025-05-27 (2): STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation
Title: STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation | STITCH-OPE: Trajektorienstiche mit geführter Diffusion für Off-Policy-Bewertung | STSTTCH-OPE: 非政策评价的引导传播的轨迹 2505.20781v1 |
Authors: Hossein Goli, Michael Gimelfarb, Nathan Samuel de Lara, Haruki Nishimura, Masha Itkina, Florian Shkurti
Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or compounding errors from learned dynamics models. To address these challenges, we propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE in high-dimensional state and action spaces. Starting with a diffusion model pre-trained on the behavior data, STITCH-OPE generates synthetic trajectories from the target policy by guiding the denoising process using the score function of the target policy. STITCH-OPE proposes two technical innovations that make it advantageous for OPE: (1) prevents over-regularization by subtracting the score of the behavior policy during guidance, and (2) generates long-horizon trajectories by stitching partial trajectories together end-to-end. We provide a theoretical guarantee that under mild assumptions, these modifications result in an exponential reduction in variance versus long-horizon trajectory diffusion. Experiments on the D4RL and OpenAI Gym benchmarks show substantial improvement in mean squared error, correlation, and regret metrics compared to state-of-the-art OPE methods.
nan
Article 1087
Title@2025-05-27 (2): SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
Title: SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences | SpecExtend: Ein Drop-in-Enhancement für spekulative Decoding von langen Sequenzen | 外观:对长期序列的投机性代谢的减少增强 2505.20776v1 |
Authors: Jungyoub Cha, Hyunjong Kim, Sungzoon Cho
Speculative decoding is a widely adopted technique for accelerating inference in large language models (LLMs), but its performance degrades on long inputs due to increased attention cost and reduced draft accuracy. We introduce SpecExtend, a drop-in enhancement that improves the performance of speculative decoding on long sequences without any additional training. SpecExtend integrates efficient attention mechanisms such as FlashAttention and Hybrid Tree Attention into both the draft and target models, reducing latency across all stages. To improve draft accuracy and speed, we propose Cross-model Retrieval, a novel KV cache update strategy that uses the target model’s attention scores to dynamically select relevant context for the draft model. Extensive evaluations on three long-context understanding datasets show that SpecExtend accelerates standard tree-based speculative decoding by up to 2.22x for inputs up to 16K tokens, providing an effective solution for speculative decoding of long sequences. The code is available at https://github.com/jycha98/SpecExtend .
nan
Article 1088
Title@2025-05-27 (2): T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning
Title: T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning | T-REX: Mixture-of-Rank-One-Experts mit semantischer Intuition für Multi-Task Large Language Model Finetuning | T-REX:多任务大语言模型微调中具有语义认知度的多任务大语言模型微调混合型兰克单方专家 2404.08985v2 |
Authors: Rongyu Zhang, Yijiang Liu, Huanrui Yang, Shenli Zheng, Dan Wang, Yuan Du, Li Du, Shanghang Zhang
Large language models (LLMs) encounter significant adaptation challenges in diverse multitask finetuning. Mixture-of-experts (MoE) provides a promising solution with a dynamic architecture, enabling effective task decoupling. However, scaling up the number of MoE experts incurs substantial parameter and computational overheads and suffers from limited performance gain due to naive routing mechanisms. In this paper, we design a novel framework, mix\underline{\textbf{T}}ure\underline{\textbf{-}}of-\underline{\textbf{R}}ank-on\underline{\textbf{E}}-e\underline{\textbf{X}}perts (\texttt{T-REX}), which leverages the combination of ultra-low rank experts to construct LoRA weights on pretrained LLMs. The rank-1 experts enable a mix-and-match mechanism to quadratically expand the vector subspace of experts with linear parameter overheads, achieving approximate error reduction with optimal efficiency. In addition, T-REX offers implicit guidance to the router, leveraging the inherent semantic clustering of training embeddings as prior knowledge, enabling optimized feature allocation across experts for a smoother convergence. Extensive theoretical and empirical results demonstrate that T-REX achieves superior efficiency and generalizability across diverse tasks. Compared with other LoRA-based methods, T-REX achieves up to 1.78\% mean accuracy improvement with around 30\%-40\% less trainable parameters across 14 public datasets. \href{https://github.com/RoyZry98/T-REX-Pytorch}{Code} is available.
nan
Article 1089
Title@2025-05-27 (2): Non-invasive maturity assessment of iPSC-CMs based on optical maturity characteristics using interpretable AI
Title: Non-invasive maturity assessment of iPSC-CMs based on optical maturity characteristics using interpretable AI | Nicht-invasive Bewertung der Laufzeit von iPSC-CMs auf der Grundlage optischer Reifemerkmale unter Verwendung interpretierbarer KI | 使用可解释的AI根据光学成熟度特性对iPSC-CMMs进行非侵入性成熟度评估 2505.20775v1 |
Authors: Fabian Scheurer, Alexander Hammer, Mario Schubert, Robert-Patrick Steiner, Oliver Gamm, Kaomei Guan, Frank Sonntag, Hagen Malberg, Martin Schmidt
Human induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) are an important resource for the identification of new therapeutic targets and cardioprotective drugs. After differentiation iPSC-CMs show an immature, fetal-like phenotype. Cultivation of iPSC-CMs in lipid-supplemented maturation medium (MM) strongly enhances their structural, metabolic and functional phenotype. Nevertheless, assessing iPSC-CM maturation state remains challenging as most methods are time consuming and go in line with cell damage or loss of the sample. To address this issue, we developed a non-invasive approach for automated classification of iPSC-CM maturity through interpretable artificial intelligence (AI)-based analysis of beat characteristics derived from video-based motion analysis. In a prospective study, we evaluated 230 video recordings of early-state, immature iPSC-CMs on day 21 after differentiation (d21) and more mature iPSC-CMs cultured in MM (d42, MM). For each recording, 10 features were extracted using Maia motion analysis software and entered into a support vector machine (SVM). The hyperparameters of the SVM were optimized in a grid search on 80 % of the data using 5-fold cross-validation. The optimized model achieved an accuracy of 99.5 $\pm$ 1.1 % on a hold-out test set. Shapley Additive Explanations (SHAP) identified displacement, relaxation-rise time and beating duration as the most relevant features for assessing maturity level. Our results suggest the use of non-invasive, optical motion analysis combined with AI-based methods as a tool to assess iPSC-CMs maturity and could be applied before performing functional readouts or drug testing. This may potentially reduce the variability and improve the reproducibility of experimental studies.
nan
Article 1090
Title@2025-05-27 (2): TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state
Title: TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state | TimePro: Effiziente Multivariate Langzeit-Zeitreihen-Prognose mit variabler und zeitversetzter Hyperstate | 具有可变和时间warware超状态预测的高效多变长期时间序列 2505.20774v1 |
Authors: Xiaowen Ma, Zhenliang Ni, Shuai Xiao, Xinghao Chen
In long-term time series forecasting, different variables often influence the target variable over distinct time intervals, a challenge known as the multi-delay issue. Traditional models typically process all variables or time points uniformly, which limits their ability to capture complex variable relationships and obtain non-trivial time representations. To address this issue, we propose TimePro, an innovative Mamba-based model that constructs variate- and time-aware hyper-states. Unlike conventional approaches that merely transfer plain states across variable or time dimensions, TimePro preserves the fine-grained temporal features of each variate token and adaptively selects the focused time points to tune the plain state. The reconstructed hyper-state can perceive both variable relationships and salient temporal information, which helps the model make accurate forecasting. In experiments, TimePro performs competitively on eight real-world long-term forecasting benchmarks with satisfactory linear complexity. Code is available at https://github.com/xwmaxwma/TimePro.
nan
Article 1091
Title@2025-05-27 (2): MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
Title: MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning | MetaSlot: Durchbruch durch die feste Anzahl von Slots im Objekt-Zentrischen Lernen | MetaSlot: 打破对象中心学习中的固定空格数 2505.20772v1 |
Authors: Hongjia Liu, Rongzhen Zhao, Haohan Chen, Joni Pajarinen
Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods adopt Slot Attention or its variants to iteratively aggregate objects’ super-pixels into a fixed set of query feature vectors, termed slots. However, their reliance on a static slot count leads to an object being represented as multiple parts when the number of objects varies. We introduce MetaSlot, a plug-and-play Slot Attention variant that adapts to variable object counts. MetaSlot (i) maintains a codebook that holds prototypes of objects in a dataset by vector-quantizing the resulting slot representations; (ii) removes duplicate slots from the traditionally aggregated slots by quantizing them with the codebook; and (iii) injects progressively weaker noise into the Slot Attention iterations to accelerate and stabilize the aggregation. MetaSlot is a general Slot Attention variant that can be seamlessly integrated into existing OCL architectures. Across multiple public datasets and tasks–including object discovery and recognition–models equipped with MetaSlot achieve significant performance gains and markedly interpretable slot representations, compared with existing Slot Attention variants.
nan
Article 1092
Title@2025-05-27 (2): ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools
Title: ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools | ChemHAS: Hierarchische Agenzien-Stacking zur Verbesserung von Chemiewerkzeugen | ChemHAS:加强化学工具的等级代理人 2505.21569v1 |
Authors: Zhucong Li, Bowei Zhang, Jin Xiao, Zhijian Zhou, Fenglei Cao, Jiaqing Liang, Yuan Qi
Large Language Model (LLM)-based agents have demonstrated the ability to improve performance in chemistry-related tasks by selecting appropriate tools. However, their effectiveness remains limited by the inherent prediction errors of chemistry tools. In this paper, we take a step further by exploring how LLMbased agents can, in turn, be leveraged to reduce prediction errors of the tools. To this end, we propose ChemHAS (Chemical Hierarchical Agent Stacking), a simple yet effective method that enhances chemistry tools through optimizing agent-stacking structures from limited data. ChemHAS achieves state-of-the-art performance across four fundamental chemistry tasks, demonstrating that our method can effectively compensate for prediction errors of the tools. Furthermore, we identify and characterize four distinct agent-stacking behaviors, potentially improving interpretability and revealing new possibilities for AI agent applications in scientific research. Our code and dataset are publicly available at https: //anonymous.4open.science/r/ChemHAS-01E4/README.md.
nan
Article 1093
Title@2025-05-27 (2): Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games
Title: Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games | Divide-Fuse-Conquer: Eliciting “Aha Momente” in Multi-Szenario-Spiele | 分裂-裂变:在多种场景运动会中激发“哈动力” 2505.16401v2 |
Authors: Xiaoqing Zhang, Huabin Zheng, Ang Lv, Yuhan Liu, Zirui Song, Flood Sung, Xiuying Chen, Rui Yan
Large language models (LLMs) have been observed to suddenly exhibit advanced reasoning abilities during reinforcement learning (RL), resembling an ``aha moment’’ triggered by simple outcome-based rewards. While RL has proven effective in eliciting such breakthroughs in tasks involving mathematics, coding, and vision, it faces significant challenges in multi-scenario games. The diversity of game rules, interaction modes, and environmental complexities often leads to policies that perform well in one scenario but fail to generalize to others. Simply combining multiple scenarios during training introduces additional challenges, such as training instability and poor performance. To overcome these challenges, we propose Divide-Fuse-Conquer, a framework designed to enhance generalization in multi-scenario RL. This approach starts by heuristically grouping games based on characteristics such as rules and difficulties. Specialized models are then trained for each group to excel at games in the group is what we refer to as the divide step. Next, we fuse model parameters from different groups as a new model, and continue training it for multiple groups, until the scenarios in all groups are conquered. Experiments across 18 TextArena games show that Qwen2.5-32B-Align trained with the Divide-Fuse-Conquer strategy reaches a performance level comparable to Claude3.5, achieving 7 wins and 4 draws. We hope our approach can inspire future research on using reinforcement learning to improve the generalization of LLMs.
nan
Article 1094
Title@2025-05-27 (2): Robust and Explainable Detector of Time Series Anomaly via Augmenting Multiclass Pseudo-Anomalies
Title: Robust and Explainable Detector of Time Series Anomaly via Augmenting Multiclass Pseudo-Anomalies | Robuster und erklärbarer Detektor der Zeitreihenanomalie durch Augmenting-Multiclass-Pseudoanomalien | 通过增强多级优度反射器反射反射器,对时间序列时间序列进行强力和可解释的探测器 2505.20765v1 |
Authors: Kohei Obata, Yasuko Matsubara, Yasushi Sakurai
Unsupervised anomaly detection in time series has been a pivotal research area for decades. Current mainstream approaches focus on learning normality, on the assumption that all or most of the samples in the training set are normal. However, anomalies in the training set (i.e., anomaly contamination) can be misleading. Recent studies employ data augmentation to generate pseudo-anomalies and learn the boundary separating the training samples from the augmented samples. Although this approach mitigates anomaly contamination if augmented samples mimic unseen real anomalies, it suffers from several limitations. (1) Covering a wide range of time series anomalies is challenging. (2) It disregards augmented samples that resemble normal samples (i.e., false anomalies). (3) It places too much trust in the labels of training and augmented samples. In response, we propose RedLamp, which employs diverse data augmentations to generate multiclass pseudo-anomalies and learns the multiclass boundary. Such multiclass pseudo-anomalies cover a wide variety of time series anomalies. We conduct multiclass classification using soft labels, which prevents the model from being overconfident and ensures its robustness against contaminated/false anomalies. The learned latent space is inherently explainable as it is trained to separate pseudo-anomalies into multiclasses. Extensive experiments demonstrate the effectiveness of RedLamp in anomaly detection and its robustness against anomaly contamination.
nan
Article 1095
Title@2025-05-27 (2): ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
Title: ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval | ConText-CIR: Von Konzepten lernen im Text für das komponierte Bild-Retrieval | ConText-CIR:从合成图像检索文本中的概念学习 2505.20764v1 |
Authors: Eric Xing, Pranavi Kolouju, Robert Pless, Abby Stylianou, Nathan Jacobs
Composed image retrieval (CIR) is the task of retrieving a target image specified by a query image and a relative text that describes a semantic modification to the query image. Existing methods in CIR struggle to accurately represent the image and the text modification, resulting in subpar performance. To address this limitation, we introduce a CIR framework, ConText-CIR, trained with a Text Concept-Consistency loss that encourages the representations of noun phrases in the text modification to better attend to the relevant parts of the query image. To support training with this loss function, we also propose a synthetic data generation pipeline that creates training data from existing CIR datasets or unlabeled images. We show that these components together enable stronger performance on CIR tasks, setting a new state-of-the-art in composed image retrieval in both the supervised and zero-shot settings on multiple benchmark datasets, including CIRR and CIRCO. Source code, model checkpoints, and our new datasets are available at https://github.com/mvrl/ConText-CIR.
nan
Article 1096
Title@2025-05-27 (2): Learning to Explain Air Traffic Situation
Title: Learning to Explain Air Traffic Situation | Erklären der Lage im Luftverkehr | 学习解释空中交通状况 2502.10764v2 |
Authors: Hong-ah Chai, Seokbin Yoon, Keumjin Lee
Understanding how air traffic controllers construct a mental ‘picture’ of complex air traffic situations is crucial but remains a challenge due to the inherently intricate, high-dimensional interactions between aircraft, pilots, and controllers. Previous work on modeling the strategies of air traffic controllers and their mental image of traffic situations often centers on specific air traffic control tasks or pairwise interactions between aircraft, neglecting to capture the comprehensive dynamics of an air traffic situation. To address this issue, we propose a machine learning-based framework for explaining air traffic situations. Specifically, we employ a Transformer-based multi-agent trajectory model that encapsulates both the spatio-temporal movement of aircraft and social interaction between them. By deriving attention scores from the model, we can quantify the influence of individual aircraft on overall traffic dynamics. This provides explainable insights into how air traffic controllers perceive and understand the traffic situation. Trained on real-world air traffic surveillance data collected from the terminal airspace around Incheon International Airport in South Korea, our framework effectively explicates air traffic situations. This could potentially support and enhance the decision-making and situational awareness of air traffic controllers.
nan
Article 1097
Title@2025-05-27 (2): Practical estimation of the optimal classification error with soft labels and calibration
Title: Practical estimation of the optimal classification error with soft labels and calibration | Praktische Schätzung des optimalen Klassifizierungsfehlers mit Softlabels und Kalibrierung | 用软标签和校准校准对最佳分类错误的实际估计 2505.20761v1 |
Authors: Ryota Ushio, Takashi Ishida, Masashi Sugiyama
While the performance of machine learning systems has experienced significant improvement in recent years, relatively little attention has been paid to the fundamental question: to what extent can we improve our models? This paper provides a means of answering this question in the setting of binary classification, which is practical and theoretically supported. We extend a previous work that utilizes soft labels for estimating the Bayes error, the optimal error rate, in two important ways. First, we theoretically investigate the properties of the bias of the hard-label-based estimator discussed in the original work. We reveal that the decay rate of the bias is adaptive to how well the two class-conditional distributions are separated, and it can decay significantly faster than the previous result suggested as the number of hard labels per instance grows. Second, we tackle a more challenging problem setting: estimation with corrupted soft labels. One might be tempted to use calibrated soft labels instead of clean ones. However, we reveal that calibration guarantee is not enough, that is, even perfectly calibrated soft labels can result in a substantially inaccurate estimate. Then, we show that isotonic calibration can provide a statistically consistent estimator under an assumption weaker than that of the previous work. Our method is instance-free, i.e., we do not assume access to any input instances. This feature allows it to be adopted in practical scenarios where the instances are not available due to privacy issues. Experiments with synthetic and real-world datasets show the validity of our methods and theory.
nan
Article 1098
Title@2025-05-27 (2): Multi-Stage Speaker Diarization for Noisy Classrooms
Title: Multi-Stage Speaker Diarization for Noisy Classrooms | Mehrstufige Speaker-Diarisierung für Lärmklassenräume | 多级发言人 多级发言人 吵闹教室的响声 2505.10879v2 |
Authors: Ali Sartaz Khan, Tolulope Ogunremi, Ahmed Adel Attia, Dorottya Demszky
Speaker diarization, the process of identifying “who spoke when” in audio recordings, is essential for understanding classroom dynamics. However, classroom settings present distinct challenges, including poor recording quality, high levels of background noise, overlapping speech, and the difficulty of accurately capturing children’s voices. This study investigates the effectiveness of multi-stage diarization models using Nvidia’s NeMo diarization pipeline. We assess the impact of denoising on diarization accuracy and compare various voice activity detection (VAD) models, including self-supervised transformer-based frame-wise VAD models. We also explore a hybrid VAD approach that integrates Automatic Speech Recognition (ASR) word-level timestamps with frame-level VAD predictions. We conduct experiments using two datasets from English speaking classrooms to separate teacher vs. student speech and to separate all speakers. Our results show that denoising significantly improves the Diarization Error Rate (DER) by reducing the rate of missed speech. Additionally, training on both denoised and noisy datasets leads to substantial performance gains in noisy conditions. The hybrid VAD model leads to further improvements in speech detection, achieving a DER as low as 17% in teacher-student experiments and 45% in all-speaker experiments. However, we also identified trade-offs between voice activity detection and speaker confusion. Overall, our study highlights the effectiveness of multi-stage diarization models and integrating ASR-based information for enhancing speaker diarization in noisy classroom environments.
nan
Article 1099
Title@2025-05-27 (2): Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model
Title: Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model | Paarweise Optimale Transporte für Training All-to-All Flow-Based Condition Transfer Modell | 以对等方式最佳运输培训全到所有流动条件转让模式 2504.03188v2 |
Authors: Kotaro Ikeda, Masanori Koyama, Jinzhe Zhang, Kohei Hayashi, Kenji Fukumizu
In this paper, we propose a flow-based method for learning all-to-all transfer maps among conditional distributions that approximates pairwise optimal transport. The proposed method addresses the challenge of handling the case of continuous conditions, which often involve a large set of conditions with sparse empirical observations per condition. We introduce a novel cost function that enables simultaneous learning of optimal transports for all pairs of conditional distributions. Our method is supported by a theoretical guarantee that, in the limit, it converges to the pairwise optimal transports among infinite pairs of conditional distributions. The learned transport maps are subsequently used to couple data points in conditional flow matching. We demonstrate the effectiveness of this method on synthetic and benchmark datasets, as well as on chemical datasets in which continuous physical properties are defined as conditions.
nan
Article 1100
Title@2025-05-27 (2): Scalable Model Merging with Progressive Layer-wise Distillation
Title: Scalable Model Merging with Progressive Layer-wise Distillation | Skalierbares Modell Zusammenführen mit progressiver schichtweiser Destillation | 可缩放模型与递进图层蒸馏法合并 2502.12706v2 |
Authors: Jing Xu, Jiazheng Li, Jingzhao Zhang
Model merging offers an effective way to integrate the capabilities of multiple fine-tuned models. However, the performance degradation of the merged model remains a challenge, particularly when none or few data are available. This paper first highlights the necessity of domain-specific data for model merging by proving that data-agnostic algorithms can have arbitrarily bad worst-case performance. Building on this theoretical insight, we explore the relationship between model merging and distillation, introducing a novel few-shot merging algorithm, ProDistill (Progressive Layer-wise Distillation). Unlike common belief that layer wise training hurts performance, we show that layer-wise teacher-student distillation not only enhances the scalability but also improves model merging performance. We conduct extensive experiments to show that compared to existing few-shot merging methods, ProDistill achieves state-of-the-art performance, with up to 6.14% and 6.61% improvements in vision and NLU tasks. Furthermore, we extend the experiments to models with over 10B parameters, showcasing the exceptional scalability of ProDistill.
nan
Article 1101
Title@2025-05-27 (2): Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction
Title: Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction | Uni-Instruct: Einstufiges Diffusionsmodell durch Unified Diffusion Divergence Instruction | Uni- Instruct: 通过统一扩散分散指令单步扩散模型 2505.20755v1 |
Authors: Yifei Wang, Weimin Bai, Colin Zhang, Debing Zhang, Weijian Luo, He Sun
In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, $f$-distill, etc, inside a theory-driven framework which we name the \textbf{\emph{Uni-Instruct}}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the $f$-divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded $f$-divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded $f$-divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \textbf{\emph{1.46}} for unconditional generation and \textbf{\emph{1.38}} for conditional generation. On the ImageNet-$64\times 64$ generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of \textbf{\emph{1.02}}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33 (1.02 vs 2.35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.
nan
Article 1102
Title@2025-05-27 (2): Stationary MMD Points for Cubature
Title: Stationary MMD Points for Cubature | Stationäre MMD-Punkte für Kubature | Cubature 固定的 MMMD点 2505.20754v1 |
Authors: Zonghao Chen, Toni Karvonen, Heishiro Kanagawa, François-Xavier Briol, Chris. J. Oates
Approximation of a target probability distribution using a finite set of points is a problem of fundamental importance, arising in cubature, data compression, and optimisation. Several authors have proposed to select points by minimising a maximum mean discrepancy (MMD), but the non-convexity of this objective precludes global minimisation in general. Instead, we consider \emph{stationary} points of the MMD which, in contrast to points globally minimising the MMD, can be accurately computed. Our main theoretical contribution is the (perhaps surprising) result that, for integrands in the associated reproducing kernel Hilbert space, the cubature error of stationary MMD points vanishes \emph{faster} than the MMD. Motivated by this \emph{super-convergence} property, we consider discretised gradient flows as a practical strategy for computing stationary points of the MMD, presenting a refined convergence analysis that establishes a novel non-asymptotic finite-particle error bound, which may be of independent interest.
nan
Article 1103
Title@2025-05-27 (2): EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models
Title: EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models | EaqVLA: Kodierungsorientierte Quantisierung für Vision-Language-Action-Modelle | EaqVLA: 愿景-语言-行动模式的编码和一致的量化 2505.21567v1 |
Authors: Feng Jiang, Zihao Zheng, Xiuping Cui, Maoliang Li, JIayu Chen, Xiang Chen
With the development of Embodied Artificial intelligence, the end-to-end control policy such as Vision-Language-Action (VLA) model has become the mainstream. Existing VLA models faces expensive computing/storage cost, which need to be optimized. Quantization is considered as the most effective method which can not only reduce the memory cost but also achieve computation acceleration. However, we find the token alignment of VLA models hinders the application of existing quantization methods. To address this, we proposed an optimized framework called EaqVLA, which apply encoding-aligned quantization to VLA models. Specifically, we propose an complete analysis method to find the misalignment in various granularity. Based on the analysis results, we propose a mixed precision quantization with the awareness of encoding alignment. Experiments shows that the porposed EaqVLA achieves better quantization performance (with the minimal quantization loss for end-to-end action control and xxx times acceleration) than existing quantization methods.
nan
Article 1104
Title@2025-05-27 (2): Map Space Belief Prediction for Manipulation-Enhanced Mapping
Title: Map Space Belief Prediction for Manipulation-Enhanced Mapping | Karte Raum Glaube Vorhersage für manipulations-verbesserte Mapping | 人工-增强绘图的地图空间信仰预测 2502.20606v2 |
Authors: Joao Marcos Correia Marques, Nils Dengler, Tobias Zaenker, Jesper Mucke, Shenlong Wang, Maren Bennewitz, Kris Hauser
Searching for objects in cluttered environments requires selecting efficient viewpoints and manipulation actions to remove occlusions and reduce uncertainty in object locations, shapes, and categories. In this work, we address the problem of manipulation-enhanced semantic mapping, where a robot has to efficiently identify all objects in a cluttered shelf. Although Partially Observable Markov Decision Processes~(POMDPs) are standard for decision-making under uncertainty, representing unstructured interactive worlds remains challenging in this formalism. To tackle this, we define a POMDP whose belief is summarized by a metric-semantic grid map and propose a novel framework that uses neural networks to perform map-space belief updates to reason efficiently and simultaneously about object geometries, locations, categories, occlusions, and manipulation physics. Further, to enable accurate information gain analysis, the learned belief updates should maintain calibrated estimates of uncertainty. Therefore, we propose Calibrated Neural-Accelerated Belief Updates (CNABUs) to learn a belief propagation model that generalizes to novel scenarios and provides confidence-calibrated predictions for unknown areas. Our experiments show that our novel POMDP planner improves map completeness and accuracy over existing methods in challenging simulations and successfully transfers to real-world cluttered shelves in zero-shot fashion.
nan
Article 1105
Title@2025-05-27 (2): MOLLM: Multi-Objective Large Language Model for Molecular Design – Optimizing with Experts
Title: MOLLM: Multi-Objective Large Language Model for Molecular Design – Optimizing with Experts | MOLLM: Multi-Objective Large Language Model for Molecular Design – Optimierung mit Experten | MOLLM: 分子设计多目标大语言模型 – – 与专家优化 2502.12845v2 |
Authors: Nian Ran, Yue Wang, Richard Allmendinger
Molecular design plays a critical role in advancing fields such as drug discovery, materials science, and chemical engineering. This work introduces the Multi-Objective Large Language Model for Molecular Design (MOLLM), a novel framework that combines domain-specific knowledge with the adaptability of large language models to optimize molecular properties across multiple objectives. Leveraging in-context learning and multi-objective optimization, MOLLM achieves superior performance and innovation, consistently surpassing state-of-the-art (SOTA) methods. We significantly improve the efficiency of our framework, making it 14 times faster and substantially more cost-effective without compromising performance compared to the latest similar work. Our results demonstrate that MOLLM consistently outperforms SOTA models across experiments and excels on the PMO benchmark. In addition, we provide extensive ablation studies and analysis to evaluate the effectiveness of each component and the quality of the output molecules.
nan
Article 1106
Title@2025-05-27 (2): ‘Hello, World!’: Making GNNs Talk with LLMs
Title: ‘Hello, World!’: Making GNNs Talk with LLMs | “Hallo, Welt!”: GNNs mit LLMs sprechen zu lassen | “你好,世界!” “让GNNs和LLMs说话” 2505.20742v1 |
Authors: Sunwoo Kim, Soo Yong Lee, Jaemin Yoo, Kijung Shin
While graph neural networks (GNNs) have shown remarkable performance across diverse graph-related tasks, their high-dimensional hidden representations render them black boxes. In this work, we propose Graph Lingual Network (GLN), a GNN built on large language models (LLMs), with hidden representations in the form of human-readable text. Through careful prompt design, GLN incorporates not only the message passing module of GNNs but also advanced GNN techniques, including graph attention and initial residual connection. The comprehensibility of GLN’s hidden representations enables an intuitive analysis of how node representations change (1) across layers and (2) under advanced GNN techniques, shedding light on the inner workings of GNNs. Furthermore, we demonstrate that GLN achieves strong zero-shot performance on node classification and link prediction, outperforming existing LLM-based baseline methods.
nan
Article 1107
Title@2025-05-27 (2): Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?
Title: Can Small Language Models Learn, Unlearn, and Retain Noise Patterns? | Können kleine Sprachmodelle Geräuschmuster lernen, nicht lernen und erhalten? | 小语言模型能够学习、不学习和保留噪音模式吗? 2407.00996v3 |
Authors: Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani
With the growing need for efficient language models in resource-constrained environments, Small Language Models (SLMs) have emerged as compact and practical alternatives to Large Language Models (LLMs). While studies have explored noise handling in LLMs, little is known about how SLMs handle noise, a critical factor for their reliable real-world deployment. This study investigates the ability of SLMs with parameters between 1 and 3 billion to learn, retain, and subsequently eliminate different types of noise (word flip, character flip, transliteration, irrelevant content, and contradictory information). Four pretrained SLMs (Olmo 1B, Qwen1.5 1.8B, Gemma1.1 2B, and Phi2 2.7B) were instruction-tuned on noise-free data and tested with in-context examples to assess noise learning. Subsequently, noise patterns were introduced in instruction tuning to assess their adaptability. The results revealed differences in how models handle noise, with smaller models like Olmo quickly adapting to noise patterns. Phi2’s carefully curated, structured, and high-quality pretraining data enabled resistance to character level, transliteration, and counterfactual noise, while Gemma adapted successfully to transliteration noise through its multilingual pretraining. Subsequent clean data training effectively mitigated noise effects. These findings provide practical strategies for developing robust SLMs for real-world applications.
nan
Article 1108
Title@2025-05-27 (2): Detecting Informative Channels: ActionFormer
Title: Detecting Informative Channels: ActionFormer | Informative Kanäle erkennen: AktionEhemaliger | 检测信息渠道:行动前 2505.20739v1 |
Authors: Kunpeng Zhao, Asahi Miyazaki, Tsuyoshi Okita
Human Activity Recognition (HAR) has recently witnessed advancements with Transformer-based models. Especially, ActionFormer shows us a new perspectives for HAR in the sense that this approach gives us additional outputs which detect the border of the activities as well as the activity labels. ActionFormer was originally proposed with its input as image/video. However, this was converted to with its input as sensor signals as well. We analyze this extensively in terms of deep learning architectures. Based on the report of high temporal dynamics which limits the model’s ability to capture subtle changes effectively and of the interdependencies between the spatial and temporal features. We propose the modified ActionFormer which will decrease these defects for sensor signals. The key to our approach lies in accordance with the Sequence-and-Excitation strategy to minimize the increase in additional parameters and opt for the swish activation function to retain the information about direction in the negative range. Experiments on the WEAR dataset show that our method achieves substantial improvement of a 16.01\% in terms of average mAP for inertial data.
nan
Article 1109
Title@2025-05-27 (2): Adversarial bandit optimization for approximately linear functions
Title: Adversarial bandit optimization for approximately linear functions | Adversariale Bandit-Optimierung für etwa lineare Funktionen | 大约直线功能的对面土匪优化 2505.20734v1 |
Authors: Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto
We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player’s choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.
nan
Article 1110
Title@2025-05-27 (2): SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution
Title: SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution | SPA-RL: Verstärkung der LLM-Agenten durch schrittweise Fortschrittszuweisung | SPA-RL:通过逐步推进加强LLM代理 2505.20732v1 |
Authors: Hanlin Wang, Chak Tou Leong, Jiashuo Wang, Jian Wang, Wenjie Li
Reinforcement learning (RL) holds significant promise for training LLM agents to handle complex, goal-oriented tasks that require multi-step interactions with external environments. However, a critical challenge when applying RL to these agentic tasks arises from delayed rewards: feedback signals are typically available only after the entire task is completed. This makes it non-trivial to assign delayed rewards to earlier actions, providing insufficient guidance regarding environmental constraints and hindering agent training. In this work, we draw on the insight that the ultimate completion of a task emerges from the cumulative progress an agent makes across individual steps. We propose Stepwise Progress Attribution (SPA), a general reward redistribution framework that decomposes the final reward into stepwise contributions, each reflecting its incremental progress toward overall task completion. To achieve this, we train a progress estimator that accumulates stepwise contributions over a trajectory to match the task completion. During policy optimization, we combine the estimated per-step contribution with a grounding signal for actions executed in the environment as the fine-grained, intermediate reward for effective agent training. Extensive experiments on common agent benchmarks (including Webshop, ALFWorld, and VirtualHome) demonstrate that SPA consistently outperforms the state-of-the-art method in both success rate (+2.5\% on average) and grounding accuracy (+1.9\% on average). Further analyses demonstrate that our method remarkably provides more effective intermediate rewards for RL training. Our code is available at https://github.com/WangHanLinHenry/SPA-RL-Agent.
nan
Article 1111
Title@2025-05-27 (2): Semi-supervised Clustering Through Representation Learning of Large-scale EHR Data
Title: Semi-supervised Clustering Through Representation Learning of Large-scale EHR Data | Halbüberwachtes Clustering durch Repräsentationslernen von EHR-Großdaten | 通过代表学习大规模电子人力资源数据,进行半监督的集群组合 2505.20731v1 |
Authors: Linshanshan Wang, Mengyan Li, Zongqi Xia, Molei Liu, Tianxi Cai
Electronic Health Records (EHR) offer rich real-world data for personalized medicine, providing insights into disease progression, treatment responses, and patient outcomes. However, their sparsity, heterogeneity, and high dimensionality make them difficult to model, while the lack of standardized ground truth further complicates predictive modeling. To address these challenges, we propose SCORE, a semi-supervised representation learning framework that captures multi-domain disease profiles through patient embeddings. SCORE employs a Poisson-Adapted Latent factor Mixture (PALM) Model with pre-trained code embeddings to characterize codified features and extract meaningful patient phenotypes and embeddings. To handle the computational challenges of large-scale data, it introduces a hybrid Expectation-Maximization (EM) and Gaussian Variational Approximation (GVA) algorithm, leveraging limited labeled data to refine estimates on a vast pool of unlabeled samples. We theoretically establish the convergence of this hybrid approach, quantify GVA errors, and derive SCORE’s error rate under diverging embedding dimensions. Our analysis shows that incorporating unlabeled data enhances accuracy and reduces sensitivity to label scarcity. Extensive simulations confirm SCORE’s superior finite-sample performance over existing methods. Finally, we apply SCORE to predict disability status for patients with multiple sclerosis (MS) using partially labeled EHR data, demonstrating that it produces more informative and predictive patient embeddings for multiple MS-related conditions compared to existing approaches.
nan
Article 1112
Title@2025-05-27 (2): What LLMs Miss in Recommendations: Bridging the Gap with Retrieval-Augmented Collaborative Signals
Title: What LLMs Miss in Recommendations: Bridging the Gap with Retrieval-Augmented Collaborative Signals | Was LLMs in Empfehlungen vermissen: Die Lücke mit retrieval-Augmented Collaborative Signals überbrücken | 在建议中错过了什么的LLM女士:用检索增强的合作信号弥合差距 2505.20730v1 |
Authors: Shahrooz Pouryousef
User-item interactions contain rich collaborative signals that form the backbone of many successful recommender systems. While recent work has explored the use of large language models (LLMs) for recommendation, it remains unclear whether LLMs can effectively reason over this type of collaborative information. In this paper, we conduct a systematic comparison between LLMs and classical matrix factorization (MF) models to assess LLMs’ ability to leverage user-item interaction data. We further introduce a simple retrieval-augmented generation (RAG) method that enhances LLMs by grounding their predictions in structured interaction data. Our experiments reveal that current LLMs often fall short in capturing collaborative patterns inherent to MF models, but that our RAG-based approach substantially improves recommendation quality-highlighting a promising direction for future LLM-based recommenders.
nan
Article 1113
Title@2025-05-27 (2): Energy-based generator matching: A neural sampler for general state space
Title: Energy-based generator matching: A neural sampler for general state space | Energiebasierte Generator-Matching: Ein neuronaler Sampler für den allgemeinen Zustandsraum | 基于能源的发电机匹配:一般状态空间的神经取样器 2505.19646v2 |
Authors: Dongyeop Woo, Minsu Kim, Minkyu Kim, Kiyoung Seong, Sungsoo Ahn
We propose Energy-based generator matching (EGM), a modality-agnostic approach to train generative models from energy functions in the absence of data. Extending the recently proposed generator matching, EGM enables training of arbitrary continuous-time Markov processes, e.g., diffusion, flow, and jump, and can generate data from continuous, discrete, and a mixture of two modalities. To this end, we propose estimating the generator matching loss using self-normalized importance sampling with an additional bootstrapping trick to reduce variance in the importance weight. We validate EGM on both discrete and multimodal tasks up to 100 and 20 dimensions, respectively.
nan
Article 1114
Title@2025-05-27 (2): A reinforcement learning agent for maintenance of deteriorating systems with increasingly imperfect repairs
Title: A reinforcement learning agent for maintenance of deteriorating systems with increasingly imperfect repairs | Ein Verstärkungs-Lernmittel für die Instandhaltung von verschlechternden Systemen mit zunehmend unvollkommenen Reparaturen | 强化学习代理,用于维护修理越来越不完善的恶化系统 2505.20725v1 |
Authors: Alberto Pliego Marugán, Jesús M. Pinar-Pérez, Fausto Pedro García Márquez
Efficient maintenance has always been essential for the successful application of engineering systems. However, the challenges to be overcome in the implementation of Industry 4.0 necessitate new paradigms of maintenance optimization. Machine learning techniques are becoming increasingly used in engineering and maintenance, with reinforcement learning being one of the most promising. In this paper, we propose a gamma degradation process together with a novel maintenance model in which repairs are increasingly imperfect, i.e., the beneficial effect of system repairs decreases as more repairs are performed, reflecting the degradational behavior of real-world systems. To generate maintenance policies for this system, we developed a reinforcement-learning-based agent using a Double Deep Q-Network architecture. This agent presents two important advantages: it works without a predefined preventive threshold, and it can operate in a continuous degradation state space. Our agent learns to behave in different scenarios, showing great flexibility. In addition, we performed an analysis of how changes in the main parameters of the environment affect the maintenance policy proposed by the agent. The proposed approach is demonstrated to be appropriate and to significatively improve long-run cost as compared with other common maintenance strategies.
nan
Article 1115
Title@2025-05-27 (2): LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation
Title: LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation | LeDiFlow: Erlernter, verteilungsgeführter Fluss passend zur beschleunigten Bildgenerierung | LediFlow:为加速图像生成而实现的派发指导流动匹配 2505.20723v1 |
Authors: Pascal Zwick, Nils Friederich, Maximilian Beichter, Lennart Hilbert, Ralf Mikut, Oliver Bringmann
Enhancing the efficiency of high-quality image generation using Diffusion Models (DMs) is a significant challenge due to the iterative nature of the process. Flow Matching (FM) is emerging as a powerful generative modeling paradigm based on a simulation-free training objective instead of a score-based one used in DMs. Typical FM approaches rely on a Gaussian distribution prior, which induces curved, conditional probability paths between the prior and target data distribution. These curved paths pose a challenge for the Ordinary Differential Equation (ODE) solver, requiring a large number of inference calls to the flow prediction network. To address this issue, we present Learned Distribution-guided Flow Matching (LeDiFlow), a novel scalable method for training FM-based image generation models using a better-suited prior distribution learned via a regression-based auxiliary model. By initializing the ODE solver with a prior closer to the target data distribution, LeDiFlow enables the learning of more computationally tractable probability paths. These paths directly translate to fewer solver steps needed for high-quality image generation at inference time. Our method utilizes a State-Of-The-Art (SOTA) transformer architecture combined with latent space sampling and can be trained on a consumer workstation. We empirically demonstrate that LeDiFlow remarkably outperforms the respective FM baselines. For instance, when operating directly on pixels, our model accelerates inference by up to 3.75x compared to the corresponding pixel-space baseline. Simultaneously, our latent FM model enhances image quality on average by 1.32x in CLIP Maximum Mean Discrepancy (CMMD) metric against its respective baseline.
nan
Article 1116
Title@2025-05-27 (2): Diffusion Model-based Activity Completion for AI Motion Capture from Videos
Title: Diffusion Model-based Activity Completion for AI Motion Capture from Videos | Diffusion Modellbasierte Aktivitätsvervollständigung für AI Motion Capture aus Videos | AI 从视频中抓取 AI 运动的传播示范活动完成 2505.21566v1 |
Authors: Gao Huayu, Huang Tengjiu, Ye Xiaolong, Tsuyoshi Okita
AI-based motion capture is an emerging technology that offers a cost-effective alternative to traditional motion capture systems. However, current AI motion capture methods rely entirely on observed video sequences, similar to conventional motion capture. This means that all human actions must be predefined, and movements outside the observed sequences are not possible. To address this limitation, we aim to apply AI motion capture to virtual humans, where flexible actions beyond the observed sequences are required. We assume that while many action fragments exist in the training data, the transitions between them may be missing. To bridge these gaps, we propose a diffusion-model-based action completion technique that generates complementary human motion sequences, ensuring smooth and continuous movements. By introducing a gate module and a position-time embedding module, our approach achieves competitive results on the Human3.6M dataset. Our experimental results show that (1) MDC-Net outperforms existing methods in ADE, FDE, and MMADE but is slightly less accurate in MMFDE, (2) MDC-Net has a smaller model size (16.84M) compared to HumanMAC (28.40M), and (3) MDC-Net generates more natural and coherent motion sequences. Additionally, we propose a method for extracting sensor data, including acceleration and angular velocity, from human motion sequences.
nan
Article 1117
Title@2025-05-27 (2): Recurrent Neural Operators: Stable Long-Term PDE Prediction
Title: Recurrent Neural Operators: Stable Long-Term PDE Prediction | Recurrent Neural Operators: Stabile Langzeit-PDE-Vorhersage | 经常性神经操作员:稳定的长期PDE预测 2505.20721v1 |
Authors: Zaijun Ye, Chen-Song Zhang, Wansheng Wang
Neural operators have emerged as powerful tools for learning solution operators of partial differential equations. However, in time-dependent problems, standard training strategies such as teacher forcing introduce a mismatch between training and inference, leading to compounding errors in long-term autoregressive predictions. To address this issue, we propose Recurrent Neural Operators (RNOs)-a novel framework that integrates recurrent training into neural operator architectures. Instead of conditioning each training step on ground-truth inputs, RNOs recursively apply the operator to their own predictions over a temporal window, effectively simulating inference-time dynamics during training. This alignment mitigates exposure bias and enhances robustness to error accumulation. Theoretically, we show that recurrent training can reduce the worst-case exponential error growth typical of teacher forcing to linear growth. Empirically, we demonstrate that recurrently trained Multigrid Neural Operators significantly outperform their teacher-forced counterparts in long-term accuracy and stability on standard benchmarks. Our results underscore the importance of aligning training with inference dynamics for robust temporal generalization in neural operator learning.
nan
Article 1118
Title@2025-05-27 (2): ProgCo: Program Helps Self-Correction of Large Language Models
Title: ProgCo: Program Helps Self-Correction of Large Language Models | ProgCo: Programm hilft bei der Selbstkorrektur großer Sprachmodelle | ProgC:帮助大语言模式自我校正方案 2501.01264v2 |
Authors: Xiaoshuai Song, Yanan Wu, Weixun Wang, Jiaheng Liu, Wenbo Su, Bo Zheng
Self-Correction aims to enable large language models (LLMs) to self-verify and self-refine their initial responses without external feedback. However, LLMs often fail to effectively self-verify and generate correct feedback, further misleading refinement and leading to the failure of self-correction, especially in complex reasoning tasks. In this paper, we propose Program-driven Self-Correction (ProgCo). First, program-driven verification (ProgVe) achieves complex verification logic and extensive validation through self-generated, self-executing verification pseudo-programs. Then, program-driven refinement (ProgRe) receives feedback from ProgVe, conducts dual reflection and refinement on both responses and verification programs to mitigate misleading of incorrect feedback in complex reasoning tasks. Experiments on three instruction-following and mathematical benchmarks indicate that ProgCo achieves effective self-correction, and can be further enhance performance when combined with real program tools. We release our code at https://github.com/songxiaoshuai/progco.
nan
Article 1119
Title@2025-05-27 (2): LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models
Title: LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models | LatentExplainer: Erklären von latenten Darstellungen in tiefgenerativen Modellen mit multimodalen großen Sprachmodellen | 前任Explainer:在多模式大语言模型的深创模型中解释前述表述 2406.14862v6 |
Authors: Mengdan Zhu, Raasikh Kanjiani, Jiahui Lu, Andrew Choi, Qirui Ye, Liang Zhao
Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces \textit{LatentExplainer}, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. \textit{LatentExplainer} tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. Our approach perturbs latent variables, interpreting changes in generated data, and uses multimodal large language models (MLLMs) to produce human-understandable explanations. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations for latent variables. The results highlight the effectiveness of incorporating inductive biases and uncertainty quantification, significantly enhancing model interpretability.
nan
Article 1120
Title@2025-05-27 (2): PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints
Title: PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints | PCDCNet: Ein Surrogate-Modell für die Luftqualitätsprognose mit physikalisch-chemischer Dynamik und Einschränkungen | PCDCNet:利用物理化学动态和制约因素进行空气质量预测的替代模型 2505.19842v2 |
Authors: Shuo Wang, Yun Cheng, Qingye Meng, Olga Saukh, Jiang Zhang, Jingfang Fan, Yuanting Zhang, Xingyuan Yuan, Lothar Thiele
Air quality forecasting (AQF) is critical for public health and environmental management, yet remains challenging due to the complex interplay of emissions, meteorology, and chemical transformations. Traditional numerical models, such as CMAQ and WRF-Chem, provide physically grounded simulations but are computationally expensive and rely on uncertain emission inventories. Deep learning models, while computationally efficient, often struggle with generalization due to their lack of physical constraints. To bridge this gap, we propose PCDCNet, a surrogate model that integrates numerical modeling principles with deep learning. PCDCNet explicitly incorporates emissions, meteorological influences, and domain-informed constraints to model pollutant formation, transport, and dissipation. By combining graph-based spatial transport modeling, recurrent structures for temporal accumulation, and representation enhancement for local interactions, PCDCNet achieves state-of-the-art (SOTA) performance in 72-hour station-level PM2.5 and O3 forecasting while significantly reducing computational costs. Furthermore, our model is deployed in an online platform, providing free, real-time air quality forecasts, demonstrating its scalability and societal impact. By aligning deep learning with physical consistency, PCDCNet offers a practical and interpretable solution for AQF, enabling informed decision-making for both personal and regulatory applications.
nan
Article 1121
Title@2025-05-27 (2): What is Fair? Defining Fairness in Machine Learning for Health
Title: What is Fair? Defining Fairness in Machine Learning for Health | Was ist fair? Fairness im maschinellen Lernen für die Gesundheit definieren | 什么是公平?界定机器保健学习的公平性 2406.09307v5 |
Authors: Jianhui Gao, Benson Chou, Zachary R. McCaw, Hilary Thurston, Paul Varghese, Chuan Hong, Jessica Gronsbell
Ensuring that machine learning (ML) models are safe, effective, and equitable across all patients is critical for clinical decision-making and for preventing the amplification of existing health disparities. In this work, we examine how fairness is conceptualized in ML for health, including why ML models may lead to unfair decisions and how fairness has been measured in diverse real-world applications. We review commonly used fairness notions within group, individual, and causal-based frameworks. We also discuss the outlook for future research and highlight opportunities and challenges in operationalizing fairness in health-focused applications.
nan
Article 1122
Title@2025-05-27 (2): Are Data Embeddings effective in time series forecasting?
Title: Are Data Embeddings effective in time series forecasting? | Sind Daten-Embeddings in der Zeitreihenvorhersage wirksam? | 数据嵌入在时间序列预测中是否有效? 2505.20716v1 |
Authors: Reza Nematirad, Anil Pahwa, Balasubramaniam Natarajan
Time series forecasting plays a crucial role in many real-world applications, and numerous complex forecasting models have been proposed in recent years. Despite their architectural innovations, most state-of-the-art models report only marginal improvements – typically just a few thousandths in standard error metrics. These models often incorporate complex data embedding layers to transform raw inputs into higher-dimensional representations to enhance accuracy. But are data embedding techniques actually effective in time series forecasting? Through extensive ablation studies across fifteen state-of-the-art models and four benchmark datasets, we find that removing data embedding layers from many state-of-the-art models does not degrade forecasting performance. In many cases, it improves both accuracy and computational efficiency. The gains from removing embedding layers often exceed the performance differences typically reported between competing models. Code available at: https://github.com/neuripsdataembedidng/DataEmbedding
nan
Article 1123
Title@2025-05-27 (2): Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting
Title: Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting | Wideband RF Radiance Field Modellierung mit Frequenz eingebettet 3D Gaussian Splatting | 使用频率组合的 3D 高斯平面 2505.20714v1 |
Authors: Zechen Li, Lanqing Yang, Yiheng Bian, Hao Pan, Yongjian Fu, Yezhou Wang, Yi-Chao Chen, Guangtao Xue, Ju Ren
This paper presents an innovative frequency-embedded 3D Gaussian splatting (3DGS) algorithm for wideband radio-frequency (RF) radiance field modeling, offering an advancement over the existing works limited to single-frequency modeling. Grounded in fundamental physics, we uncover the complex relationship between EM wave propagation behaviors and RF frequencies. Inspired by this, we design an EM feature network with attenuation and radiance modules to learn the complex relationships between RF frequencies and the key properties of each 3D Gaussian, specifically the attenuation factor and RF signal intensity. By training the frequency-embedded 3DGS model, we can efficiently reconstruct RF radiance fields at arbitrary unknown frequencies within a given 3D environment. Finally, we propose a large-scale power angular spectrum (PAS) dataset containing 50000 samples ranging from 1 to 100 GHz in 6 indoor environments, and conduct extensive experiments to verify the effectiveness of our method. Our approach achieves an average Structural Similarity Index Measure (SSIM) up to 0.72, and a significant improvement up to 17.8% compared to the current state-of-the-art (SOTA) methods trained on individual test frequencies. Additionally, our method achieves an SSIM of 0.70 without prior training on these frequencies, which represents only a 2.8% performance drop compared to models trained with full PAS data. This demonstrates our model’s capability to estimate PAS at unknown frequencies. For related code and datasets, please refer to https://github.com/sim-2-real/Wideband3DGS.
nan
Article 1124
Title@2025-05-27 (2): Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis
Title: Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis | Funktioniert Graph Prompt? Eine Datenbetriebsperspektive mit theoretischer Analyse | 《图表迅速工作吗? 带有理论分析的数据操作视角》 2410.01635v2 |
Authors: Qunzhong Wang, Xiangguo Sun, Hong Cheng
In recent years, graph prompting has emerged as a promising research direction, enabling the learning of additional tokens or subgraphs appended to the original graphs without requiring retraining of pre-trained graph models across various applications. This novel paradigm, shifting from the traditional pretraining and finetuning to pretraining and prompting has shown significant empirical success in simulating graph data operations, with applications ranging from recommendation systems to biological networks and graph transferring. However, despite its potential, the theoretical underpinnings of graph prompting remain underexplored, raising critical questions about its fundamental effectiveness. The lack of rigorous theoretical proof of why and how much it works is more like a dark cloud over the graph prompt area to go further. To fill this gap, this paper introduces a theoretical framework that rigorously analyzes graph prompting from a data operation perspective. Our contributions are threefold: First, we provide a formal guarantee theorem, demonstrating graph prompts capacity to approximate graph transformation operators, effectively linking upstream and downstream tasks. Second, we derive upper bounds on the error of these data operations by graph prompts for a single graph and extend this discussion to batches of graphs, which are common in graph model training. Third, we analyze the distribution of data operation errors, extending our theoretical findings from linear graph models (e.g., GCN) to non-linear graph models (e.g., GAT). Extensive experiments support our theoretical results and confirm the practical implications of these guarantees.
nan
Article 1125
Title@2025-05-27 (2): Time-Series Learning for Proactive Fault Prediction in Distributed Systems with Deep Neural Structures
Title: Time-Series Learning for Proactive Fault Prediction in Distributed Systems with Deep Neural Structures | Time-Series Learning für proaktive Fehlervorhersage in verteilten Systemen mit tiefen neuralen Strukturen | 深心神经结构分布系统预发性故障预测时间序列学习 2505.20705v1 |
Authors: Yang Wang, Wenxuan Zhu, Xuehui Quan, Heyi Wang, Chang Liu, Qiyuan Wu
This paper addresses the challenges of fault prediction and delayed response in distributed systems by proposing an intelligent prediction method based on temporal feature learning. The method takes multi-dimensional performance metric sequences as input. We use a Gated Recurrent Unit (GRU) to model the evolution of system states over time. An attention mechanism is then applied to enhance key temporal segments, improving the model’s ability to identify potential faults. On this basis, a feedforward neural network is designed to perform the final classification, enabling early warning of system failures. To validate the effectiveness of the proposed approach, comparative experiments and ablation analyses were conducted using data from a large-scale real-world cloud system. The experimental results show that the model outperforms various mainstream time-series models in terms of Accuracy, F1-Score, and AUC. This demonstrates strong prediction capability and stability. Furthermore, the loss function curve confirms the convergence and reliability of the training process. It indicates that the proposed method effectively learns system behavior patterns and achieves efficient fault detection.
nan
Article 1126
Title@2025-05-27 (2): NeUQI: Near-Optimal Uniform Quantization Parameter Initialization
Title: NeUQI: Near-Optimal Uniform Quantization Parameter Initialization | NeUQI: Beinahe-optimale einheitliche Quantisierung Parameter Initialisierung | NeUQI: 近最佳统一量化参数初始化 2505.17595v2 |
Authors: Li Lin, Xinyu Hu, Xiaojun Wan
Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored for its efficiency and ease of deployment since uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on $\geq 2$-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they primarily focus on quantization methodologies, while the initialization of quantization parameters is underexplored and still relies on the suboptimal Min-Max strategies. In this work, we propose NeUQI, a method devoted to efficiently determining near-optimal initial parameters for uniform quantization. NeUQI is orthogonal to prior quantization methodologies and can seamlessly integrate with them. The experiments with the LLaMA and Qwen families on various tasks demonstrate that our NeUQI consistently outperforms existing methods. Furthermore, when combined with a lightweight distillation strategy, NeUQI can achieve superior performance to PV-tuning, a much more resource-intensive approach.
nan
Article 1127
Title@2025-05-27 (2): Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Title: Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases | Zwischen Circuits und Chomsky: Pre-Pretraining auf Formal Languages Imparts Linguistic Biases | 巡回巡回和乔姆斯基之间:正式语言语言语言预科培训 2502.19249v2 |
Authors: Michael Y. Hu, Jackson Petty, Chuan Shi, William Merrill, Tal Linzen
Pretraining language models on formal language can improve their acquisition of natural language. Which features of the formal language impart an inductive bias that leads to effective transfer? Drawing on insights from linguistics and complexity theory, we hypothesize that effective transfer occurs when two conditions are met: the formal language should capture the dependency structures present in natural language, and it should remain within the computational limitations of the model architecture. We experiment with pre-pretraining (training on formal language before natural languages) on transformers and find that formal languages capturing hierarchical dependencies indeed enable language models to achieve lower loss on natural language and better linguistic generalization compared to other formal languages. We also find modest support for the hypothesis that the formal language should fall within the computational limitations of the architecture. Strikingly, pre-pretraining reduces loss more efficiently than training on a matched amount of natural language. For a 1B-parameter language model trained on roughly 1.6B tokens of natural language, pre-pretraining achieves the same loss and better linguistic generalization with a 33% smaller token budget. Finally, we also give mechanistic evidence of transfer from formal to natural language: attention heads acquired during pre-pretraining remain crucial for the model’s performance on syntactic evaluations.
nan
Article 1128
Title@2025-05-27 (2): vCache: Verified Semantic Prompt Caching
Title: vCache: Verified Semantic Prompt Caching | vCache: Verifizierter semantischer Prompt-Caching | vCache: 校验语义快速缓冲 2502.03771v3 |
Authors: Luis Gaspar Schroeder, Aditya Desai, Alejandro Cuadron, Kyle Chu, Shu Liu, Mark Zhao, Stephan Krusche, Alfons Kemper, Matei Zaharia, Joseph E. Gonzalez
Semantic caches return cached LLM-generated responses for semantically similar prompts to reduce inference latency and cost. They embed cached prompts and store them alongside their response in a vector database. Embedding similarity metrics assign a numerical score to quantify the similarity between a request and its nearest neighbor prompt from the cache. Existing systems use the same static similarity threshold across all requests to determine whether two prompts can share similar responses. However, we observe that static thresholds do not give formal correctness guarantees, can result in unexpected error rates, and lead to suboptimal cache hit rates. This paper proposes vCache, the first verified semantic cache with user-defined error rate guarantees. It employs an online learning algorithm to estimate an optimal threshold for each cached prompt, enabling reliable cache responses without additional training. Our experiments show that vCache consistently meets the specified error bounds while outperforming state-of-the-art static-threshold and fine-tuned embedding baselines. We release the vCache implementation and benchmarks to support future research.
nan
Article 1129
Title@2025-05-27 (2): Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model
Title: Multi-instance Learning as Downstream Task of Self-Supervised Learning-based Pre-trained Model | Multi-Instance-Lernen als Downstream-Aufgabe des selbstüberwachten Learning-basierten vortrainierten Modells | 将多机构学习作为自监督学习模式培训前模式的下游任务 2505.21564v1 |
Authors: Koki Matsuishi, Tsuyoshi Okita
In deep multi-instance learning, the number of applicable instances depends on the data set. In histopathology images, deep learning multi-instance learners usually assume there are hundreds to thousands instances in a bag. However, when the number of instances in a bag increases to 256 in brain hematoma CT, learning becomes extremely difficult. In this paper, we address this drawback. To overcome this problem, we propose using a pre-trained model with self-supervised learning for the multi-instance learner as a downstream task. With this method, even when the original target task suffers from the spurious correlation problem, we show improvements of 5% to 13% in accuracy and 40% to 55% in the F1 measure for the hypodensity marker classification of brain hematoma CT.
nan
Article 1130
Title@2025-05-27 (2): Sparsified State-Space Models are Efficient Highway Networks
Title: Sparsified State-Space Models are Efficient Highway Networks | Sparsifizierte State-Space-Modelle sind effiziente Highway-Netzwerke | 国家空间模型是高效公路网 2505.20698v1 |
Authors: Woomin Song, Jihoon Tack, Sangwoo Mo, Seunghyuk Oh, Jinwoo Shin
State-space models (SSMs) offer a promising architecture for sequence modeling, providing an alternative to Transformers by replacing expensive self-attention with linear recurrences. In this paper, we propose a simple yet effective trick to enhance SSMs within given computational budgets by sparsifying them. Our intuition is that tokens in SSMs are highly redundant due to gradual recurrent updates, and dense recurrence operations block the delivery of past information. In particular, we observe that upper layers of SSMs tend to be more redundant as they encode global information, while lower layers encode local information. Motivated by this, we introduce Simba, a hierarchical sparsification method for SSMs based on token pruning. Simba sparsifies upper layers more than lower layers, encouraging the upper layers to behave like highways. To achieve this, we propose a novel token pruning criterion for SSMs, measuring the global impact of tokens on the final output by accumulating local recurrences. We demonstrate that Simba outperforms the baseline model, Mamba, with the same FLOPS in various natural language tasks. Moreover, we illustrate the effect of highways, showing that Simba not only enhances efficiency but also improves the information flow across long sequences. Code is available at https://github.com/woominsong/Simba.
nan
Article 1131
Title@2025-05-27 (2): Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Title: Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models | Token-Level Akzeptieren oder ablehnen: Ein Micro Alignment-Ansatz für große Sprachmodelle | 接受或拒绝时肯级别:大语言模式微调整方法 2505.19743v2 |
Authors: Yang Zhang, Yu Yu, Bo Tang, Yu Zhu, Chuxiong Sun, Wenqiang Wei, Jie Hu, Zipeng Xie, Zhiyu Li, Feiyu Xiong, Edward Chung
With the rapid development of Large Language Models (LLMs), aligning these models with human preferences and values is critical to ensuring ethical and safe applications. However, existing alignment techniques such as RLHF or DPO often require direct fine-tuning on LLMs with billions of parameters, resulting in substantial computational costs and inefficiencies. To address this, we propose Micro token-level Accept-Reject Aligning (MARA) approach designed to operate independently of the language models. MARA simplifies the alignment process by decomposing sentence-level preference learning into token-level binary classification, where a compact three-layer fully-connected network determines whether candidate tokens are “Accepted” or “Rejected” as part of the response. Extensive experiments across seven different LLMs and three open-source datasets show that MARA achieves significant improvements in alignment performance while reducing computational costs. The source code and implementation details are publicly available at https://github.com/IAAR-Shanghai/MARA, and the trained models are released at https://huggingface.co/IAAR-Shanghai/MARA_AGENTS.
nan
Article 1132
Title@2025-05-27 (2): Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series
Title: Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series | Generieren von Hypothesen dynamischer Kausalgraphen in der Neurowissenschaft: Nutzung generativer Faktorenmodelle beobachteter Zeitreihen | 在神经科学中生成动态因果图的假设:利用观测时间序列的生成因数模型 2505.20697v1 |
Authors: Zachary C. Brown, David Carlson
The field of hypothesis generation promises to reduce costs in neuroscience by narrowing the range of interventional studies needed to study various phenomena. Existing machine learning methods can generate scientific hypotheses from complex datasets, but many approaches assume causal relationships are static over time, limiting their applicability to systems with dynamic, state-dependent behavior, such as the brain. While some techniques attempt dynamic causal discovery through factor models, they often restrict relationships to linear patterns or impose other simplifying assumptions. We propose a novel method that models dynamic graphs as a conditionally weighted superposition of static graphs, where each static graph can capture nonlinear relationships. This approach enables the detection of complex, time-varying interactions between variables beyond linear limitations. Our method improves f1-scores of predicted dynamic causal patterns by roughly 22-28% on average over baselines in some of our experiments, with some improvements reaching well over 60%. A case study on real brain data demonstrates our method’s ability to uncover relationships linked to specific behavioral states, offering valuable insights into neural dynamics.
nan
Article 1133
Title@2025-05-27 (2): Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration
Title: Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration | Navigieren Sie das Unbekannte: Verbesserung der LLM-Vernunft mit intrinsischer Motivation geführte Exploration | 导航未知:利用内在动力性引导探索加强LLM 2505.17621v2 |
Authors: Jingtong Gao, Ling Pan, Yejing Wang, Rui Zhong, Chi Lu, Qingpeng Cai, Peng Jiang, Xiangyu Zhao
Reinforcement learning (RL) has emerged as a pivotal method for improving the reasoning capabilities of Large Language Models (LLMs). However, prevalent RL approaches such as Proximal Policy Optimization (PPO) and Group-Regularized Policy Optimization (GRPO) face critical limitations due to their reliance on sparse outcome-based rewards and inadequate mechanisms for incentivizing exploration. These limitations result in inefficient guidance for multi-step reasoning processes. Specifically, sparse reward signals fail to deliver effective or sufficient feedback, particularly for challenging problems. Furthermore, such reward structures induce systematic biases that prioritize exploitation of familiar trajectories over novel solution discovery. These shortcomings critically hinder performance in complex reasoning tasks, which inherently demand iterative refinement across ipntermediate steps. To address these challenges, we propose an Intrinsic Motivation guidEd exploratioN meThOd foR LLM Reasoning (i-MENTOR), a novel method designed to both deliver dense rewards and amplify explorations in the RL-based training paradigm. i-MENTOR introduces three key innovations: trajectory-aware exploration rewards that mitigate bias in token-level strategies while maintaining computational efficiency; dynamic reward scaling to stabilize exploration and exploitation in large action spaces; and advantage-preserving reward implementation that maintains advantage distribution integrity while incorporating exploratory guidance. Experiments across three public datasets demonstrate i-MENTOR’s effectiveness with a 22.39% improvement on the difficult dataset Countdown-4.
nan
Article 1134
Title@2025-05-27 (2): Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets
Title: Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets | Temporale Saliency-geführte Destillation: Ein skalierbares Framework für die Destillierung von Videodatensätzen | 时间性盐度-指导蒸馏:用于蒸馏视频数据集的可缩放框架 2505.20694v1 |
Authors: Xulin Gu, Xinhao Zhong, Zhixing Wei, Yimin Zhou, Shuoyang Sun, Bin Chen, Hongpeng Wang, Yuan Luo
Dataset distillation (DD) has emerged as a powerful paradigm for dataset compression, enabling the synthesis of compact surrogate datasets that approximate the training utility of large-scale ones. While significant progress has been achieved in distilling image datasets, extending DD to the video domain remains challenging due to the high dimensionality and temporal complexity inherent in video data. Existing video distillation (VD) methods often suffer from excessive computational costs and struggle to preserve temporal dynamics, as na"ive extensions of image-based approaches typically lead to degraded performance. In this paper, we propose a novel uni-level video dataset distillation framework that directly optimizes synthetic videos with respect to a pre-trained model. To address temporal redundancy and enhance motion preservation, we introduce a temporal saliency-guided filtering mechanism that leverages inter-frame differences to guide the distillation process, encouraging the retention of informative temporal cues while suppressing frame-level redundancy. Extensive experiments on standard video benchmarks demonstrate that our method achieves state-of-the-art performance, bridging the gap between real and distilled video data and offering a scalable solution for video dataset compression.
nan
Article 1135
Title@2025-05-27 (2): Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages
Title: Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages | Phir Hera Fairy: Ein englisches Märchen ist ein starker Faker der fließenden Rede in Low-Resource indischen Sprachen | Phir Hera Fairy:英国仙女是印度低资源语言流利流利的有力名人 2505.20693v1 |
Authors: Praveen Srinivasa Varadhan, Srija Anand, Soma Siddhartha, Mitesh M. Khapra
What happens when an English Fairytaler is fine-tuned on Indian languages? We evaluate how the English F5-TTS model adapts to 11 Indian languages, measuring polyglot fluency, voice-cloning, style-cloning, and code-mixing. We compare: (i) training from scratch, (ii) fine-tuning English F5 on Indian data, and (iii) fine-tuning on both Indian and English data to prevent forgetting. Fine-tuning with only Indian data proves most effective and the resultant IN-F5 is a near-human polyglot; that enables speakers of one language (e.g., Odia) to fluently speak in another (e.g., Hindi). Our results show English pretraining aids low-resource TTS in reaching human parity. To aid progress in other low-resource languages, we study data-constrained setups and arrive at a compute optimal strategy. Finally, we show IN-F5 can synthesize unseen languages like Bhojpuri and Tulu using a human-in-the-loop approach for zero-resource TTS via synthetic data generation.
nan
Article 1136
Title@2025-05-27 (2): Evidential Deep Active Learning for Semi-Supervised Classification
Title: Evidential Deep Active Learning for Semi-Supervised Classification | Evidentielles tiefes aktives Lernen für semi-überwachte Klassifikation | 半监督分类的证明深层积极学习 2505.20691v1 |
Authors: Shenkai Zhao, Xinao Zhang, Lipeng Pan, Xiaobin Xu, Danilo Pelusi
Semi-supervised classification based on active learning has made significant progress, but the existing methods often ignore the uncertainty estimation (or reliability) of the prediction results during the learning process, which makes it questionable whether the selected samples can effectively update the model. Hence, this paper proposes an evidential deep active learning approach for semi-supervised classification (EDALSSC). EDALSSC builds a semi-supervised learning framework to simultaneously quantify the uncertainty estimation of labeled and unlabeled data during the learning process. The uncertainty estimation of the former is associated with evidential deep learning, while that of the latter is modeled by combining ignorance information and conflict information of the evidence from the perspective of the T-conorm operator. Furthermore, this article constructs a heuristic method to dynamically balance the influence of evidence and the number of classes on uncertainty estimation to ensure that it does not produce counter-intuitive results in EDALSSC. For the sample selection strategy, EDALSSC selects the sample with the greatest uncertainty estimation that is calculated in the form of a sum when the training loss increases in the latter half of the learning process. Experimental results demonstrate that EDALSSC outperforms existing semi-supervised and supervised active learning approaches on image classification datasets.
nan
Article 1137
Title@2025-05-27 (2): Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Title: Accelerating RL for LLM Reasoning with Optimal Advantage Regression | Beschleunigung der RL für LLM-Vernunft mit optimaler Regression | 以最优优势回归加速 LLL 来计算LLM 加速RL 原因 2505.20686v1 |
Authors: Kianté Brantley, Mingyu Chen, Zhaolin Gao, Jason D. Lee, Wen Sun, Wenhao Zhan, Xuezhou Zhang
Reinforcement learning (RL) has emerged as a powerful tool for fine-tuning large language models (LLMs) to improve complex reasoning abilities. However, state-of-the-art policy optimization methods often suffer from high computational overhead and memory consumption, primarily due to the need for multiple generations per prompt and the reliance on critic networks or advantage estimates of the current policy. In this paper, we propose $A$-PO, a novel two-stage policy optimization framework that directly approximates the optimal advantage function and enables efficient training of LLMs for reasoning tasks. In the first stage, we leverage offline sampling from a reference policy to estimate the optimal value function $V$, eliminating the need for costly online value estimation. In the second stage, we perform on-policy updates using a simple least-squares regression loss with only a single generation per prompt. Theoretically, we establish performance guarantees and prove that the KL-regularized RL objective can be optimized without requiring complex exploration strategies. Empirically, $A$-PO achieves competitive performance across a wide range of mathematical reasoning benchmarks, while reducing training time by up to 2$\times$ and peak memory usage by over 30% compared to PPO, GRPO, and REBEL. Implementation of $A$-PO can be found at https://github.com/ZhaolinGao/A-PO.
nan
Article 1138
Title@2025-05-27 (2): A Survey of LLM $\times$ DATA
Title: A Survey of LLM $\times$ DATA | Eine Umfrage über LLM $\times$ DATEN | 对LLLM 美元-美元-美元-美元-数据数据的调查 2505.18458v2 |
Authors: Xuanhe Zhou, Junxuan He, Wei Zhou, Haodong Chen, Zirui Tang, Haoyu Zhao, Xin Tong, Guoliang Li, Youmin Chen, Jun Zhou, Zhaojun Sun, Binyuan Hui, Shuo Wang, Conghui He, Zhiyuan Liu, Jingren Zhou, Fan Wu
The integration of large language model (LLM) and data management (DATA) is rapidly redefining both domains. In this survey, we comprehensively review the bidirectional relationships. On the one hand, DATA4LLM, spanning large-scale data processing, storage, and serving, feeds LLMs with high quality, diversity, and timeliness of data required for stages like pre-training, post-training, retrieval-augmented generation, and agentic workflows: (i) Data processing for LLMs includes scalable acquisition, deduplication, filtering, selection, domain mixing, and synthetic augmentation; (ii) Data Storage for LLMs focuses on efficient data and model formats, distributed and heterogeneous storage hierarchies, KV-cache management, and fault-tolerant checkpointing; (iii) Data serving for LLMs tackles challenges in RAG (e.g., knowledge post-processing), LLM inference (e.g., prompt compression, data provenance), and training strategies (e.g., data packing and shuffling). On the other hand, in LLM4DATA, LLMs are emerging as general-purpose engines for data management. We review recent advances in (i) data manipulation, including automatic data cleaning, integration, discovery; (ii) data analysis, covering reasoning over structured, semi-structured, and unstructured data, and (iii) system optimization (e.g., configuration tuning, query rewriting, anomaly diagnosis), powered by LLM techniques like retrieval-augmented prompting, task-specialized fine-tuning, and multi-agent collaboration.
nan
Article 1139
Title@2025-05-27 (2): MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning
Title: MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning | MODULI: Locking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning | MODULI:通过离线多目标强化学习扩散模型解锁普及 2408.15501v2 |
Authors: Yifu Yuan, Zhenrui Zheng, Zibin Dong, Jianye Hao
Multi-objective Reinforcement Learning (MORL) seeks to develop policies that simultaneously optimize multiple conflicting objectives, but it requires extensive online interactions. Offline MORL provides a promising solution by training on pre-collected datasets to generalize to any preference upon deployment. However, real-world offline datasets are often conservatively and narrowly distributed, failing to comprehensively cover preferences, leading to the emergence of out-of-distribution (OOD) preference areas. Existing offline MORL algorithms exhibit poor generalization to OOD preferences, resulting in policies that do not align with preferences. Leveraging the excellent expressive and generalization capabilities of diffusion models, we propose MODULI (Multi-objective Diffusion Planner with Sliding Guidance), which employs a preference-conditioned diffusion model as a planner to generate trajectories that align with various preferences and derive action for decision-making. To achieve accurate generation, MODULI introduces two return normalization methods under diverse preferences for refining guidance. To further enhance generalization to OOD preferences, MODULI proposes a novel sliding guidance mechanism, which involves training an additional slider adapter to capture the direction of preference changes. Incorporating the slider, it transitions from in-distribution (ID) preferences to generating OOD preferences, patching, and extending the incomplete Pareto front. Extensive experiments on the D4MORL benchmark demonstrate that our algorithm outperforms state-of-the-art Offline MORL baselines, exhibiting excellent generalization to OOD preferences.
nan
Article 1140
Title@2025-05-27 (2): SELF-PERCEPT: Introspection Improves Large Language Models’ Detection of Multi-Person Mental Manipulation in Conversations
Title: SELF-PERCEPT: Introspection Improves Large Language Models’ Detection of Multi-Person Mental Manipulation in Conversations | SELF-PERCEPT: Introspection verbessert die Erkennung von Multi-Person-Gedankenmanipulation in Gesprächen durch große Sprachmodelle | SELF-PERCEPT: 调查改进大语言模型在对话中探测多人心理操纵 2505.20679v1 |
Authors: Danush Khanna, Pratinav Seth, Sidhaarth Sredharan Murali, Aditya Kumar Guru, Siddharth Shukla, Tanuj Tyagi, Sandeep Chaurasia, Kripabandhu Ghosh
Mental manipulation is a subtle yet pervasive form of abuse in interpersonal communication, making its detection critical for safeguarding potential victims. However, due to manipulation’s nuanced and context-specific nature, identifying manipulative language in complex, multi-turn, and multi-person conversations remains a significant challenge for large language models (LLMs). To address this gap, we introduce the MultiManip dataset, comprising 220 multi-turn, multi-person dialogues balanced between manipulative and non-manipulative interactions, all drawn from reality shows that mimic real-world scenarios. For manipulative interactions, it includes 11 distinct manipulations depicting real-life scenarios. We conduct extensive evaluations of state-of-the-art LLMs, such as GPT-4o and Llama-3.1-8B, employing various prompting strategies. Despite their capabilities, these models often struggle to detect manipulation effectively. To overcome this limitation, we propose SELF-PERCEPT, a novel, two-stage prompting framework inspired by Self-Perception Theory, demonstrating strong performance in detecting multi-person, multi-turn mental manipulation. Our code and data are publicly available at https://github.com/danushkhanna/self-percept .
nan
Article 1141
Title@2025-05-27 (2): Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System
Title: Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | Viele Köpfe sind besser als eins: Verbesserte wissenschaftliche Idee-Generation durch ein LLM-basiertes Multi-Agent-System | 许多领导人比一个领导人好得多:由以LLM为基础的多种机构系统改进科学思想的一代 2410.09403v4 |
Authors: Haoyang Su, Renqi Chen, Shixiang Tang, Zhenfei Yin, Xinzhe Zheng, Jinzhe Li, Biqing Qi, Qi Wu, Hui Li, Wanli Ouyang, Philip Torr, Bowen Zhou, Nanqing Dong
The rapid advancement of scientific progress requires innovative tools that can accelerate knowledge discovery. Although recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short of replicating the collaborative nature of real-world scientific practices, where diverse experts work together in teams to tackle complex problems. To address the limitations, we propose an LLM-based multi-agent system, i.e., Virtual Scientists (VirSci), designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. Through comprehensive experiments, we demonstrate that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas. We further investigate the collaboration mechanisms that contribute to its tendency to produce ideas with higher novelty, offering valuable insights to guide future research and illuminating pathways toward building a robust system for autonomous scientific discovery. The code is available at https://github.com/open-sciencelab/Virtual-Scientists.
nan
Article 1142
Title@2025-05-27 (2): LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation
Title: LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation | LLM-geführtes Stärkungslernen: Bewältigung von Ausbildungsengpässen durch politische Modulation | LLM-LLM-指导强化学习:通过政策调整解决培训瓶颈问题 2505.20671v1 |
Authors: Heng Tan, Hua Yan, Yu Yang
While reinforcement learning (RL) has achieved notable success in various domains, training effective policies for complex tasks remains challenging. Agents often converge to local optima and fail to maximize long-term rewards. Existing approaches to mitigate training bottlenecks typically fall into two categories: (i) Automated policy refinement, which identifies critical states from past trajectories to guide policy updates, but suffers from costly and uncertain model training; and (ii) Human-in-the-loop refinement, where human feedback is used to correct agent behavior, but this does not scale well to environments with large or continuous action spaces. In this work, we design a large language model-guided policy modulation framework that leverages LLMs to improve RL training without additional model training or human intervention. We first prompt an LLM to identify critical states from a sub-optimal agent’s trajectories. Based on these states, the LLM then provides action suggestions and assigns implicit rewards to guide policy refinement. Experiments across standard RL benchmarks demonstrate that our method outperforms state-of-the-art baselines, highlighting the effectiveness of LLM-based explanations in addressing RL training bottlenecks.
nan
Article 1143
Title@2025-05-27 (2): From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Title: From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | Vom Sehen zum Tun: Überbrücken von Vernunft und Entscheidung für die Robotermanipulation | 从看到做:机器人操纵的搭桥理由和决定 2505.08548v2 |
Authors: Yifu Yuan, Haiqin Cui, Yibin Chen, Zibin Dong, Fei Ni, Longxin Kou, Jinyi Liu, Pengyi Li, Yan Zheng, Jianye Hao
Achieving generalization in robotic manipulation remains a critical challenge, particularly for unseen scenarios and novel tasks. Current Vision-Language-Action (VLA) models, while building on top of general Vision-Language Models (VLMs), still fall short of achieving robust zero-shot performance due to the scarcity and heterogeneity prevalent in embodied datasets. To address these limitations, we propose FSD (From Seeing to Doing), a novel vision-language model that generates intermediate representations through spatial relationship reasoning, providing fine-grained guidance for robotic manipulation. Our approach combines a hierarchical data pipeline for training with a self-consistency mechanism that aligns spatial coordinates with visual signals. Through extensive experiments, we comprehensively validated FSD’s capabilities in both “seeing” and “doing,” achieving outstanding performance across 8 benchmarks for general spatial reasoning and embodied reference abilities, as well as on our proposed more challenging benchmark VABench. We also verified zero-shot capabilities in robot manipulation, demonstrating significant performance improvements over baseline methods in both SimplerEnv and real robot settings. Experimental results show that FSD achieves 40.6% success rate in SimplerEnv and 72% success rate across 8 real-world tasks, outperforming the strongest baseline by 30%.
nan
Article 1144
Title@2025-05-27 (2): RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
Title: RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | RE-Bench: Bewertung der KI-FuE-Fähigkeiten von Sprachmodellagenten gegen menschliche Experten | RE-BENCH: 对照人类专家评估语言模范代理商的AI研究与开发的前沿能力 2411.15114v2 |
Authors: Hjalmar Wijk, Tao Lin, Joel Becker, Sami Jawhar, Neev Parikh, Thomas Broadley, Lawrence Chan, Michael Chen, Josh Clymer, Jai Dhyani, Elena Ericheva, Katharyn Garcia, Brian Goodrich, Nikola Jurkovic, Holden Karnofsky, Megan Kinniment, Aron Lajko, Seraphina Nix, Lucas Sato, William Saunders, Maksym Taran, Ben West, Elizabeth Barnes
Frontier AI safety policies highlight automation of AI research and development (R&D) by AI agents as an important capability to anticipate. However, there exist few evaluations for AI R&D capabilities, and none that are highly realistic and have a direct comparison to human performance. We introduce RE-Bench (Research Engineering Benchmark, v1), which consists of 7 challenging, open-ended ML research engineering environments and data from 71 8-hour attempts by 61 distinct human experts. We confirm that our experts make progress in the environments given 8 hours, with 82% of expert attempts achieving a non-zero score and 24% matching or exceeding our strong reference solutions. We compare humans to several public frontier models through best-of-k with varying time budgets and agent designs, and find that the best AI agents achieve a score 4x higher than human experts when both are given a total time budget of 2 hours per environment. However, humans currently display better returns to increasing time budgets, narrowly exceeding the top AI agent scores given an 8-hour budget, and achieving 2x the score of the top AI agent when both are given 32 total hours (across different attempts). Qualitatively, we find that modern AI agents possess significant expertise in many ML topics – e.g. an agent wrote a faster custom Triton kernel than any of our human experts’ – and can generate and test solutions over ten times faster than humans, at much lower cost. We open-source the evaluation environments, human expert data, analysis code and agent trajectories to facilitate future research.
nan
Article 1145
Title@2025-05-27 (2): Predicting and Understanding College Student Mental Health with Interpretable Machine Learning
Title: Predicting and Understanding College Student Mental Health with Interpretable Machine Learning | Vorhersagen und Verständnis College Student Mental Health mit Interpretable Machine Learning | 预测和理解学院学生心理健康与可解释机器学习 2503.08002v2 |
Authors: Meghna Roy Chowdhury, Wei Xuan, Shreyas Sen, Yixue Zhao, Yi Ding
Mental health issues among college students have reached critical levels, significantly impacting academic performance and overall wellbeing. Predicting and understanding mental health status among college students is challenging due to three main factors: the necessity for large-scale longitudinal datasets, the prevalence of black-box machine learning models lacking transparency, and the tendency of existing approaches to provide aggregated insights at the population level rather than individualized understanding. To tackle these challenges, this paper presents I-HOPE, the first Interpretable Hierarchical mOdel for Personalized mEntal health prediction. I-HOPE is a two-stage hierarchical model that connects raw behavioral features to mental health status through five defined behavioral categories as interaction labels. We evaluate I-HOPE on the College Experience Study, the longest longitudinal mobile sensing dataset. This dataset spans five years and captures data from both pre-pandemic periods and the COVID-19 pandemic. I-HOPE achieves a prediction accuracy of 91%, significantly surpassing the 60-70% accuracy of baseline methods. In addition, I-HOPE distills complex patterns into interpretable and individualized insights, enabling the future development of tailored interventions and improving mental health support. The code is available at https://github.com/roycmeghna/I-HOPE.
nan
Article 1146
Title@2025-05-27 (2): Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers
Title: Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers | Continuous-Time-Achtung: PDE-geführte Mechanismen für lange Sequenztransformatoren | 持续关注:长序列变换者PDE-指导机制 2505.20666v1 |
Authors: Yukun Zhang, Xueqing Zhou
We propose a novel framework, Continuous_Time Attention, which infuses partial differential equations (PDEs) into the Transformer’s attention mechanism to address the challenges of extremely long input sequences. Instead of relying solely on a static attention matrix, we allow attention weights to evolve over a pseudo_time dimension via diffusion, wave, or reaction_diffusion dynamics. This mechanism systematically smooths local noise, enhances long_range dependencies, and stabilizes gradient flow. Theoretically, our analysis shows that PDE_based attention leads to better optimization landscapes and polynomial rather than exponential decay of distant interactions. Empirically, we benchmark our method on diverse experiments_demonstrating consistent gains over both standard and specialized long sequence Transformer variants. Our findings highlight the potential of PDE_based formulations to enrich attention mechanisms with continuous_time dynamics and global coherence.
nan
Article 1147
Title@2025-05-27 (2): Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond
Title: Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond | Auf dem Weg zu LLM Unlearning Resilient to Relearning Attacks: Eine scharfsinnige Minimierungsperspektive und darüber hinaus | 走向LLM 学会学会学会学会重新学习攻击的不学习能力:锐化-尽量减少知识的视角及展望 2502.05374v4 |
Authors: Chongyu Fan, Jinghan Jia, Yihua Zhang, Anil Ramakrishna, Mingyi Hong, Sijia Liu
The LLM unlearning technique has recently been introduced to comply with data regulations and address the safety and ethical concerns of LLMs by removing the undesired data-model influence. However, state-of-the-art unlearning methods face a critical vulnerability: they are susceptible to ``relearning’’ the removed information from a small number of forget data points, known as relearning attacks. In this paper, we systematically investigate how to make unlearned models robust against such attacks. For the first time, we establish a connection between robust unlearning and sharpness-aware minimization (SAM) through a unified robust optimization framework, in an analogy to adversarial training designed to defend against adversarial attacks. Our analysis for SAM reveals that smoothness optimization plays a pivotal role in mitigating relearning attacks. Thus, we further explore diverse smoothing strategies to enhance unlearning robustness. Extensive experiments on benchmark datasets, including WMDP and MUSE, demonstrate that SAM and other smoothness optimization approaches consistently improve the resistance of LLM unlearning to relearning attacks. Notably, smoothness-enhanced unlearning also helps defend against (input-level) jailbreaking attacks, broadening our proposal’s impact in robustifying LLM unlearning. Codes are available at https://github.com/OPTML-Group/Unlearn-Smooth.
nan
Article 1148
Title@2025-05-27 (2): BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models
Title: BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models | BLAST: Ausgewogene Zeitreihen für universelle Vorhersagemodelle | BLAST: 通用预测模型平衡抽样时间序列 2505.17871v2 |
Authors: Zezhi Shao, Yujie Li, Fei Wang, Chengqing Yu, Yisong Fu, Tangwen Qian, Bin Xu, Boyu Diao, Yongjun Xu, Xueqi Cheng
The advent of universal time series forecasting models has revolutionized zero-shot forecasting across diverse domains, yet the critical role of data diversity in training these models remains underexplored. Existing large-scale time series datasets often suffer from inherent biases and imbalanced distributions, leading to suboptimal model performance and generalization. To address this gap, we introduce BLAST, a novel pre-training corpus designed to enhance data diversity through a balanced sampling strategy. First, BLAST incorporates 321 billion observations from publicly available datasets and employs a comprehensive suite of statistical metrics to characterize time series patterns. Then, to facilitate pattern-oriented sampling, the data is implicitly clustered using grid-based partitioning. Furthermore, by integrating grid sampling and grid mixup techniques, BLAST ensures a balanced and representative coverage of diverse patterns. Experimental results demonstrate that models pre-trained on BLAST achieve state-of-the-art performance with a fraction of the computational resources and training tokens required by existing methods. Our findings highlight the pivotal role of data diversity in improving both training efficiency and model performance for the universal forecasting task.
nan
Article 1149
Title@2025-05-27 (2): Generalized and Personalized Federated Learning with Foundation Models via Orthogonal Transformations
Title: Generalized and Personalized Federated Learning with Foundation Models via Orthogonal Transformations | Generalisiertes und personalisiertes Federated Learning mit Gründungsmodellen über Orthogonale Transformationen | 通过矫形转变形成基础模型的通用和个性化联邦学习 2505.19888v2 |
Authors: Eun Gyung Kong, Je Won Yeom, Yonghoon Jeon, Taesup Kim
Federated Learning (FL) aims to train models across decentralized clients or devices holding local data without the need for centralized data collection, thus enhancing data privacy and security. However, achieving both generalization and personalization in heterogeneous settings remains a significant challenge. To address this, we introduce FedOT, a novel approach that leverages black-box foundation models. FedOT shares only a global task-dependent classifier across clients while locally adapting features through orthogonal transformations. By enforcing orthogonality, FedOT mitigates gradient conflicts across diverse clients, preserves semantic integrity, and achieves robust performance even in the presence of substantial data heterogeneity. The strategy of combining global and local parameters enables a more balanced approach for both generalization and personalization, outperforming baseline FL methods across multiple benchmarks. Furthermore, our extensive analysis confirms that joint optimization of global classifiers and local orthogonal transformations yields superior performance and suggests broader applicability.
nan
Article 1150
Title@2025-05-27 (2): ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Title: ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | ReMA: Meta-Denken lernen für LLMs mit Multi-Agenten-Verstärkungs-Lernen | ReMA:学习多机构强化学习的LLMLM的元思维 2503.09501v3 |
Authors: Ziyu Wan, Yunxiang Li, Xiaoyu Wen, Yan Song, Hanjing Wang, Linyi Yang, Mark Schmidt, Jun Wang, Weinan Zhang, Shuyue Hu, Ying Wen
Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking – enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Additionally, we further extend ReMA to multi-turn interaction settings, leveraging turn-level ratio and parameter sharing to improve efficiency. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs. Our code can be found in https://github.com/ziyuwan/ReMA-public
nan
Article 1151
Title@2025-05-27 (2): How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Title: How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Wie können neurale Netzwerke mit Skalierungsgesetzen ausgebaut werden? Eine Umfrage und praktische Leitlinien | 如何提升具有扩展法的神经网络? 2502.12051v3 |
Authors: Ayan Sengupta, Yash Goel, Tanmoy Chakraborty
Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. However, recent studies highlighted their limitations across architectures, modalities, and deployment contexts. Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns. Moreover, scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches. In this survey, we synthesize insights from over 50 studies, examining the theoretical foundations, empirical findings, and practical implications of scaling laws. We also explore key challenges, including data efficiency, inference scaling, and architecture-specific constraints, advocating for adaptive scaling strategies tailored to real-world applications. We suggest that while scaling laws provide a useful guide, they do not always generalize across all architectures and training strategies.
nan
Article 1152
Title@2025-05-27 (2): Enhancing Time Series Forecasting via a Parallel Hybridization of ARIMA and Polynomial Classifiers
Title: Enhancing Time Series Forecasting via a Parallel Hybridization of ARIMA and Polynomial Classifiers | Verbesserung der Zeitreihenprognose über eine parallele Hybridisierung von ARIMA und Polynom-Klassifikatoren | 通过ARIMA和多边分类的平行混合预测增强时间序列 2505.06874v2 |
Authors: Thanh Son Nguyen, Van Thanh Nguyen, Dang Minh Duc Nguyen
Time series forecasting has attracted significant attention, leading to the de-velopment of a wide range of approaches, from traditional statistical meth-ods to advanced deep learning models. Among them, the Auto-Regressive Integrated Moving Average (ARIMA) model remains a widely adopted linear technique due to its effectiveness in modeling temporal dependencies in economic, industrial, and social data. On the other hand, polynomial classifi-ers offer a robust framework for capturing non-linear relationships and have demonstrated competitive performance in domains such as stock price pre-diction. In this study, we propose a hybrid forecasting approach that inte-grates the ARIMA model with a polynomial classifier to leverage the com-plementary strengths of both models. The hybrid method is evaluated on multiple real-world time series datasets spanning diverse domains. Perfor-mance is assessed based on forecasting accuracy and computational effi-ciency. Experimental results reveal that the proposed hybrid model consist-ently outperforms the individual models in terms of prediction accuracy, al-beit with a modest increase in execution time.
nan
Article 1153
Title@2025-05-27 (2): An Optimisation Framework for Unsupervised Environment Design
Title: An Optimisation Framework for Unsupervised Environment Design | Ein Rahmen für die Optimierung des unbeaufsichtigten Umweltdesigns | 无人监督环境设计优化框架 2505.20659v1 |
Authors: Nathan Monette, Alistair Letcher, Michael Beukman, Matthew T. Jackson, Alexander Rutherford, Alexander D. Goldie, Jakob N. Foerster
For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent’s generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing stronger theoretical guarantees for practical settings than prior work. Whereas previous methods relied on guarantees if they reach convergence, our framework employs a nonconvex-strongly-concave objective for which we provide a provably convergent algorithm in the zero-sum setting. We empirically verify the efficacy of our method, outperforming prior methods in a number of environments with varying difficulties.
nan
Article 1154
Title@2025-05-27 (2): When More is Less: Understanding Chain-of-Thought Length in LLMs
Title: When More is Less: Understanding Chain-of-Thought Length in LLMs | Wenn mehr weniger ist: Verstehst du die Kettenlänge in LLMs? | 越少越多: 了解LLM 中所寻求的链条长度 2502.07266v3 |
Authors: Yuyang Wu, Yifei Wang, Ziyu Ye, Tianqi Du, Stefanie Jegelka, Yisen Wang
Large Language Models (LLMs) employ Chain-of-Thought (CoT) reasoning to deconstruct complex problems. While longer CoTs are often presumed superior, this paper challenges that notion, arguing that longer is not always better. Drawing on combined evidence from real-world observations, controlled experiments, and theoretical analysis, we demonstrate that task accuracy typically follows an inverted U-shaped curve with CoT length, where performance initially improves but eventually decreases as the number of CoT steps increases. With controlled experiments, we further uncover the scaling behaviors of the optimal CoT length: it increases with task difficulty but decreases with model capability, exposing an inherent simplicity bias where more capable models favor shorter, more efficient CoT reasoning. This bias is also evident in Reinforcement Learning (RL) training, where models gravitate towards shorter CoTs as their accuracy improves. To have a deep understanding of these dynamics, we establish a simple theoretical model that formally proves these phenomena, including the optimal length’s scaling laws and the emergence of simplicity bias during RL. Guided by this framework, we demonstrate significant practical benefits from training with optimally-lengthed CoTs and employing length-aware filtering at inference. These findings offer both a principled understanding of the “overthinking” phenomenon and multiple practical guidelines for CoT calibration, enabling LLMs to achieve optimal reasoning performance with adaptive CoTs tailored to task complexity and model capability.
nan
Article 1155
Title@2025-05-27 (2): Prompting Decision Transformers for Zero-Shot Reach-Avoid Policies
Title: Prompting Decision Transformers for Zero-Shot Reach-Avoid Policies | Prompting Decision Transformers für Zero-Shot-Reach-Aoid-Politiken | 推动零热切无损政策决策变革者 2505.19337v2 |
Authors: Kevin Li, Marinka Zitnik
Offline goal-conditioned reinforcement learning methods have shown promise for reach-avoid tasks, where an agent must reach a target state while avoiding undesirable regions of the state space. Existing approaches typically encode avoid-region information into an augmented state space and cost function, which prevents flexible, dynamic specification of novel avoid-region information at evaluation time. They also rely heavily on well-designed reward and cost functions, limiting scalability to complex or poorly structured environments. We introduce RADT, a decision transformer model for offline, reward-free, goal-conditioned, avoid region-conditioned RL. RADT encodes goals and avoid regions directly as prompt tokens, allowing any number of avoid regions of arbitrary size to be specified at evaluation time. Using only suboptimal offline trajectories from a random policy, RADT learns reach-avoid behavior through a novel combination of goal and avoid-region hindsight relabeling. We benchmark RADT against 3 existing offline goal-conditioned RL models across 11 tasks, environments, and experimental settings. RADT generalizes in a zero-shot manner to out-of-distribution avoid region sizes and counts, outperforming baselines that require retraining. In one such zero-shot setting, RADT achieves 35.7% improvement in normalized cost over the best retrained baseline while maintaining high goal-reaching success. We apply RADT to cell reprogramming in biology, where it reduces visits to undesirable intermediate gene expression states during trajectories to desired target states, despite stochastic transitions and discrete, structured state dynamics.
nan
Article 1156
Title@2025-05-27 (2): New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class
Title: New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class | Neuer Paradigma der Adversarial Training: Freigabe von Genauigkeit-Robustheit-Trade-Off über Dummy-Klasse | 反向培训新范例:通过Dummi类实现释放准确性-交战交易 2410.12671v2 |
Authors: Yanyun Wang, Li Liu, Zi Liang, Yi R., Fung, Qingqing Ye, Haibo Hu
Adversarial Training (AT) is one of the most effective methods to enhance the robustness of Deep Neural Networks (DNNs). However, existing AT methods suffer from an inherent accuracy-robustness trade-off. Previous works have studied this issue under the current AT paradigm, but still face over 10% accuracy reduction without significant robustness improvement over simple baselines such as PGD-AT. This inherent trade-off raises a question: Whether the current AT paradigm, which assumes to learn corresponding benign and adversarial samples as the same class, inappropriately mixes clean and robust objectives that may be essentially inconsistent. In fact, our empirical results show that up to 40% of CIFAR-10 adversarial samples always fail to satisfy such an assumption across various AT methods and robust models, explicitly indicating the room for improvement of the current AT paradigm. To relax from this overstrict assumption and the tension between clean and robust learning, in this work, we propose a new AT paradigm by introducing an additional dummy class for each original class, aiming to accommodate hard adversarial samples with shifted distribution after perturbation. The robustness w.r.t. these adversarial samples can be achieved by runtime recovery from the predicted dummy classes to the corresponding original ones, without conflicting with the clean objective on accuracy of benign samples. Finally, based on our new paradigm, we propose a novel DUmmy Classes-based Adversarial Training (DUCAT) method that concurrently improves accuracy and robustness in a plug-and-play manner only relevant to logits, loss, and a proposed two-hot soft label-based supervised signal. Our method outperforms state-of-the-art (SOTA) benchmarks, effectively releasing the current trade-off. The code is available at https://github.com/FlaAI/DUCAT.
nan
Article 1157
Title@2025-05-27 (2): FRABench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities
Title: FRABench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities | FRABench und GenEval: Skalierung feinkörniger Aspekte Bewertung über Aufgaben, Modalitäten hinweg | FRA Bench和GenEval:扩大对各任务、方式、方式和方式的精细评价 2505.12795v2 |
Authors: Shibo Hong, Jiahao Ying, Haiyuan Liang, Mengdi Zhang, Jun Kuang, Jiazheng Zhang, Yixin Cao
Evaluating the open-ended outputs of large language models (LLMs) has become a bottleneck as model capabilities, task diversity, and modality coverage rapidly expand. Existing “LLM-as-a-Judge” evaluators are typically narrow in a few tasks, aspects, or modalities, and easily suffer from low consistency. In this paper, we argue that explicit, fine-grained aspect specification is the key to both generalizability and objectivity in automated evaluation. To this end, we propose a hierarchical aspect taxonomy encompassing 112 distinct aspects that unifies evaluation across four representative settings – Natural Language Generation, Image Understanding, Image Generation, and Interleaved Text-and-Image Generation. Building upon this taxonomy, we create FRABench, a benchmark comprising 60.4k pairwise samples with 325k evaluation labels obtained from a combination of human and LLM annotations. FRABench provides the first large-scale, multi-modal resource for training and meta-evaluating fine-grained LMM judges. Leveraging FRABench, we develop GenEval, a fine-grained evaluator generalizable across tasks and modalities. Experiments show that GenEval (i) attains high agreement with GPT-4o and expert annotators, (ii) transfers robustly to unseen tasks and modalities, and (iii) reveals systematic weaknesses of current LMMs on evaluation.
nan
Article 1158
Title@2025-05-27 (2): Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning
Title: Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning | Voronoi-Grid-basiertes Pareto-Front-Lernen und seine Anwendung auf kollaboratives Federated Learning | 以Voronoi-Grid为基础的Pareto阵线学习及其在联邦学习合作组织中的应用 2505.20648v1 |
Authors: Mengmeng Chen, Xiaohu Wu, Qiqi Liu, Tiantian He, Yew-Soon Ong, Yaochu Jin, Qicheng Lao, Han Yu
Multi-objective optimization (MOO) exists extensively in machine learning, and aims to find a set of Pareto-optimal solutions, called the Pareto front, e.g., it is fundamental for multiple avenues of research in federated learning (FL). Pareto-Front Learning (PFL) is a powerful method implemented using Hypernetworks (PHNs) to approximate the Pareto front. This method enables the acquisition of a mapping function from a given preference vector to the solutions on the Pareto front. However, most existing PFL approaches still face two challenges: (a) sampling rays in high-dimensional spaces; (b) failing to cover the entire Pareto Front which has a convex shape. Here, we introduce a novel PFL framework, called as PHN-HVVS, which decomposes the design space into Voronoi grids and deploys a genetic algorithm (GA) for Voronoi grid partitioning within high-dimensional space. We put forward a new loss function, which effectively contributes to more extensive coverage of the resultant Pareto front and maximizes the HV Indicator. Experimental results on multiple MOO machine learning tasks demonstrate that PHN-HVVS outperforms the baselines significantly in generating Pareto front. Also, we illustrate that PHN-HVVS advances the methodologies of several recent problems in the FL field. The code is available at https://github.com/buptcmm/phnhvvs}{https://github.com/buptcmm/phnhvvs.
nan
Article 1159
Title@2025-05-27 (2): Moment Expansions of the Energy Distance
Title: Moment Expansions of the Energy Distance | Momenterweiterungen der Energieentfernung | 扩大能源距离时间 2505.20647v1 |
Authors: Ian Langmore
The energy distance is used to test distributional equality, and as a loss function in machine learning. While $D^2(X, Y)=0$ only when $X\sim Y$, the sensitivity to different moments is of practical importance. This work considers $D^2(X, Y)$ in the case where the distributions are close. In this regime, $D^2(X, Y)$ is more sensitive to differences in the means $\bar{X}-\bar{Y}$, than differences in the covariances $\Delta$. This is due to the structure of the energy distance and is independent of dimension. The sensitivity to on versus off diagonal components of $\Delta$ is examined when $X$ and $Y$ are close to isotropic. Here a dimension dependent averaging occurs and, in many cases, off diagonal correlations contribute significantly less. Numerical results verify these relationships hold even when distributional assumptions are not strictly met.
nan
Article 1160
Title@2025-05-27 (2): Evaluating Training in Binarized Neural Networks Through the Lens of Algorithmic Information Theory
Title: Evaluating Training in Binarized Neural Networks Through the Lens of Algorithmic Information Theory | Bewertung der Ausbildung in Binarized Neural Networks durch die Linse der algorithmischen Informationstheorie | 通过分析信息理论的透镜评估神经网络的觉测培训 2505.20646v1 |
Authors: Eduardo Y. Sakabe, Felipe S. Abrahão, Alexandre Simões, Esther Colombini, Paula Costa, Ricardo Gudwin, Hector Zenil
Understanding and controlling the informational complexity of neural networks is a central challenge in machine learning, with implications for generalization, optimization, and model capacity. While most approaches rely on entropy-based loss functions and statistical metrics, these measures often fail to capture deeper, causally relevant algorithmic regularities embedded in network structure. We propose a shift toward algorithmic information theory, using Binarized Neural Networks (BNNs) as a first proxy. Grounded in algorithmic probability (AP) and the universal distribution it defines, our approach characterizes learning dynamics through a formal, causally grounded lens. We apply the Block Decomposition Method (BDM) – a scalable approximation of algorithmic complexity based on AP – and demonstrate that it more closely tracks structural changes during training than entropy, consistently exhibiting stronger correlations with training loss across varying model sizes and randomized training runs. These results support the view of training as a process of algorithmic compression, where learning corresponds to the progressive internalization of structured regularities. In doing so, our work offers a principled estimate of learning progression and suggests a framework for complexity-aware learning and regularization, grounded in first principles from information theory, complexity, and computability.
nan
Article 1161
Title@2025-05-27 (2): Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain
Title: Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain | Aufgabenoptimierte konvolutionäre recurrente Netzwerke richten sich an taktile Verarbeitung im Nagetierhirn | 与鼠脑中触摸处理相适应的 任务优化的革命经常网络 2505.18361v2 |
Authors: Trinity Chung, Yuchen Shen, Nathan C. L. Kong, Aran Nayebi
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent whisker-array simulator. We identify convolutional recurrent neural networks (ConvRNNs) as superior encoders to purely feedforward and state-space architectures for tactile categorization. Crucially, these ConvRNN-encoder-based EAD models achieve neural representations closely matching rodent somatosensory cortex, saturating the explainable neural variability and revealing a clear linear relationship between supervised categorization performance and neural alignment. Furthermore, contrastive self-supervised ConvRNN-encoder-based EADs, trained with tactile-specific augmentations, match supervised neural fits, serving as an ethologically-relevant, label-free proxy. For neuroscience, our findings highlight nonlinear recurrent processing as important for general-purpose tactile representations in somatosensory cortex, providing the first quantitative characterization of the underlying inductive biases in this system. For embodied AI, our results emphasize the importance of recurrent EAD architectures to handle realistic tactile inputs, along with tailored self-supervised learning methods for achieving robust tactile perception with the same type of sensors animals use to sense in unstructured environments.
nan
Article 1162
Title@2025-05-27 (2): Can Past Experience Accelerate LLM Reasoning?
Title: Can Past Experience Accelerate LLM Reasoning? | Kann vergangene Erfahrung LLM Reasoning beschleunigen? | 以往经验能否加快LLM理由解释? 2505.20643v1 |
Authors: Bo Pan, Liang Zhao
Allocating more compute to large language models (LLMs) reasoning has generally been demonstrated to improve their effectiveness, but also results in increased inference time. In contrast, humans can perform tasks faster and better with increased experience and exposure. Hence, this paper aims to investigate the question: Can LLMs also become faster at reasoning through recurrent exposure on relevant tasks, and if so, how can it be achieved? To address these questions, we first formalize the problem setting of LLM reasoning speedup systematically in the dimensions of task relevancy and compute budget calculation. We then propose SpeedupLLM, a theoretically guaranteed framework to implement and benchmark such reasoning speedup behaviour based on adaptive compute allocation and memory mechanisms. We further conduct comprehensive experiments to benchmark such behaviour across different question similarity levels, memory methods, and reasoning methods. Results show that LLMs can generally reason faster with past experience, achieving up to a 56% reduction in compute cost when equipped with appropriate memory and reasoning methods.
nan
Article 1163
Title@2025-05-27 (2): PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
Title: PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation | PosterO: Strukturierung von Layout-Strukturen zur Aktivierung von Sprachmodellen in der Generierung von generalisierten Content-Aware-Layouts | PosterO: 构建布局树以在通用内容软件布局生成中启用语言模型 2505.07843v2 |
Authors: HsiaoYuan Hsu, Yuxin Peng
In poster design, content-aware layout generation is crucial for automatically arranging visual-textual elements on the given image. With limited training data, existing work focused on image-centric enhancement. However, this neglects the diversity of layouts and fails to cope with shape-variant elements or diverse design intents in generalized settings. To this end, we proposed a layout-centric approach that leverages layout knowledge implicit in large language models (LLMs) to create posters for omnifarious purposes, hence the name PosterO. Specifically, it structures layouts from datasets as trees in SVG language by universal shape, design intent vectorization, and hierarchical node representation. Then, it applies LLMs during inference to predict new layout trees by in-context learning with intent-aligned example selection. After layout trees are generated, we can seamlessly realize them into poster designs by editing the chat with LLMs. Extensive experimental results have demonstrated that PosterO can generate visually appealing layouts for given images, achieving new state-of-the-art performance across various benchmarks. To further explore PosterO’s abilities under the generalized settings, we built PStylish7, the first dataset with multi-purpose posters and various-shaped elements, further offering a challenging test for advanced research.
nan
Article 1164
Title@2025-05-27 (2): Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation
Title: Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation | Rethinking MUSHRA: Bewältigung moderner Herausforderungen in der Text-zu-Speech-Bewertung | 重新思考MUSHRA:应对文本到语音评价中的现代挑战 2411.12719v3 |
Authors: Praveen Srinivasa Varadhan, Amogh Gulati, Ashwin Sankar, Srija Anand, Anirudh Gupta, Anirudh Mukherjee, Shiva Kumar Marepally, Ankur Bhatia, Saloni Jaju, Suvrat Bhooshan, Mitesh M. Khapra
Despite rapid advancements in TTS models, a consistent and robust human evaluation framework is still lacking. For example, MOS tests fail to differentiate between similar models, and CMOS’s pairwise comparisons are time-intensive. The MUSHRA test is a promising alternative for evaluating multiple TTS systems simultaneously, but in this work we show that its reliance on matching human reference speech unduly penalises the scores of modern TTS systems that can exceed human speech quality. More specifically, we conduct a comprehensive assessment of the MUSHRA test, focusing on its sensitivity to factors such as rater variability, listener fatigue, and reference bias. Based on our extensive evaluation involving 492 human listeners across Hindi and Tamil we identify two primary shortcomings: (i) reference-matching bias, where raters are unduly influenced by the human reference, and (ii) judgement ambiguity, arising from a lack of clear fine-grained guidelines. To address these issues, we propose two refined variants of the MUSHRA test. The first variant enables fairer ratings for synthesized samples that surpass human reference quality. The second variant reduces ambiguity, as indicated by the relatively lower variance across raters. By combining these approaches, we achieve both more reliable and more fine-grained assessments. We also release MANGO, a massive dataset of 246,000 human ratings, the first-of-its-kind collection for Indian languages, aiding in analyzing human preferences and developing automatic metrics for evaluating TTS systems.
nan
Article 1165
Title@2025-05-27 (2): Pointing the Way: Refining Radar-Lidar Localization Using Learned ICP Weights
Title: Pointing the Way: Refining Radar-Lidar Localization Using Learned ICP Weights | Den Weg weisen: Verfeinerung der Radar-Lidar-Lokalisierung mit erfahrenen ICP-Gewichten | 指向方向:利用比较方案所积累的重量改进雷达-里达尔的本地化 2309.08731v4 |
Authors: Daniil Lisus, Johann Laconte, Keenan Burnett, Ziyu Zhang, Timothy D. Barfoot
This paper presents a novel deep-learning-based approach to improve localizing radar measurements against lidar maps. This radar-lidar localization leverages the benefits of both sensors; radar is resilient against adverse weather, while lidar produces high-quality maps in clear conditions. However, owing in part to the unique artefacts present in radar measurements, radar-lidar localization has struggled to achieve comparable performance to lidar-lidar systems, preventing it from being viable for autonomous driving. This work builds on ICP-based radar-lidar localization by including a learned preprocessing step that weights radar points based on high-level scan information. To train the weight-generating network, we present a novel, stand-alone, open-source differentiable ICP library. The learned weights facilitate ICP by filtering out harmful radar points related to artefacts, noise, and even vehicles on the road. Combining an analytical approach with a learned weight reduces overall localization errors and improves convergence in radar-lidar ICP results run on real-world autonomous driving data. Our code base is publicly available to facilitate reproducibility and extensions.
nan
Article 1166
Title@2025-05-27 (2): GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration
Title: GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration | GMoE: Stärkung von LLMs Feinsteuerung über MoE Graph Collaboration | GMOE:通过教育部图表合作,赋予LLMs Fine-Turning女士权力 2412.16216v3 |
Authors: Ting Bai, Yue Yu, Le Huang, Zenan Xu, Zhe Zhao, Chuan Shi
The sparse Mixture-of-Experts (MoE) architecture of large language models (LLMs) confronts an inherent issue of load imbalance arising from the simplistic linear router strategy, which ultimately causes the instability and inefficient learning of LLMs. To address this challenge, we introduce a novel MoE graph-based framework $\textbf{GMoE}$, aimed at enhancing the collaboration among multiple experts. In GMoE, a graph router function is designed to capture the collaboration signals among experts. This enables all experts to dynamically allocate information derived from input data by sharing information with their neighboring experts. Moreover, we put forward two coordination strategies in GMoE: the $\textit{Poisson distribution-based distinction strategy}$ and the $\textit{Normal distribution-based balance strategy}$, to further release the capacity of each expert and increase the model stability in the fine-tuning of LLMs. Specifically, we leverage a parameter-efficient fine-tuning technique, i.e., Low-Rank Adaptation (LoRA), to implement the graph MoE architecture. Extensive experiments on four real-world benchmark datasets demonstrate the effectiveness of GMoE, showing the benefits of facilitating collaborations of multiple experts in LLM fine-tuning. The code of experimental implementation is available at https://github.com/BAI-LAB/GMoE
nan
Article 1167
Title@2025-05-27 (2): Non-identifiability distinguishes Neural Networks among Parametric Models
Title: Non-identifiability distinguishes Neural Networks among Parametric Models | Nicht-Identifizierbarkeit unterscheidet neurale Netzwerke zwischen parametrischen Modellen | 不可识别性将神经网络区分为参数模型 2504.18017v2 |
Authors: Sourav Chatterjee, Timothy Sudijono
One of the enduring problems surrounding neural networks is to identify the factors that differentiate them from traditional statistical models. We prove a pair of results which distinguish feedforward neural networks among parametric models at the population level, for regression tasks. Firstly, we prove that for any pair of random variables $(X,Y)$, neural networks always learn a nontrivial relationship between $X$ and $Y$, if one exists. Secondly, we prove that for reasonable smooth parametric models, under local and global identifiability conditions, there exists a nontrivial $(X,Y)$ pair for which the parametric model learns the constant predictor $\mathbb{E}[Y]$. Together, our results suggest that a lack of identifiability distinguishes neural networks among the class of smooth parametric models.
nan
Article 1168
Title@2025-05-27 (2): Scintillation pulse characterization with spectrum-inspired temporal neural networks: case studies on particle detector signals
Title: Scintillation pulse characterization with spectrum-inspired temporal neural networks: case studies on particle detector signals | Scintillation-Pulscharakterisierung mit spektruminspirierten zeitlichen neuronalen Netzwerken: Fallstudien zu Partikeldetektor-Signalen | 与受频谱启发的时时神经网络的闪烁脉冲定性:粒子探测器信号案例研究 2410.07267v3 |
Authors: Pengcheng Ai, Xiangming Sun, Zhi Deng, Xinchi Ran
Particle detectors based on scintillators are widely used in high-energy physics and astroparticle physics experiments, nuclear medicine imaging, industrial and environmental detection, etc. Precisely extracting scintillation signal characteristics at the event level is important for these applications, not only in respect of understanding the scintillator itself, but also kinds and physical property of incident particles. Recent researches demonstrate data-driven neural networks surpass traditional statistical methods, especially when the analytical form of signals is hard to obtain, or noise is significant. However, most densely connected or convolution-based networks fail to fully exploit the spectral and temporal structure of scintillation signals, leaving large space for performance improvement. In this paper, we propose a network architecture specially tailored for scintillation pulse characterization based on previous works on time series analysis. The core insight is that, by directly applying Fast Fourier Transform on original signals and utilizing different frequency components, the proposed network architecture can serve as a lightweight and enhanced representation learning backbone. We prove our idea in two case studies: (a) simulation data generated with the setting of the LUX dark matter detector, and (b) experimental electrical signals with fast electronics to emulate scintillation variations for the NICA/MPD calorimeter. The proposed model achieves significantly better results than the reference model in literature and densely connected models and demonstrates higher cost-efficiency than conventional machine learning methods.
nan
Article 1169
Title@2025-05-27 (2): Policy Design for Two-sided Platforms with Participation Dynamics
Title: Policy Design for Two-sided Platforms with Participation Dynamics | Politikgestaltung für zweiseitige Plattformen mit Partizipationsdynamik | 具有参与动态的双面平台政策设计 2502.01792v2 |
Authors: Haruka Kiyohara, Fan Yao, Sarah Dean
In two-sided platforms (e.g., video streaming or e-commerce), viewers and providers engage in interactive dynamics: viewers benefit from increases in provider populations, while providers benefit from increases in viewer population. Despite the importance of such “population effects” on long-term platform health, recommendation policies do not generally take the participation dynamics into account. This paper thus studies the dynamics and recommender policy design on two-sided platforms under the population effects for the first time. Our control- and game-theoretic findings warn against the use of the standard “myopic-greedy” policy and shed light on the importance of provider-side considerations (i.e., effectively distributing exposure among provider groups) to improve social welfare via population growth. We also present a simple algorithm to optimize long-term social welfare by taking the population effects into account, and demonstrate its effectiveness in synthetic and real-data experiments. Our experiment code is available at https://github.com/sdean-group/dynamics-two-sided-market.
nan
Article 1170
Title@2025-05-27 (2): Explaining Concept Shift with Interpretable Feature Attribution
Title: Explaining Concept Shift with Interpretable Feature Attribution | Erklären von Konzeptverschiebungen mit interpretierbarer Eigenschaftszuweisung | 解释解释概念转变与可解释性地物归属 2505.20634v1 |
Authors: Ruiqi Lyu, Alistair Turcan, Bryan Wilder
Regardless the amount of data a machine learning (ML) model is trained on, there will inevitably be data that differs from their training set, lowering model performance. Concept shift occurs when the distribution of labels conditioned on the features changes, making even a well-tuned ML model to have learned a fundamentally incorrect representation. Identifying these shifted features provides unique insight into how one dataset differs from another, considering the difference may be across a scientifically relevant dimension, such as time, disease status, population, etc. In this paper, we propose SGShift, a model for detecting concept shift in tabular data and attributing reduced model performance to a sparse set of shifted features. SGShift models concept shift with a Generalized Additive Model (GAM) and performs subsequent feature selection to identify shifted features. We propose further extensions of SGShift by incorporating knockoffs to control false discoveries and an absorption term to account for models with poor fit to the data. We conduct extensive experiments in synthetic and real data across various ML models and find SGShift can identify shifted features with AUC $>0.9$ and recall $>90\%$, often 2 or 3 times as high as baseline methods.
nan
Article 1171
Title@2025-05-27 (2): Adaptive Backtracking Line Search
Title: Adaptive Backtracking Line Search | Adaptive Rückverfolgungszeilensuche | 适应性后回跟踪线搜索 2408.13150v2 |
Authors: Joao V. Cavalcanti, Laurent Lessard, Ashia C. Wilson
Backtracking line search is foundational in numerical optimization. The basic idea is to adjust the step-size of an algorithm by a constant factor until some chosen criterion (e.g. Armijo, Descent Lemma) is satisfied. We propose a novel way to adjust step-sizes, replacing the constant factor used in regular backtracking with one that takes into account the degree to which the chosen criterion is violated, with no additional computational burden. This light-weight adjustment leads to significantly faster optimization, which we confirm by performing a variety of experiments on over fifteen real world datasets. For convex problems, we prove adaptive backtracking requires no more adjustments to produce a feasible step-size than regular backtracking does. For nonconvex smooth problems, we prove adaptive backtracking enjoys the same guarantees of regular backtracking. Furthermore, we prove adaptive backtracking preserves the convergence rates of gradient descent and its accelerated variant.
nan
Article 1172
Title@2025-05-27 (2): Test-Time Learning for Large Language Models
Title: Test-Time Learning for Large Language Models | Test-Time Learning für große Sprachmodelle | 大语言模型试验时间学习 2505.20633v1 |
Authors: Jinwu Hu, Zhitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, Mingkui Tan
While Large Language Models (LLMs) have exhibited remarkable emergent capabilities through extensive pre-training, they still face critical limitations in generalizing to specialized domains and handling diverse linguistic variations, known as distribution shifts. In this paper, we propose a Test-Time Learning (TTL) paradigm for LLMs, namely TLM, which dynamically adapts LLMs to target domains using only unlabeled test data during testing. Specifically, we first provide empirical evidence and theoretical insights to reveal that more accurate predictions from LLMs can be achieved by minimizing the input perplexity of the unlabeled test data. Based on this insight, we formulate the Test-Time Learning process of LLMs as input perplexity minimization, enabling self-supervised enhancement of LLM performance. Furthermore, we observe that high-perplexity samples tend to be more informative for model optimization. Accordingly, we introduce a Sample Efficient Learning Strategy that actively selects and emphasizes these high-perplexity samples for test-time updates. Lastly, to mitigate catastrophic forgetting and ensure adaptation stability, we adopt Low-Rank Adaptation (LoRA) instead of full-parameter optimization, which allows lightweight model updates while preserving more original knowledge from the model. We introduce the AdaptEval benchmark for TTL and demonstrate through experiments that TLM improves performance by at least 20% compared to original LLMs on domain knowledge adaptation.
nan
Article 1173
Title@2025-05-27 (2): Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Title: Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training | Einschließlich flexibler Bildkonditionierung in Text-zu-Video-Diffusionsmodelle ohne Training | 将灵活的图像条件纳入无培训的文本到视频传播模型 2505.20629v1 |
Authors: Bolin Lai, Sangmin Lee, Xu Cao, Xiang Li, James M. Rehg
Text-image-to-video (TI2V) generation is a critical problem for controllable video generation using both semantic and visual conditions. Most existing methods typically add visual conditions to text-to-video (T2V) foundation models by finetuning, which is costly in resources and only limited to a few predefined conditioning settings. To tackle this issue, we introduce a unified formulation for TI2V generation with flexible visual conditioning. Furthermore, we propose an innovative training-free approach, dubbed FlexTI2V, that can condition T2V foundation models on an arbitrary amount of images at arbitrary positions. Specifically, we firstly invert the condition images to noisy representation in a latent space. Then, in the denoising process of T2V models, our method uses a novel random patch swapping strategy to incorporate visual features into video representations through local image patches. To balance creativity and fidelity, we use a dynamic control mechanism to adjust the strength of visual conditioning to each video frame. Extensive experiments validate that our method surpasses previous training-free image conditioning methods by a notable margin. We also show more insights of our method by detailed ablation study and analysis.
nan
Article 1174
Title@2025-05-27 (2): Position: Adopt Constraints Over Penalties in Deep Learning
Title: Position: Adopt Constraints Over Penalties in Deep Learning | Position: Überstrapazierte Strafen im Deep Learning adoptieren | 职位:在深深学习中采用约束措施以凌驾刑罚 2505.20628v1 |
Authors: Juan Ramirez, Meraj Hashemizadeh, Simon Lacoste-Julien
Recent efforts toward developing trustworthy AI systems with accountability guarantees have led to a growing reliance on machine learning formulations that incorporate external requirements, or constraints. These requirements are often enforced through penalization–adding fixed-weight terms to the task loss. We argue that this approach is ill-suited, and that tailored constrained optimization methods should be adopted instead. In particular, no penalty coefficient may yield a solution that both satisfies the constraints and achieves good performance–i.e., one solving the constrained problem. Moreover, tuning these coefficients is costly, incurring significant time and computational overhead. In contrast, tailored constrained methods–such as the Lagrangian approach, which optimizes the penalization “coefficients” (the Lagrange multipliers) alongside the model–(i) truly solve the constrained problem and add accountability, (ii) eliminate the need for extensive penalty tuning, and (iii) integrate seamlessly with modern deep learning pipelines.
nan
Article 1175
Title@2025-05-27 (2): JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes
Title: JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes | JaxRobotarium: Schulung und Einsatz von Multi-Roboter-Politik in 10 Minuten | JaxRobotior:10分钟内培训和部署多机器人政策 2505.06771v2 |
Authors: Shalin Anand Jain, Jiazhen Liu, Siva Kailas, Harish Ravichandar
Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluation of their individual contributions. The Multi-Agent RL Benchmark and Learning Environment for the Robotarium (MARBLER) is an exciting recent step in providing a standardized robotics-relevant platform for MARL, by bridging the Robotarium testbed with existing MARL software infrastructure. However, MARBLER lacks support for parallelization and GPU/TPU execution, making the platform prohibitively slow compared to modern MARL environments and hindering adoption. We contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot RL (MRRL) policies with realistic robot dynamics and safety constraints, supporting parallelization and hardware acceleration. Our generalizable learning interface integrates easily with SOTA MARL libraries (e.g., JaxMARL). In addition, JaxRobotarium includes eight standardized coordination scenarios, including four novel scenarios that bring established MARL benchmark tasks (e.g., RWARE and Level-Based Foraging) to a robotics setting. We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim-to-real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation. Our code is available at https://github.com/GT-STAR-Lab/JaxRobotarium.
nan
Article 1176
Title@2025-05-27 (2): Knowledge Distillation Approach for SOS Fusion Staging: Towards Fully Automated Skeletal Maturity Assessment
Title: Knowledge Distillation Approach for SOS Fusion Staging: Towards Fully Automated Skeletal Maturity Assessment | Wissensdestillationsansatz für SOS-Fusionsstaging: Auf dem Weg zu einer vollautomatischen Skeletalreifebewertung | 利用知识蒸馏方法解决求求求融合问题:全面自动化骨骼成熟期评估 2505.21561v1 |
Authors: Omid Halimi Milani, Amanda Nikho, Marouane Tliba, Lauren Mills, Ahmet Enis Cetin, Mohammed H Elnagar
We introduce a novel deep learning framework for the automated staging of spheno-occipital synchondrosis (SOS) fusion, a critical diagnostic marker in both orthodontics and forensic anthropology. Our approach leverages a dual-model architecture wherein a teacher model, trained on manually cropped images, transfers its precise spatial understanding to a student model that operates on full, uncropped images. This knowledge distillation is facilitated by a newly formulated loss function that aligns spatial logits as well as incorporates gradient-based attention spatial mapping, ensuring that the student model internalizes the anatomically relevant features without relying on external cropping or YOLO-based segmentation. By leveraging expert-curated data and feedback at each step, our framework attains robust diagnostic accuracy, culminating in a clinically viable end-to-end pipeline. This streamlined approach obviates the need for additional pre-processing tools and accelerates deployment, thereby enhancing both the efficiency and consistency of skeletal maturation assessment in diverse clinical settings.
nan
Article 1177
Title@2025-05-27 (2): SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
Title: SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation | SeqPO-SiMT: Sequentielle Politikoptimierung für die gleichzeitige maschinelle Übersetzung | SeqPO-SIMT:同步机器翻译的序列政策优化 2505.20622v1 |
Authors: Ting Xu, Zhichao Huang, Jiankai Sun, Shanbo Cheng, Wai Lam
We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. In contrast to popular Reinforcement Learning from Human Feedback (RLHF) methods, such as PPO and DPO, which are typically applied in single-step tasks, SeqPO-SiMT effectively tackles the multi-step SiMT task. This intuitive framework allows the SiMT LLMs to simulate and refine the SiMT process using a tailored reward. We conduct experiments on six datasets from diverse domains for En to Zh and Zh to En SiMT tasks, demonstrating that SeqPO-SiMT consistently achieves significantly higher translation quality with lower latency. In particular, SeqPO-SiMT outperforms the supervised fine-tuning (SFT) model by 1.13 points in COMET, while reducing the Average Lagging by 6.17 in the NEWSTEST2021 En to Zh dataset. While SiMT operates with far less context than offline translation, the SiMT results of SeqPO-SiMT on 7B LLM surprisingly rival the offline translation of high-performing LLMs, including Qwen-2.5-7B-Instruct and LLaMA-3-8B-Instruct.
nan
Article 1178
Title@2025-05-27 (2): Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning
Title: Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning | Mehrstufige Zertifizierte Verteidigung gegen vergiftende Angriffe im Offline-Verstärkungslernen | 多级认证防卫,防止在离线强化学习中进行毒物攻击 2505.20621v1 |
Authors: Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah Erfani, Benjamin I. P. Rubinstein
Similar to other machine learning frameworks, Offline Reinforcement Learning (RL) is shown to be vulnerable to poisoning attacks, due to its reliance on externally sourced datasets, a vulnerability that is exacerbated by its sequential nature. To mitigate the risks posed by RL poisoning, we extend certified defenses to provide larger guarantees against adversarial manipulation, ensuring robustness for both per-state actions, and the overall expected cumulative reward. Our approach leverages properties of Differential Privacy, in a manner that allows this work to span both continuous and discrete spaces, as well as stochastic and deterministic environments – significantly expanding the scope and applicability of achievable guarantees. Empirical evaluations demonstrate that our approach ensures the performance drops to no more than $50\%$ with up to $7\%$ of the training data poisoned, significantly improving over the $0.008\%$ in prior work~\citep{wu_copa_2022}, while producing certified radii that is $5$ times larger as well. This highlights the potential of our framework to enhance safety and reliability in offline RL.
nan
Article 1179
Title@2025-05-27 (2): An Inexact Halpern Iteration with Application to Distributionally Robust Optimization
Title: An Inexact Halpern Iteration with Application to Distributionally Robust Optimization | Eine ungenaue Halpern-Iteration mit Anwendung zur distributiv robusten Optimierung | 用于分布强力优化优化的不精确 Halpern 迭代 2402.06033v3 |
Authors: Ling Liang, Zusen Xu, Kim-Chuan Toh, Jia-Jie Zhu
The Halpern iteration for solving monotone inclusion problems has gained increasing interests in recent years due to its simple form and appealing convergence properties. In this paper, we investigate the inexact variants of the scheme in both deterministic and stochastic settings. We conduct extensive convergence analysis and show that by choosing the inexactness tolerances appropriately, the inexact schemes admit an $O(k^{-1})$ convergence rate in terms of the (expected) residue norm. Our results relax the state-of-the-art inexactness conditions employed in the literature while sharing the same competitive convergence properties. We then demonstrate how the proposed methods can be applied for solving two classes of data-driven Wasserstein distributionally robust optimization problems that admit convex-concave min-max optimization reformulations. We highlight its capability of performing inexact computations for distributionally robust learning with stochastic first-order methods and for general nonlinear convex-concave loss functions, which are competitive in the literature.
nan
Article 1180
Title@2025-05-27 (2): SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds
Title: SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds | SoftPQ: Robuste Instance Segmentierungsbewertung über Soft Matching und Tunable Thresholds | 软PQ:通过软匹配和金枪鱼分量阈值进行强力实例分化评价 2505.12155v2 |
Authors: Ranit Karmakar, Simon F. Nørrelykke
Segmentation evaluation metrics traditionally rely on binary decision logic: predictions are either correct or incorrect, based on rigid IoU thresholds. Detection–based metrics such as F1 and mAP determine correctness at the object level using fixed overlap cutoffs, while overlap–based metrics like Intersection over Union (IoU) and Dice operate at the pixel level, often overlooking instance–level structure. Panoptic Quality (PQ) attempts to unify detection and segmentation assessment, but it remains dependent on hard-threshold matching–treating predictions below the threshold as entirely incorrect. This binary framing obscures important distinctions between qualitatively different errors and fails to reward gradual model improvements. We propose SoftPQ, a flexible and interpretable instance segmentation metric that redefines evaluation as a graded continuum rather than a binary classification. SoftPQ introduces tunable upper and lower IoU thresholds to define a partial matching region and applies a sublinear penalty function to ambiguous or fragmented predictions. These extensions allow SoftPQ to exhibit smoother score behavior, greater robustness to structural segmentation errors, and more informative feedback for model development and evaluation. Through controlled perturbation experiments, we show that SoftPQ captures meaningful differences in segmentation quality that existing metrics overlook, making it a practical and principled alternative for both benchmarking and iterative model refinement.
nan
Article 1181
Title@2025-05-27 (2): Real-Time Stress Monitoring, Detection, and Management in College Students: A Wearable Technology and Machine-Learning Approach
Title: Real-Time Stress Monitoring, Detection, and Management in College Students: A Wearable Technology and Machine-Learning Approach | Echtzeit-Stress-Monitoring, Detection und Management in College-Studenten: Ein Wearable-Technologie- und Machine-Learning-Ansatz | 大学生实时应力监测、检测和管理:穿戴技术和机械学习方法 2505.15974v2 |
Authors: Alan Ta, Nilsu Salgin, Mustafa Demir, Kala Phillips Reindel, Ranjana K. Mehta, Anthony McDonald, Carly McCord, Farzan Sasangohar
College students are increasingly affected by stress, anxiety, and depression, yet face barriers to traditional mental health care. This study evaluated the efficacy of a mobile health (mHealth) intervention, Mental Health Evaluation and Lookout Program (mHELP), which integrates a smartwatch sensor and machine learning (ML) algorithms for real-time stress detection and self-management. In a 12-week randomized controlled trial (n = 117), participants were assigned to a treatment group using mHELP’s full suite of interventions or a control group using the app solely for real-time stress logging and weekly psychological assessments. The primary outcome, “Moments of Stress” (MS), was assessed via physiological and self-reported indicators and analyzed using Generalized Linear Mixed Models (GLMM) approaches. Similarly, secondary outcomes of psychological assessments, including the Generalized Anxiety Disorder-7 (GAD-7) for anxiety, the Patient Health Questionnaire (PHQ-8) for depression, and the Perceived Stress Scale (PSS), were also analyzed via GLMM. The finding of the objective measure, MS, indicates a substantial decrease in MS among the treatment group compared to the control group, while no notable between-group differences were observed in subjective scores of anxiety (GAD-7), depression (PHQ-8), or stress (PSS). However, the treatment group exhibited a clinically meaningful decline in GAD-7 and PSS scores. These findings underscore the potential of wearable-enabled mHealth tools to reduce acute stress in college populations and highlight the need for extended interventions and tailored features to address chronic symptoms like depression.
nan
Article 1182
Title@2025-05-27 (2): LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
Title: LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers | LLM-FE: Automatisiertes Feature Engineering für Tabellendaten mit LLMs als Evolutionsoptimierer | LLM-FE: 制表数据的自动地貌工程,LLMM作为进化优化器 2503.14434v2 |
Authors: Nikhil Abhyankar, Parshin Shojaee, Chandan K. Reddy
Automated feature engineering plays a critical role in improving predictive model performance for tabular learning tasks. Traditional automated feature engineering methods are limited by their reliance on pre-defined transformations within fixed, manually designed search spaces, often neglecting domain knowledge. Recent advances using Large Language Models (LLMs) have enabled the integration of domain knowledge into the feature engineering process. However, existing LLM-based approaches use direct prompting or rely solely on validation scores for feature selection, failing to leverage insights from prior feature discovery experiments or establish meaningful reasoning between feature generation and data-driven performance. To address these challenges, we propose LLM-FE, a novel framework that combines evolutionary search with the domain knowledge and reasoning capabilities of LLMs to automatically discover effective features for tabular learning tasks. LLM-FE formulates feature engineering as a program search problem, where LLMs propose new feature transformation programs iteratively, and data-driven feedback guides the search process. Our results demonstrate that LLM-FE consistently outperforms state-of-the-art baselines, significantly enhancing the performance of tabular prediction models across diverse classification and regression benchmarks.
nan
Article 1183
Title@2025-05-27 (2): PhySense: Sensor Placement Optimization for Accurate Physics Sensing
Title: PhySense: Sensor Placement Optimization for Accurate Physics Sensing | PhySense: Sensor-Platzierungs-Optimierung für präzise Physik Sensing | 感应:精确物理遥感传感器定位优化 2505.18190v2 |
Authors: Yuezhou Ma, Haixu Wu, Hang Zhou, Huikun Weng, Jianmin Wang, Mingsheng Long
Physics sensing plays a central role in many scientific and engineering domains, which inherently involves two coupled tasks: reconstructing dense physical fields from sparse observations and optimizing scattered sensor placements to observe maximum information. While deep learning has made rapid advances in sparse-data reconstruction, existing methods generally omit optimization of sensor placements, leaving the mutual enhancement between reconstruction and placement on the shelf. To change this suboptimal practice, we propose PhySense, a synergistic two-stage framework that learns to jointly reconstruct physical fields and to optimize sensor placements, both aiming for accurate physics sensing. The first stage involves a flow-based generative model enhanced by cross-attention to adaptively fuse sparse observations. Leveraging the reconstruction feedback, the second stage performs sensor placement via projected gradient descent to satisfy spatial constraints. We further prove that the learning objectives of the two stages are consistent with classical variance-minimization principles, providing theoretical guarantees. Extensive experiments across three challenging benchmarks, especially a 3D geometry dataset, indicate PhySense achieves state-of-the-art physics sensing accuracy and discovers informative sensor placements previously unconsidered.
nan
Article 1184
Title@2025-05-27 (2): Intelligent Incident Hypertension Prediction in Obstructive Sleep Apnea
Title: Intelligent Incident Hypertension Prediction in Obstructive Sleep Apnea | Intelligente Hypertonie-Vorhersage bei obstruktiver Schlafapnoe | 阻碍睡眠的智能性事件超强度预测 2505.20615v1 |
Authors: Omid Halimi Milani, Ahmet Enis Cetin, Bharati Prasad
Obstructive sleep apnea (OSA) is a significant risk factor for hypertension, primarily due to intermittent hypoxia and sleep fragmentation. Predicting whether individuals with OSA will develop hypertension within five years remains a complex challenge. This study introduces a novel deep learning approach that integrates Discrete Cosine Transform (DCT)-based transfer learning to enhance prediction accuracy. We are the first to incorporate all polysomnography signals together for hypertension prediction, leveraging their collective information to improve model performance. Features were extracted from these signals and transformed into a 2D representation to utilize pre-trained 2D neural networks such as MobileNet, EfficientNet, and ResNet variants. To further improve feature learning, we introduced a DCT layer, which transforms input features into a frequency-based representation, preserving essential spectral information, decorrelating features, and enhancing robustness to noise. This frequency-domain approach, coupled with transfer learning, is especially beneficial for limited medical datasets, as it leverages rich representations from pre-trained networks to improve generalization. By strategically placing the DCT layer at deeper truncation depths within EfficientNet, our model achieved a best area under the curve (AUC) of 72.88%, demonstrating the effectiveness of frequency-domain feature extraction and transfer learning in predicting hypertension risk in OSA patients over a five-year period.
nan
Article 1185
Title@2025-05-27 (2): A Concentration Bound for TD(0) with Function Approximation
Title: A Concentration Bound for TD(0) with Function Approximation | Ein Konzentrationsbund für TD(0) mit Funktionsannäherung | 具有函数接近度的 TD(0) 的浓度界值 2312.10424v3 |
Authors: Siddharth Chandak, Vivek S. Borkar
We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$’ for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.
nan
Article 1186
Title@2025-05-27 (2): REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning
Title: REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning | REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning | 实际检索: 数学理由的回收增量精液预言 2505.20613v1 |
Authors: Ziju Shen, Naohao Huang, Fanyi Yang, Yutong Wang, Guoxiong Gao, Tianyi Xu, Jiedong Jiang, Wanyi He, Pu Yang, Mengzhou Sun, Haocheng Ju, Peihao Wu, Bryan Dai, Bin Dong
Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we developed HERALD-AF, a data extraction pipeline that converts natural language math problems into formal statements, and a new open-source Lean 4 interactive environment (Jixia-interactive) to facilitate synthesis data collection. In our experiments, our prover using only supervised fine-tune achieves competitive results with a 23.7% success rate (Pass@64) on the ProofNet dataset-comparable to state-of-the-art (SOTA) models. To further evaluate our approach, we introduce FATE-M, a new benchmark focused on algebraic problems, where our prover achieves a SOTA success rate of 56.7% (Pass@64).
nan
Article 1187
Title@2025-05-27 (2): Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Title: Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models | Roboflow100-VL: Ein Multi-Domain-Objekterkennungs-Benchmark für Vision-Language-Modelle | 机器人流100-VL:愿景-语言模型多功能物体探测基准 2505.20612v1 |
Authors: Peter Robicheaux, Matvei Popov, Anish Madan, Isaac Robinson, Joseph Nelson, Deva Ramanan, Neehar Peri
Vision-language models (VLMs) trained on internet-scale data achieve remarkable zero-shot detection performance on common objects like car, truck, and pedestrian. However, state-of-the-art models still struggle to generalize to out-of-distribution classes, tasks and imaging modalities not typically found in their pre-training. Rather than simply re-training VLMs on more visual data, we argue that one should align VLMs to new concepts with annotation instructions containing a few visual examples and rich textual descriptions. To this end, we introduce Roboflow100-VL, a large-scale collection of 100 multi-modal object detection datasets with diverse concepts not commonly found in VLM pre-training. We evaluate state-of-the-art models on our benchmark in zero-shot, few-shot, semi-supervised, and fully-supervised settings, allowing for comparison across data regimes. Notably, we find that VLMs like GroundingDINO and Qwen2.5-VL achieve less than 2% zero-shot accuracy on challenging medical imaging datasets within Roboflow100-VL, demonstrating the need for few-shot concept alignment. Our code and dataset are available at https://github.com/roboflow/rf100-vl/ and https://universe.roboflow.com/rf100-vl/
nan
Article 1188
Title@2025-05-27 (2): Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings
Title: Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings | Hierarchische Mamba trifft auf Hyperbolische Geometrie: Ein neues Paradigma für strukturierte Spracheinbettungen | 等级式 Mamba 相遇超双曲几何: 结构化语言嵌入的新范式 2505.18973v2 |
Authors: Sarang Patil, Ashish Parmanand Pandey, Ioannis Koutis, Mengjia Xu
Selective state-space models have achieved great success in long-sequence modeling. However, their capacity for language representation, especially in complex hierarchical reasoning tasks, remains underexplored. Most large language models rely on flat Euclidean embeddings, limiting their ability to capture latent hierarchies. To address this limitation, we propose Hierarchical Mamba (HiM), integrating efficient Mamba2 with exponential growth and curved nature of hyperbolic geometry to learn hierarchy-aware language embeddings for deeper linguistic understanding. Mamba2-processed sequences are projected to the Poincare ball (via tangent-based mapping) or Lorentzian manifold (via cosine and sine-based mapping) with “learnable” curvature, optimized with a combined hyperbolic loss. Our HiM model facilitates the capture of relational distances across varying hierarchical levels, enabling effective long-range reasoning. This makes it well-suited for tasks like mixed-hop prediction and multi-hop inference in hierarchical classification. We evaluated our HiM with four linguistic and medical datasets for mixed-hop prediction and multi-hop inference tasks. Experimental results demonstrated that: 1) Both HiM models effectively capture hierarchical relationships for four ontological datasets, surpassing Euclidean baselines. 2) HiM-Poincare captures fine-grained semantic distinctions with higher h-norms, while HiM-Lorentz provides more stable, compact, and hierarchy-preserving embeddings favoring robustness over detail.
nan
Article 1189
Title@2025-05-27 (2): Integral Imprecise Probability Metrics
Title: Integral Imprecise Probability Metrics | Integral Ungenaue Wahrscheinlichkeits-Metriken | 综合综合不全性障碍 概率概率度量 2505.16156v2 |
Authors: Siu Lun Chau, Michele Caprio, Krikamol Muandet
Quantifying differences between probability distributions is fundamental to statistics and machine learning, primarily for comparing statistical uncertainty. In contrast, epistemic uncertainty (EU) – due to incomplete knowledge – requires richer representations than those offered by classical probability. Imprecise probability (IP) theory offers such models, capturing ambiguity and partial belief. This has driven growing interest in imprecise probabilistic machine learning (IPML), where inference and decision-making rely on broader uncertainty models – highlighting the need for metrics beyond classical probability. This work introduces the Integral Imprecise Probability Metric (IIPM) framework, a Choquet integral-based generalisation of classical Integral Probability Metric (IPM) to the setting of capacities – a broad class of IP models encompassing many existing ones, including lower probabilities, probability intervals, belief functions, and more. Theoretically, we establish conditions under which IIPM serves as a valid metric and metrises a form of weak convergence of capacities. Practically, IIPM not only enables comparison across different IP models but also supports the quantification of epistemic uncertainty within a single IP model. In particular, by comparing an IP model with its conjugate, IIPM gives rise to a new class of EU measures – Maximum Mean Imprecision – which satisfy key axiomatic properties proposed in the Uncertainty Quantification literature. We validate MMI through selective classification experiments, demonstrating strong empirical performance against established EU measures, and outperforming them when classical methods struggle to scale to a large number of classes. Our work advances both theory and practice in IPML, offering a principled framework for comparing and quantifying epistemic uncertainty under imprecision.
nan
Article 1190
Title@2025-05-27 (2): Improving Generative Inverse Design of Rectangular Patch Antennas with Test Time Optimization
Title: Improving Generative Inverse Design of Rectangular Patch Antennas with Test Time Optimization | Verbesserung des generativen Inversen Designs von rechteckigen Patchantennen mit Testzeitoptimierung | 改进带测试时间优化的矩形补边天线的生成反向设计 2505.18188v2 |
Authors: Beck LaBash, Shahriar Khushrushahi, Fabian Ruehle
We propose a two-stage deep learning framework for the inverse design of rectangular patch antennas. Our approach leverages generative modeling to learn a latent representation of antenna frequency response curves and conditions a subsequent generative model on these responses to produce feasible antenna geometries. We further demonstrate that leveraging search and optimization techniques at test-time improves the accuracy of the generated designs and enables consideration of auxiliary objectives such as manufacturability. Our approach generalizes naturally to different design criteria, and can be easily adapted to more complex geometric design spaces.
nan
Article 1191
Title@2025-05-27 (2): InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling
Title: InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling | InstGenIE: Generative Bildbearbeitung mit Mask-aware Caching und Scheduling effizient gemacht | InstGenie: 生成图像编辑, 高效使用防面具图像缓冲和排程 2505.20600v1 |
Authors: Xiaoxiao Jiang, Suyi Li, Lingyun Yang, Tianyu Feng, Zhipeng Di, Weiyi Lu, Guoxuan Zhu, Xiu Lin, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang
Generative image editing using diffusion models has become a prevalent application in today’s AI cloud services. In production environments, image editing typically involves a mask that specifies the regions of an image template to be edited. The use of masks provides direct control over the editing process and introduces sparsity in the model inference. In this paper, we present InstGenIE, a system that efficiently serves image editing requests. The key insight behind InstGenIE is that image editing only modifies the masked regions of image templates while preserving the original content in the unmasked areas. Driven by this insight, InstGenIE judiciously skips redundant computations associated with the unmasked areas by reusing cached intermediate activations from previous inferences. To mitigate the high cache loading overhead, InstGenIE employs a bubble-free pipeline scheme that overlaps computation with cache loading. Additionally, to reduce queuing latency in online serving while improving the GPU utilization, InstGenIE proposes a novel continuous batching strategy for diffusion model serving, allowing newly arrived requests to join the running batch in just one step of denoising computation, without waiting for the entire batch to complete. As heterogeneous masks induce imbalanced loads, InstGenIE also develops a load balancing strategy that takes into account the loads of both computation and cache loading. Collectively, InstGenIE outperforms state-of-the-art diffusion serving systems for image editing, achieving up to 3x higher throughput and reducing average request latency by up to 14.7x while ensuring image quality.
nan
Article 1192
Title@2025-05-27 (2): Randomly Sampled Language Reasoning Problems Explain Limits of LLMs
Title: Randomly Sampled Language Reasoning Problems Explain Limits of LLMs | Zufällig gemusterte Sprachbegründungsprobleme erklären Grenzen von LLMs | 随机抽样 语言原因问题解释LLMM限制 2501.02825v5 |
Authors: Kavi Gupta, Kate Sanders, Armando Solar-Lezama
While LLMs have revolutionized the field of machine learning due to their high performance across a range of tasks, they are known to perform poorly in planning, hallucinate false answers, have degraded performance on less canonical versions of the same task, and answer incorrectly on a variety of specific prompts. There are several emerging theories of LLM performance with some predictive power, among them that LLMs lack world modeling ability, that they have an undesirable bias towards an autoregressive prior, and that they perform less well on more novel problems. The existing literature on novelty has focused on tasks of relatively high complexity, studying perturbations of canonical but complex problems. In this paper, we attempt to isolate novelty as a factor in LLM underperformance. To this end, we consider an extremely simple domain: next token prediction on simple language tasks. The twist is that these language tasks are unseen, as they are randomly drawn from a large, parsimoniously defined set of languages arising from simple grammar rules. This allows us to isolate the effect of task novelty and see if it is sufficient to explain low performance. We find that LLMs uniformly underperform n-gram models (which do not have the capacity for world modeling) on these tasks, both when used as next token predictors and as reasoners.
nan
Article 1193
Title@2025-05-26 (1): GenMol: A Drug Discovery Generalist with Discrete Diffusion
Title: GenMol: A Drug Discovery Generalist with Discrete Diffusion | GenMol: Ein Drug Discovery Generalist mit diskreter Diffusion | GenMol: 具有分辨扩散作用的药物发现通俗主义者 2501.06158v2 |
Authors: Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Yuxing Peng, Saee Paliwal, Weili Nie, Arash Vahdat
Drug discovery is a complex process that involves multiple stages and tasks. However, existing molecular generative models can only tackle some of these tasks. We present Generalist Molecular generative model (GenMol), a versatile framework that uses only a single discrete diffusion model to handle diverse drug discovery scenarios. GenMol generates Sequential Attachment-based Fragment Embedding (SAFE) sequences through non-autoregressive bidirectional parallel decoding, thereby allowing the utilization of a molecular context that does not rely on the specific token ordering while having better sampling efficiency. GenMol uses fragments as basic building blocks for molecules and introduces fragment remasking, a strategy that optimizes molecules by regenerating masked fragments, enabling effective exploration of chemical space. We further propose molecular context guidance (MCG), a guidance method tailored for masked discrete diffusion of GenMol. GenMol significantly outperforms the previous GPT-based model in de novo generation and fragment-constrained generation, and achieves state-of-the-art performance in goal-directed hit generation and lead optimization. These results demonstrate that GenMol can tackle a wide range of drug discovery tasks, providing a unified and versatile approach for molecular design.
nan
Article 1194
Title@2025-05-26 (1): Prot2Token: A Unified Framework for Protein Modeling via Next-Token Prediction
Title: Prot2Token: A Unified Framework for Protein Modeling via Next-Token Prediction | Prot2Token: Ein einheitliches Framework für Proteinmodellierung über Next-Token-Vorhersage | Prot2Token:通过次声预测建立蛋白模型的统一框架 2505.20589v1 |
Authors: Mahdi Pourmirzaei, Farzaneh Esmaili, Salhuldin Alqarghuli, Mohammadreza Pourmirzaei, Ye Han, Kai Chen, Mohsen Rezaei, Duolin Wang, Dong Xu
The diverse nature of protein prediction tasks has traditionally necessitated specialized models, hindering the development of broadly applicable and computationally efficient Protein Language Models (PLMs). In this work, we introduce Prot2Token, a unified framework that overcomes these challenges by converting a wide spectrum of protein-related predictions, from sequence-level properties and residue-specific attributes to complex inter-protein interactions, into a standardized next-token prediction format. At its core, Prot2Token employs an autoregressive decoder, conditioned on embeddings from pre-trained protein encoders and guided by learnable task tokens, to perform diverse predictions. This architecture uniquely facilitates multi-task learning, enabling a single model to master numerous tasks with improved efficiency. We present extensive experimental validation across a variety of benchmarks, demonstrating Prot2Tokens strong predictive power in different types of protein-prediction tasks. Key results include significant speedups (e.g., near 1000x over AlphaFold2 with MSA) and performance often matching or exceeding specialized approaches. Beyond that, we introduce an auxiliary self-supervised decoder pre-training approach to improve spatially sensitive task performance. Prot2Token thus offers a significant step towards a versatile, high-throughput paradigm for protein modeling, promising to accelerate biological discovery and the development of novel therapeutics. The code is available at https://github.com/mahdip72/prot2token .
nan
Article 1195
Title@2025-05-26 (1): Bidirectional Variational Autoencoders
Title: Bidirectional Variational Autoencoders | Bidirektionale Variationale Autoencoder | 双向多向自动自动编码器 2505.16074v2 |
Authors: Bart Kosko, Olaoluwa Adigun
We present the new bidirectional variational autoencoder (BVAE) network architecture. The BVAE uses a single neural network both to encode and decode instead of an encoder-decoder network pair. The network encodes in the forward direction and decodes in the backward direction through the same synaptic web. Simulations compared BVAEs and ordinary VAEs on the four image tasks of image reconstruction, classification, interpolation, and generation. The image datasets included MNIST handwritten digits, Fashion-MNIST, CIFAR-10, and CelebA-64 face images. The bidirectional structure of BVAEs cut the parameter count by almost 50% and still slightly outperformed the unidirectional VAEs.
nan
Article 1196
Title@2025-05-26 (1): Balancing Performance and Costs in Best Arm Identification
Title: Balancing Performance and Costs in Best Arm Identification | Ausgewogene Leistung und Kosten bei der Ermittlung der besten Waffen | 平衡最佳武器识别的性能和费用 2505.20583v1 |
Authors: Michael O. Harding, Kirthevasan Kandasamy
We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners as to how to choose an approach and corresponding budget or confidence parameter. We propose a new formalism to avoid this dilemma altogether by minimizing a risk functional which explicitly balances the performance of the recommended arm and the cost incurred by learning this arm. In this framework, a cost is incurred for each observation during the sampling phase, and upon recommending an arm, a performance penalty is incurred for identifying a suboptimal arm. The learner’s goal is to minimize the sum of the penalty and cost. This new regime mirrors the priorities of many practitioners, e.g. maximizing profit in an A/B testing framework, better than classical fixed budget or confidence settings. We derive theoretical lower bounds for the risk of each of two choices for the performance penalty, the probability of misidentification and the simple regret, and propose an algorithm called DBCARE to match these lower bounds up to polylog factors on nearly all problem instances. We then demonstrate the performance of DBCARE on a number of simulated models, comparing to fixed budget and confidence algorithms to show the shortfalls of existing BAI paradigms on this problem.
nan
Article 1197
Title@2025-05-26 (1): Training a Generally Curious Agent
Title: Training a Generally Curious Agent | Ein allgemein neugieriger Agent ausbilden | a 训练一般好奇剂 2502.17543v3 |
Authors: Fahim Tajwar, Yiding Jiang, Abitha Thankaraj, Sumaita Sadia Rahman, J Zico Kolter, Jeff Schneider, Ruslan Salakhutdinov
Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present Paprika, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with Paprika can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach’s primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.
nan
Article 1198
Title@2025-05-26 (1): Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL
Title: Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL | Strg-DNA: Kontrollierbare Zell-Typ-spezifische Regulatorische DNA-Design über eingeschränkte RL | Ctrl-DNA:通过受控RL设计可控细胞-Type-具体监管DNA 2505.20578v1 |
Authors: Xingyu Chen, Shihao Ma, Runsheng Lin, Jiecong Lin, Bo Wang
Designing regulatory DNA sequences that achieve precise cell-type-specific gene expression is crucial for advancements in synthetic biology, gene therapy and precision medicine. Although transformer-based language models (LMs) can effectively capture patterns in regulatory DNA, their generative approaches often struggle to produce novel sequences with reliable cell-specific activity. Here, we introduce Ctrl-DNA, a novel constrained reinforcement learning (RL) framework tailored for designing regulatory DNA sequences with controllable cell-type specificity. By formulating regulatory sequence design as a biologically informed constrained optimization problem, we apply RL to autoregressive genomic LMs, enabling the models to iteratively refine sequences that maximize regulatory activity in targeted cell types while constraining off-target effects. Our evaluation on human promoters and enhancers demonstrates that Ctrl-DNA consistently outperforms existing generative and RL-based approaches, generating high-fitness regulatory sequences and achieving state-of-the-art cell-type specificity. Moreover, Ctrl-DNA-generated sequences capture key cell-type-specific transcription factor binding sites (TFBS), short DNA motifs recognized by regulatory proteins that control gene expression, demonstrating the biological plausibility of the generated sequences.
nan
Article 1199
Title@2025-05-26 (1): Emotion Classification In-Context in Spanish
Title: Emotion Classification In-Context in Spanish | Emotion Classification In-Context auf Spanisch | 西班牙文《情感分类西班牙文内引文》 2505.20571v1 |
Authors: Bipul Thapa, Gabriel Cofre
Classifying customer feedback into distinct emotion categories is essential for understanding sentiment and improving customer experience. In this paper, we classify customer feedback in Spanish into three emotion categories–positive, neutral, and negative–using advanced NLP and ML techniques. Traditional methods translate feedback from widely spoken languages to less common ones, resulting in a loss of semantic integrity and contextual nuances inherent to the original language. To address this limitation, we propose a hybrid approach that combines TF-IDF with BERT embeddings, effectively transforming Spanish text into rich numerical representations that preserve the semantic depth of the original language by using a Custom Stacking Ensemble (CSE) approach. To evaluate emotion classification, we utilize a range of models, including Logistic Regression, KNN, Bagging classifier with LGBM, and AdaBoost. The CSE model combines these classifiers as base models and uses a one-vs-all Logistic Regression as the meta-model. Our experimental results demonstrate that CSE significantly outperforms the individual and BERT model, achieving a test accuracy of 93.3% on the native Spanish dataset–higher than the accuracy obtained from the translated version. These findings underscore the challenges of emotion classification in Spanish and highlight the advantages of combining vectorization techniques like TF-IDF with BERT for improved accuracy. Our results provide valuable insights for businesses seeking to leverage emotion classification to enhance customer feedback analysis and service improvements.
nan
Article 1200
Title@2025-05-26 (1): Bi-Level Unsupervised Feature Selection
Title: Bi-Level Unsupervised Feature Selection | Bi-Level-Unüberwachte Feature-Auswahl | 双级不受监督的地物选择 2505.20563v1 |
Authors: Jingjing Liu, Xiansen Ju, Xianchao Xiu, Wanquan Liu
Unsupervised feature selection (UFS) is an important task in data engineering. However, most UFS methods construct models from a single perspective and often fail to simultaneously evaluate feature importance and preserve their inherent data structure, thus limiting their performance. To address this challenge, we propose a novel bi-level unsupervised feature selection (BLUFS) method, including a clustering level and a feature level. Specifically, at the clustering level, spectral clustering is used to generate pseudo-labels for representing the data structure, while a continuous linear regression model is developed to learn the projection matrix. At the feature level, the $\ell_{2,0}$-norm constraint is imposed on the projection matrix for more effectively selecting features. To the best of our knowledge, this is the first work to combine a bi-level framework with the $\ell_{2,0}$-norm. To solve the proposed bi-level model, we design an efficient proximal alternating minimization (PAM) algorithm, whose subproblems either have explicit solutions or can be computed by fast solvers. Furthermore, we establish the convergence result and computational complexity. Finally, extensive experiments on two synthetic datasets and eight real datasets demonstrate the superiority of BLUFS in clustering and classification tasks.
nan
Article 1201
Title@2025-05-26 (1): Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
Title: Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning | Jenseits von Markovian: Reflektierende Exploration über Bayes-Adaptive RL für LLM-Reasoning | 马尔科维安之后:通过Bayes-Adapative RL进行反射勘探,用于LLM 理由分析 2505.20561v1 |
Authors: Shenao Zhang, Yaqing Wang, Yinxiao Liu, Tianqi Liu, Peter Grabowski, Eugene Ie, Zhaoran Wang, Yunxuan Li
Large Language Models (LLMs) trained via Reinforcement Learning (RL) have exhibited strong reasoning capabilities and emergent reflective behaviors, such as backtracking and error correction. However, conventional Markovian RL confines exploration to the training phase to learn an optimal deterministic policy and depends on the history contexts only through the current state. Therefore, it remains unclear whether reflective reasoning will emerge during Markovian RL training, or why they are beneficial at test time. To remedy this, we recast reflective exploration within the Bayes-Adaptive RL framework, which explicitly optimizes the expected return under a posterior distribution over Markov decision processes. This Bayesian formulation inherently incentivizes both reward-maximizing exploitation and information-gathering exploration via belief updates. Our resulting algorithm, BARL, instructs the LLM to stitch and switch strategies based on the observed outcomes, offering principled guidance on when and how the model should reflectively explore. Empirical results on both synthetic and mathematical reasoning tasks demonstrate that BARL outperforms standard Markovian RL approaches at test time, achieving superior token efficiency with improved exploration effectiveness. Our code is available at https://github.com/shenao-zhang/BARL.
nan
Article 1202
Title@2025-05-26 (1): Advancing Molecular Machine Learning Representations with Stereoelectronics-Infused Molecular Graphs
Title: Advancing Molecular Machine Learning Representations with Stereoelectronics-Infused Molecular Graphs | Advancing Molecular Machine Learning Representations mit stereoelectronics-infused Molecular Graphs | 具有立体电子成份式分子图的分子机学习演示 2408.04520v2 |
Authors: Daniil A. Boiko, Thiago Reschützegger, Benjamin Sanchez-Lengeling, Samuel M. Blau, Gabe Gomes
Molecular representation is a critical element in our understanding of the physical world and the foundation for modern molecular machine learning. Previous molecular machine learning models have employed strings, fingerprints, global features, and simple molecular graphs that are inherently information-sparse representations. However, as the complexity of prediction tasks increases, the molecular representation needs to encode higher fidelity information. This work introduces a novel approach to infusing quantum-chemical-rich information into molecular graphs via stereoelectronic effects, enhancing expressivity and interpretability. Learning to predict the stereoelectronics-infused representation with a tailored double graph neural network workflow enables its application to any downstream molecular machine learning task without expensive quantum chemical calculations. We show that the explicit addition of stereoelectronic information significantly improves the performance of message-passing 2D machine learning models for molecular property prediction. We show that the learned representations trained on small molecules can accurately extrapolate to much larger molecular structures, yielding chemical insight into orbital interactions for previously intractable systems, such as entire proteins, opening new avenues of molecular design. Finally, we have developed a web application (simg.cheme.cmu.edu) where users can rapidly explore stereoelectronic information for their own molecular systems.
nan
Article 1203
Title@2025-05-26 (1): Causal Composition Diffusion Model for Closed-loop Traffic Generation
Title: Causal Composition Diffusion Model for Closed-loop Traffic Generation | Causal Composition Diffusion Modell für die Closed-Loop-Verkehrserzeugung | 闭闭环交通流量生成原因构成传播模式 2412.17920v3 |
Authors: Haohong Lin, Xin Huang, Tung Phan-Minh, David S. Hayden, Huan Zhang, Ding Zhao, Siddhartha Srinivasa, Eric M. Wolff, Hongge Chen
Simulation is critical for safety evaluation in autonomous driving, particularly in capturing complex interactive behaviors. However, generating realistic and controllable traffic scenarios in long-tail situations remains a significant challenge. Existing generative models suffer from the conflicting objective between user-defined controllability and realism constraints, which is amplified in safety-critical contexts. In this work, we introduce the Causal Compositional Diffusion Model (CCDiff), a structure-guided diffusion framework to address these challenges. We first formulate the learning of controllable and realistic closed-loop simulation as a constrained optimization problem. Then, CCDiff maximizes controllability while adhering to realism by automatically identifying and injecting causal structures directly into the diffusion process, providing structured guidance to enhance both realism and controllability. Through rigorous evaluations on benchmark datasets and in a closed-loop simulator, CCDiff demonstrates substantial gains over state-of-the-art approaches in generating realistic and user-preferred trajectories. Our results show CCDiff’s effectiveness in extracting and leveraging causal structures, showing improved closed-loop performance based on key metrics such as collision rate, off-road rate, FDE, and comfort.
nan
Article 1204
Title@2025-05-26 (1): Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text
Title: Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text | Task-informierte Anti-Kurriculum durch Masken verbessert Downstream-Performance auf Text | 通过遮罩改进文字下流业绩,以任务化的反文体 2502.12953v2 |
Authors: Andrei Jarca, Florinel Alin Croitoru, Radu Tudor Ionescu
Masked language modeling has become a widely adopted unsupervised technique to pre-train large language models (LLMs). However, the process of selecting tokens for masking is random, and the percentage of masked tokens is typically fixed for the entire training process. In this paper, we propose to adjust the masking ratio and to decide which tokens to mask based on a novel task-informed anti-curriculum learning scheme. First, we harness task-specific knowledge about useful and harmful tokens in order to determine which tokens to mask. Second, we propose a cyclic decaying masking ratio, which corresponds to an anti-curriculum schedule (from hard to easy). We exemplify our novel task-informed anti-curriculum by masking (TIACBM) approach across three diverse downstream tasks: sentiment analysis, text classification by topic, and authorship attribution. Our findings suggest that TIACBM enhances the ability of the model to focus on key task-relevant features, contributing to statistically significant performance gains across tasks. We release our code at https://github.com/JarcaAndrei/TIACBM.
nan
Article 1205
Title@2025-05-26 (1): Learning a Pessimistic Reward Model in RLHF
Title: Learning a Pessimistic Reward Model in RLHF | Ein pessimistisches Belohnungsmodell in RLHF lernen | 在RLHF学习悲观奖励模式 2505.20556v1 |
Authors: Yinglun Xu, Hangoo Kang, Tarun Suresh, Yuxuan Wan, Gagandeep Singh
This work proposes `PET’, a novel pessimistic reward fine-tuning method, to learn a pessimistic reward model robust against reward hacking in offline reinforcement learning from human feedback (RLHF). Traditional reward modeling techniques in RLHF train an imperfect reward model, on which a KL regularization plays a pivotal role in mitigating reward hacking when optimizing a policy. Such an intuition-based method still suffers from reward hacking, and the policies with large KL divergence from the dataset distribution are excluded during learning. In contrast, we show that when optimizing a policy on a pessimistic reward model fine-tuned through PET, reward hacking can be prevented without relying on any regularization. We test our methods on the standard TL;DR summarization dataset. We find that one can learn a high-quality policy on our pessimistic reward without using any regularization. Such a policy has a high KL divergence from the dataset distribution while having high performance in practice. In summary, our work shows the feasibility of learning a pessimistic reward model against reward hacking. The agent can greedily search for the policy with a high pessimistic reward without suffering from reward hacking.
nan
Article 1206
Title@2025-05-26 (1): A ZeNN architecture to avoid the Gaussian trap
Title: A ZeNN architecture to avoid the Gaussian trap | Eine ZeNN-Architektur, um die Gaussische Falle zu vermeiden | 避免高斯陷阱的 ZeNN 建筑 2505.20553v1 |
Authors: Luís Carvalho, João L. Costa, José Mourão, Gonçalo Oliveira
We propose a new simple architecture, Zeta Neural Networks (ZeNNs), in order to overcome several shortcomings of standard multi-layer perceptrons (MLPs). Namely, in the large width limit, MLPs are non-parametric, they do not have a well-defined pointwise limit, they lose non-Gaussian attributes and become unable to perform feature learning; moreover, finite width MLPs perform poorly in learning high frequencies. The new ZeNN architecture is inspired by three simple principles from harmonic analysis: i) Enumerate the perceptons and introduce a non-learnable weight to enforce convergence; ii) Introduce a scaling (or frequency) factor; iii) Choose activation functions that lead to near orthogonal systems. We will show that these ideas allow us to fix the referred shortcomings of MLPs. In fact, in the infinite width limit, ZeNNs converge pointwise, they exhibit a rich asymptotic structure beyond Gaussianity, and perform feature learning. Moreover, when appropriate activation functions are chosen, (finite width) ZeNNs excel at learning high-frequency features of functions with low dimensional domains.
nan
Article 1207
Title@2025-05-26 (1): Estimating Motor Symptom Presence and Severity in Parkinson’s Disease from Wrist Accelerometer Time Series using ROCKET and InceptionTime
Title: Estimating Motor Symptom Presence and Severity in Parkinson’s Disease from Wrist Accelerometer Time Series using ROCKET and InceptionTime | Abschätzung von Motorsymptome und Schweregrad bei Parkinson-Krankheit aus der Wrist Accelerometer Time Serie mit ROCKET und InceptionTime | 利用 ROCKET 和 受孕时间从风速计时间序列中估计帕金森氏病的机动症状存在和严重性 2304.11265v3 |
Authors: Cedric Donié, Neha Das, Satoshi Endo, Sandra Hirche
Parkinson’s disease (PD) is a neurodegenerative condition characterized by frequently changing motor symptoms, necessitating continuous symptom monitoring for more targeted treatment. Classical time series classification and deep learning techniques have demonstrated limited efficacy in monitoring PD symptoms using wearable accelerometer data due to complex PD movement patterns and the small size of available datasets. We investigate InceptionTime and RandOm Convolutional KErnel Transform (ROCKET) as they are promising for PD symptom monitoring. InceptionTime’s high learning capacity is well-suited to modeling complex movement patterns, while ROCKET is suited to small datasets. With random search methodology, we identify the highest-scoring InceptionTime architecture and compare its performance to ROCKET with a ridge classifier and a multi-layer perceptron (MLP) on wrist motion data from PD patients. Our findings indicate that all approaches can learn to estimate tremor severity and bradykinesia presence with moderate performance but encounter challenges in detecting dyskinesia. Among the presented approaches, ROCKET demonstrates higher scores in identifying dyskinesia, whereas InceptionTime exhibits slightly better performance in tremor and bradykinesia estimation. Notably, both methods outperform the multi-layer perceptron. In conclusion, InceptionTime can classify complex wrist motion time series and holds potential for continuous symptom monitoring in PD with further development.
nan
Article 1208
Title@2025-05-26 (1): TAPIP3D: Tracking Any Point in Persistent 3D Geometry
Title: TAPIP3D: Tracking Any Point in Persistent 3D Geometry | TAPIP3D: Verfolgung eines beliebigen Punktes in persistenter 3D-Geometrie | TAPIP3D:跟踪持久性三维几何中的任何点 2504.14717v2 |
Authors: Bowei Zhang, Lei Ke, Adam W. Harley, Katerina Fragkiadaki
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular RGB and RGB-D videos. TAPIP3D represents videos as camera-stabilized spatio-temporal feature clouds, leveraging depth and camera motion information to lift 2D video features into a 3D world space where camera movement is effectively canceled out. Within this stabilized 3D representation, TAPIP3D iteratively refines multi-frame motion estimates, enabling robust point tracking over long time horizons. To handle the irregular structure of 3D point distributions, we propose a 3D Neighborhood-to-Neighborhood (N2N) attention mechanism - a 3D-aware contextualization strategy that builds informative, spatially coherent feature neighborhoods to support precise trajectory estimation. Our 3D-centric formulation significantly improves performance over existing 3D point tracking methods and even surpasses state-of-the-art 2D pixel trackers in accuracy when reliable depth is available. The model supports inference in both camera-centric (unstabilized) and world-centric (stabilized) coordinates, with experiments showing that compensating for camera motion leads to substantial gains in tracking robustness. By replacing the conventional 2D square correlation windows used in prior 2D and 3D trackers with a spatially grounded 3D attention mechanism, TAPIP3D achieves strong and consistent results across multiple 3D point tracking benchmarks. Project Page: https://tapip3d.github.io
nan
Article 1209
Title@2025-05-26 (1): Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling
Title: Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling | Erreichen von Anpassungsfähigkeit und Optimität für mehrarmige Banditen mit Expenential-Kullback Leibler Maillard Sampling | 利用Expernitial-Kullback Leiber Leiber Maillard抽样,实现多武装强盗的适应性和最佳性 2502.14379v2 |
Authors: Hao Qin, Kwang-Sung Jun, Chicheng Zhang
We study the problem of $K$-armed bandits with reward distributions belonging to a one-parameter exponential distribution family. In the literature, several criteria have been proposed to evaluate the performance of such algorithms, including Asymptotic Optimality, Minimax Optimality, Sub-UCB, and variance-adaptive worst-case regret bound. Thompson Sampling-based and Upper Confidence Bound-based algorithms have been employed to achieve some of these criteria. However, none of these algorithms simultaneously satisfy all the aforementioned criteria. In this paper, we design an algorithm, Exponential Kullback-Leibler Maillard Sampling (abbrev. Exp-KL-MS), that can achieve multiple optimality criteria simultaneously, including Asymptotic Optimality, Minimax Optimality with a $\sqrt{\ln (K)}$ factor, Sub-UCB, and variance-adaptive worst-case regret bound.
nan
Article 1210
Title@2025-05-26 (1): Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
Title: Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes | Quantum Speedups bei der Bedauernsanalyse von Unendlichen Horizon durchschnittlichen Markov-Entscheidungsprozessen | 对无限地平地平平线平均回报Markov决定程序进行遗憾分析时的量量加速 2310.11684v4 |
Authors: Bhargav Ganguly, Yang Xu, Vaneet Aggarwal
This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent’s engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm that harnesses quantum signals acquired by the agent through efficient quantum mean estimation techniques. Through thorough theoretical analysis, we demonstrate that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm achieves a regret bound of $\tilde{\mathcal{O}}(1)$, a significant improvement over the $\tilde{\mathcal{O}}(\sqrt{T})$ bound exhibited by classical counterparts.
nan
Article 1211
Title@2025-05-26 (1): RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Title: RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs | RL nur im Namen? Analyse der strukturellen Annahmen im RL-Post-Training für LLMs | 仅限名称的RL?分析在RL为LLMs提供的培训后培训中的结构假设 2505.13697v2 |
Authors: Soumya Rani Samineni, Durgesh Kalwar, Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati
Reinforcement learning-based post-training of large language models (LLMs) has recently gained attention, particularly following the release of DeepSeek R1, which applied GRPO for fine-tuning. Amid the growing hype around improved reasoning abilities attributed to RL post-training, we critically examine the formulation and assumptions underlying these methods. We start by highlighting the popular structural assumptions made in modeling LLM training as a Markov Decision Process (MDP), and show how they lead to a degenerate MDP that doesn’t quite need the RL/GRPO apparatus. The two critical structural assumptions include (1) making the MDP states be just a concatenation of the actions-with states becoming the context window and the actions becoming the tokens in LLMs and (2) splitting the reward of a state-action trajectory uniformly across the trajectory. Through a comprehensive analysis, we demonstrate that these simplifying assumptions make the approach effectively equivalent to an outcome-driven supervised learning. Our experiments on benchmarks including GSM8K and Countdown using Qwen-2.5 base models show that iterative supervised fine-tuning, incorporating both positive and negative samples, achieves performance comparable to GRPO-based training. We will also argue that the structural assumptions indirectly incentivize the RL to generate longer sequences of intermediate tokens-which in turn feeds into the narrative of “RL generating longer thinking traces.” While RL may well be a very useful technique for improving the reasoning abilities of LLMs, our analysis shows that the simplistic structural assumptions made in modeling the underlying MDP render the popular LLM RL frameworks and their interpretations questionable.
nan
Article 1212
Title@2025-05-26 (1): Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models
Title: Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models | Kovariate-adjusted Deep Causal Learning für heterogene Panel-Datenmodelle | 异质小组数据模型的共变调整深因学习 2505.20536v1 |
Authors: Guanhao Zhou, Yuefeng Han, Xiufan Yu
This paper studies the task of estimating heterogeneous treatment effects in causal panel data models, in the presence of covariate effects. We propose a novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models, that employs flexible model structures and powerful neural network architectures to cohesively deal with the underlying heterogeneity and nonlinearity of both panel units and covariate effects. The proposed CoDEAL integrates nonlinear covariate effect components (parameterized by a feed-forward neural network) with nonlinear factor structures (modeled by a multi-output autoencoder) to form a heterogeneous causal panel model. The nonlinear covariate component offers a flexible framework for capturing the complex influences of covariates on outcomes. The nonlinear factor analysis enables CoDEAL to effectively capture both cross-sectional and temporal dependencies inherent in the data panel. This latent structural information is subsequently integrated into a customized matrix completion algorithm, thereby facilitating more accurate imputation of missing counterfactual outcomes. Moreover, the use of a multi-output autoencoder explicitly accounts for heterogeneity across units and enhances the model interpretability of the latent factors. We establish theoretical guarantees on the convergence of the estimated counterfactuals, and demonstrate the compelling performance of the proposed method using extensive simulation studies and a real data application.
nan
Article 1213
Title@2025-05-26 (1): Rotary Masked Autoencoders are Versatile Learners
Title: Rotary Masked Autoencoders are Versatile Learners | Rotary Masked Autoencoder sind vielseitige Lerner | 扶轮式遮罩自动算术员是多功能学习者 2505.20535v1 |
Authors: Uros Zivanovic, Serafina Di Gioia, Andre Scaffidi, Martín de los Rios, Gabriella Contardo, Roberto Trotta
Applying Transformers to irregular time-series typically requires specializations to their baseline architecture, which can result in additional computational overhead and increased method complexity. We present the Rotary Masked Autoencoder (RoMAE), which utilizes the popular Rotary Positional Embedding (RoPE) method for continuous positions. RoMAE is an extension to the Masked Autoencoder (MAE) that enables representation learning with multidimensional continuous positional information while avoiding any time-series-specific architectural specializations. We showcase RoMAE’s performance on a variety of modalities including irregular and multivariate time-series, images, and audio, demonstrating that RoMAE surpasses specialized time-series architectures on difficult datasets such as the DESC ELAsTiCC Challenge while maintaining MAE’s usual performance across other modalities. In addition, we investigate RoMAE’s ability to reconstruct the embedded continuous positions, demonstrating that including learned embeddings in the input sequence breaks RoPE’s relative position property.
nan
Article 1214
Title@2025-05-26 (1): HiPoNet: A Multi-View Simplicial Complex Network for High Dimensional Point-Cloud and Single-Cell Data
Title: HiPoNet: A Multi-View Simplicial Complex Network for High Dimensional Point-Cloud and Single-Cell Data | HiPoNet: Ein Multi-View-Komplexnetzwerk für hochdimensionale Point-Cloud- und Single-Cell-Daten | HipoNet:高多面点和单细胞数据多视图简易复杂的网络 2502.07746v2 |
Authors: Siddharth Viswanath, Hiren Madhu, Dhananjay Bhaskar, Jake Kovalic, David R Johnson, Christopher Tape, Ian Adelstein, Rex Ying, Michael Perlmutter, Smita Krishnaswamy
In this paper, we propose HiPoNet, an end-to-end differentiable neural network for regression, classification, and representation learning on high-dimensional point clouds. Our work is motivated by single-cell data which can have very high-dimensionality –exceeding the capabilities of existing methods for point clouds which are mostly tailored for 3D data. Moreover, modern single-cell and spatial experiments now yield entire cohorts of datasets (i.e., one data set for every patient), necessitating models that can process large, high-dimensional point-clouds at scale. Most current approaches build a single nearest-neighbor graph, discarding important geometric and topological information. In contrast, HiPoNet models the point-cloud as a set of higher-order simplicial complexes, with each particular complex being created using a reweighting of features. This method thus generates multiple constructs corresponding to different views of high-dimensional data, which in biology offers the possibility of disentangling distinct cellular processes. It then employs simplicial wavelet transforms to extract multiscale features, capturing both local and global topology from each view. We show that geometric and topological information is preserved in this framework both theoretically and empirically. We showcase the utility of HiPoNet on point-cloud level tasks, involving classification and regression of entire point-clouds in data cohorts. Experimentally, we find that HiPoNet outperforms other point-cloud and graph-based models on single-cell data. We also apply HiPoNet to spatial transcriptomics datasets using spatial coordinates as one of the views. Overall, HiPoNet offers a robust and scalable solution for high-dimensional data analysis.
nan
Article 1215
Title@2025-05-26 (1): One-shot Robust Federated Learning of Independent Component Analysis
Title: One-shot Robust Federated Learning of Independent Component Analysis | One-shot Robust Federated Learning of Independent Component Analysis | 强力学习独立构成部分分析 2505.20532v1 |
Authors: Dian Jin, Xin Bing, Yuqian Zhang
This paper investigates a general robust one-shot aggregation framework for distributed and federated Independent Component Analysis (ICA) problem. We propose a geometric median-based aggregation algorithm that leverages $k$-means clustering to resolve the permutation ambiguity in local client estimations. Our method first performs k-means to partition client-provided estimators into clusters and then aggregates estimators within each cluster using the geometric median. This approach provably remains effective even in highly heterogeneous scenarios where at most half of the clients can observe only a minimal number of samples. The key theoretical contribution lies in the combined analysis of the geometric median’s error bound-aided by sample quantiles-and the maximum misclustering rates of the aforementioned solution of $k$-means. The effectiveness of the proposed approach is further supported by simulation studies conducted under various heterogeneous settings.
nan
Article 1216
Title@2025-05-26 (1): Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate
Title: Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate | Vorhersage-erweitert Monte Carlo: Eine Machine-Learning-Ansicht auf Steuerungsvariate | 预测增强的蒙特卡洛:关于控制Variatte的机械学习观点 2412.11257v2 |
Authors: Fengpei Li, Haoxian Chen, Jiahe Lin, Arkin Gupta, Xiaowei Tan, Honglei Zhao, Gang Xu, Yuriy Nevmyvaka, Agostino Capponi, Henry Lam
For many complex simulation tasks spanning areas such as healthcare, engineering, and finance, Monte Carlo (MC) methods are invaluable due to their unbiased estimates and precise error quantification. Nevertheless, Monte Carlo simulations often become computationally prohibitive, especially for nested, multi-level, or path-dependent evaluations lacking effective variance reduction techniques. While machine learning (ML) surrogates appear as natural alternatives, naive replacements typically introduce unquantifiable biases. We address this challenge by introducing Prediction-Enhanced Monte Carlo (PEMC), a framework that leverages modern ML models as learned predictors, using cheap and parallelizable simulation as features, to output unbiased evaluation with reduced variance and runtime. PEMC can also be viewed as a “modernized” view of control variates, where we consider the overall computation-cost-aware variance reduction instead of per-replication reduction, while bypassing the closed-form mean function requirement and maintaining the advantageous unbiasedness and uncertainty quantifiability of Monte Carlo. We illustrate PEMC’s broader efficacy and versatility through three examples: first, equity derivatives such as variance swaps under stochastic local volatility models; second, interest rate derivatives such as swaption pricing under the Heath-Jarrow-Morton (HJM) interest-rate model. Finally, we showcase PEMC in a socially significant context - ambulance dispatch and hospital load balancing - where accurate mortality rate estimates are key for ethically sensitive decision-making. Across these diverse scenarios, PEMC consistently reduces variance while preserving unbiasedness, highlighting its potential as a powerful enhancement to standard Monte Carlo baselines.
nan
Article 1217
Title@2025-05-26 (1): Fast Calculation of Feature Contributions in Boosting Trees
Title: Fast Calculation of Feature Contributions in Boosting Trees | Schnelle Berechnung von Feature-Beiträgen bei der Förderung von Bäumen | 快速计算推动树的特性贡献 2407.03515v2 |
Authors: Zhongli Jiang, Min Zhang, Dabao Zhang
Recently, several fast algorithms have been proposed to decompose predicted value into Shapley values, enabling individualized feature contribution analysis in tree models. While such local decomposition offers valuable insights, it underscores the need for a global evaluation of feature contributions. Although coefficients of determination ($R^2$) allow for comparative assessment of individual features, individualizing $R^2$ is challenged by the underlying quadratic losses. To address this, we propose Q-SHAP, an efficient algorithm that reduces the computational complexity of calculating Shapley values for quadratic losses to polynomial time. Our simulations show that Q-SHAP not only improves computational efficiency but also enhances the accuracy of feature-specific $R^2$ estimates.
nan
Article 1218
Title@2025-05-26 (1): Training Articulatory Inversion Models for Inter-Speaker Consistency
Title: Training Articulatory Inversion Models for Inter-Speaker Consistency | Training Artikulatorische Inversionsmodelle für die Konsistenz zwischen den Lautsprechern | 供发言者间和谐使用的培训用人工转换模型 2505.20529v1 |
Authors: Charles McGhee, Mark J. F. Gales, Kate M. Knill
Acoustic-to-Articulatory Inversion (AAI) attempts to model the inverse mapping from speech to articulation. Exact articulatory prediction from speech alone may be impossible, as speakers can choose different forms of articulation seemingly without reference to their vocal tract structure. However, once a speaker has selected an articulatory form, their productions vary minimally. Recent works in AAI have proposed adapting Self-Supervised Learning (SSL) models to single-speaker datasets, claiming that these single-speaker models provide a universal articulatory template. In this paper, we investigate whether SSL-adapted models trained on single and multi-speaker data produce articulatory targets which are consistent across speaker identities for English and Russian. We do this through the use of a novel evaluation method which extracts articulatory targets using minimal pair sets. We also present a training method which can improve inter-speaker consistency using only speech data.
nan
Article 1219
Title@2025-05-26 (1): DYMAG: Rethinking Message Passing Using Dynamical-systems-based Waveforms
Title: DYMAG: Rethinking Message Passing Using Dynamical-systems-based Waveforms | DYMAG: Nachricht neu denken Passieren mit Dynamisch-Systeme-basierten Wellenformen | DYMAG: 利用动态系统波形重新思考信息传递方式 2309.09924v5 |
Authors: Dhananjay Bhaskar, Xingzhi Sun, Yanlei Zhang, Charles Xu, Arman Afrasiyabi, Siddharth Viswanath, Oluwadamilola Fasina, Maximilian Nickel, Guy Wolf, Michael Perlmutter, Smita Krishnaswamy
We present DYMAG, a graph neural network based on a novel form of message aggregation. Standard message-passing neural networks, which often aggregate local neighbors via mean-aggregation, can be regarded as convolving with a simple rectangular waveform which is non-zero only on 1-hop neighbors of every vertex. Here, we go beyond such local averaging. We will convolve the node features with more sophisticated waveforms generated using dynamics such as the heat equation, wave equation, and the Sprott model (an example of chaotic dynamics). Furthermore, we use snapshots of these dynamics at different time points to create waveforms at many effective scales. Theoretically, we show that these dynamic waveforms can capture salient information about the graph including connected components, connectivity, and cycle structures even with no features. Empirically, we test DYMAG on both real and synthetic benchmarks to establish that DYMAG outperforms baseline models on recovery of graph persistence, generating parameters of random graphs, as well as property prediction for proteins, molecules and materials. Our code is available at https://github.com/KrishnaswamyLab/DYMAG.
nan
Article 1220
Title@2025-05-26 (1): Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks
Title: Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks | Lernpolitische Ausschüsse für effektive Personalisierung in MDPs mit unterschiedlichen Aufgaben | 在有不同任务的多边发展方案中促进有效个性化的学习政策委员会 2503.01885v2 |
Authors: Luise Ge, Michael Lanier, Anindya Sarkar, Bengisu Guresti, Chongjie Zhang, Yevgeniy Vorobeychik
Many dynamic decision problems, such as robotic control, involve a series of tasks, many of which are unknown at training time. Typical approaches for these problems, such as multi-task and meta reinforcement learning, do not generalize well when the tasks are diverse. On the other hand, approaches that aim to tackle task diversity, such as using task embedding as policy context and task clustering, typically lack performance guarantees and require a large number of training tasks. To address these challenges, we propose a novel approach for learning a policy committee that includes at least one near-optimal policy with high probability for tasks encountered during execution. While we show that this problem is in general inapproximable, we present two practical algorithmic solutions. The first yields provable approximation and task sample complexity guarantees when tasks are low-dimensional (the best we can do due to inapproximability), whereas the second is a general and practical gradient-based approach. In addition, we provide a provable sample complexity bound for few-shot learning. Our experiments on MuJoCo and Meta-World show that the proposed approach outperforms state-of-the-art multi-task, meta-, and task clustering baselines in training, generalization, and few-shot learning, often by a large margin. Our code is available at https://github.com/CERL-WUSTL/PACMAN.
nan
Article 1221
Title@2025-05-26 (1): Towards Fully FP8 GEMM LLM Training at Scale
Title: Towards Fully FP8 GEMM LLM Training at Scale | Auf dem Weg zum vollständigen RP8 GEMM LLM Training auf Scale | GEMM GEMM LLM 大规模培训 2505.20524v1 |
Authors: Alejandro Hernández-Cano, Dhia Garbaya, Imanol Schlag, Martin Jaggi
Despite the significant potential of FP8 data formats for large language model (LLM) pre-training, their adoption has been limited due to challenges in maintaining stability at scale. Existing approaches often rely on suboptimal fine-grained FP8 kernels or fall back to higher-precision matrix multiplications (GEMMs) in sensitive components, such as attention projections, compromising potential throughput gains. We introduce a new class of LLM architectures that, for the first time, support FP8 computation for all GEMMs within transformer blocks during both forward and backward passes. This enables unprecedented throughput gains, particularly at scale, while matching the downstream performance of standard BF16 training. Our architecture design reduces large outlier activations, promoting stable long-term FP8 training. In addition, we identify key metrics to monitor low-precision training and predict potential future divergences.
nan
Article 1222
Title@2025-05-26 (1): Scaling over Scaling: Exploring Test-Time Scaling Pareto in Large Reasoning Models
Title: Scaling over Scaling: Exploring Test-Time Scaling Pareto in Large Reasoning Models | Skalierung über Skalierung: Untersuchung von Test-Zeit-Skalierung Pareto in großen vernünftigen Modellen | 缩放过缩放: 探索大型理由模型中的测试时间缩放派 2505.20522v1 |
Authors: Jian Wang, Boyan Zhu, Chak Tou Leong, Yongqi Li, Wenjie Li
Large reasoning models (LRMs) have exhibited the capacity of enhancing reasoning performance via internal test-time scaling. Building upon this, a promising direction is to further scale test-time compute to unlock even greater reasoning capabilities. However, as we push these scaling boundaries, systematically understanding the practical limits and achieving optimal resource allocation becomes a critical challenge. In this paper, we investigate the scaling Pareto of test-time scaling and introduce the Test-Time Scaling Performance Model (TTSPM). We theoretically analyze two fundamental paradigms for such extended scaling, parallel scaling and sequential scaling, from a probabilistic modeling perspective. Our primary contribution is the derivation of the saturation point on the scaling budget for both strategies, identifying thresholds beyond which additional computation yields diminishing returns. Remarkably, despite their distinct mechanisms, both paradigms converge to a unified mathematical structure in their upper bounds. We empirically validate our theoretical findings on challenging reasoning benchmarks, including AIME, MATH-500, and GPQA, demonstrating the practical utility of these bounds for test-time resource allocation. We hope that this work provides insights into the cost-benefit trade-offs of test-time scaling, guiding the development of more resource-efficient inference strategies for large reasoning models.
nan
Article 1223
Title@2025-05-26 (1): Semi-Explicit Neural DAEs: Learning Long-Horizon Dynamical Systems with Algebraic Constraints
Title: Semi-Explicit Neural DAEs: Learning Long-Horizon Dynamical Systems with Algebraic Constraints | Halbexplizite neurale DAEs: Lernen von langhorizontigen dynamischen Systemen mit algebraischen Einschränkungen | 半显性神经DAEs:学习具有代数限制的长毛利区动态系统 2505.20515v1 |
Authors: Avik Pal, Alan Edelman, Christopher Rackauckas
Despite the promise of scientific machine learning (SciML) in combining data-driven techniques with mechanistic modeling, existing approaches for incorporating hard constraints in neural differential equations (NDEs) face significant limitations. Scalability issues and poor numerical properties prevent these neural models from being used for modeling physical systems with complicated conservation laws. We propose Manifold-Projected Neural ODEs (PNODEs), a method that explicitly enforces algebraic constraints by projecting each ODE step onto the constraint manifold. This framework arises naturally from semi-explicit differential-algebraic equations (DAEs), and includes both a robust iterative variant and a fast approximation requiring a single Jacobian factorization. We further demonstrate that prior works on relaxation methods are special cases of our approach. PNODEs consistently outperform baselines across six benchmark problems achieving a mean constraint violation error below $10^{-10}$. Additionally, PNODEs consistently achieve lower runtime compared to other methods for a given level of error tolerance. These results show that constraint projection offers a simple strategy for learning physically consistent long-horizon dynamics.
nan
Article 1224
Title@2025-05-26 (1): On a Neural Implementation of Brenier’s Polar Factorization
Title: On a Neural Implementation of Brenier’s Polar Factorization | Über eine neurale Umsetzung von Breniers Polarfaktorisierung | 布赖尼尔极地化的神经实施 2403.03071v4 |
Authors: Nina Vesseron, Marco Cuturi
In 1991, Brenier proved a theorem that generalizes the polar decomposition for square matrices – factored as PSD $\times$ unitary – to any vector field $F:\mathbb{R}^d\rightarrow \mathbb{R}^d$. The theorem, known as the polar factorization theorem, states that any field $F$ can be recovered as the composition of the gradient of a convex function $u$ with a measure-preserving map $M$, namely $F=\nabla u \circ M$. We propose a practical implementation of this far-reaching theoretical result, and explore possible uses within machine learning. The theorem is closely related to optimal transport (OT) theory, and we borrow from recent advances in the field of neural optimal transport to parameterize the potential $u$ as an input convex neural network. The map $M$ can be either evaluated pointwise using $u^$, the convex conjugate of $u$, through the identity $M=\nabla u^ \circ F$, or learned as an auxiliary network. Because $M$ is, in general, not injective, we consider the additional task of estimating the ill-posed inverse map that can approximate the pre-image measure $M^{-1}$ using a stochastic generator. We illustrate possible applications of Brenier’s polar factorization to non-convex optimization problems, as well as sampling of densities that are not log-concave.
nan
Article 1225
Title@2025-05-26 (1): A Novel Convolutional Neural Network-Based Framework for Complex Multiclass Brassica Seed Classification
Title: A Novel Convolutional Neural Network-Based Framework for Complex Multiclass Brassica Seed Classification | Ein neuartiges konvolutionäres neurales Netzwerk-basiertes Framework für die komplexe Klassifizierung von mehrstufigen Brassica-Samen | 复杂多级巴西种子种子分类新革命神经网络框架 2505.21558v1 |
Authors: Elhoucine Elfatimia, Recep Eryigitb, Lahcen Elfatimi
Agricultural research has accelerated in recent years, yet farmers often lack the time and resources for on-farm research due to the demands of crop production and farm operations. Seed classification offers valuable insights into quality control, production efficiency, and impurity detection. Early identification of seed types is critical to reducing the cost and risk associated with field emergence, which can lead to yield losses or disruptions in downstream processes like harvesting. Seed sampling supports growers in monitoring and managing seed quality, improving precision in determining seed purity levels, guiding management adjustments, and enhancing yield estimations. This study proposes a novel convolutional neural network (CNN)-based framework for the efficient classification of ten common Brassica seed types. The approach addresses the inherent challenge of texture similarity in seed images using a custom-designed CNN architecture. The model’s performance was evaluated against several pre-trained state-of-the-art architectures, with adjustments to layer configurations for optimized classification. Experimental results using our collected Brassica seed dataset demonstrate that the proposed model achieved a high accuracy rate of 93 percent.
nan
Article 1226
Title@2025-05-26 (1): Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures
Title: Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures | Beispiel und Karte aus einem einzigen Convex-Potential: Erzeugung mit konjugierenden Momenten | 单一汇合潜能的样本和地图:使用协同时间措施生成 2503.10576v2 |
Authors: Nina Vesseron, Louis Béthune, Marco Cuturi
The canonical approach in generative modeling is to split model fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an alternative route that ties sampling and mapping. We find inspiration in moment measures, a result that states that for any measure $\rho$, there exists a unique convex potential $u$ such that $\rho=\nabla u \sharp e^{-u}$. While this does seem to tie effectively sampling (from log-concave distribution $e^{-u}$) and action (pushing particles through $\nabla u$), we observe on simple examples (e.g., Gaussians or 1D distributions) that this choice is ill-suited for practical tasks. We study an alternative factorization, where $\rho$ is factorized as $\nabla w^\sharp e^{-w}$, where $w^$ is the convex conjugate of a convex potential $w$. We call this approach conjugate moment measures, and show far more intuitive results on these examples. Because $\nabla w^*$ is the Monge map between the log-concave distribution $e^{-w}$ and $\rho$, we rely on optimal transport solvers to propose an algorithm to recover $w$ from samples of $\rho$, and parameterize $w$ as an input-convex neural network. We also address the common sampling scenario in which the density of $\rho$ is known only up to a normalizing constant, and propose an algorithm to learn $w$ in this setting.
nan
Article 1227
Title@2025-05-26 (1): Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Title: Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review | Verkörperte KI mit Basismodellen für mobile Serviceroboter: Ein Systematischer Test | 与 “ 移动服务机器人:系统审查 “ 基金会模型 2505.20503v1 |
Authors: Matthew Lisondra, Beno Benhabib, Goldie Nejat
Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action Models have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, reason, and act through physical interactions, robots can improve understanding, adapt to, and execute complex tasks in dynamic real-world environments. However, embodied AI in mobile service robots continues to face key challenges, including multimodal sensor fusion, real-time decision-making under uncertainty, task generalization, and effective human-robot interactions (HRI). In this paper, we present the first systematic review of the integration of foundation models in mobile service robotics, identifying key open challenges in embodied AI and examining how foundation models can address them. Namely, we explore the role of such models in enabling real-time sensor fusion, language-conditioned control, and adaptive task execution. Furthermore, we discuss real-world applications in the domestic assistance, healthcare, and service automation sectors, demonstrating the transformative impact of foundation models on service robotics. We also include potential future research directions, emphasizing the need for predictive scaling laws, autonomous long-term adaptation, and cross-embodiment generalization to enable scalable, efficient, and robust deployment of foundation models in human-centric robotic systems.
nan
Article 1228
Title@2025-05-26 (1): Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification
Title: Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification | Erklären Sie: Evidenz-getriebene Vorhersagen für erklärbare Drogenziel-Identifikation | 寻求解释:对可解释药物目标识别的由证据驱动的预测 2402.04068v4 |
Authors: Ravi Patel, Angus Brayne, Rogier Hintzen, Daniel Jaroslawicz, Georgiana Neculae, Dane Corneil
Language models hold incredible promise for enabling scientific discovery by synthesizing massive research corpora. Many complex scientific research questions have multiple plausible answers, each supported by evidence of varying strength. However, existing language models lack the capability to quantitatively and faithfully compare answer plausibility in terms of supporting evidence. To address this, we introduce Retrieve to Explain (R2E), a retrieval-based model that scores and ranks all possible answers to a research question based on evidence retrieved from a document corpus. The architecture represents each answer only in terms of its supporting evidence, with the answer itself masked. This allows us to extend feature attribution methods such as Shapley values, to transparently attribute answer scores to supporting evidence at inference time. The architecture also allows incorporation of new evidence without retraining, including non-textual data modalities templated into natural language. We developed R2E for the challenging scientific discovery task of drug target identification, a human-in-the-loop process where failures are extremely costly and explainability paramount. When predicting whether drug targets will subsequently be confirmed as efficacious in clinical trials, R2E not only matches non-explainable literature-based models but also surpasses a genetics-based target identification approach used throughout the pharmaceutical industry.
nan
Article 1229
Title@2025-05-26 (1): CLEVRER-Humans: Describing Physical and Causal Events the Human Way
Title: CLEVRER-Humans: Describing Physical and Causal Events the Human Way | CLEVRER-Mensch: Physikalische und kausale Ereignisse auf menschliche Weise beschreiben | CLEVRER-人类:将自然和因果事件描述为人类道路 2310.03635v2 |
Authors: Jiayuan Mao, Xuelin Yang, Xikun Zhang, Noah D. Goodman, Jiajun Wu
Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.
nan
Article 1230
Title@2025-05-26 (1): Distributionally Robust Optimization
Title: Distributionally Robust Optimization | Verteilungsstarke Optimierung | 分布强力优化 2411.02549v3 |
Authors: Daniel Kuhn, Soroosh Shafiee, Wolfram Wiesemann
Distributionally robust optimization (DRO) studies decision problems under uncertainty where the probability distribution governing the uncertain problem parameters is itself uncertain. A key component of any DRO model is its ambiguity set, that is, a family of probability distributions consistent with any available structural or statistical information. DRO seeks decisions that perform best under the worst distribution in the ambiguity set. This worst case criterion is supported by findings in psychology and neuroscience, which indicate that many decision-makers have a low tolerance for distributional ambiguity. DRO is rooted in statistics, operations research and control theory, and recent research has uncovered its deep connections to regularization techniques and adversarial training in machine learning. This survey presents the key findings of the field in a unified and self-contained manner.
nan
Article 1231
Title@2025-05-26 (1): Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data
Title: Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data | Vermeiden Sie das Vergessen, indem Sie globale Wissensgradienten im Föderierten Lernen mit nicht-ID-Daten bewahren | 避免在使用非二二二维数据进行联邦学习时因保留全球知识进步而被遗忘 2505.20485v1 |
Authors: Abhijit Chunduru, Majid Morafah, Mahdi Morafah, Vishnu Pandi Chellapandi, Ang Li
The inevitable presence of data heterogeneity has made federated learning very challenging. There are numerous methods to deal with this issue, such as local regularization, better model fusion techniques, and data sharing. Though effective, they lack a deep understanding of how data heterogeneity can affect the global decision boundary. In this paper, we bridge this gap by performing an experimental analysis of the learned decision boundary using a toy example. Our observations are surprising: (1) we find that the existing methods suffer from forgetting and clients forget the global decision boundary and only learn the perfect local one, and (2) this happens regardless of the initial weights, and clients forget the global decision boundary even starting from pre-trained optimal weights. In this paper, we present FedProj, a federated learning framework that robustly learns the global decision boundary and avoids its forgetting during local training. To achieve better ensemble knowledge fusion, we design a novel server-side ensemble knowledge transfer loss to further calibrate the learned global decision boundary. To alleviate the issue of learned global decision boundary forgetting, we further propose leveraging an episodic memory of average ensemble logits on a public unlabeled dataset to regulate the gradient updates at each step of local training. Experimental results demonstrate that FedProj outperforms state-of-the-art methods by a large margin.
nan
Article 1232
Title@2025-05-26 (1): Towards Efficient Training of Graph Neural Networks: A Multiscale Approach
Title: Towards Efficient Training of Graph Neural Networks: A Multiscale Approach | Auf dem Weg zu einer effizienten Ausbildung von Graphen-Neuralen Netzwerken: Ein multiskaliger Ansatz | 争取对图形神经网络进行有效培训:一种多部门办法 2503.19666v3 |
Authors: Eshed Gal, Moshe Eliasof, Carola-Bibiane Schönlieb, Ivan I. Kyrchei, Eldad Haber, Eran Treister
Graph Neural Networks (GNNs) have become powerful tools for learning from graph-structured data, finding applications across diverse domains. However, as graph sizes and connectivity increase, standard GNN training methods face significant computational and memory challenges, limiting their scalability and efficiency. In this paper, we present a novel framework for efficient multiscale training of GNNs. Our approach leverages hierarchical graph representations and subgraphs, enabling the integration of information across multiple scales and resolutions. By utilizing coarser graph abstractions and subgraphs, each with fewer nodes and edges, we significantly reduce computational overhead during training. Building on this framework, we propose a suite of scalable training strategies, including coarse-to-fine learning, subgraph-to-full-graph transfer, and multiscale gradient computation. We also provide some theoretical analysis of our methods and demonstrate their effectiveness across various datasets and learning tasks. Our results show that multiscale training can substantially accelerate GNN training for large scale problems while maintaining, or even improving, predictive performance.
nan
Article 1233
Title@2025-05-26 (1): CardioPatternFormer: Pattern-Guided Attention for Interpretable ECG Classification with Transformer Architecture
Title: CardioPatternFormer: Pattern-Guided Attention for Interpretable ECG Classification with Transformer Architecture | CardioPatternFormer: Mustergeführte Aufmerksamkeit für die Interpretierbare EKG-Klassifikation mit Transformer-Architektur | 卡尔迪·皮德·皮德罗·弗德:对具有变形结构的可解释的ECG分类的典型引导关注 2505.20481v1 |
Authors: Berat Kutay Uğraş, Ömer Nezih Gerek, İbrahim Talha Saygı
Accurate ECG interpretation is vital, yet complex cardiac data and “black-box” AI models limit clinical utility. Inspired by Transformer architectures’ success in NLP for understanding sequential data, we frame ECG as the heart’s unique “language” of temporal patterns. We present CardioPatternFormer, a novel Transformer-based model for interpretable ECG classification. It employs a sophisticated attention mechanism to precisely identify and classify diverse cardiac patterns, excelling at discerning subtle anomalies and distinguishing multiple co-occurring conditions. This pattern-guided attention provides clear insights by highlighting influential signal regions, effectively allowing the “heart to talk” through transparent interpretations. CardioPatternFormer demonstrates robust performance on challenging ECGs, including complex multi-pathology cases. Its interpretability via attention maps enables clinicians to understand the model’s rationale, fostering trust and aiding informed diagnostic decisions. This work offers a powerful, transparent solution for advanced ECG analysis, paving the way for more reliable and clinically actionable AI in cardiology.
nan
Article 1234
Title@2025-05-26 (1): Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective
Title: Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective | Sparsamkeit für stichprobeneffizientes Preference-Lernen: Eine theoretische Perspektive | 利用差距促进抽样有效优先学习:理论视角 2501.18282v3 |
Authors: Yunzhen Yao, Lie He, Michael Gastpar
This paper considers the sample-efficiency of preference learning, which models and predicts human choices based on comparative judgments. The minimax optimal estimation error rate $\Theta(d/n)$ in classical estimation theory requires that the number of samples $n$ scales linearly with the dimensionality of the feature space $d$. However, the high dimensionality of the feature space and the high cost of collecting human-annotated data challenge the efficiency of traditional estimation methods. To remedy this, we leverage sparsity in the preference model and establish sharp error rates. We show that under the sparse random utility model, where the parameter of the reward function is $k$-sparse, the minimax optimal rate can be reduced to $\Theta(k/n \log(d/k))$. Furthermore, we analyze the $\ell_{1}$-regularized estimator and show that it achieves near-optimal rate under mild assumptions on the Gram matrix. Experiments on synthetic data and LLM alignment data validate our theoretical findings, showing that sparsity-aware methods significantly reduce sample complexity and improve prediction accuracy.
nan
Article 1235
Title@2025-05-26 (1): From learnable objects to learnable random objects
Title: From learnable objects to learnable random objects | Von lernbaren Objekten zu lernbaren zufälligen Objekten | 从可学习对象到可学习随机对象 2504.00847v2 |
Authors: Aaron Anderson, Michael Benedikt
We consider the relationship between learnability of a “base class” of functions on a set $X$, and learnability of a class of statistical functions derived from the base class. For example, we refine results showing that learnability of a family $h_p: p \in Y$ of functions implies learnability of the family of functions $h_\mu=\lambda p: Y. E_\mu(h_p)$, where $E_\mu$ is the expectation with respect to $\mu$, and $\mu$ ranges over probability distributions on $X$. We will look at both Probably Approximately Correct (PAC) learning, where example inputs and outputs are chosen at random, and online learning, where the examples are chosen adversarily. For agnostic learning, we establish improved bounds on the sample complexity of learning for statistical classes, stated in terms of combinatorial dimensions of the base class. We connect these problems to techniques introduced in model theory for “randomizing a structure”. We also provide counterexamples for realizable learning, in both the PAC and online settings.
nan
Article 1236
Title@2025-05-26 (1): Stochastic Preconditioning for Neural Field Optimization
Title: Stochastic Preconditioning for Neural Field Optimization | Stochastische Vorkonditionierung für die Neuralfeldoptimierung | 神经场优化的斯托克预设设备 2505.20473v1 |
Authors: Selena Ling, Merlin Nimier-David, Alec Jacobson, Nicholas Sharp
Neural fields are a highly effective representation across visual computing. This work observes that fitting these fields is greatly improved by incorporating spatial stochasticity during training, and that this simple technique can replace or even outperform custom-designed hierarchies and frequency space constructions. The approach is formalized as implicitly operating on a blurred version of the field, evaluated in-expectation by sampling with Gaussian-distributed offsets. Querying the blurred field during optimization greatly improves convergence and robustness, akin to the role of preconditioners in numerical linear algebra. This implicit, sampling-based perspective fits naturally into the neural field paradigm, comes at no additional cost, and is extremely simple to implement. We describe the basic theory of this technique, including details such as handling boundary conditions, and extending to a spatially-varying blur. Experiments demonstrate this approach on representations including coordinate MLPs, neural hashgrids, triplanes, and more, across tasks including surface reconstruction and radiance fields. In settings where custom-designed hierarchies have already been developed, stochastic preconditioning nearly matches or improves their performance with a simple and unified approach; in settings without existing hierarchies it provides an immediate boost to quality and robustness.
nan
Article 1237
Title@2025-05-26 (1): WeatherEdit: Controllable Weather Editing with 4D Gaussian Field
Title: WeatherEdit: Controllable Weather Editing with 4D Gaussian Field | WeatherEdit: Kontrollierbare Wetterbearbeitung mit 4D Gaussian Field | 气象编辑: 4D Gaussian 字段的可控天气编辑 2505.20471v1 |
Authors: Chenghao Qian, Wenjing Li, Yuhu Guo, Gustav Markkula
In this work, we present WeatherEdit, a novel weather editing pipeline for generating realistic weather effects with controllable types and severity in 3D scenes. Our approach is structured into two key components: weather background editing and weather particle construction. For weather background editing, we introduce an all-in-one adapter that integrates multiple weather styles into a single pretrained diffusion model, enabling the generation of diverse weather effects in 2D image backgrounds. During inference, we design a Temporal-View (TV-) attention mechanism that follows a specific order to aggregate temporal and spatial information, ensuring consistent editing across multi-frame and multi-view images. To construct the weather particles, we first reconstruct a 3D scene using the edited images and then introduce a dynamic 4D Gaussian field to generate snowflakes, raindrops and fog in the scene. The attributes and dynamics of these particles are precisely controlled through physical-based modelling and simulation, ensuring realistic weather representation and flexible severity adjustments. Finally, we integrate the 4D Gaussian field with the 3D scene to render consistent and highly realistic weather effects. Experiments on multiple driving datasets demonstrate that WeatherEdit can generate diverse weather effects with controllable condition severity, highlighting its potential for autonomous driving simulation in adverse weather. See project page: https://jumponthemoon.github.io/w-edit
nan
Article 1238
Title@2025-05-26 (1): Recursive Deep Inverse Reinforcement Learning
Title: Recursive Deep Inverse Reinforcement Learning | Rekursives tiefes Inverse-Verstärkung-Lernen | 递归深反向强化学习 2504.13241v4 |
Authors: Paul Ghanem, Owen Howell, Michael Potter, Pau Closas, Alireza Ramezani, Deniz Erdogmus, Tales Imbiriba
Inferring an adversary’s goals from exhibited behavior is crucial for counterplanning and non-cooperative multi-agent systems in domains like cybersecurity, military, and strategy games. Deep Inverse Reinforcement Learning (IRL) methods based on maximum entropy principles show promise in recovering adversaries’ goals but are typically offline, require large batch sizes with gradient descent, and rely on first-order updates, limiting their applicability in real-time scenarios. We propose an online Recursive Deep Inverse Reinforcement Learning (RDIRL) approach to recover the cost function governing the adversary actions and goals. Specifically, we minimize an upper bound on the standard Guided Cost Learning (GCL) objective using sequential second-order Newton updates, akin to the Extended Kalman Filter (EKF), leading to a fast (in terms of convergence) learning algorithm. We demonstrate that RDIRL is able to recover cost and reward functions of expert agents in standard and adversarial benchmark tasks. Experiments on benchmark tasks show that our proposed approach outperforms several leading IRL algorithms.
nan
Article 1239
Title@2025-05-26 (1): Learning with Expected Signatures: Theory and Applications
Title: Learning with Expected Signatures: Theory and Applications | Lernen mit erwarteten Signaturen: Theorie und Anwendungen | 学习与预期签名:理论和应用 2505.20465v1 |
Authors: Lorenzo Lucchese, Mikko S. Pakkanen, Almut E. D. Veraart
The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This “model-free” embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML) algorithms for time series and sequential data. The convergence results proved in this paper bridge the gap between the expected signature’s empirical discrete-time estimator and its theoretical continuous-time value, allowing for a more complete probabilistic interpretation of expected signature-based ML methods. Moreover, when the data generating process is a martingale, we suggest a simple modification of the expected signature estimator with significantly lower mean squared error and empirically demonstrate how it can be effectively applied to improve predictive performance.
nan
Article 1240
Title@2025-05-26 (1): Federated Learning-Distillation Alternation for Resource-Constrained IoT
Title: Federated Learning-Distillation Alternation for Resource-Constrained IoT | Federated Learning-Destillation Alternative für ressourcengebundenes IoT | 资源培训型IOT 资源培训型IOT替代物 2505.20456v1 |
Authors: Rafael Valente da Silva, Onel L. Alcaraz López, Richard Demo Souza
Federated learning (FL) faces significant challenges in Internet of Things (IoT) networks due to device limitations in energy and communication resources, especially when considering the large size of FL models. From an energy perspective, the challenge is aggravated if devices rely on energy harvesting (EH), as energy availability can vary significantly over time, influencing the average number of participating users in each iteration. Additionally, the transmission of large model updates is more susceptible to interference from uncorrelated background traffic in shared wireless environments. As an alternative, federated distillation (FD) reduces communication overhead and energy consumption by transmitting local model outputs, which are typically much smaller than the entire model used in FL. However, this comes at the cost of reduced model accuracy. Therefore, in this paper, we propose FL-distillation alternation (FLDA). In FLDA, devices alternate between FD and FL phases, balancing model information with lower communication overhead and energy consumption per iteration. We consider a multichannel slotted-ALOHA EH-IoT network subject to background traffic/interference. In such a scenario, FLDA demonstrates higher model accuracy than both FL and FD, and achieves faster convergence than FL. Moreover, FLDA achieves target accuracies saving up to 98% in energy consumption, while also being less sensitive to interference, both relative to FL.
nan
Article 1241
Title@2025-05-26 (1): Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection
Title: Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection | Skalierungsgesetze für das Vergessen beim Finetuning mit Vorschulungs-Dateninjektion | 调整前数据输入时遗忘法律的扩大范围 2502.06042v2 |
Authors: Louis Bethune, David Grangier, Dan Busbridge, Eleonora Gualdoni, Marco Cuturi, Pierre Ablin
A widespread strategy to obtain a language model that performs well on a target domain is to finetune a pretrained model to perform unsupervised next-token prediction on data from that target domain. Finetuning presents two challenges: (i) if the amount of target data is limited, as in most practical applications, the model will quickly overfit, and (ii) the model will drift away from the original model, forgetting the pretraining data and the generic knowledge that comes with it. We aim to derive scaling laws that quantify these two phenomena for various target domains, amounts of available target data, and model scales. We measure the efficiency of injecting pretraining data into the finetuning data mixture to avoid forgetting and mitigate overfitting. A key practical takeaway from our study is that injecting as little as 1% of pretraining data in the finetuning data mixture prevents the model from forgetting the pretraining set.
nan
Article 1242
Title@2025-05-26 (1): BlastOFormer: Attention and Neural Operator Deep Learning Methods for Explosive Blast Prediction
Title: BlastOFormer: Attention and Neural Operator Deep Learning Methods for Explosive Blast Prediction | BlastOFormer: Aufmerksamkeit und neuraler Operator Deep Learning Methoden zur explosiven Blast-Vorhersage | BLastO Former: 爆炸性爆炸预测的注意和神经操作员深学习方法 2505.20454v1 |
Authors: Reid Graves, Anthony Zhou, Amir Barati Farimani
Accurate prediction of blast pressure fields is essential for applications in structural safety, defense planning, and hazard mitigation. Traditional methods such as empirical models and computational fluid dynamics (CFD) simulations offer limited trade offs between speed and accuracy; empirical models fail to capture complex interactions in cluttered environments, while CFD simulations are computationally expensive and time consuming. In this work, we introduce BlastOFormer, a novel Transformer based surrogate model for full field maximum pressure prediction from arbitrary obstacle and charge configurations. BlastOFormer leverages a signed distance function (SDF) encoding and a grid to grid attention based architecture inspired by OFormer and Vision Transformer (ViT) frameworks. Trained on a dataset generated using the open source blastFoam CFD solver, our model outperforms convolutional neural networks (CNNs) and Fourier Neural Operators (FNOs) across both log transformed and unscaled domains. Quantitatively, BlastOFormer achieves the highest R2 score (0.9516) and lowest error metrics, while requiring only 6.4 milliseconds for inference, more than 600,000 times faster than CFD simulations. Qualitative visualizations and error analyses further confirm BlastOFormer’s superior spatial coherence and generalization capabilities. These results highlight its potential as a real time alternative to conventional CFD approaches for blast pressure estimation in complex environments.
nan
Article 1243
Title@2025-05-26 (1): Active Learning for Multiple Change Point Detection in Non-stationary Time Series with Deep Gaussian Processes
Title: Active Learning for Multiple Change Point Detection in Non-stationary Time Series with Deep Gaussian Processes | Aktives Lernen für Multiple Change Point Detection in nicht-stationären Zeitreihen mit tiefen Gauß-Prozessen | 与深高斯进程一起在非静止时间序列中进行多变点探测活动学习 2505.20452v1 |
Authors: Hao Zhao, Rong Pan
Multiple change point (MCP) detection in non-stationary time series is challenging due to the variety of underlying patterns. To address these challenges, we propose a novel algorithm that integrates Active Learning (AL) with Deep Gaussian Processes (DGPs) for robust MCP detection. Our method leverages spectral analysis to identify potential changes and employs AL to strategically select new sampling points for improved efficiency. By incorporating the modeling flexibility of DGPs with the change-identification capabilities of spectral methods, our approach adapts to diverse spectral change behaviors and effectively localizes multiple change points. Experiments on both simulated and real-world data demonstrate that our method outperforms existing techniques in terms of detection accuracy and sampling efficiency for non-stationary time series.
nan
Article 1244
Title@2025-05-26 (1): Symmetry constrained neural networks for detection and localization of damage in metal plates
Title: Symmetry constrained neural networks for detection and localization of damage in metal plates | Symmetrie eingeschränkte neuronale Netze zur Erkennung und Lokalisierung von Schäden in Metallplatten | 用于金属板块损害探测和定位的对称约束神经网络 2409.06084v3 |
Authors: James Amarel, Christopher Rudolf, Athanasios Iliopoulos, John Michopoulos, Leslie N. Smith
The present paper is concerned with deep learning techniques applied to detection and localization of damage in a thin aluminum plate. We used data collected on a tabletop apparatus by mounting to the plate four piezoelectric transducers, each of which took turn to generate a Lamb wave that then traversed the region of interest before being received by the remaining three sensors. On training a neural network to analyze time-series data of the material response, which displayed damage-reflective features whenever the plate guided waves interacted with a contact load, we achieved a model that detected with greater than $99\%$ accuracy in addition to a model that localized with $2.58 \pm 0.12$ mm mean distance error. For each task, the best-performing model was designed according to the inductive bias that our transducers were both similar and arranged in a square pattern on a nearly uniform plate.
nan
Article 1245
Title@2025-05-26 (1): Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach
Title: Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach | Zeitreihenerstellung unter Datenknappheit: Ein einheitlicher generativer Modellierungsansatz | 数据缺乏情况下的时间序列生成:统一生成模式方法 2505.20446v1 |
Authors: Tal Gonen, Itai Pemper, Ilan Naiman, Nimrod Berman, Omri Azencot
Generative modeling of time series is a central challenge in time series analysis, particularly under data-scarce conditions. Despite recent advances in generative modeling, a comprehensive understanding of how state-of-the-art generative models perform under limited supervision remains lacking. In this work, we conduct the first large-scale study evaluating leading generative models in data-scarce settings, revealing a substantial performance gap between full-data and data-scarce regimes. To close this gap, we propose a unified diffusion-based generative framework that can synthesize high-fidelity time series across diverse domains using just a few examples. Our model is pre-trained on a large, heterogeneous collection of time series datasets, enabling it to learn generalizable temporal representations. It further incorporates architectural innovations such as dynamic convolutional layers for flexible channel adaptation and dataset token conditioning for domain-aware generation. Without requiring abundant supervision, our unified model achieves state-of-the-art performance in few-shot settings-outperforming domain-specific baselines across a wide range of subset sizes. Remarkably, it also surpasses all baselines even when tested on full datasets benchmarks, highlighting the strength of pre-training and cross-domain generalization. We hope this work encourages the community to revisit few-shot generative modeling as a key problem in time series research and pursue unified solutions that scale efficiently across domains. Code is available at https://github.com/azencot-group/ImagenFew.
nan
Article 1246
Title@2025-05-26 (1): HoPE: Hybrid of Position Embedding for Length Generalization in Vision-Language Models
Title: HoPE: Hybrid of Position Embedding for Length Generalization in Vision-Language Models | HoPE: Hybrid der Positionseinbettung für die Längenverallgemeinerung in Vision-Language-Modelle | HoPE:愿景-语言模型中长期通用化所嵌入的立场组合 2505.20444v1 |
Authors: Haoran Li, Yingjie Qin, Baoyuan Ou, Lai Xu, Ruiwen Xu
Vision-Language Models (VLMs) have made significant progress in multimodal tasks. However, their performance often deteriorates in long-context scenarios, particularly long videos. While Rotary Position Embedding (RoPE) has been widely adopted for length generalization in Large Language Models (LLMs), extending vanilla RoPE to capture the intricate spatial-temporal dependencies in videos remains an unsolved challenge. Existing methods typically allocate different frequencies within RoPE to encode 3D positional information. However, these allocation strategies mainly rely on heuristics, lacking in-depth theoretical analysis. In this paper, we first study how different allocation strategies impact the long-context capabilities of VLMs. Our analysis reveals that current multimodal RoPEs fail to reliably capture semantic similarities over extended contexts. To address this issue, we propose HoPE, a Hybrid of Position Embedding designed to improve the long-context capabilities of VLMs. HoPE introduces a hybrid frequency allocation strategy for reliable semantic modeling over arbitrarily long context, and a dynamic temporal scaling mechanism to facilitate robust learning and flexible inference across diverse context lengths. Extensive experiments across four video benchmarks on long video understanding and retrieval tasks demonstrate that HoPE consistently outperforms existing methods, confirming its effectiveness. Code is available at https://github.com/hrlics/HoPE.
nan
Article 1247
Title@2025-05-26 (1): AI Learning Algorithms: Deep Learning, Hybrid Models, and Large-Scale Model Integration
Title: AI Learning Algorithms: Deep Learning, Hybrid Models, and Large-Scale Model Integration | KI-Learning-Algorithmen: Deep Learning, hybride Modelle und großformatige Modellintegration | AI 学习等级:深学习、混合模型和大型模型整合 2410.09186v3 |
Authors: Noorbakhsh Amiri Golilarz, Elias Hossain, Abdoljalil Addeh, Keyan Alexander Rahimi
In this paper, we discuss learning algorithms and their importance in different types of applications which includes training to identify important patterns and features in a straightforward, easy-to-understand manner. We will review the main concepts of artificial intelligence (AI), machine learning (ML), deep learning (DL), and hybrid models. Some important subsets of Machine Learning algorithms such as supervised, unsupervised, and reinforcement learning are also discussed in this paper. These techniques can be used for some important tasks like prediction, classification, and segmentation. Convolutional Neural Networks (CNNs) are used for image and video processing and many more applications. We dive into the architecture of CNNs and how to integrate CNNs with ML algorithms to build hybrid models. This paper explores the vulnerability of learning algorithms to noise, leading to misclassification. We further discuss the integration of learning algorithms with Large Language Models (LLM) to generate coherent responses applicable to many domains such as healthcare, marketing, and finance by learning important patterns from large volumes of data. Furthermore, we discuss the next generation of learning algorithms and how we may have an unified Adaptive and Dynamic Network to perform important tasks. Overall, this article provides brief overview of learning algorithms, exploring their current state, applications and future direction.
nan
Article 1248
Title@2025-05-26 (1): Holes in Latent Space: Topological Signatures Under Adversarial Influence
Title: Holes in Latent Space: Topological Signatures Under Adversarial Influence | Löcher im latenten Raum: Topologische Signaturen unter dem Einfluss von Adversarien | 低空空洞:在对立影响下的地形签名 2505.20435v1 |
Authors: Aideen Fay, Inés García-Redondo, Qiquan Wang, Haim Dubossarsky, Anthea Monod
Understanding how adversarial conditions affect language models requires techniques that capture both global structure and local detail within high-dimensional activation spaces. We propose persistent homology (PH), a tool from topological data analysis, to systematically characterize multiscale latent space dynamics in LLMs under two distinct attack modes – backdoor fine-tuning and indirect prompt injection. By analyzing six state-of-the-art LLMs, we show that adversarial conditions consistently compress latent topologies, reducing structural diversity at smaller scales while amplifying dominant features at coarser ones. These topological signatures are statistically robust across layers, architectures, model sizes, and align with the emergence of adversarial effects deeper in the network. To capture finer-grained mechanisms underlying these shifts, we introduce a neuron-level PH framework that quantifies how information flows and transforms within and across layers. Together, our findings demonstrate that PH offers a principled and unifying approach to interpreting representational dynamics in LLMs, particularly under distributional shift.
nan
Article 1249
Title@2025-05-26 (1): Kernel Quantile Embeddings and Associated Probability Metrics
Title: Kernel Quantile Embeddings and Associated Probability Metrics | Kernel-Quantile-Embeddings und zugehörige Wahrscheinlichkeits-Metriken | 内核量量嵌入器及相关概率 2505.20433v1 |
Authors: Masha Naslidnyk, Siu Lun Chau, François-Xavier Briol, Krikamol Muandet
Embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) has enabled powerful nonparametric methods such as the maximum mean discrepancy (MMD), a statistical distance with strong theoretical and computational properties. At its core, the MMD relies on kernel mean embeddings to represent distributions as mean functions in RKHS. However, it remains unclear if the mean function is the only meaningful RKHS representation. Inspired by generalised quantiles, we introduce the notion of kernel quantile embeddings (KQEs). We then use KQEs to construct a family of distances that: (i) are probability metrics under weaker kernel conditions than MMD; (ii) recover a kernelised form of the sliced Wasserstein distance; and (iii) can be efficiently estimated with near-linear cost. Through hypothesis testing, we show that these distances offer a competitive alternative to MMD and its fast approximations.
nan
Article 1250
Title@2025-05-26 (1): Differentiable Quadratic Optimization For The Maximum Independent Set Problem
Title: Differentiable Quadratic Optimization For The Maximum Independent Set Problem | Unterschiedliche quadratische Optimierung für das maximale unabhängige Set-Problem | 最大独立集集问题可区别的二次二次曲线优化 2406.19532v6 |
Authors: Ismail Alkhouri, Cedric Le Denmat, Yingjie Li, Cunxi Yu, Jia Liu, Rongrong Wang, Alvaro Velasquez
Combinatorial Optimization (CO) addresses many important problems, including the challenging Maximum Independent Set (MIS) problem. Alongside exact and heuristic solvers, differentiable approaches have emerged, often using continuous relaxations of ReLU-based or quadratic objectives. Noting that an MIS in a graph is a Maximum Clique (MC) in its complement, we propose a new quadratic formulation for MIS by incorporating an MC term, improving convergence and exploration. We show that every maximal independent set corresponds to a local minimizer, derive conditions with respect to the MIS size, and characterize stationary points. To tackle the non-convexity of the objective, we propose optimizing several initializations in parallel using momentum-based gradient descent, complemented by an efficient MIS checking criterion derived from our theory. We dub our method as parallelized Clique-Informed Quadratic Optimization for MIS (pCQO-MIS). Our experimental results demonstrate the effectiveness of the proposed method compared to exact, heuristic, sampling, and data-centric approaches. Notably, our method avoids the out-of-distribution tuning and reliance on (un)labeled data required by data-centric methods, while achieving superior MIS sizes and competitive runtime relative to their inference time. Additionally, a key advantage of pCQO-MIS is that, unlike exact and heuristic solvers, the runtime scales only with the number of nodes in the graph, not the number of edges. Our code is available at the GitHub repository: https://github.com/ledenmat/pCQO-mis-benchmark/tree/refactor.
nan
Article 1251
Title@2025-05-26 (1): Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?
Title: Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? | Selbstreflektierende Unsicherheiten: Kennen LLMs ihre interne Antwortverteilung? | 自我反感的不确定性:LLMs知道他们的内部答案分布吗? 2505.20295v1 |
Authors: Michael Kirchhof, Luca Füger, Adam Goliński, Eeshan Gunesh Dhekane, Arno Blaas, Sinead Williamson
To reveal when a large language model (LLM) is uncertain about a response, uncertainty quantification commonly produces percentage numbers along with the output. But is this all we can do? We argue that in the output space of LLMs, the space of strings, exist strings expressive enough to summarize the distribution over output strings the LLM deems possible. We lay a foundation for this new avenue of uncertainty explication and present SelfReflect, a theoretically-motivated metric to assess how faithfully a string summarizes an LLM’s internal answer distribution. We show that SelfReflect is able to discriminate even subtle differences of candidate summary strings and that it aligns with human judgement, outperforming alternative metrics such as LLM judges and embedding comparisons. With SelfReflect, we investigate a number of self-summarization methods and find that even state-of-the-art reasoning models struggle to explicate their internal uncertainty. But we find that faithful summarizations can be generated by sampling and summarizing. Our metric enables future works towards this universal form of LLM uncertainties.
nan
Article 1252
Title@2025-05-26 (1): Reasoning LLMs are Wandering Solution Explorers
Title: Reasoning LLMs are Wandering Solution Explorers | Grundlegende LLMs sind wandernde Lösungs-Explorer | 理据LLMs是游荡的解决方案探索者 2505.20296v1 |
Authors: Jiahao Lu, Ziwei Xu, Mohan Kankanhalli
Large Language Models (LLMs) have demonstrated impressive reasoning abilities through test-time computation (TTC) techniques such as chain-of-thought prompting and tree-based reasoning. However, we argue that current reasoning LLMs (RLLMs) lack the ability to systematically explore the solution space. This paper formalizes what constitutes systematic problem solving and identifies common failure modes that reveal reasoning LLMs to be wanderers rather than systematic explorers. Through qualitative and quantitative analysis across multiple state-of-the-art LLMs, we uncover persistent issues: invalid reasoning steps, redundant explorations, hallucinated or unfaithful conclusions, and so on. Our findings suggest that current models’ performance can appear to be competent on simple tasks yet degrade sharply as complexity increases. Based on the findings, we advocate for new metrics and tools that evaluate not just final outputs but the structure of the reasoning process itself.
nan
Article 1253
Title@2025-05-26 (1): Lorentz Local Canonicalization: How to Make Any Network Lorentz-Equivariant
Title: Lorentz Local Canonicalization: How to Make Any Network Lorentz-Equivariant | Lorentz lokale Canonicalization: Wie man jedes Netzwerk Lorentz-Equivariant | Lorentz 本地 Canonicalization : 如何制造任何网络 Lorentz- Equivalication 2505.20280v1 |
Authors: Jonas Spinner, Luigi Favaro, Peter Lippmann, Sebastian Pitz, Gerrit Gerhartz, Tilman Plehn, Fred A. Hamprecht
Lorentz-equivariant neural networks are becoming the leading architectures for high-energy physics. Current implementations rely on specialized layers, limiting architectural choices. We introduce Lorentz Local Canonicalization (LLoCa), a general framework that renders any backbone network exactly Lorentz-equivariant. Using equivariantly predicted local reference frames, we construct LLoCa-transformers and graph networks. We adapt a recent approach to geometric message passing to the non-compact Lorentz group, allowing propagation of space-time tensorial features. Data augmentation emerges from LLoCa as a special choice of reference frame. Our models surpass state-of-the-art accuracy on relevant particle physics tasks, while being $4\times$ faster and using $5$-$100\times$ fewer FLOPs.
nan
Article 1254
Title@2025-05-26 (1): Solving Hidden Monotone Variational Inequalities with Surrogate Losses
Title: Solving Hidden Monotone Variational Inequalities with Surrogate Losses | Lösen versteckter monotoner Variationsungleichheiten mit Surrogatverlusten | 解决与代谢损失的隐藏单式单体差异性不平等 2411.05228v3 |
Authors: Ryan D’Orazio, Danilo Vucetic, Zichu Liu, Junhyung Lyle Kim, Ioannis Mitliagkas, Gauthier Gidel
Deep learning has proven to be effective in a wide variety of loss minimization problems. However, many applications of interest, like minimizing projected Bellman error and min-max optimization, cannot be modelled as minimizing a scalar loss function but instead correspond to solving a variational inequality (VI) problem. This difference in setting has caused many practical challenges as naive gradient-based approaches from supervised learning tend to diverge and cycle in the VI case. In this work, we propose a principled surrogate-based approach compatible with deep learning to solve VIs. We show that our surrogate-based approach has three main benefits: (1) under assumptions that are realistic in practice (when hidden monotone structure is present, interpolation, and sufficient optimization of the surrogates), it guarantees convergence, (2) it provides a unifying perspective of existing methods, and (3) is amenable to existing deep learning optimizers like ADAM. Experimentally, we demonstrate our surrogate-based approach is effective in min-max optimization and minimizing projected Bellman error. Furthermore, in the deep reinforcement learning case, we propose a novel variant of TD(0) which is more compute and sample efficient.
nan
Article 1255
Title@2025-05-26 (1): The Coverage Principle: A Framework for Understanding Compositional Generalization
Title: The Coverage Principle: A Framework for Understanding Compositional Generalization | Das Coverage-Prinzip: Ein Rahmen für das Verständnis der kompositorischen Verallgemeinerung | 覆盖范围原则:理解普遍组成框架 2505.20278v1 |
Authors: Hoyeon Chang, Jinho Park, Hanseul Cho, Sohee Yang, Miyoung Ko, Hyeonbin Hwang, Seungpil Won, Dohaeng Lee, Youbin Ahn, Minjoon Seo
Large language models excel at pattern matching, yet often fall short in systematic compositional generalization. We propose the coverage principle: a data-centric framework showing that models relying primarily on pattern matching for compositional tasks cannot reliably generalize beyond substituting fragments that yield identical results when used in the same contexts. We demonstrate that this framework has a strong predictive power for the generalization capabilities of Transformers. First, we derive and empirically confirm that the training data required for two-hop generalization grows at least quadratically with the token set size, and the training data efficiency does not improve with 20x parameter scaling. Second, for compositional tasks with path ambiguity where one variable affects the output through multiple computational paths, we show that Transformers learn context-dependent state representations that undermine both performance and interoperability. Third, Chain-of-Thought supervision improves training data efficiency for multi-hop tasks but still struggles with path ambiguity. Finally, we outline a \emph{mechanism-based} taxonomy that distinguishes three ways neural networks can generalize: structure-based (bounded by coverage), property-based (leveraging algebraic invariances), and shared-operator (through function reuse). This conceptual lens contextualizes our results and highlights where new architectural ideas are needed to achieve systematic compositionally. Overall, the coverage principle provides a unified lens for understanding compositional reasoning, and underscores the need for fundamental architectural or training innovations to achieve truly systematic compositionality.
nan
Article 1256
Title@2025-05-26 (1): Probabilistic Kernel Function for Fast Angle Testing
Title: Probabilistic Kernel Function for Fast Angle Testing | Probabilistische Kernel-Funktion für schnelle Winkelprüfung | 用于快速角测试的概率内核函数 2505.20274v1 |
Authors: Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa
In this paper, we study the angle testing problem in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and employs a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be both theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We further apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5X ~ 3X higher query-per-second (QPS) throughput compared to the state-of-the-art graph-based search algorithm HNSW.
nan
Article 1257
Title@2025-05-26 (1): Comparing Neural Network Encodings for Logic-based Explainability
Title: Comparing Neural Network Encodings for Logic-based Explainability | Vergleich von Neural Network Encodings für Logic-basierte Erklärbarkeit | 比较基于逻辑的解释性神经网络编码 2505.20269v1 |
Authors: Levi Cordeiro Carvalho, Saulo A. F. Oliveira, Thiago Alves Rocha
Providing explanations for the outputs of artificial neural networks (ANNs) is crucial in many contexts, such as critical systems, data protection laws and handling adversarial examples. Logic-based methods can offer explanations with correctness guarantees, but face scalability challenges. Due to these issues, it is necessary to compare different encodings of ANNs into logical constraints, which are used in logic-based explainability. This work compares two encodings of ANNs: one has been used in the literature to provide explanations, while the other will be adapted for our context of explainability. Additionally, the second encoding uses fewer variables and constraints, thus, potentially enhancing efficiency. Experiments showed similar running times for computing explanations, but the adapted encoding performed up to 18\% better in building logical constraints and up to 16\% better in overall time.
nan
Article 1258
Title@2025-05-26 (1): Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Title: Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits | Ergebnisbasiertes Online-Verstärkungslernen: Algorithmen und grundlegende Grenzen | 基于成果的在线强化学习:等级和基本限制 2505.20268v1 |
Authors: Fan Chen, Zeyu Jia, Alexander Rakhlin, Tengyang Xie
Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving $\widetilde{O}({C_{\rm cov} H^3}/{\epsilon^2})$ sample complexity, where $C_{\rm cov}$ is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential separation for certain MDPs. For deterministic MDPs, we show how to eliminate the completeness assumption, dramatically simplifying the algorithm. We further extend our approach to preference-based feedback settings, proving that equivalent statistical efficiency can be achieved even under more limited information. Together, these results constitute a theoretical foundation for understanding the statistical properties of outcome-based reinforcement learning.
nan
Article 1259
Title@2025-05-26 (1): syftr: Pareto-Optimal Generative AI
Title: syftr: Pareto-Optimal Generative AI | syftr: Pareto-Optimal Generative KI | Syftr: Pareto- Opmatimal 生成 AI 2505.20266v1 |
Authors: Alexander Conway, Debadeepta Dey, Stefan Hackmann, Matthew Hausknecht, Michael Schmidt, Mark Steadman, Nick Volynets
Retrieval-Augmented Generation (RAG) pipelines are central to applying large language models (LLMs) to proprietary or dynamic data. However, building effective RAG flows is complex, requiring careful selection among vector databases, embedding models, text splitters, retrievers, and synthesizing LLMs. The challenge deepens with the rise of agentic paradigms. Modules like verifiers, rewriters, and rerankers-each with intricate hyperparameter dependencies have to be carefully tuned. Balancing tradeoffs between latency, accuracy, and cost becomes increasingly difficult in performance-sensitive applications. We introduce syftr, a framework that performs efficient multi-objective search over a broad space of agentic and non-agentic RAG configurations. Using Bayesian Optimization, syftr discovers Pareto-optimal flows that jointly optimize task accuracy and cost. A novel early-stopping mechanism further improves efficiency by pruning clearly suboptimal candidates. Across multiple RAG benchmarks, syftr finds flows which are on average approximately 9 times cheaper while preserving most of the accuracy of the most accurate flows on the Pareto-frontier. Furthermore, syftr’s ability to design and optimize allows integrating new modules, making it even easier and faster to realize high-performing generative AI pipelines.
nan
Article 1260
Title@2025-05-26 (1): Lifelong Safety Alignment for Language Models
Title: Lifelong Safety Alignment for Language Models | Lebenslange Sicherheitsausrichtung für Sprachmodelle | 语言模型终身安全比对 2505.20259v1 |
Authors: Haoyu Wang, Zeyu Qin, Yifei Zhao, Chao Du, Min Lin, Xueqian Wang, Tianyu Pang
LLMs have made impressive progress, but their growing capabilities also expose them to highly flexible jailbreaking attacks designed to bypass safety alignment. While many existing defenses focus on known types of attacks, it is more critical to prepare LLMs for unseen attacks that may arise during deployment. To address this, we propose a lifelong safety alignment framework that enables LLMs to continuously adapt to new and evolving jailbreaking strategies. Our framework introduces a competitive setup between two components: a Meta-Attacker, trained to actively discover novel jailbreaking strategies, and a Defender, trained to resist them. To effectively warm up the Meta-Attacker, we first leverage the GPT-4o API to extract key insights from a large collection of jailbreak-related research papers. Through iterative training, the first iteration Meta-Attacker achieves a 73% attack success rate (ASR) on RR and a 57% transfer ASR on LAT using only single-turn attacks. Meanwhile, the Defender progressively improves its robustness and ultimately reduces the Meta-Attacker’s success rate to just 7%, enabling safer and more reliable deployment of LLMs in open-ended environments. The code is available at https://github.com/sail-sg/LifelongSafetyAlignment.
nan
Article 1261
Title@2025-05-26 (1): GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining
Title: GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining | GRAPE: Optimierung der Datenmischung für ein robustes Multi-Target-Adaptives Vortraining | GRAPE: 优化集体强力多目标适应性预备培训的数据混合 2505.20380v1 |
Authors: Simin Fan, Maria Ios Glarou, Martin Jaggi
The performance of large language models (LLMs) across diverse downstream applications is fundamentally governed by the quality and composition of their pretraining corpora. Existing domain reweighting algorithms primarily optimize data mixtures for a single target task, thereby resulting in models that overfit to specialized objectives while exhibiting substantial performance degradation on other benchmarks. This paper introduces Group Robust Multi-target Adaptive PrEtraining (GRAPE), a novel multi-source-multi-target domain reweighting framework designed to calibrate pretraining data mixtures for robust performance across multiple target tasks simultaneously. GRAPE dynamically adjusts sampling weights across source domains (domain weights) while concurrently modulating task weights that quantify the relative importance of each individual target task. This adaptive process prioritizes tasks based on their learning difficulty throughout training. We formulate this interleaved reweighting mechanism as a minimax optimization problem: The inner maximization adjusts task weights leveraging group distributed-robust-optimization (DRO), where those tasks demonstrating the least improvement under the current data mixture are prioritized with higher weights; The outer minimization then optimizes domain weights to maximize loss reduction on the prioritized tasks. Experiments on ClimbLab and SlimPajama datasets demonstrate that GRAPE consistently outperforms baseline methods in terms of reasoning performance across 6 benchmarks. Furthermore, when applied to multilingual targets, GRAPE effectively identifies optimal training mixtures from mainstream languages, achieving superior language modeling capabilities across 8 low-resource target languages.
nan
Article 1262
Title@2025-05-26 (1): Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs
Title: Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs | Position: Mechanische Dolmetschbarkeit sollte Feature-Konsistenz in SAEs priorisieren | 位置: 机械可解释性:应优先考虑高级专业环境评估中的地物一致性 2505.20254v1 |
Authors: Xiangchen Song, Aashiq Muhamed, Yujia Zheng, Lingjing Kong, Zeyu Tang, Mona T. Diab, Virginia Smith, Kun Zhang
Sparse Autoencoders (SAEs) are a prominent tool in mechanistic interpretability (MI) for decomposing neural network activations into interpretable features. However, the aspiration to identify a canonical set of features is challenged by the observed inconsistency of learned SAE features across different training runs, undermining the reliability and efficiency of MI research. This position paper argues that mechanistic interpretability should prioritize feature consistency in SAEs – the reliable convergence to equivalent feature sets across independent runs. We propose using the Pairwise Dictionary Mean Correlation Coefficient (PW-MCC) as a practical metric to operationalize consistency and demonstrate that high levels are achievable (0.80 for TopK SAEs on LLM activations) with appropriate architectural choices. Our contributions include detailing the benefits of prioritizing consistency; providing theoretical grounding and synthetic validation using a model organism, which verifies PW-MCC as a reliable proxy for ground-truth recovery; and extending these findings to real-world LLM data, where high feature consistency strongly correlates with the semantic similarity of learned feature explanations. We call for a community-wide shift towards systematically measuring feature consistency to foster robust cumulative progress in MI.
nan
Article 1263
Title@2025-05-26 (1): Unveiling AI’s Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors
Title: Unveiling AI’s Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors | Enthüllen der Blind-Spots von KI: Ein Oracle für In-Domain-, Out-of-Domain- und Adversarial-Fehler | 大赦国际不懈的《盲人点:内地、外地和反向错误的甲骨文》 2410.02384v3 |
Authors: Shuangpeng Han, Mengmi Zhang
AI models make mistakes when recognizing images-whether in-domain, out-of-domain, or adversarial. Predicting these errors is critical for improving system reliability, reducing costly mistakes, and enabling proactive corrections in real-world applications such as healthcare, finance, and autonomous systems. However, understanding what mistakes AI models make, why they occur, and how to predict them remains an open challenge. Here, we conduct comprehensive empirical evaluations using a “mentor” model-a deep neural network designed to predict another “mentee” model’s errors. Our findings show that the mentor excels at learning from a mentee’s mistakes on adversarial images with small perturbations and generalizes effectively to predict in-domain and out-of-domain errors of the mentee. Additionally, transformer-based mentor models excel at predicting errors across various mentee architectures. Subsequently, we draw insights from these observations and develop an “oracle” mentor model, dubbed SuperMentor, that can outperform baseline mentors in predicting errors across different error types from the ImageNet-1K dataset. Our framework paves the way for future research on anticipating and correcting AI model behaviors, ultimately increasing trust in AI systems.
nan
Article 1264
Title@2025-05-26 (1): Learning Extrapolative Sequence Transformations from Markov Chains
Title: Learning Extrapolative Sequence Transformations from Markov Chains | Extrapolative Sequenztransformationen von Markov-Ketten lernen | 来自Markov 链条的学习外推序列变换 2505.20251v1 |
Authors: Sophia Hager, Aleem Khan, Andrew Wang, Nicholas Andrews
Most successful applications of deep learning involve similar training and test conditions. However, tasks such as biological sequence design involve searching for sequences that improve desirable properties beyond previously known values, which requires novel hypotheses that \emph{extrapolate} beyond training data. In these settings, extrapolation may be achieved by using random search methods such as Markov chain Monte Carlo (MCMC), which, given an initial state, sample local transformations to approximate a target density that rewards states with the desired properties. However, even with a well-designed proposal, MCMC may struggle to explore large structured state spaces efficiently. Rather than relying on stochastic search, it would be desirable to have a model that greedily optimizes the properties of interest, successfully extrapolating in as few steps as possible. We propose to learn such a model from the Markov chains resulting from MCMC search. Specifically, our approach uses selected states from Markov chains as a source of training data for an autoregressive model, which is then able to efficiently generate novel sequences that extrapolate along the sequence-level properties of interest. The proposed approach is validated on three problems: protein sequence design, text sentiment control, and text anonymization. We find that the autoregressive model can extrapolate as well or better than MCMC, but with the additional benefits of scalability and significantly higher sample efficiency.
nan
Article 1265
Title@2025-05-26 (1): On the Guidance of Flow Matching
Title: On the Guidance of Flow Matching | Über die Anleitung von Flow Matching | 流动配对指南 2502.02150v3 |
Authors: Ruiqi Feng, Chenglei Yu, Wenhao Deng, Peiyan Hu, Tailin Wu
Flow matching has shown state-of-the-art performance in various generative tasks, ranging from image generation to decision-making, where generation under energy guidance (abbreviated as guidance in the following) is pivotal. However, the guidance of flow matching is more general than and thus substantially different from that of its predecessor, diffusion models. Therefore, the challenge in guidance for general flow matching remains largely underexplored. In this paper, we propose the first framework of general guidance for flow matching. From this framework, we derive a family of guidance techniques that can be applied to general flow matching. These include a new training-free asymptotically exact guidance, novel training losses for training-based guidance, and two classes of approximate guidance that cover classical gradient guidance methods as special cases. We theoretically investigate these different methods to give a practical guideline for choosing suitable methods in different scenarios. Experiments on synthetic datasets, image inverse problems, and offline reinforcement learning demonstrate the effectiveness of our proposed guidance methods and verify the correctness of our flow matching guidance framework. Code to reproduce the experiments can be found at https://github.com/AI4Science-WestlakeU/flow_guidance.
nan
Article 1266
Title@2025-05-26 (1): TACO: Training-free Sound Prompted Segmentation via Semantically Constrained Audio-visual CO-factorization
Title: TACO: Training-free Sound Prompted Segmentation via Semantically Constrained Audio-visual CO-factorization | TACO: Schulungsfreie Klang-Prompt-Segmentierung über semantisch eingeschränkte Audio-visuelle CO-Fabrizierung | TACO:通过模拟压缩培训的视听共同推动因素,进行无培训、无培训的音频快速分割 2412.01488v3 |
Authors: Hugo Malard, Michel Olvera, Stephane Lathuiliere, Slim Essid
Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications. Here, we tackle the specific task of sound-prompted segmentation, aiming to segment image regions corresponding to objects heard in an audio signal. Most existing approaches tackle this problem by fine-tuning pre-trained models or by training additional modules specifically for the task. We adopt a different strategy: we introduce a training-free approach that leverages Non-negative Matrix Factorization (NMF) to co-factorize audio and visual features from pre-trained models so as to reveal shared interpretable concepts. These concepts are passed on to an open-vocabulary segmentation model for precise segmentation maps. By using frozen pre-trained models, our method achieves high generalization and establishes state-of-the-art performance in unsupervised sound-prompted segmentation, significantly surpassing previous unsupervised methods.
nan
Article 1267
Title@2025-05-26 (1): Efficient Optimization Accelerator Framework for Multistate Ising Problems
Title: Efficient Optimization Accelerator Framework for Multistate Ising Problems | Effizientes Optimierungs-Beschleuniger-Framework für Multistate Ising-Probleme | 高效高效优化多州化问题加速加速框架 2505.20250v1 |
Authors: Chirag Garg, Sayeef Salahuddin
Ising Machines are a prominent class of hardware architectures that aim to solve NP-hard combinatorial optimization problems. These machines consist of a network of interacting binary spins/neurons that evolve to represent the optimum ground state energy solution. Generally, combinatorial problems are transformed into quadratic unconstrained binary optimization (QUBO) form to harness the computational efficiency of these Ising machines. However, this transformation, especially for multi-state problems, often leads to a more complex exploration landscape than the original problem, thus severely impacting the solution quality. To address this challenge, we model the spin interactions as a generalized boolean logic function to significantly reduce the exploration space. We benchmark the graph coloring problem from the class of multi-state NP-hard optimization using probabilistic Ising solvers to illustrate the effectiveness of our framework. The proposed methodology achieves similar accuracy compared to state-of-the-art heuristics and machine learning algorithms, and demonstrates significant improvement over the existing Ising methods. Additionally, we demonstrate that combining parallel tempering with our existing framework further reduces the coloring error by up to 50% compared to the conventionally used Gibbs sampling algorithm. We also design a 1024-neuron all-to-all connected probabilistic Ising accelerator that shows up to 10000x performance acceleration compared to heuristics while reducing the number of required physical neurons by 1.5-4x compared to conventional Ising machines. Indeed, this accelerator solution demonstrates improvement across all metrics over the current methods, i.e., energy, performance, area, and solution quality. Thus, this work expands the potential of existing Ising hardware to solve a broad class of these multistate optimization problems.
nan
Article 1268
Title@2025-05-26 (1): RedAHD: Reduction-Based End-to-End Automatic Heuristic Design with Large Language Models
Title: RedAHD: Reduction-Based End-to-End Automatic Heuristic Design with Large Language Models | RedAHD: Reduktionsbasiertes, End-to-End-Automatisches Heuristisches Design mit großen Sprachmodellen | REDAHD: 具有大语言模型的后端至后端自动超量设计 2505.20242v1 |
Authors: Nguyen Thach, Aida Riahifar, Nathan Huynh, Hau Chan
Solving NP-hard combinatorial optimization problems (COPs) (e.g., traveling salesman problems (TSPs) and capacitated vehicle routing problems (CVRPs)) in practice traditionally involves handcrafting heuristics or specifying a search space for finding effective heuristics. The main challenges from these approaches, however, are the sheer amount of domain knowledge and implementation efforts required from human experts. Recently, significant progress has been made to address these challenges, particularly by using large language models (LLMs) to design heuristics within some predetermined generalized algorithmic framework (GAF, e.g., ant colony optimization and guided local search) for building key functions/components (e.g., a priori information on how promising it is to include each edge in a solution for TSP and CVRP). Although existing methods leveraging this idea have shown to yield impressive optimization performance, they are not fully end-to-end and still require considerable manual interventions. In this paper, we propose a novel end-to-end framework, named RedAHD, that enables these LLM-based heuristic design methods to operate without the need of GAFs. More specifically, RedAHD employs LLMs to automate the process of reduction, i.e., transforming the COP at hand into similar COPs that are better-understood, from which LLM-based heuristic design methods can design effective heuristics for directly solving the transformed COPs and, in turn, indirectly solving the original COP. Our experimental results, evaluated on six COPs, show that RedAHD is capable of designing heuristics with competitive or improved results over the state-of-the-art methods with minimal human involvement.
nan
Article 1269
Title@2025-05-26 (1): DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
Title: DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning | DreamPRM: Domain-regewichtetes Prozess-Reward-Modell für multimodale Vernunft | DreamPRM: 多边理由解释的负重评分进程奖励模式 2505.20241v1 |
Authors: Qi Cao, Ruiyi Wang, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie
Reasoning has substantially improved the performance of large language models (LLMs) on complicated tasks. Central to the current reasoning studies, Process Reward Models (PRMs) offer a fine-grained evaluation of intermediate reasoning steps and guide the reasoning process. However, extending PRMs to multimodal large language models (MLLMs) introduces challenges. Since multimodal reasoning covers a wider range of tasks compared to text-only scenarios, the resulting distribution shift from the training to testing sets is more severe, leading to greater generalization difficulty. Training a reliable multimodal PRM, therefore, demands large and diverse datasets to ensure sufficient coverage. However, current multimodal reasoning datasets suffer from a marked quality imbalance, which degrades PRM performance and highlights the need for an effective data selection strategy. To address the issues, we introduce DreamPRM, a domain-reweighted training framework for multimodal PRMs which employs bi-level optimization. In the lower-level optimization, DreamPRM performs fine-tuning on multiple datasets with domain weights, allowing the PRM to prioritize high-quality reasoning signals and alleviating the impact of dataset quality imbalance. In the upper-level optimization, the PRM is evaluated on a separate meta-learning dataset; this feedback updates the domain weights through an aggregation loss function, thereby improving the generalization capability of trained PRM. Extensive experiments on multiple multimodal reasoning benchmarks covering both mathematical and general reasoning show that test-time scaling with DreamPRM consistently improves the performance of state-of-the-art MLLMs. Further comparisons reveal that DreamPRM’s domain-reweighting strategy surpasses other data selection methods and yields higher accuracy gains than existing test-time scaling approaches.
nan
Article 1270
Title@2025-05-26 (1): SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems
Title: SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems | SITCOM: Triple-Consistent Diffusions-Probenahme für inverse Probleme | SITCOM: 反问题递进三联扩散抽样 2410.04479v2 |
Authors: Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang
Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse problems, the reverse sampling steps are modified to approximately sample from a measurement-conditioned distribution. However, these modifications may be unsuitable for certain settings (e.g., presence of measurement noise) and non-linear tasks, as they often struggle to correct errors from earlier steps and generally require a large number of optimization and/or sampling steps. To address these challenges, we state three conditions for achieving measurement-consistent diffusion trajectories. Building on these conditions, we propose a new optimization-based sampling method that not only enforces standard data manifold measurement consistency and forward diffusion consistency, as seen in previous studies, but also incorporates our proposed step-wise and network-regularized backward diffusion consistency that maintains a diffusion trajectory by optimizing over the input of the pre-trained model at every sampling step. By enforcing these conditions (implicitly or explicitly), our sampler requires significantly fewer reverse steps. Therefore, we refer to our method as Step-wise Triple-Consistent Sampling (SITCOM). Compared to SOTA baselines, our experiments across several linear and non-linear tasks (with natural and medical images) demonstrate that SITCOM achieves competitive or superior results in terms of standard similarity metrics and run-time.
nan
Article 1271
Title@2025-05-26 (1): A Temporal Difference Method for Stochastic Continuous Dynamics
Title: A Temporal Difference Method for Stochastic Continuous Dynamics | Eine zeitliche Differenzmethode für stochastische kontinuierliche Dynamik | 存储连续动态的时差方法 2505.15544v3 |
Authors: Haruki Settai, Naoya Takeishi, Takehisa Yairi
For continuous systems modeled by dynamical equations such as ODEs and SDEs, Bellman’s principle of optimality takes the form of the Hamilton-Jacobi-Bellman (HJB) equation, which provides the theoretical target of reinforcement learning (RL). Although recent advances in RL successfully leverage this formulation, the existing methods typically assume the underlying dynamics are known a priori because they need explicit access to the coefficient functions of dynamical equations to update the value function following the HJB equation. We address this inherent limitation of HJB-based RL; we propose a model-free approach still targeting the HJB equation and propose the corresponding temporal difference method. We demonstrate its potential advantages over transition kernel-based formulations, both qualitatively and empirically. The proposed formulation paves the way toward bridging stochastic optimal control and model-free reinforcement learning.
nan
Article 1272
Title@2025-05-26 (1): RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Title: RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning | RAGEN: Selbst-Evolution in LLM-Agenten durch Multi-Turn-Verstärkungs-Lernen verstehen | 通过多阶段强化学习了解LLM代理商的自我演变 2504.20073v2 |
Authors: Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Xing Jin, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
Training large language models (LLMs) as interactive agents presents unique challenges including long-horizon decision making and interacting with stochastic environment feedback. While reinforcement learning (RL) has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO (State-Thinking-Actions-Reward Policy Optimization), a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents. Our study on four stylized environments reveals three core findings. First, our agent RL training shows a recurring mode of Echo Trap where reward variance cliffs and gradient spikes; we address this with StarPO-S, a stabilized variant with trajectory filtering, critic incorporation, and gradient stabilization. Second, we find the shaping of RL rollouts would benefit from diverse initial states, medium interaction granularity and more frequent sampling. Third, we show that without fine-grained, reasoning-aware reward signals, agent reasoning hardly emerge through multi-turn RL and they may show shallow strategies or hallucinated thoughts. Code and environments are available at https://github.com/RAGEN-AI/RAGEN.
nan
Article 1273
Title@2025-05-26 (1): SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Title: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | SFT-Erinnerungen, RL Generalisiert: Eine vergleichende Studie des Stiftungsmodells nach der Ausbildung | SFT Memorizes,RL一般化:基金会培训模式模型比较研究 2501.17161v2 |
Authors: Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, Sergey Levine, Yi Ma
Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model’s underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL’s superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model’s output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.
nan
Article 1274
Title@2025-05-26 (1): Variational Deep Learning via Implicit Regularization
Title: Variational Deep Learning via Implicit Regularization | Variationales Deep Learning durch Implizite Regularisierung | 通过隐性规范化进行不同的深层学习 2505.20235v1 |
Authors: Jonathan Wenger, Beau Coker, Juraj Marusic, John P. Cunningham
Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters and optimization procedure. However, deploying deep learning models out-of-distribution, in sequential decision-making tasks, or in safety-critical domains, necessitates reliable uncertainty quantification, not just a point estimate. The machinery of modern approximate inference – Bayesian deep learning – should answer the need for uncertainty quantification, but its effectiveness has been challenged by our inability to define useful explicit inductive biases through priors, as well as the associated computational burden. Instead, in this work we demonstrate, both theoretically and empirically, how to regularize a variational deep network implicitly via the optimization procedure, just as for standard deep learning. We fully characterize the inductive bias of (stochastic) gradient descent in the case of an overparametrized linear model as generalized variational inference and demonstrate the importance of the choice of parametrization. Finally, we show empirically that our approach achieves strong in- and out-of-distribution performance without tuning of additional hyperparameters and with minimal time and memory overhead over standard deep learning.
nan
Article 1275
Title@2025-05-26 (1): Multimodal Federated Learning With Missing Modalities through Feature Imputation Network
Title: Multimodal Federated Learning With Missing Modalities through Feature Imputation Network | Multimodales Federated Learning mit fehlenden Modalitäten durch Feature Imputation Network | 通过特征截肢网络以失踪模式进行多模式联邦学习 2505.20232v1 |
Authors: Pranav Poudel, Aavash Chhetri, Prashnna Gyawali, Georgios Leontidis, Binod Bhattarai
Multimodal federated learning holds immense potential for collaboratively training models from multiple sources without sharing raw data, addressing both data scarcity and privacy concerns, two key challenges in healthcare. A major challenge in training multimodal federated models in healthcare is the presence of missing modalities due to multiple reasons, including variations in clinical practice, cost and accessibility constraints, retrospective data collection, privacy concerns, and occasional technical or human errors. Previous methods typically rely on publicly available real datasets or synthetic data to compensate for missing modalities. However, obtaining real datasets for every disease is impractical, and training generative models to synthesize missing modalities is computationally expensive and prone to errors due to the high dimensionality of medical data. In this paper, we propose a novel, lightweight, low-dimensional feature translator to reconstruct bottleneck features of the missing modalities. Our experiments on three different datasets (MIMIC-CXR, NIH Open-I, and CheXpert), in both homogeneous and heterogeneous settings consistently improve the performance of competitive baselines. The code and implementation details are available at: https://github.com/bhattarailab/FedFeatGen
nan
Article 1276
Title@2025-05-26 (1): From What to How: Attributing CLIP’s Latent Components Reveals Unexpected Semantic Reliance
Title: From What to How: Attributing CLIP’s Latent Components Reveals Unexpected Semantic Reliance | Von was zu wie: Zuweisen von CLIPs latenten Komponenten zeigt ungeahnte semantische Zuverlässigkeit | 从何到如何: 将 CLIP 的内部部件流出异常的语义依赖性归结为 CLIP 的内部批量 。 2505.20229v1 |
Authors: Maximilian Dreyer, Lorenz Hufe, Jim Berend, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
Transformer-based CLIP models are widely used for text-image probing and feature extraction, making it relevant to understand the internal mechanisms behind their predictions. While recent works show that Sparse Autoencoders (SAEs) yield interpretable latent components, they focus on what these encode and miss how they drive predictions. We introduce a scalable framework that reveals what latent components activate for, how they align with expected semantics, and how important they are to predictions. To achieve this, we adapt attribution patching for instance-wise component attributions in CLIP and highlight key faithfulness limitations of the widely used Logit Lens technique. By combining attributions with semantic alignment scores, we can automatically uncover reliance on components that encode semantically unexpected or spurious concepts. Applied across multiple CLIP variants, our method uncovers hundreds of surprising components linked to polysemous words, compound nouns, visual typography and dataset artifacts. While text embeddings remain prone to semantic ambiguity, they are more robust to spurious correlations compared to linear classifiers trained on image embeddings. A case study on skin lesion detection highlights how such classifiers can amplify hidden shortcuts, underscoring the need for holistic, mechanistic interpretability. We provide code at https://github.com/maxdreyer/attributing-clip.
nan
Article 1277
Title@2025-05-26 (1): FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models
Title: FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models | FLAME-MoE: Eine transparente End-to-End-Forschungsplattform für Mixture-of-Experts-Sprachmodelle | FLAME-MOE:混合专家语言模型透明端对端研究平台 2505.20225v1 |
Authors: Hao Kang, Zichun Yu, Chenyan Xiong
Recent large language models such as Gemini-1.5, DeepSeek-V3, and Llama-4 increasingly adopt Mixture-of-Experts (MoE) architectures, which offer strong efficiency-performance trade-offs by activating only a fraction of the model per token. Yet academic researchers still lack a fully open, end-to-end MoE platform for investigating scaling, routing, and expert behavior. We release FLAME-MoE, a completely open-source research suite composed of seven decoder-only models, ranging from 38M to 1.7B active parameters, whose architecture–64 experts with top-8 gating and 2 shared experts–closely reflects modern production LLMs. All training data pipelines, scripts, logs, and checkpoints are publicly available to enable reproducible experimentation. Across six evaluation tasks, FLAME-MoE improves average accuracy by up to 3.4 points over dense baselines trained with identical FLOPs. Leveraging full training trace transparency, we present initial analyses showing that (i) experts increasingly specialize on distinct token subsets, (ii) co-activation matrices remain sparse, reflecting diverse expert usage, and (iii) routing behavior stabilizes early in training. All code, training logs, and model checkpoints are available at https://github.com/cmu-flame/FLAME-MoE.
nan
Article 1278
Title@2025-05-26 (1): Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Title: Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects | Chain-of-Thought für autonomes Fahren: Eine umfassende Umfrage und Zukunftsaussichten | 寻求自主驾驶:全面调查和未来前景 2505.20223v1 |
Authors: Yixin Cui, Haotian Lin, Shuo Yang, Yixiao Wang, Yanjun Huang, Hong Chen
The rapid evolution of large language models in natural language processing has substantially elevated their semantic understanding and logical reasoning capabilities. Such proficiencies have been leveraged in autonomous driving systems, contributing to significant improvements in system performance. Models such as OpenAI o1 and DeepSeek-R1, leverage Chain-of-Thought (CoT) reasoning, an advanced cognitive method that simulates human thinking processes, demonstrating remarkable reasoning capabilities in complex tasks. By structuring complex driving scenarios within a systematic reasoning framework, this approach has emerged as a prominent research focus in autonomous driving, substantially improving the system’s ability to handle challenging cases. This paper investigates how CoT methods improve the reasoning abilities of autonomous driving models. Based on a comprehensive literature review, we present a systematic analysis of the motivations, methodologies, challenges, and future research directions of CoT in autonomous driving. Furthermore, we propose the insight of combining CoT with self-learning to facilitate self-evolution in driving systems. To ensure the relevance and timeliness of this study, we have compiled a dynamic repository of literature and open-source projects, diligently updated to incorporate forefront developments. The repository is publicly available at https://github.com/cuiyx1720/Awesome-CoT4AD.
nan
Article 1279
Title@2025-05-26 (1): Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Title: Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction | Rollen Sie die Würfel & Blick, bevor Sie springen: Gehen über die kreativen Grenzen der Next-Token-Vorhersage | 跳跃前的骰子滚动和看一看:超越了次声预测的创造性极限 2504.15266v2 |
Authors: Vaishnavh Nagarajan, Chen Henry Wu, Charles Ding, Aditi Raghunathan
We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. Much like real-world tasks that require a creative, far-sighted leap of thought, our tasks require an implicit, open-ended stochastic planning step that either (a) discovers new connections in an abstract knowledge graph (like in wordplay, drawing analogies, or research) or (b) constructs new patterns (like in designing math problems or new proteins). In these tasks, we empirically and conceptually argue how next-token learning is myopic and memorizes excessively; multi-token approaches, namely teacherless training and diffusion models, comparatively excel in producing diverse and original output. Secondly, to elicit randomness without hurting coherence, we find that injecting noise at the input layer (dubbed as seed-conditioning) works surprisingly as well as (and in some conditions, better than) temperature sampling from the output layer. Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and temperature sampling. We make part of the code available under https://github.com/chenwu98/algorithmic-creativity
nan
Article 1280
Title@2025-05-26 (1): Gradient Flow Matching for Learning Update Dynamics in Neural Network Training
Title: Gradient Flow Matching for Learning Update Dynamics in Neural Network Training | Gradient Flow Passend zum Lernen von Update-Dynamik im neuralen Netzwerktraining | 神经网络培训中学习更新动态动态的渐进流程匹配 2505.20221v1 |
Authors: Xiao Shou, Yanna Ding, Jianxi Gao
Training deep neural networks remains computationally intensive due to the itera2 tive nature of gradient-based optimization. We propose Gradient Flow Matching (GFM), a continuous-time modeling framework that treats neural network training as a dynamical system governed by learned optimizer-aware vector fields. By leveraging conditional flow matching, GFM captures the underlying update rules of optimizers such as SGD, Adam, and RMSprop, enabling smooth extrapolation of weight trajectories toward convergence. Unlike black-box sequence models, GFM incorporates structural knowledge of gradient-based updates into the learning objective, facilitating accurate forecasting of final weights from partial training sequences. Empirically, GFM achieves forecasting accuracy that is competitive with Transformer-based models and significantly outperforms LSTM and other classical baselines. Furthermore, GFM generalizes across neural architectures and initializations, providing a unified framework for studying optimization dynamics and accelerating convergence prediction.
nan
Article 1281
Title@2025-05-26 (1): Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction
Title: Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction | Öffnen Sie die Augen von MPNN: Vision verbessert MPNN in Link Prediction | MPNNN的 “ 睁开眼 “ :愿景在 “ 连结预测 “ 中加强MPNN 2505.08266v2 |
Authors: Yanbin Wei, Xuehao Wang, Zhan Zhuang, Yang Chen, Shuhao Chen, Yulong Zhang, Yu Zhang, James Kwok
Message-passing graph neural networks (MPNNs) and structural features (SFs) are cornerstones for the link prediction task. However, as a common and intuitive mode of understanding, the potential of visual perception has been overlooked in the MPNN community. For the first time, we equip MPNNs with vision structural awareness by proposing an effective framework called Graph Vision Network (GVN), along with a more efficient variant (E-GVN). Extensive empirical results demonstrate that with the proposed frameworks, GVN consistently benefits from the vision enhancement across seven link prediction datasets, including challenging large-scale graphs. Such improvements are compatible with existing state-of-the-art (SOTA) methods and GVNs achieve new SOTA results, thereby underscoring a promising novel direction for link prediction.
nan
Article 1282
Title@2025-05-26 (1): New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
Title: New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results | Neue Perspektiven auf die Polyak Stepsize: Surrogate-Funktionen und negative Ergebnisse | 关于 “ 多边步骤的新观点:代理功能和消极结果 “ 2505.20219v1 |
Authors: Francesco Orabona, Ryan D’Orazio
The Polyak stepsize has been proven to be a fundamental stepsize in convex optimization, giving near optimal gradient descent rates across a wide range of assumptions. The universality of the Polyak stepsize has also inspired many stochastic variants, with theoretical guarantees and strong empirical performance. Despite the many theoretical results, our understanding of the convergence properties and shortcomings of the Polyak stepsize or its variants is both incomplete and fractured across different analyses. We propose a new, unified, and simple perspective for the Polyak stepsize and its variants as gradient descent on a surrogate loss. We show that each variant is equivalent to minimize a surrogate function with stepsizes that adapt to a guaranteed local curvature. Our general surrogate loss perspective is then used to provide a unified analysis of existing variants across different assumptions. Moreover, we show a number of negative results proving that the non-convergence results in some of the upper bounds is indeed real.
nan
Article 1283
Title@2025-05-26 (1): Fine-grained List-wise Alignment for Generative Medication Recommendation
Title: Fine-grained List-wise Alignment for Generative Medication Recommendation | Feinkörnige List-Wise-Ausrichtung für Generative Medikamente Empfehlung | 生产用药建议精制清单调整 2505.20218v1 |
Authors: Chenxiao Fan, Chongming Gao, Wentao Shi, Yaxin Gong, Zihao Zhao, Fuli Feng
Accurate and safe medication recommendations are critical for effective clinical decision-making, especially in multimorbidity cases. However, existing systems rely on point-wise prediction paradigms that overlook synergistic drug effects and potential adverse drug-drug interactions (DDIs). We propose FLAME, a fine-grained list-wise alignment framework for large language models (LLMs), enabling drug-by-drug generation of drug lists. FLAME formulates recommendation as a sequential decision process, where each step adds or removes a single drug. To provide fine-grained learning signals, we devise step-wise Group Relative Policy Optimization (GRPO) with potential-based reward shaping, which explicitly models DDIs and optimizes the contribution of each drug to the overall prescription. Furthermore, FLAME enhances patient modeling by integrating structured clinical knowledge and collaborative information into the representation space of LLMs. Experiments on benchmark datasets demonstrate that FLAME achieves state-of-the-art performance, delivering superior accuracy, controllable safety-accuracy trade-offs, and strong generalization across diverse clinical scenarios. Our code is available at https://github.com/cxfann/Flame.
nan
Article 1284
Title@2025-05-26 (1): Parameter-Efficient Fine-Tuning with Column Space Projection
Title: Parameter-Efficient Fine-Tuning with Column Space Projection | Parameter-Effizient Feintuning mit Säulenraumprojektion | 带有列空间投射的高效参数精密设计 2505.20211v1 |
Authors: Junseo Hwang, Wonguk Cho, Taesup Kim
Fine-tuning large language models (LLMs) with minimal computational overhead is essential for efficiently adapting them to downstream tasks under resource constraints. Parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), facilitate this by updating only a small subset of parameters. However, recent studies show that LoRA diverges from full fine-tuning (Full FT) in its learning behavior, particularly in terms of spectral properties. Motivated by these findings, we propose PiCa, the first theoretically grounded PEFT method based on the spectral properties of fine-tuned weights. PiCa projects gradients onto the low-rank column subspace of pre-trained weights and exhibits learning patterns more closely aligned with Full FT. Furthermore, we show that combining PiCa with weight sharing drastically reduces the number of trainable parameters without compromising performance, enabling to achieve superior performance than LoRA using 13x fewer trainable parameters. Extensive experiments demonstrate PiCa achieves the state-of-the-art performance compared to existing PEFT methods.
nan
Article 1285
Title@2025-05-26 (1): FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings
Title: FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings | FedECA: Eine Federated External Control Arm Methode für ursächliche Schlussfolgerungen mit Zeit-bis-Event-Daten in verteilten Einstellungen | FedECA:在分布环境中利用时间到时间的数据进行因果关系推断的联邦外部控制武器法 2311.16984v9 |
Authors: Jean Ogier du Terrail, Quentin Klopfenstein, Honghao Li, Imke Mayer, Nicolas Loiseau, Mohammad Hallal, Michael Debouver, Thibault Camalon, Thibault Fouqueray, Jorge Arellano Castro, Zahia Yanes, Laëtitia Dahan, Julien Taïeb, Pierre Laurent-Puig, Jean-Baptiste Bachet, Shulin Zhao, Remy Nicolle, Jérome Cros, Daniel Gonzalez, Robert Carreras-Torres, Adelaida Garcia Velasco, Kawther Abdilleh, Sudheer Doss, Félix Balazard, Mathieu Andreux
External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval. However, the main challenge in implementing ECA lies in accessing real-world or historical clinical trials data. Indeed, regulations protecting patients’ rights by strictly controlling data processing make pooling data from multiple sources in a central server often difficult. To address these limitations, we develop a new method, ‘FedECA’ that leverages federated learning (FL) to enable inverse probability of treatment weighting (IPTW) for time-to-event outcomes on separate cohorts without needing to pool data. To showcase the potential of FedECA, we apply it in different settings of increasing complexity culminating with a real-world use-case in which FedECA is used to compare the treatment effect of two approved chemotherapy regimens using data from three separate cohorts of patients with metastatic pancreatic cancer. By sharing our code, we hope FedECA will foster the creation of federated research networks and thus accelerate drug development.
nan
Article 1286
Title@2025-05-26 (1): Temporal Sampling for Forgotten Reasoning in LLMs
Title: Temporal Sampling for Forgotten Reasoning in LLMs | Zeitliche Probenahme für vergessene Vernunft in LLMs | LLM 被遗忘原因的时间抽样 2505.20196v1 |
Authors: Yuetai Li, Zhangchen Xu, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Xiang Yue, Radha Poovendran
Fine-tuning large language models (LLMs) is intended to improve their reasoning capabilities, yet we uncover a counterintuitive effect: models often forget how to solve problems they previously answered correctly during training. We term this phenomenon temporal forgetting and show that it is widespread across model sizes, fine-tuning methods (both Reinforcement Learning and Supervised Fine-Tuning), and multiple reasoning benchmarks. To address this gap, we introduce Temporal Sampling, a simple decoding strategy that draws outputs from multiple checkpoints along the training trajectory. This approach recovers forgotten solutions without retraining or ensembling, and leads to substantial improvements in reasoning performance, gains from 4 to 19 points in Pass@k and consistent gains in Majority@k across several benchmarks. We further extend our method to LoRA-adapted models, demonstrating that storing only adapter weights across checkpoints achieves similar benefits with minimal storage cost. By leveraging the temporal diversity inherent in training, Temporal Sampling offers a practical, compute-efficient way to surface hidden reasoning ability and rethink how we evaluate LLMs.
nan
Article 1287
Title@2025-05-26 (1): FunReason: Enhancing Large Language Models’ Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
Title: FunReason: Enhancing Large Language Models’ Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement | FunReason: Erweiterung der Funktion großer Sprachmodelle durch Multiscale-Verluste und automatisierte Datenverfeinerung durch Selbst-Refinement | FunReason:通过自我改进、多尺度损失和数据自动化改进加强大语言模型功能 2505.20192v1 |
Authors: Bingguang Hao, Maolin Wang, Zengzhuang Xu, Cunyin Peng, Yicheng Chen, Xiangyu Zhao, Jinjie Gu, Chenyi Zhuang
The integration of large language models (LLMs) with function calling has emerged as a crucial capability for enhancing their practical utility in real-world applications. However, effectively combining reasoning processes with accurate function execution remains a significant challenge. Traditional training approaches often struggle to balance the detailed reasoning steps with the precision of function calls, leading to suboptimal performance. To address these limitations, we introduce FunReason, a novel framework that enhances LLMs’ function calling capabilities through an automated data refinement strategy and a Self-Refinement Multiscale Loss (SRML) approach. FunReason leverages LLMs’ natural reasoning abilities to generate high-quality training examples, focusing on query parseability, reasoning coherence, and function call precision. The SRML approach dynamically balances the contribution of reasoning processes and function call accuracy during training, addressing the inherent trade-off between these two critical aspects. FunReason achieves performance comparable to GPT-4o while effectively mitigating catastrophic forgetting during fine-tuning. FunReason provides a comprehensive solution for enhancing LLMs’ function calling capabilities by introducing a balanced training methodology and a data refinement pipeline. For code and dataset, please refer to our repository at GitHub https://github.com/BingguangHao/FunReason
nan
Article 1288
Title@2025-05-26 (1): Private Geometric Median in Nearly-Linear Time
Title: Private Geometric Median in Nearly-Linear Time | Private Geometrische Medien in fast linearer Zeit | 近利时私人几何中位数 2505.20189v1 |
Authors: Syamantak Kumar, Daogao Liu, Kevin Tian, Chutong Yang
Estimating the geometric median of a dataset is a robust counterpart to mean estimation, and is a fundamental problem in computational geometry. Recently, [HSU24] gave an $(\varepsilon, \delta)$-differentially private algorithm obtaining an $\alpha$-multiplicative approximation to the geometric median objective, $\frac 1 n \sum_{i \in [n]} |\cdot - \mathbf{x}i|$, given a dataset $\mathcal{D} := {\mathbf{x}_i}{i \in [n]} \subset \mathbb{R}^d$. Their algorithm requires $n \gtrsim \sqrt d \cdot \frac 1 {\alpha\varepsilon}$ samples, which they prove is information-theoretically optimal. This result is surprising because its error scales with the \emph{effective radius} of $\mathcal{D}$ (i.e., of a ball capturing most points), rather than the worst-case radius. We give an improved algorithm that obtains the same approximation quality, also using $n \gtrsim \sqrt d \cdot \frac 1 {\alpha\epsilon}$ samples, but in time $\widetilde{O}(nd + \frac d {\alpha^2})$. Our runtime is nearly-linear, plus the cost of the cheapest non-private first-order method due to [CLM+16]. To achieve our results, we use subsampling and geometric aggregation tools inspired by FriendlyCore [TCK+22] to speed up the “warm start” component of the [HSU24] algorithm, combined with a careful custom analysis of DP-SGD’s sensitivity for the geometric median objective.
nan
Article 1289
Title@2025-05-26 (1): Research on feature fusion and multimodal patent text based on graph attention network
Title: Research on feature fusion and multimodal patent text based on graph attention network | Forschungsarbeiten über Feature Fusion und multimodalen Patenttext auf der Grundlage von Graphen Aufmerksamkeit Netzwerk | 根据图示关注网络研究地物聚合和多式专利法 2505.20188v1 |
Authors: Zhenzhen Song, Ziwei Liu, Hongji Li
Aiming at the problems of cross-modal feature fusion, low efficiency of long text modeling and lack of hierarchical semantic coherence in patent text semantic mining, this study proposes HGM-Net, a deep learning framework that integrates Hierarchical Comparative Learning (HCL), Multi-modal Graph Attention Network (M-GAT) and Multi-Granularity Sparse Attention (MSA), which builds a dynamic mask, contrast and cross-structural similarity constraints on the word, sentence and paragraph hierarchies through HCL. Contrast and cross-structural similarity constraints are constructed at the word and paragraph levels by HCL to strengthen the local semantic and global thematic consistency of patent text; M-GAT models patent classification codes, citation relations and text semantics as heterogeneous graph structures, and achieves dynamic fusion of multi-source features by cross-modal gated attention; MSA adopts a hierarchical sparsity strategy to optimize the computational efficiency of long text modeling at word, phrase, sentence and paragraph granularity. Experiments show that the framework demonstrates significant advantages over existing deep learning methods in tasks such as patent classification and similarity matching, and provides a solution with both theoretical innovation and practical value for solving the problems of patent examination efficiency improvement and technology relevance mining.
nan
Article 1290
Title@2025-05-26 (1): UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design
Title: UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design | UniMoMo: Unified Generative Modellierung von 3D-Molekülen für De Novo Binder Design | UniMomo:De Novo Binder 设计3D Molecules的统一生成模型 2503.19300v3 |
Authors: Xiangzhe Kong, Zishen Zhang, Ziting Zhang, Rui Jiao, Jianzhu Ma, Wenbing Huang, Kai Liu, Yang Liu
The design of target-specific molecules such as small molecules, peptides, and antibodies is vital for biological research and drug discovery. Existing generative methods are restricted to single-domain molecules, failing to address versatile therapeutic needs or utilize cross-domain transferability to enhance model performance. In this paper, we introduce Unified generative Modeling of 3D Molecules (UniMoMo), the first framework capable of designing binders of multiple molecular domains using a single model. In particular, UniMoMo unifies the representations of different molecules as graphs of blocks, where each block corresponds to either a standard amino acid or a molecular fragment. Subsequently, UniMoMo utilizes a geometric latent diffusion model for 3D molecular generation, featuring an iterative full-atom autoencoder to compress blocks into latent space points, followed by an E(3)-equivariant diffusion process. Extensive benchmarks across peptides, antibodies, and small molecules demonstrate the superiority of our unified framework over existing domain-specific models, highlighting the benefits of multi-domain training.
nan
Article 1291
Title@2025-05-26 (1): Linearization of ReLU Activation Function for Neural Network-Embedded Optimization: Optimal Day-Ahead Energy Scheduling
Title: Linearization of ReLU Activation Function for Neural Network-Embedded Optimization: Optimal Day-Ahead Energy Scheduling | Linearisierung der ReLU-Aktivierungsfunktion für neurale Netzwerk-Embedded-Optimierung: Optimale Day-Ahead-Energieplanung | ReLU神经网络激活功能的线性化 2310.01758v2 |
Authors: Cunzhi Zhao, Fan Jiang, Xingpeng Li
Recently, neural networks have been widely applied in the power system area. They can be used for better predicting input information and modeling system performance with increased accuracy. In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models that enforce limits on the output of the same learning model. This will create a neural network-embedded optimization problem; the use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable. To address this emerging challenge, this paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function. Four linearization methods tailored for the ReLU activation function are developed, analyzed and compared in this paper. Each method employs a set of linear constraints to replace the ReLU function, effectively linearizing the optimization problem, which can overcome the computational challenges associated with the nonlinearity of the neural network model. These proposed linearization methods provide valuable tools for effectively solving optimization problems that integrate neural network models with ReLU activation functions
nan
Article 1292
Title@2025-05-26 (1): Bayesian Optimisation Against Climate Change: Applications and Benchmarks
Title: Bayesian Optimisation Against Climate Change: Applications and Benchmarks | Bayesische Optimierung gegen den Klimawandel: Anwendungen und Benchmarks | Bayesian最佳应对气候变化:应用和基准 2306.04343v2 |
Authors: Sigrid Passano Hellan, Christopher G. Lucas, Nigel H. Goddard
Bayesian optimisation is a powerful method for optimising black-box functions, popular in settings where the true function is expensive to evaluate and no gradient information is available. Bayesian optimisation can improve responses to many optimisation problems within climate change for which simulator models are unavailable or expensive to sample from. While there have been several demonstrations of climate-related applications, there has been no unifying review of applications and benchmarks. We provide such a review here, to encourage the use of Bayesian optimisation for important and well-suited applications. We identify four main application domains: material discovery, wind farm layout, optimal renewable control and environmental monitoring. For each domain we identify a public benchmark or data set that is easy to use and evaluate systems against, while being representative of real-world problems. Due to the lack of a suitable benchmark for environmental monitoring, we propose LAQN-BO, based on air pollution data. Our contributions are: a) summarising Bayesian optimisation applications related to climate change; b) identifying a representative range of benchmarks, providing example code where necessary; and c) introducing a new benchmark, LAQN-BO.
nan
Article 1293
Title@2025-05-26 (1): On the Volatility of Shapley-Based Contribution Metrics in Federated Learning
Title: On the Volatility of Shapley-Based Contribution Metrics in Federated Learning | Über die Volatilität von Shapley-Based Contribution Metrics im Federated Learning | 联邦学习中基于毛质的贡献度量变化无常 2405.08044v4 |
Authors: Arno Geimer, Beltran Fiz, Radu State
Federated learning (FL) is a collaborative and privacy-preserving Machine Learning paradigm, allowing the development of robust models without the need to centralize sensitive data. A critical challenge in FL lies in fairly and accurately allocating contributions from diverse participants. Inaccurate allocation can undermine trust, lead to unfair compensation, and thus participants may lack the incentive to join or actively contribute to the federation. Various remuneration strategies have been proposed to date, including auction-based approaches and Shapley-value-based methods, the latter offering a means to quantify the contribution of each participant. However, little to no work has studied the stability of these contribution evaluation methods. In this paper, we evaluate participant contributions in federated learning using gradient-based model reconstruction techniques with Shapley values and compare the round-based contributions to a classic data contribution measurement scheme. We provide an extensive analysis of the discrepancies of Shapley values across a set of aggregation strategies and examine them on an overall and a per-client level. We show that, between different aggregation techniques, Shapley values lead to unstable reward allocations among participants. Our analysis spans various data heterogeneity distributions, including independent and identically distributed (IID) and non-IID scenarios.
nan
Article 1294
Title@2025-05-26 (1): No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference
Title: No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference | Kein kostenloses Mittagessen: Nicht-asymptotische Analyse von Vorhersage-Powered Inferenz | 无免费午餐:预测力推论的非心理分析 2505.20178v1 |
Authors: Pranav Mani, Peng Xu, Zachary C. Lipton, Michael Oberst
Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic “free lunch” for PPI++, an adaptive form of PPI, showing that the asymptotic variance of PPI++ is always less than or equal to the variance obtained from using gold-standard labels alone. Notably, this result holds regardless of the quality of the pseudo-labels. In this work, we demystify this result by conducting an exact finite-sample analysis of the estimation error of PPI++ on the mean estimation problem. We give a “no free lunch” result, characterizing the settings (and sample sizes) where PPI++ has provably worse estimation error than using gold-standard labels alone. Specifically, PPI++ will outperform if and only if the correlation between pseudo- and gold-standard is above a certain level that depends on the number of labeled samples ($n$). In some cases our results simplify considerably: For Gaussian data, the correlation must be at least $1/\sqrt{n - 2}$ in order to see improvement, and a similar result holds for binary labels. In experiments, we illustrate that our theoretical findings hold on real-world datasets, and give insights into trade-offs between single-sample and sample-splitting variants of PPI++.
nan
Article 1295
Title@2025-05-26 (1): The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination
Title: The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination | Die Macht des iterativen Filterns für überwachtes Lernen mit (schwerer) Kontaminierung | 受监督学习(重)污染的迭代过滤功能 2505.20177v1 |
Authors: Adam R. Klivans, Konstantinos Stavropoulos, Kevin Tian, Arsen Vasilyan
Inspired by recent work on learning with distribution shift, we give a general outlier removal algorithm called iterative polynomial filtering and show a number of striking applications for supervised learning with contamination: (1) We show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as nasty noise). This is a surprising resolution to a longstanding gap between the complexity of agnostic learning and learning with contamination, as it was widely believed that low-degree approximators only implied tolerance to label noise. (2) For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than $1/2$ of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting. (3) We obtain the first efficient algorithms for tolerant testable learning of functions of halfspaces with respect to any fixed log-concave distribution. Even the non-tolerant case for a single halfspace in this setting had remained open. These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart.
nan
Article 1296
Title@2025-05-26 (1): “KAN you hear me?” Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding
Title: “KAN you hear me?” Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding | “KAN hörst du mich?” Kolmogorov-Arnold-Netzwerke für gesprochenes Sprachverständnis erkunden | 探索科尔莫戈洛夫-阿诺尔德语言理解网络 2505.20176v1 |
Authors: Alkis Koudounas, Moreno La Quatra, Eliana Pastor, Sabato Marco Siniscalchi, Elena Baralis
Kolmogorov-Arnold Networks (KANs) have recently emerged as a promising alternative to traditional neural architectures, yet their application to speech processing remains under explored. This work presents the first investigation of KANs for Spoken Language Understanding (SLU) tasks. We experiment with 2D-CNN models on two datasets, integrating KAN layers in five different configurations within the dense block. The best-performing setup, which places a KAN layer between two linear layers, is directly applied to transformer-based models and evaluated on five SLU datasets with increasing complexity. Our results show that KAN layers can effectively replace the linear layers, achieving comparable or superior performance in most cases. Finally, we provide insights into how KAN and linear layers on top of transformers differently attend to input regions of the raw waveforms.
nan
Article 1297
Title@2025-05-26 (1): mPOLICE: Provable Enforcement of Multi-Region Affine Constraints in Deep Neural Networks
Title: mPOLICE: Provable Enforcement of Multi-Region Affine Constraints in Deep Neural Networks | mPOLICE: Wahrscheinliche Durchsetzung von Multi-Region Affine-Konstraints in tiefen neuralen Netzwerken | MPOLICE: 在深神经网络中以可行方式执行多种区域同系限制 2502.02434v2 |
Authors: Mohammadmehdi Ataei, Hyunmin Cheong, Adrian Butscher
Deep neural networks are increasingly used in safety-critical domains such as robotics and scientific modeling, where strict adherence to output constraints is essential. Methods like POLICE, which are tailored for single convex regions, face challenges when extended to multiple disjoint regions, often leading to constraint violations or unwanted affine behavior across regions. This paper proposes mPOLICE, a new approach that generalizes POLICE to provably enforce affine constraints over multiple disjoint convex regions. At its core, mPOLICE assigns distinct neuron activation patterns to each constrained region, enabling localized affine behavior and avoiding unintended generalization. This is implemented through a layer-wise optimization of the network parameters. Additionally, we introduce a training algorithm that incorporates mPOLICE into conventional deep learning pipelines, balancing task-specific performance with constraint enforcement using periodic sign pattern enforcement. We validate the flexibility and effectiveness of mPOLICE through experiments across various applications, including safety-critical reinforcement learning, implicit 3D shape representation with geometric constraints, and fluid dynamics simulations with boundary condition enforcement. Importantly, mPOLICE incurs no runtime overhead during inference, making it a practical and reliable solution for constraint handling in deep neural networks.
nan
Article 1298
Title@2025-05-26 (1): Virtual Cells: Predict, Explain, Discover
Title: Virtual Cells: Predict, Explain, Discover | Virtuelle Zellen: Vorhersagen, Erklären, Entdecken | 虚拟细胞: 预测、解释、发现 2505.14613v2 |
Authors: Emmanuel Noutahi, Jason Hartford, Prudencio Tossou, Shawn Whitfield, Alisandra K. Denton, Cas Wognum, Kristina Ulicna, Michael Craig, Jonathan Hsu, Michael Cuccarese, Emmanuel Bengio, Dominique Beaini, Christopher Gibson, Daniel Cohen, Berton Earnshaw
Drug discovery is fundamentally a process of inferring the effects of treatments on patients, and would therefore benefit immensely from computational models that can reliably simulate patient responses, enabling researchers to generate and test large numbers of therapeutic hypotheses safely and economically before initiating costly clinical trials. Even a more specific model that predicts the functional response of cells to a wide range of perturbations would be tremendously valuable for discovering safe and effective treatments that successfully translate to the clinic. Creating such virtual cells has long been a goal of the computational research community that unfortunately remains unachieved given the daunting complexity and scale of cellular biology. Nevertheless, recent advances in AI, computing power, lab automation, and high-throughput cellular profiling provide new opportunities for reaching this goal. In this perspective, we present a vision for developing and evaluating virtual cells that builds on our experience at Recursion. We argue that in order to be a useful tool to discover novel biology, virtual cells must accurately predict the functional response of a cell to perturbations and explain how the predicted response is a consequence of modifications to key biomolecular interactions. We then introduce key principles for designing therapeutically-relevant virtual cells, describe a lab-in-the-loop approach for generating novel insights with them, and advocate for biologically-grounded benchmarks to guide virtual cell development. Finally, we make the case that our approach to virtual cells provides a useful framework for building other models at higher levels of organization, including virtual patients. We hope that these directions prove useful to the research community in developing virtual models optimized for positive impact on drug discovery outcomes.
nan
Article 1299
Title@2025-05-26 (1): A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation
Title: A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation | Ein theoretischer Rahmen für Grokking: Interpolation gefolgt von Riemannsche Norm Minimierung | Grokking理论框架:内插,然后是Riemannian Norm 最小化 2505.20172v1 |
Authors: Etienne Boursier, Scott Pesme, Radu-Alexandru Dragomir
We study the dynamics of gradient flow with small weight decay on general training losses $F: \mathbb{R}^d \to \mathbb{R}$. Under mild regularity assumptions and assuming convergence of the unregularised gradient flow, we show that the trajectory with weight decay $\lambda$ exhibits a two-phase behaviour as $\lambda \to 0$. During the initial fast phase, the trajectory follows the unregularised gradient flow and converges to a manifold of critical points of $F$. Then, at time of order $1/\lambda$, the trajectory enters a slow drift phase and follows a Riemannian gradient flow minimising the $\ell_2$-norm of the parameters. This purely optimisation-based phenomenon offers a natural explanation for the \textit{grokking} effect observed in deep learning, where the training loss rapidly reaches zero while the test loss plateaus for an extended period before suddenly improving. We argue that this generalisation jump can be attributed to the slow norm reduction induced by weight decay, as explained by our analysis. We validate this mechanism empirically on several synthetic regression tasks.
nan
Article 1300
Title@2025-05-26 (1): From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Title: From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data | Von der Ausrichtung zur Weiterentwicklung: Bootstrapping Audio-Language Alignment mit synthetischen Daten | 从对齐到推进: 用合成数据推动音频语言对齐 2505.20166v1 |
Authors: Chun-Yi Kuan, Hung-yi Lee
Audio-aware large language models (ALLMs) have recently made great strides in understanding and processing audio inputs. These models are typically adapted from text-based large language models (LLMs) through additional training on audio-related tasks. However, this adaptation process presents two major limitations. First, ALLMs often suffer from catastrophic forgetting, where important textual capabilities such as instruction-following are lost after training on audio data. In some cases, models may even hallucinate sounds that are not present in the input audio, raising concerns about their reliability. Second, achieving cross-modal alignment between audio and language typically relies on large collections of task-specific question-answer pairs for instruction tuning, making the process resource-intensive. To address these issues, we leverage the backbone LLMs from ALLMs to synthesize general-purpose caption-style alignment data. We refer to this process as bootstrapping audio-language alignment via synthetic data generation from backbone LLMs (BALSa). Building on BALSa, we introduce LISTEN (Learning to Identify Sounds Through Extended Negative Samples), a contrastive-like training method designed to improve ALLMs’ ability to distinguish between present and absent sounds. We further extend BALSa to multi-audio scenarios, where the model either explains the differences between audio inputs or produces a unified caption that describes them all, thereby enhancing audio-language alignment. Experimental results indicate that our method effectively mitigates audio hallucinations while reliably maintaining strong performance in audio understanding, reasoning, and instruction-following skills. Moreover, incorporating multi-audio training further enhances the model’s comprehension and reasoning capabilities. Overall, BALSa offers an efficient and scalable approach to the development of ALLMs.
nan
Article 1301
Title@2025-05-26 (1): Capability-Based Scaling Laws for LLM Red-Teaming
Title: Capability-Based Scaling Laws for LLM Red-Teaming | Capability-Based Scaling-Gesetze für LLM Red-Teaming | LLM 红色团队合作以能力为基础的增强法律 2505.20162v1 |
Authors: Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping
As large language models grow in capability and agency, identifying vulnerabilities through red-teaming becomes vital for safe deployment. However, traditional prompt-engineering approaches may prove ineffective once red-teaming turns into a weak-to-strong problem, where target models surpass red-teamers in capabilities. To study this shift, we frame red-teaming through the lens of the capability gap between attacker and target. We evaluate more than 500 attacker-target pairs using LLM-based jailbreak attacks that mimic human red-teamers across diverse families, sizes, and capability levels. Three strong trends emerge: (i) more capable models are better attackers, (ii) attack success drops sharply once the target’s capability exceeds the attacker’s, and (iii) attack success rates correlate with high performance on social science splits of the MMLU-Pro benchmark. From these trends, we derive a jailbreaking scaling law that predicts attack success for a fixed target based on attacker-target capability gap. These findings suggest that fixed-capability attackers (e.g., humans) may become ineffective against future models, increasingly capable open-source models amplify risks for existing systems, and model providers must accurately measure and control models’ persuasive and manipulative abilities to limit their effectiveness as attackers.
nan
Article 1302
Title@2025-05-26 (1): Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
Title: Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning | Prismatische Synthese: Gradientenbasierte Datendiversifizierung steigert Generalisierung in LLM-Reasoning | 理论综合:基于逐步的数据多样化促进LLM理由说明的概括化 2505.20161v1 |
Authors: Jaehun Jung, Seungju Han, Ximing Lu, Skyler Hallinan, David Acuna, Shrimai Prabhumoye, Mostafa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi
Effective generalization in language models depends critically on the diversity of their training data. Yet existing diversity metrics often fall short of this goal, relying on surface-level heuristics that are decoupled from model behavior. This motivates us to ask: What kind of diversity in training data actually drives generalization in language models – and how can we measure and amplify it? Through large-scale empirical analyses spanning over 300 training runs, carefully controlled for data scale and quality, we show that data diversity can be a strong predictor of generalization in LLM reasoning – as measured by average model performance on unseen out-of-distribution benchmarks. We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients. Despite using a small off-the-shelf proxy model for gradients, G-Vendi consistently outperforms alternative measures, achieving strong correlation (Spearman’s $\rho \approx 0.9$) with out-of-distribution (OOD) performance on both natural language inference (NLI) and math reasoning tasks. Building on this insight, we present Prismatic Synthesis, a framework for generating diverse synthetic data by targeting underrepresented regions in gradient space. Experimental results show that Prismatic Synthesis consistently improves model performance as we scale synthetic data – not just on in-distribution test but across unseen, out-of-distribution benchmarks – significantly outperforming state-of-the-art models that rely on 20 times larger data generator than ours. For example, PrismMath-7B, our model distilled from a 32B LLM, outperforms R1-Distill-Qwen-7B – the same base model trained on proprietary data generated by 671B R1 – on 6 out of 7 challenging benchmarks.
nan
Article 1303
Title@2025-05-26 (1): Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities
Title: Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities | Gedachte politische Optimierung: Überwindung externer Leitlinien und interner Fähigkeiten | 优化政策:将外部指导和内部能力结合起来 2505.15692v2 |
Authors: Jinyang Wu, Chonghua Liao, Mingkuan Feng, Shuai Zhang, Zhengqi Wen, Pengpeng Shao, Huazhe Xu, Jianhua Tao
Reinforcement learning (RL) has emerged as an effective method for training reasoning models. However, existing RL approaches typically bias the model’s output distribution toward reward-maximizing paths without introducing external knowledge. This limits their exploration capacity and results in a narrower reasoning capability boundary compared to base models. To address this limitation, we propose TAPO (Thought-Augmented Policy Optimization), a novel framework that augments RL by incorporating external high-level guidance (“thought patterns”). By adaptively integrating structured thoughts during training, TAPO effectively balances model-internal exploration and external guidance exploitation. Extensive experiments show that our approach significantly outperforms GRPO by 99% on AIME, 41% on AMC, and 17% on Minerva Math. Notably, these high-level thought patterns, abstracted from only 500 prior samples, generalize effectively across various tasks and models. This highlights TAPO’s potential for broader applications across multiple tasks and domains. Our further analysis reveals that introducing external guidance produces powerful reasoning models with superior explainability of inference behavior and enhanced output readability.
nan
Article 1304
Title@2025-05-26 (1): Polynomial, trigonometric, and tropical activations
Title: Polynomial, trigonometric, and tropical activations | Polynomische, trigonometrische und tropische Aktivierungen | 多边、三角和热带活性 2502.01247v2 |
Authors: Ismail Khalfaoui-Hassani, Stefan Kesselheim
Which functions can be used as activations in deep neural networks? This article explores families of functions based on orthonormal bases, including the Hermite polynomial basis and the Fourier trigonometric basis, as well as a basis resulting from the tropicalization of a polynomial basis. Our study shows that, through simple variance-preserving initialization and without additional clamping mechanisms, these activations can successfully be used to train deep models, such as GPT-2 for next-token prediction on OpenWebText and ConvNeXt for image classification on ImageNet. Our work addresses the issue of exploding and vanishing activations and gradients, particularly prevalent with polynomial activations, and opens the door for improving the efficiency of large-scale learning tasks. Furthermore, our approach provides insight into the structure of neural networks, revealing that networks with polynomial activations can be interpreted as multivariate polynomial mappings. Finally, using Hermite interpolation, we show that our activations can closely approximate classical ones in pre-trained models by matching both the function and its derivative, making them especially useful for fine-tuning tasks. These activations are available in the torchortho library, which can be accessed via: https://github.com/K-H-Ismail/torchortho.
nan
Article 1305
Title@2025-05-26 (1): On the (Non) Injectivity of Piecewise Linear Janossy Pooling
Title: On the (Non) Injectivity of Piecewise Linear Janossy Pooling | Auf der (Nicht-)Injektivität der stückweise linearen Janossy-Pooling | 在Peaxy Linear Janosy 集合的喷射上, 2505.20150v1 |
Authors: Ilai Reshef, Nadav Dym
Multiset functions, which are functions that map multisets to vectors, are a fundamental tool in the construction of neural networks for multisets and graphs. To guarantee that the vector representation of the multiset is faithful, it is often desirable to have multiset mappings that are both injective and bi-Lipschitz. Currently, there are several constructions of multiset functions achieving both these guarantees, leading to improved performance in some tasks but often also to higher compute time than standard constructions. Accordingly, it is natural to inquire whether simpler multiset functions achieving the same guarantees are available. In this paper, we make a large step towards giving a negative answer to this question. We consider the family of k-ary Janossy pooling, which includes many of the most popular multiset models, and prove that no piecewise linear Janossy pooling function can be injective. On the positive side, we show that when restricted to multisets without multiplicities, even simple deep-sets models suffice for injectivity and bi-Lipschitzness.
nan
Article 1306
Title@2025-05-26 (1): SeMe: Training-Free Language Model Merging via Semantic Alignment
Title: SeMe: Training-Free Language Model Merging via Semantic Alignment | SeMe: Training-freies Sprachmodell Zusammenführen über semantische Ausrichtung | SeME:通过语义一致合并的无培训语言模式 2505.20144v1 |
Authors: Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang
Despite the remarkable capabilities of Language Models (LMs) across diverse tasks, no single model consistently outperforms others, necessitating efficient methods to combine their strengths without expensive retraining. Existing model merging techniques, such as parameter averaging and task-guided fusion, often rely on data-dependent computations or fail to preserve internal knowledge, limiting their robustness and scalability. We introduce SeMe (Semantic-based Merging), a novel, data-free, and training-free approach that leverages latent semantic alignment to merge LMs at a fine-grained, layer-wise level. Unlike prior work, SeMe not only preserves model behaviors but also explicitly stabilizes internal knowledge, addressing a critical gap in LM fusion. Through extensive experiments across diverse architectures and tasks, we demonstrate that SeMe outperforms existing methods in both performance and efficiency while eliminating reliance on external data. Our work establishes a new paradigm for knowledge-aware model merging and provides insights into the semantic structure of LMs, paving the way for more scalable and interpretable model composition.
nan
Article 1307
Title@2025-05-26 (1): Model Stitching by Functional Latent Alignment
Title: Model Stitching by Functional Latent Alignment | Modellstitching durch funktionale Latent Alignment | 通过功能性前端对齐进行模型切换 2505.20142v1 |
Authors: Ioannis Athanasiadis, Anmar Karmush, Michael Felsberg
Evaluating functional similarity involves quantifying the degree to which independently trained neural networks learn functionally similar representations. Reliably inferring the functional similarity of these networks remains an open problem with far-reaching implications for AI. Model stitching has emerged as a promising paradigm, where an optimal affine transformation aligns two models to solve a task, with the stitched model serving as a proxy for functional similarity. In this work, we draw inspiration from the knowledge distillation literature and propose Functional Latent Alignment (FuLA) as a novel optimality condition for model stitching. We revisit previously explored functional similarity testbeds and introduce a new one, based on which FuLA emerges as an overall more reliable method of functional similarity. Specifically, our experiments in (a) adversarial training, (b) shortcut training and, (c) cross-layer stitching, reveal that FuLA is less prone to artifacts tied to training on task cues while achieving non-trivial alignments that are missed by stitch-level matching.
nan
Article 1308
Title@2025-05-26 (1): GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Title: GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models | GUARD: Rollenspiel zur Generierung von Jailbreakings in natürlicher Sprache zur Prüfung der Einhaltung der Leitlinie für große Sprachmodelle | GUARD: 利用《大语言模式遵守试验准则准则》创造以自然语言破门破门 2402.03299v5 |
Authors: Haibo Jin, Ruoxi Chen, Peiyan Zhang, Andy Zhou, Yang Zhang, Haohan Wang
The discovery of “jailbreaks” to bypass safety filters of Large Language Models (LLMs) and harmful responses have encouraged the community to implement safety measures. One major safety measure is to proactively test the LLMs with jailbreaks prior to the release. Therefore, such testing will require a method that can generate jailbreaks massively and efficiently. In this paper, we follow a novel yet intuitive strategy to generate jailbreaks in the style of the human generation. We propose a role-playing system that assigns four different roles to the user LLMs to collaborate on new jailbreaks. Furthermore, we collect existing jailbreaks and split them into different independent characteristics using clustering frequency and semantic patterns sentence by sentence. We organize these characteristics into a knowledge graph, making them more accessible and easier to retrieve. Our system of different roles will leverage this knowledge graph to generate new jailbreaks, which have proved effective in inducing LLMs to generate unethical or guideline-violating responses. In addition, we also pioneer a setting in our system that will automatically follow the government-issued guidelines to generate jailbreaks to test whether LLMs follow the guidelines accordingly. We refer to our system as GUARD (Guideline Upholding through Adaptive Role-play Diagnostics). We have empirically validated the effectiveness of GUARD on three cutting-edge open-sourced LLMs (Vicuna-13B, LongChat-7B, and Llama-2-7B), as well as a widely-utilized commercial LLM (ChatGPT). Moreover, our work extends to the realm of vision language models (MiniGPT-v2 and Gemini Vision Pro), showcasing GUARD’s versatility and contributing valuable insights for the development of safer, more reliable LLM-based applications across diverse modalities.
nan
Article 1309
Title@2025-05-26 (1): Error Optimization: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks
Title: Error Optimization: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks | Fehler-Optimierung: Überwindung exponentieller Signaldekay in tiefen vorausschauenden Codierungsnetzwerken | 错误 优化 : 克服深预报编码网络中的指数信号衰减 2505.20137v1 |
Authors: Cédric Goemaere, Gaspard Oliviers, Rafal Bogacz, Thomas Demeester
Predictive Coding (PC) offers a biologically plausible alternative to backpropagation for neural network training, yet struggles with deeper architectures. This paper identifies the root cause: an inherent signal decay problem where gradients attenuate exponentially with depth, becoming computationally negligible due to numerical precision constraints. To address this fundamental limitation, we introduce Error Optimization (EO), a novel reparameterization that preserves PC’s theoretical properties while eliminating signal decay. By optimizing over prediction errors rather than states, EO enables signals to reach all layers simultaneously and without attenuation, converging orders of magnitude faster than standard PC. Experiments across multiple architectures and datasets demonstrate that EO matches backpropagation’s performance even for deeper models where conventional PC struggles. Besides practical improvements, our work provides theoretical insight into PC dynamics and establishes a foundation for scaling biologically-inspired learning to deeper architectures on digital hardware and beyond.
nan
Article 1310
Title@2025-05-26 (1): P$^2$ Law: Scaling Law for Post-Training After Model Pruning
Title: P$^2$ Law: Scaling Law for Post-Training After Model Pruning | P$^2$ Gesetz: Skalierungsgesetz für Post-Training nach Modellprüfung | P$2美元 法律:示范 “ 谨慎 “ 后培训后培训后扩大法 2411.10272v3 |
Authors: Xiaodong Chen, Yuxuan Hu, Xiaokang Zhang, Yanling Wang, Cuiping Li, Hong Chen, Jing Zhang
Pruning has become a widely adopted technique for reducing the hardware requirements of large language models (LLMs). To recover model performance after pruning, post-training is commonly employed to mitigate the resulting performance degradation. While post-training benefits from larger datasets, once the dataset size is already substantial, increasing the training data provides only limited performance gains. To balance post-training cost and model performance, it is necessary to explore the optimal amount of post-training data.Through extensive experiments on the Llama-3 and Qwen-2.5 series models, pruned using various common pruning methods, we uncover the scaling \textbf{Law} for \textbf{P}ost-training after model \textbf{P}runing, referred to as the P$^2$ Law.This law identifies four key factors for predicting the pruned model’s post-training loss: the model size before pruning, the number of post-training tokens, the pruning rate, and the model’s loss before pruning. Moreover, P$^2$ Law can generalize to larger dataset sizes, larger model sizes, and higher pruning rates, offering valuable insights for the post-training of pruned LLMs.
nan
Article 1311
Title@2025-05-26 (1): AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings
Title: AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings | AweDist: Aufmerksamkeitsbewusste Einbettung Destillation für neue Eingabe-Token-Einbettungen | AweDist: 新的输入式嵌入式嵌入器的注意嵌入蒸馏 2505.20133v1 |
Authors: Konstantin Dobler, Desmond Elliott, Gerard de Melo
Current language models rely on static vocabularies determined at pretraining time, which can lead to decreased performance and increased computational cost for domains underrepresented in the original vocabulary. New tokens can be added to solve this problem, when coupled with a good initialization for their new embeddings. However, existing embedding initialization methods either require expensive further training or pretraining of additional modules. In this paper, we propose AweDist and show that by distilling representations obtained using the original tokenization, we can quickly learn high-quality input embeddings for new tokens. Experimental results with a wide range of open-weight models show that AweDist is able to outperform even strong baselines.
nan
Article 1312
Title@2025-05-26 (1): InfoBridge: Mutual Information estimation via Bridge Matching
Title: InfoBridge: Mutual Information estimation via Bridge Matching | InfoBridge: Gegenseitige Informationsschätzung über Bridge Matching | InfoBridge:通过桥梁匹配进行相互信息估计 2502.01383v2 |
Authors: Sergei Kholkin, Ivan Butakov, Evgeny Burnaev, Nikita Gushchin, Alexander Korotin
Diffusion bridge models have recently become a powerful tool in the field of generative modeling. In this work, we leverage their power to address another important problem in machine learning and information theory, the estimation of the mutual information (MI) between two random variables. We show that by using the theory of diffusion bridges, one can construct an unbiased estimator for data posing difficulties for conventional MI estimators. We showcase the performance of our estimator on two standard MI estimation benchmarks, i.e., low-dimensional and image-based, and on real-world data, i.e., protein language model embeddings.
nan
Article 1313
Title@2025-05-26 (1): Outcome-based Reinforcement Learning to Predict the Future
Title: Outcome-based Reinforcement Learning to Predict the Future | Ergebnisbasiertes Bewehrungslernen zur Vorhersage der Zukunft | 基于成果的强化学习,以预测未来 2505.17989v2 |
Authors: Benjamin Turtel, Danny Franklin, Kris Skotheim, Luke Hewitt, Philipp Schoenegger
Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale accuracy and surpass it in calibration and hypothetical prediction market betting by adapting two leading algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, to the forecasting setting. Our adaptations remove per-question variance scaling in GRPO, apply baseline-subtracted advantages in ReMax, hydrate training with 100k temporally consistent synthetic questions, and introduce lightweight guard-rails that penalise gibberish, non-English responses and missing rationales, enabling a single stable pass over 110k events. Scaling ReMax to 110k questions and ensembling seven predictions yields a 14B model that matches frontier baseline o1 on accuracy on our holdout set (Brier = 0.193, p = 0.23) while beating it in calibration (ECE = 0.042, p < 0.001). A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037). This demonstrates that refined RLVR methods can convert small-scale LLMs into potentially economically valuable forecasting tools, with implications for scaling this to larger models.
nan
Article 1314
Title@2025-05-26 (1): Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks
Title: Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks | Tensorisierung ist ein leistungsfähiges, aber unerforschtes Werkzeug zur Kompression und Interpretationsfähigkeit neuronaler Netzwerke | 电温是压缩和解释神经网络的强大但探索不足的工具 2505.20132v1 |
Authors: Safa Hamreras, Sukhbinder Singh, Román Orús
Tensorizing a neural network involves reshaping some or all of its dense weight matrices into higher-order tensors and approximating them using low-rank tensor network decompositions. This technique has shown promise as a model compression strategy for large-scale neural networks. However, despite encouraging empirical results, tensorized neural networks (TNNs) remain underutilized in mainstream deep learning. In this position paper, we offer a perspective on both the potential and current limitations of TNNs. We argue that TNNs represent a powerful yet underexplored framework for deep learning–one that deserves greater attention from both engineering and theoretical communities. Beyond compression, we highlight the value of TNNs as a flexible class of architectures with distinctive scaling properties and increased interpretability. A central feature of TNNs is the presence of bond indices, which introduce new latent spaces not found in conventional networks. These internal representations may provide deeper insight into the evolution of features across layers, potentially advancing the goals of mechanistic interpretability. We conclude by outlining several key research directions aimed at overcoming the practical barriers to scaling and adopting TNNs in modern deep learning workflows.
nan
Article 1315
Title@2025-05-26 (1): MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
Title: MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning | MolEditRL: Strukturschonende molekulare Bearbeitung durch diskretes Diffusions- und Verstärkungslernen | MoldEditRL:通过分解分解和扩散及强化学习保持结构的分子编辑 2505.20131v1 |
Authors: Yuanxin Zhuang, Dazhong Shen, Ying Sun
Molecular editing aims to modify a given molecule to optimize desired chemical properties while preserving structural similarity. However, current approaches typically rely on string-based or continuous representations, which fail to adequately capture the discrete, graph-structured nature of molecules, resulting in limited structural fidelity and poor controllability. In this paper, we propose MolEditRL, a molecular editing framework that explicitly integrates structural constraints with precise property optimization. Specifically, MolEditRL consists of two stages: (1) a discrete graph diffusion model pretrained to reconstruct target molecules conditioned on source structures and natural language instructions; (2) an editing-aware reinforcement learning fine-tuning stage that further enhances property alignment and structural preservation by explicitly optimizing editing decisions under graph constraints. For comprehensive evaluation, we construct MolEdit-Instruct, the largest and most property-rich molecular editing dataset, comprising 3 million diverse examples spanning single- and multi-property tasks across 10 chemical attributes. Experimental results demonstrate that MolEditRL significantly outperforms state-of-the-art methods in both property optimization accuracy and structural fidelity, achieving a 74\% improvement in editing success rate while using 98\% fewer parameters.
nan
Article 1316
Title@2025-05-26 (1): Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach
Title: Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach | Balance zwischen Interferenz und Korrelation in räumlichen Experimentaldesigns: Ein ursächlicher Graphenschnitt-Ansatz | 空间实验设计中平衡干扰和关联:因果图表切割法 2505.20130v1 |
Authors: Zhu Jin, Li Jingyi, Zhou Hongyi, Lin Yinan, Lin Zhenhua, Shi Chengchun
This paper focuses on the design of spatial experiments to optimize the amount of information derived from the experimental data and enhance the accuracy of the resulting causal effect estimator. We propose a surrogate function for the mean squared error (MSE) of the estimator, which facilitates the use of classical graph cut algorithms to learn the optimal design. Our proposal offers three key advances: (1) it accommodates moderate to large spatial interference effects; (2) it adapts to different spatial covariance functions; (3) it is computationally efficient. Theoretical results and numerical experiments based on synthetic environments and a dispatch simulator that models a city-scale ridesharing market, further validate the effectiveness of our design. A python implementation of our method is available at https://github.com/Mamba413/CausalGraphCut.
nan
Article 1317
Title@2025-05-26 (1): Uncertainty Quantification for LLM-Based Survey Simulations
Title: Uncertainty Quantification for LLM-Based Survey Simulations | Ungewissheitsquantifizierung für LLM-basierte Umfragesimulationen | 以LLM为基础的LLM调查模拟器的不确定性定量 2502.17773v3 |
Authors: Chengpiao Huang, Yuhang Wu, Kaizheng Wang
We investigate the use of large language models (LLMs) to simulate human responses to survey questions, and perform uncertainty quantification to gain reliable insights. Our approach converts imperfect LLM-simulated responses into confidence sets for population parameters of human responses, addressing the distribution shift between the simulated and real populations. A key innovation lies in determining the optimal number of simulated responses: too many produce overly narrow confidence sets with poor coverage, while too few yield excessively loose estimates. To resolve this, our method adaptively selects the simulation sample size, ensuring valid average-case coverage guarantees. It is broadly applicable to any LLM, irrespective of its fidelity, and any procedure for constructing confidence sets. Additionally, the selected sample size quantifies the degree of misalignment between the LLM and the target human population. We illustrate our method on real datasets and LLMs.
nan
Article 1318
Title@2025-05-26 (1): From Tables to Time: How TabPFN-v2 Outperforms Specialized Time Series Forecasting Models
Title: From Tables to Time: How TabPFN-v2 Outperforms Specialized Time Series Forecasting Models | Von Tabellen zur Zeit: Wie TabPFN-v2 Modelle der speziellen Zeitreihenvorhersage übertrifft | 从表格到时间: TabPFN-v2 如何表现超过专门时间序列预测模型 2501.02945v3 |
Authors: Shi Bin Hoo, Samuel Müller, David Salinas, Frank Hutter
Foundation models have become increasingly popular for forecasting due to their ability to provide predictions without requiring a lot of training data. In this work, we demonstrate how TabPFN-v2, a general tabular foundation model, can be effectively applied to time series forecasting. We introduce TabPFN-TS, a simple method that combines TabPFN-v2 with lightweight feature engineering to enable both point and probabilistic forecasting. Despite its simplicity and compact size (11M parameters), TabPFN-TS achieves top rank on the public GIFT-Eval leaderboard in both forecasting tasks. Through ablation studies, we investigate factors contributing to this surprising effectiveness, especially considering TabPFN-v2 was pretrained solely on synthetic tabular data with no exposure to time series. Our results highlights the potential of tabular foundation models like TabPFN-v2 as a valuable new approach for time series forecasting. Our implementation is available at https://github.com/PriorLabs/tabpfn-time-series.
nan
Article 1319
Title@2025-05-26 (1): Understanding Generalization in Diffusion Models via Probability Flow Distance
Title: Understanding Generalization in Diffusion Models via Probability Flow Distance | Verallgemeinerung in Diffusionsmodellen über Wahrscheinlichkeitsflussentfernung verstehen | 通过概率流动远距离理解扩散模型的通用化 2505.20123v1 |
Authors: Huijie Zhang, Zijian Huang, Siyi Chen, Jinfan Zhou, Zekai Zhang, Peng Wang, Qing Qu
Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality samples that generalize beyond the training data. However, evaluating this generalization remains challenging: theoretical metrics are often impractical for high-dimensional data, while no practical metrics rigorously measure generalization. In this work, we bridge this gap by introducing probability flow distance ($\texttt{PFD}$), a theoretically grounded and computationally efficient metric to measure distributional generalization. Specifically, $\texttt{PFD}$ quantifies the distance between distributions by comparing their noise-to-data mappings induced by the probability flow ODE. Moreover, by using $\texttt{PFD}$ under a teacher-student evaluation protocol, we empirically uncover several key generalization behaviors in diffusion models, including: (1) scaling behavior from memorization to generalization, (2) early learning and double descent training dynamics, and (3) bias-variance decomposition. Beyond these insights, our work lays a foundation for future empirical and theoretical studies on generalization in diffusion models.
nan
Article 1320
Title@2025-05-26 (1): Likelihood-Ratio Regularized Quantile Regression: Adapting Conformal Prediction to High-Dimensional Covariate Shifts
Title: Likelihood-Ratio Regularized Quantile Regression: Adapting Conformal Prediction to High-Dimensional Covariate Shifts | Likelihood-Ratio Regularized Quantile Regression: Anpassung der konformen Vorhersage an hochdimensionale Kovariate Verschiebungen | 常规量化递减:调整对高多元共变变化的正规预测 2502.13030v2 |
Authors: Sunay Joshi, Shayan Kiyani, George Pappas, Edgar Dobriban, Hamed Hassani
We consider the problem of conformal prediction under covariate shift. Given labeled data from a source domain and unlabeled data from a covariate shifted target domain, we seek to construct prediction sets with valid marginal coverage in the target domain. Most existing methods require estimating the unknown likelihood ratio function, which can be prohibitive for high-dimensional data such as images. To address this challenge, we introduce the likelihood ratio regularized quantile regression (LR-QR) algorithm, which combines the pinball loss with a novel choice of regularization in order to construct a threshold function without directly estimating the unknown likelihood ratio. We show that the LR-QR method has coverage at the desired level in the target domain, up to a small error term that we can control. Our proofs draw on a novel analysis of coverage via stability bounds from learning theory. Our experiments demonstrate that the LR-QR algorithm outperforms existing methods on high-dimensional prediction tasks, including a regression task for the Communities and Crime dataset, an image classification task from the WILDS repository, and an LLM question-answering task on the MMLU benchmark.
nan
Article 1321
Title@2025-05-26 (1): Algorithmic Control Improves Residential Building Energy and EV Management when PV Capacity is High but Battery Capacity is Low
Title: Algorithmic Control Improves Residential Building Energy and EV Management when PV Capacity is High but Battery Capacity is Low | Algorithmische Steuerung verbessert Wohngebäude Energie-und EV-Management, wenn PV-Kapazität ist hoch, aber Batterie-Kapazität ist gering | 当光电池容量高但电池容量低时,控制电量控制改进住宅建筑的能源和EV管理,改善住宅建筑的能源和EV管理 2505.20377v1 |
Authors: Lennart Ullner, Alona Zharova, Felix Creutzig
Efficient energy management in prosumer households is key to alleviating grid stress in an energy transition marked by electric vehicles (EV), renewable energies and battery storage. However, it is unclear how households optimize prosumer EV charging. Here we study real-world data from 90 households on fixed-rate electricity tariffs in German-speaking countries to investigate the potential of Deep Reinforcement Learning (DRL) and other control approaches (Rule-Based, Model Predictive Control) to manage the dynamic and uncertain environment of Home Energy Management (HEM) and optimize household charging patterns. The DRL agent efficiently aligns charging of EV and battery storage with photovoltaic (PV) surplus. We find that frequent EV charging transactions, early EV connections and PV surplus increase optimization potential. A detailed analysis of nine households (1 hour resolution, 1 year) demonstrates that high battery capacity facilitates self optimization; in this case further algorithmic control shows little value. In cases with relatively low battery capacity, algorithmic control with DRL improves energy management and cost savings by a relevant margin. This result is further corroborated by our simulation of a synthetic household. We conclude that prosumer households with optimization potential would profit from DRL, thus benefiting also the full electricity system and its decarbonization.
nan
Article 1322
Title@2025-05-26 (1): Generative diffusion for perceptron problems: statistical physics analysis and efficient algorithms
Title: Generative diffusion for perceptron problems: statistical physics analysis and efficient algorithms | Generative Diffusion für Perceptronprobleme: statistische Physikanalyse und effiziente Algorithmen | 生成感官问题扩散:统计物理分析和有效算法 2502.16292v2 |
Authors: Elizaveta Demyanenko, Davide Straziota, Carlo Baldassi, Carlo Lucibello
We consider random instances of non-convex perceptron problems in the high-dimensional limit of a large number of examples $M$ and weights $N$, with finite load $\alpha = M/N$. We develop a formalism based on replica theory to predict the fundamental limits of efficiently sampling the solution space using generative diffusion algorithms, conjectured to be saturated when the score function is provided by Approximate Message Passing. For the spherical perceptron with negative margin $\kappa$, we find that the uniform distribution over solutions can be efficiently sampled in most of the Replica Symmetric region of the $\alpha-\kappa$ plane. In contrast, for binary weights, sampling from the uniform distribution remains intractable. A theoretical analysis of this obstruction leads us to identify a potential $U(s) = -\log(s)$, under which the corresponding tilted distribution becomes efficiently samplable via diffusion. Moreover, we show numerically that an annealing procedure over the shape of this potential yields a fast and robust Markov Chain Monte Carlo algorithm for sampling the solution space of the binary perceptron.
nan
Article 1323
Title@2025-05-26 (1): Proxy-Free GFlowNet
Title: Proxy-Free GFlowNet | Proxy-freies GFlowNet | 无代理的GFlowNet 2505.20110v1 |
Authors: Ruishuo Chen, Xun Wang, Rui Hu, Zhuoran Li, Longbo Huang
Generative Flow Networks (GFlowNets) are a promising class of generative models designed to sample diverse, high-reward structures by modeling distributions over compositional objects. In many real-world applications, obtaining the reward function for such objects is expensive, time-consuming, or requires human input, making it necessary to train GFlowNets from historical datasets. Most existing methods adopt a model-based approach, learning a proxy model from the dataset to approximate the reward function. However, this strategy inherently ties the quality of the learned policy to the accuracy of the proxy, introducing additional complexity and uncertainty into the training process. To overcome these limitations, we propose \textbf{Trajectory-Distilled GFlowNet (TD-GFN)}, a \emph{proxy-free} training framework that eliminates the need for out-of-dataset reward queries. Our method is motivated by the key observation that different edges in the associated directed acyclic graph (DAG) contribute unequally to effective policy learning. TD-GFN leverages inverse reinforcement learning to estimate edge-level rewards from the offline dataset, which are then used to ingeniously prune the DAG and guide backward trajectory sampling during training. This approach directs the policy toward high-reward regions while reducing the complexity of model fitting. Empirical results across multiple tasks show that TD-GFN trains both efficiently and reliably, significantly outperforming existing baselines in convergence speed and sample quality.
nan
Article 1324
Title@2025-05-26 (1): Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning
Title: Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning | Verfeinerung von Text-zu-Multiview-Diffusion durch Verstärkungslernen | 通过强化学习改进微小的中文本到多视图传播 2505.20107v1 |
Authors: Ziyi Zhang, Li Shen, Deheng Ye, Yong Luo, Huangxuan Zhao, Lefei Zhang
Text-to-multiview (T2MV) generation, which produces coherent multiview images from a single text prompt, remains computationally intensive, while accelerated T2MV methods using few-step diffusion models often sacrifice image fidelity and view consistency. To address this, we propose a novel reinforcement learning (RL) finetuning framework tailored for few-step T2MV diffusion models to jointly optimize per-view fidelity and cross-view consistency. Specifically, we first reformulate T2MV denoising across all views as a single unified Markov decision process, enabling multiview-aware policy optimization driven by a joint-view reward objective. Next, we introduce ZMV-Sampling, a test-time T2MV sampling technique that adds an inversion-denoising pass to reinforce both viewpoint and text conditioning, resulting in improved T2MV generation at the cost of inference time. To internalize its performance gains into the base sampling policy, we develop MV-ZigAL, a novel policy optimization strategy that uses reward advantages of ZMV-Sampling over standard sampling as learning signals for policy updates. Finally, noting that the joint-view reward objective under-optimizes per-view fidelity but naively optimizing single-view metrics neglects cross-view alignment, we reframe RL finetuning for T2MV diffusion models as a constrained optimization problem that maximizes per-view fidelity subject to an explicit joint-view constraint, thereby enabling more efficient and balanced policy updates. By integrating this constrained optimization paradigm with MV-ZigAL, we establish our complete RL finetuning framework, referred to as MVC-ZigAL, which effectively refines the few-step T2MV diffusion baseline in both fidelity and consistency while preserving its few-step efficiency.
nan
Article 1325
Title@2025-05-26 (1): Preference-Based Gradient Estimation for ML-Guided Approximate Combinatorial Optimization
Title: Preference-Based Gradient Estimation for ML-Guided Approximate Combinatorial Optimization | Präferenzbasierte Gradientenschätzung für ML-geführte annähernde Kombinator-Optimierung | ML- Guided 近似组合优化的基于优惠的渐进式测算 2502.19377v2 |
Authors: Arman Mielke, Uwe Bauknecht, Thilo Strauss, Mathias Niepert
Combinatorial optimization (CO) problems arise across a broad spectrum of domains, including medicine, logistics, and manufacturing. While exact solutions are often computationally infeasible, many practical applications require high-quality solutions within a given time budget. To address this, we propose a learning-based approach that enhances existing non-learned approximation algorithms for CO. Specifically, we parameterize these approximation algorithms and train graph neural networks (GNNs) to predict parameter values that yield near-optimal solutions. Our method is trained end-to-end in a self-supervised fashion, using a novel gradient estimation scheme that treats the approximation algorithm as a black box. This approach combines the strengths of learning and traditional algorithms: the GNN learns from data to guide the algorithm toward better solutions, while the approximation algorithm ensures feasibility. We validate our method on two well-known combinatorial optimization problems: the travelling salesman problem (TSP) and the minimum k-cut problem. Our results demonstrate that the proposed approach is competitive with state-of-the-art learned CO solvers.
nan
Article 1326
Title@2025-05-26 (1): Spurious Privacy Leakage in Neural Networks
Title: Spurious Privacy Leakage in Neural Networks | Spurious Privacy Leakage in neuralen Netzwerken | 神经网络中的净隐私渗漏 2505.20095v1 |
Authors: Chenxiang Zhang, Jun Pang, Sjouke Mauw
Neural networks are vulnerable to privacy attacks aimed at stealing sensitive data. The risks can be amplified in a real-world scenario, particularly when models are trained on limited and biased data. In this work, we investigate the impact of spurious correlation bias on privacy vulnerability. We introduce \emph{spurious privacy leakage}, a phenomenon where spurious groups are significantly more vulnerable to privacy attacks than non-spurious groups. We further show that group privacy disparity increases in tasks with simpler objectives (e.g. fewer classes) due to the persistence of spurious features. Surprisingly, we find that reducing spurious correlation using spurious robust methods does not mitigate spurious privacy leakage. This leads us to introduce a perspective on privacy disparity based on memorization, where mitigating spurious correlation does not mitigate the memorization of spurious data, and therefore, neither the privacy level. Lastly, we compare the privacy of different model architectures trained with spurious data, demonstrating that, contrary to prior works, architectural choice can affect privacy outcomes.
nan
Article 1327
Title@2025-05-26 (1): A fast sound power prediction tool for genset noise using machine learning
Title: A fast sound power prediction tool for genset noise using machine learning | Ein schnelles Sound-Power-Prognose-Tool für Genset-Rausch mit maschinellem Lernen | 利用机器学习来快速可靠电源预测工具,用于使用机器学习的genseet噪音 2505.20079v1 |
Authors: Saurabh Pargal, Abhijit A. Sane
This paper investigates the application of machine learning regression algorithms Kernel Ridge Regression (KRR), Huber Regressor (HR), and Gaussian Process Regression (GPR) for predicting sound power levels of gensets, offering significant value for marketing and sales teams during the early bidding process. When engine sizes and genset enclosure dimensions are tentative, and measured noise data is unavailable, these algorithms enable reliable noise level estimation for unbuilt gensets. The study utilizes high fidelity datasets from over 100 experiments conducted at Cummins Acoustics Technology Center (ATC) in a hemi-anechoic chamber, adhering to ISO 3744 standards. By using readily available information from the bidding and initial design stages, KRR predicts sound power with an average accuracy of within 5 dBA. While HR and GPR show slightly higher prediction errors, all models effectively capture the overall noise trends across various genset configurations. These findings present a promising method for early-stage noise estimation in genset design.
nan
Article 1328
Title@2025-05-26 (1): Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior
Title: Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior | Grokking ExPLAIND: Vereinheitlichung von Modell, Daten und Trainingszuweisung zum Studieren von Modellverhalten | Grokking ExPLAIND: 用于研究模型行为的统一模型、数据和培训归属 2505.20076v1 |
Authors: Florian Eichin, Yupei Du, Philipp Mondorf, Barbara Plank, Michael A. Hedderich
Post-hoc interpretability methods typically attribute a model’s behavior to its components, data, or training trajectory in isolation. This leads to explanations that lack a unified view and may miss key interactions. While combining existing methods or applying them at different training stages offers broader insights, these approaches usually lack theoretical support. In this work, we present ExPLAIND, a unified framework that integrates all three perspectives. First, we generalize recent work on gradient path kernels, which reformulate models trained by gradient descent as a kernel machine, to more realistic training settings. Empirically, we find that both a CNN and a Transformer model are replicated accurately by this reformulation. Second, we derive novel parameter- and step-wise influence scores from the kernel feature maps. We show their effectiveness in parameter pruning that is comparable to existing methods, reinforcing their value for model component attribution. Finally, jointly interpreting model components and data over the training process, we leverage ExPLAIND to analyze a Transformer that exhibits Grokking. Among other things, our findings support previously proposed stages of Grokking, while refining the final phase as one of alignment of input embeddings and final layers around a representation pipeline learned after the memorization phase. Overall, ExPLAIND provides a theoretically grounded, unified framework to interpret model behavior and training dynamics.
nan
Article 1329
Title@2025-05-26 (1): An Out-Of-Distribution Membership Inference Attack Approach for Cross-Domain Graph Attacks
Title: An Out-Of-Distribution Membership Inference Attack Approach for Cross-Domain Graph Attacks | Ein Out-Of-Distribution-Mitgliedschaft Inferenz Angriff Ansatz für Cross-Domain Graph Attacks | 跨领域石块袭击的批外分配成员推推攻击方法 2505.20074v1 |
Authors: Jinyan Wang, Liu Yang, Yuecen Wei, Jiaxuan Si, Chenhao Guo, Qingyun Sun, Xianxian Li, Xingcheng Fu
Graph Neural Network-based methods face privacy leakage risks due to the introduction of topological structures about the targets, which allows attackers to bypass the target’s prior knowledge of the sensitive attributes and realize membership inference attacks (MIA) by observing and analyzing the topology distribution. As privacy concerns grow, the assumption of MIA, which presumes that attackers can obtain an auxiliary dataset with the same distribution, is increasingly deviating from reality. In this paper, we categorize the distribution diversity issue in real-world MIA scenarios as an Out-Of-Distribution (OOD) problem, and propose a novel Graph OOD Membership Inference Attack (GOOD-MIA) to achieve cross-domain graph attacks. Specifically, we construct shadow subgraphs with distributions from different domains to model the diversity of real-world data. We then explore the stable node representations that remain unchanged under external influences and consider eliminating redundant information from confounding environments and extracting task-relevant key information to more clearly distinguish between the characteristics of training data and unseen data. This OOD-based design makes cross-domain graph attacks possible. Finally, we perform risk extrapolation to optimize the attack’s domain adaptability during attack inference to generalize the attack to other domains. Experimental results demonstrate that GOOD-MIA achieves superior attack performance in datasets designed for multiple domains.
nan
Article 1330
Title@2025-05-26 (1): SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Title: SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety | SafeDPO: Ein einfacher Ansatz zur direkten Preference-Optimierung mit erhöhter Sicherheit | SafeDPO: 以强化安全方式直接优化优惠的简单办法 2505.20065v1 |
Authors: Geon-Hyeong Kim, Youngsoo Jang, Yu Jin Kim, Byoungjip Kim, Honglak Lee, Kyunghoon Bae, Moontae Lee
As Large Language Models (LLMs) continue to advance and find applications across a growing number of fields, ensuring the safety of LLMs has become increasingly critical. To address safety concerns, recent studies have proposed integrating safety constraints into Reinforcement Learning from Human Feedback (RLHF). However, these approaches tend to be complex, as they encompass complicated procedures in RLHF along with additional steps required by the safety constraints. Inspired by Direct Preference Optimization (DPO), we introduce a new algorithm called SafeDPO, which is designed to directly optimize the safety alignment objective in a single stage of policy learning, without requiring relaxation. SafeDPO introduces only one additional hyperparameter to further enhance safety and requires only minor modifications to standard DPO. As a result, it eliminates the need to fit separate reward and cost models or to sample from the language model during fine-tuning, while still enhancing the safety of LLMs. Finally, we demonstrate that SafeDPO achieves competitive performance compared to state-of-the-art safety alignment algorithms, both in terms of aligning with human preferences and improving safety.
nan
Article 1331
Title@2025-05-26 (1): SAEs Are Good for Steering – If You Select the Right Features
Title: SAEs Are Good for Steering – If You Select the Right Features | SAEs sind gut für das Lenken – wenn Sie die richtigen Funktionen auswählen | SAEs 有利于指导 – – 如果您选择了正确的特性 2505.20063v1 |
Authors: Dana Arad, Aaron Mueller, Yonatan Belinkov
Sparse Autoencoders (SAEs) have been proposed as an unsupervised approach to learn a decomposition of a model’s latent space. This enables useful applications such as steering - influencing the output of a model towards a desired concept - without requiring labeled data. Current methods identify SAE features to steer by analyzing the input tokens that activate them. However, recent work has highlighted that activations alone do not fully describe the effect of a feature on the model’s output. In this work, we draw a distinction between two types of features: input features, which mainly capture patterns in the model’s input, and output features, which have a human-understandable effect on the model’s output. We propose input and output scores to characterize and locate these types of features, and show that high values for both scores rarely co-occur in the same features. These findings have practical implications: after filtering out features with low output scores, we obtain 2-3x improvements when steering with SAEs, making them competitive with supervised methods.
nan
Article 1332
Title@2025-05-26 (1): Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
Title: Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting | Time-VLM: Erforschung multimodaler Vision-Sprachenmodelle für Augmented Time Series Forecasting | 时间-VLM:探索扩大时间序列预测的多模式愿景-语言模型 2502.04395v2 |
Authors: Siru Zhong, Weilin Ruan, Ming Jin, Huan Li, Qingsong Wen, Yuxuan Liang
Recent advancements in time series forecasting have explored augmenting models with text or vision modalities to improve accuracy. While text provides contextual understanding, it often lacks fine-grained temporal details. Conversely, vision captures intricate temporal patterns but lacks semantic context, limiting the complementary potential of these modalities. To address this, we propose \method, a novel multimodal framework that leverages pre-trained Vision-Language Models (VLMs) to bridge temporal, visual, and textual modalities for enhanced forecasting. Our framework comprises three key components: (1) a Retrieval-Augmented Learner, which extracts enriched temporal features through memory bank interactions; (2) a Vision-Augmented Learner, which encodes time series as informative images; and (3) a Text-Augmented Learner, which generates contextual textual descriptions. These components collaborate with frozen pre-trained VLMs to produce multimodal embeddings, which are then fused with temporal features for final prediction. Extensive experiments demonstrate that Time-VLM achieves superior performance, particularly in few-shot and zero-shot scenarios, thereby establishing a new direction for multimodal time series forecasting. Code is available at https://github.com/CityMind-Lab/ICML25-TimeVLM.
nan
Article 1333
Title@2025-05-26 (1): Sable: a Performant, Efficient and Scalable Sequence Model for MARL
Title: Sable: a Performant, Efficient and Scalable Sequence Model for MARL | Sable: ein leistungsfähiges, effizientes und skalierbares Sequenzmodell für MARL | 电缆:MARL的性能、高效和可缩放序列模型 2410.01706v5 |
Authors: Omayma Mahjoub, Sasha Abramowitz, Ruan de Kock, Wiem Khlifi, Simon du Toit, Jemma Daniel, Louay Ben Nessir, Louise Beyers, Claude Formanek, Liam Clark, Arnu Pretorius
As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modeling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable’s performance gains and confirm its efficient computational memory usage.
nan
Article 1334
Title@2025-05-26 (1): Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations
Title: Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations | Ankh3: Multi-Task Pretraining mit Sequenz Denoisieren und Vollendung verbessert Proteindarstellungen | Ankh3: 具有序列取消和完成的多任务预先培训,加强蛋白质代表制 2505.20052v1 |
Authors: Hazem Alsamkary, Mohamed Elshaffei, Mohamed Elkerdawy, Ahmed Elnaggar
Protein language models (PLMs) have emerged as powerful tools to detect complex patterns of protein sequences. However, the capability of PLMs to fully capture information on protein sequences might be limited by focusing on single pre-training tasks. Although adding data modalities or supervised objectives can improve the performance of PLMs, pre-training often remains focused on denoising corrupted sequences. To push the boundaries of PLMs, our research investigated a multi-task pre-training strategy. We developed Ankh3, a model jointly optimized on two objectives: masked language modeling with multiple masking probabilities and protein sequence completion relying only on protein sequences as input. This multi-task pre-training demonstrated that PLMs can learn richer and more generalizable representations solely from protein sequences. The results demonstrated improved performance in downstream tasks, such as secondary structure prediction, fluorescence, GB1 fitness, and contact prediction. The integration of multiple tasks gave the model a more comprehensive understanding of protein properties, leading to more robust and accurate predictions.
nan
Article 1335
Title@2025-05-26 (1): Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits
Title: Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits | Catoni-Style Change Point Detection für Reue Minimierung in nicht-stationären schwer-gefährdeten Banditen | 用于在非连续重型重航匪徒中最遗憾最小化的 卡特托尼- 轮式变速点探测 2505.20051v1 |
Authors: Gianmarco Genalti, Sujay Bhatt, Nicola Gatti, Alberto Maria Metelli
Regret minimization in stochastic non-stationary bandits gained popularity over the last decade, as it can model a broad class of real-world problems, from advertising to recommendation systems. Existing literature relies on various assumptions about the reward-generating process, such as Bernoulli or subgaussian rewards. However, in settings such as finance and telecommunications, heavy-tailed distributions naturally arise. In this work, we tackle the heavy-tailed piecewise-stationary bandit problem. Heavy-tailed bandits, introduced by Bubeck et al., 2013, operate on the minimal assumption that the finite absolute centered moments of maximum order $1+\epsilon$ are uniformly bounded by a constant $v<+\infty$, for some $\epsilon \in (0,1]$. We focus on the most popular non-stationary bandit setting, i.e., the piecewise-stationary setting, in which the mean of reward-generating distributions may change at unknown time steps. We provide a novel Catoni-style change-point detection strategy tailored for heavy-tailed distributions that relies on recent advancements in the theory of sequential estimation, which is of independent interest. We introduce Robust-CPD-UCB, which combines this change-point detection strategy with optimistic algorithms for bandits, providing its regret upper bound and an impossibility result on the minimum attainable regret for any policy. Finally, we validate our approach through numerical experiments on synthetic and real-world datasets.
nan
Article 1336
Title@2025-05-26 (1): Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks
Title: Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks | Synthetische Zeitreihenprognosen mit Transformer-Architekturen: Umfangreiche Simulations-Benchmarks | 利用变形建筑结构预测合成时间序列:广泛模拟基准 2505.20048v1 |
Authors: Ali Forootani, Mohammad Khosravi
Time series forecasting plays a critical role in domains such as energy, finance, and healthcare, where accurate predictions inform decision-making under uncertainty. Although Transformer-based models have demonstrated success in sequential modeling, their adoption for time series remains limited by challenges such as noise sensitivity, long-range dependencies, and a lack of inductive bias for temporal structure. In this work, we present a unified and principled framework for benchmarking three prominent Transformer forecasting architectures-Autoformer, Informer, and Patchtst-each evaluated through three architectural variants: Minimal, Standard, and Full, representing increasing levels of complexity and modeling capacity. We conduct over 1500 controlled experiments on a suite of ten synthetic signals, spanning five patch lengths and five forecast horizons under both clean and noisy conditions. Our analysis reveals consistent patterns across model families. To advance this landscape further, we introduce the Koopman-enhanced Transformer framework, Deep Koopformer, which integrates operator-theoretic latent state modeling to improve stability and interpretability. We demonstrate its efficacy on nonlinear and chaotic dynamical systems. Our results highlight Koopman based Transformer as a promising hybrid approach for robust, interpretable, and theoretically grounded time series forecasting in noisy and complex real-world conditions.
nan
Article 1337
Title@2025-05-26 (1): Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy
Title: Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy | Convex-Annäherung von Zwei-Layer-ReLU-Netzwerken für versteckte staatliche differentielle Privatsphäre | 隐藏式国家差异隐私双线雷路网络的连接近似 2407.04884v3 |
Authors: Rob Romijnders, Antti Koskela
The hidden state threat model of differential privacy (DP) assumes that the adversary has access only to the final trained machine learning (ML) model, without seeing intermediate states during training. However, the current privacy analyses under this model are restricted to convex optimization problems, reducing their applicability to multi-layer neural networks, which are essential in modern deep learning applications. Notably, the most successful applications of the hidden state privacy analyses in classification tasks have only been for logistic regression models. We demonstrate that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of 2-layer ReLU networks trained with DP stochastic gradient descent (DP-SGD). This is achieved through a stochastic approximation of a dual formulation of the ReLU minimization problem, resulting in a strongly convex problem. This enables the use of existing hidden state privacy analyses and provides accurate privacy bounds also for the noisy cyclic mini-batch gradient descent (NoisyCGD) method with fixed disjoint mini-batches. Empirical results on benchmark classification tasks demonstrate that NoisyCGD can achieve privacy-utility trade-offs on par with DP-SGD applied to 2-layer ReLU networks.
nan
Article 1338
Title@2025-05-26 (1): Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning
Title: Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning | Kontrolle des neuralen Zusammenbruchs verbessert Out-of-Distribution Detection und Transfer Learning | 控制神经崩溃增强传播外探测和转让学习 2502.10691v2 |
Authors: Md Yousuf Harun, Jhair Gallardo, Christopher Kanan
Out-of-distribution (OOD) detection and OOD generalization are widely studied in Deep Neural Networks (DNNs), yet their relationship remains poorly understood. We empirically show that the degree of Neural Collapse (NC) in a network layer is inversely related with these objectives: stronger NC improves OOD detection but degrades generalization, while weaker NC enhances generalization at the cost of detection. This trade-off suggests that a single feature space cannot simultaneously achieve both tasks. To address this, we develop a theoretical framework linking NC to OOD detection and generalization. We show that entropy regularization mitigates NC to improve generalization, while a fixed Simplex Equiangular Tight Frame (ETF) projector enforces NC for better detection. Based on these insights, we propose a method to control NC at different DNN layers. In experiments, our method excels at both tasks across OOD datasets and DNN architectures.
nan
Article 1339
Title@2025-05-26 (1): Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction
Title: Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction | Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interaktionen Prediction | 超越简单星系:公平评估多沙因蛋白因-蛋白因相互作用预测的PLM结构 2505.20036v1 |
Authors: Hazem Alsamkary, Mohamed Elshaffei, Mohamed Soudy, Sara Ossman, Abdallah Amr, Nehal Adel Abdelsalam, Mohamed Elkerdawy, Ahmed Elnaggar
Protein-protein interactions (PPIs) are fundamental to numerous cellular processes, and their characterization is vital for understanding disease mechanisms and guiding drug discovery. While protein language models (PLMs) have demonstrated remarkable success in predicting protein structure and function, their application to sequence-based PPI binding affinity prediction remains relatively underexplored. This gap is often attributed to the scarcity of high-quality, rigorously refined datasets and the reliance on simple strategies for concatenating protein representations. In this work, we address these limitations. First, we introduce a meticulously curated version of the PPB-Affinity dataset of a total of 8,207 unique protein-protein interaction entries, by resolving annotation inconsistencies and duplicate entries for multi-chain protein interactions. This dataset incorporates a stringent, less than or equal to 30%, sequence identity threshold to ensure robust splitting into training, validation, and test sets, minimizing data leakage. Second, we propose and systematically evaluate four architectures for adapting PLMs to PPI binding affinity prediction: embeddings concatenation (EC), sequences concatenation (SC), hierarchical pooling (HP), and pooled attention addition (PAD). These architectures were assessed using two training methods: full fine-tuning and a lightweight approach employing ConvBERT heads over frozen PLM features. Our comprehensive experiments across multiple leading PLMs (ProtT5, ESM2, Ankh, Ankh2, and ESM3) demonstrated that the HP and PAD architectures consistently outperform conventional concatenation methods, achieving up to 12% increase in terms of Spearman correlation. These results highlight the necessity of sophisticated architectural designs to fully exploit the capabilities of PLMs for nuanced PPI binding affinity prediction.
nan
Article 1340
Title@2025-05-26 (1): TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks
Title: TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks | TeleSparse: Praktische Datenschutz-Bewahrung von Tiefen-Neural-Netzwerken | 远程分离:深海神经网络的实际隐私保护核查 2504.19274v2 |
Authors: Mohammad M Maheri, Hamed Haddadi, Alex Davidson
Verification of the integrity of deep learning inference is crucial for understanding whether a model is being applied correctly. However, such verification typically requires access to model weights and (potentially sensitive or private) training data. So-called Zero-knowledge Succinct Non-Interactive Arguments of Knowledge (ZK-SNARKs) would appear to provide the capability to verify model inference without access to such sensitive data. However, applying ZK-SNARKs to modern neural networks, such as transformers and large vision models, introduces significant computational overhead. We present TeleSparse, a ZK-friendly post-processing mechanisms to produce practical solutions to this problem. TeleSparse tackles two fundamental challenges inherent in applying ZK-SNARKs to modern neural networks: (1) Reducing circuit constraints: Over-parameterized models result in numerous constraints for ZK-SNARK verification, driving up memory and proof generation costs. We address this by applying sparsification to neural network models, enhancing proof efficiency without compromising accuracy or security. (2) Minimizing the size of lookup tables required for non-linear functions, by optimizing activation ranges through neural teleportation, a novel adaptation for narrowing activation functions’ range. TeleSparse reduces prover memory usage by 67% and proof generation time by 46% on the same model, with an accuracy trade-off of approximately 1%. We implement our framework using the Halo2 proving system and demonstrate its effectiveness across multiple architectures (Vision-transformer, ResNet, MobileNet) and datasets (ImageNet,CIFAR-10,CIFAR-100). This work opens new directions for ZK-friendly model design, moving toward scalable, resource-efficient verifiable deep learning.
nan
Article 1341
Title@2025-05-26 (1): ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers
Title: ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | ViTaPEs: Visuotaktile Positionskodierungen für die modulübergreifende Ausrichtung in multimodalen Transformatoren | ViTAPEs:多式变换器中跨模式对齐的变量定位位置编码 2505.20032v1 |
Authors: Fotios Lygerakis, Ozan Özdenizci, Elmar Rückert
Tactile sensing provides local essential information that is complementary to visual perception, such as texture, compliance, and force. Despite recent advances in visuotactile representation learning, challenges remain in fusing these modalities and generalizing across tasks and environments without heavy reliance on pre-trained vision-language models. Moreover, existing methods do not study positional encodings, thereby overlooking the multi-scale spatial reasoning needed to capture fine-grained visuotactile correlations. We introduce ViTaPEs, a transformer-based framework that robustly integrates visual and tactile input data to learn task-agnostic representations for visuotactile perception. Our approach exploits a novel multi-scale positional encoding scheme to capture intra-modal structures, while simultaneously modeling cross-modal cues. Unlike prior work, we provide provable guarantees in visuotactile fusion, showing that our encodings are injective, rigid-motion-equivariant, and information-preserving, validating these properties empirically. Experiments on multiple large-scale real-world datasets show that ViTaPEs not only surpasses state-of-the-art baselines across various recognition tasks but also demonstrates zero-shot generalization to unseen, out-of-domain scenarios. We further demonstrate the transfer-learning strength of ViTaPEs in a robotic grasping task, where it outperforms state-of-the-art baselines in predicting grasp success. Project page: https://sites.google.com/view/vitapes
nan
Article 1342
Title@2025-05-26 (1): Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions
Title: Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions | Mehrere Abstiege im Deep Learning als Folge von Order-Chaos-Übergängen | 作为有秩序的赵国过渡的一个序列的深层学习中的多种族后裔 2505.20030v1 |
Authors: Wenbo Wei, Nicholas Chong Jia Le, Choy Heng Lai, Ling Feng
We observe a novel ‘multiple-descent’ phenomenon during the training process of LSTM, in which the test loss goes through long cycles of up and down trend multiple times after the model is overtrained. By carrying out asymptotic stability analysis of the models, we found that the cycles in test loss are closely associated with the phase transition process between order and chaos, and the local optimal epochs are consistently at the critical transition point between the two phases. More importantly, the global optimal epoch occurs at the first transition from order to chaos, where the ‘width’ of the ‘edge of chaos’ is the widest, allowing the best exploration of better weight configurations for learning.
nan
Article 1343
Title@2025-05-26 (1): Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
Title: Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain) | Korrelation von Instruktions-Tuning (in multimodalen Modellen) mit visionssprachlicher Verarbeitung (im Gehirn) | 与视觉语言处理(大脑中)相交校正(多式联运模式) 2505.20029v1 |
Authors: Subba Reddy Oota, Akshett Jindal, Ishani Mondal, Khushbu Pahwa, Satya Sai Srinath Namburi, Manish Shrivastava, Maneesh Singh, Bapi S. Raju, Manish Gupta
Transformer-based language models, though not explicitly trained to mimic brain recordings, have demonstrated surprising alignment with brain activity. Progress in these models-through increased size, instruction-tuning, and multimodality-has led to better representational alignment with neural data. Recently, a new class of instruction-tuned multimodal LLMs (MLLMs) have emerged, showing remarkable zero-shot capabilities in open-ended multimodal vision tasks. However, it is unknown whether MLLMs, when prompted with natural instructions, lead to better brain alignment and effectively capture instruction-specific representations. To address this, we first investigate brain alignment, i.e., measuring the degree of predictivity of neural visual activity using text output response embeddings from MLLMs as participants engage in watching natural scenes. Experiments with 10 different instructions show that MLLMs exhibit significantly better brain alignment than vision-only models and perform comparably to non-instruction-tuned multimodal models like CLIP. We also find that while these MLLMs are effective at generating high-quality responses suitable to the task-specific instructions, not all instructions are relevant for brain alignment. Further, by varying instructions, we make the MLLMs encode instruction-specific visual concepts related to the input image. This analysis shows that MLLMs effectively capture count-related and recognition-related concepts, demonstrating strong alignment with brain activity. Notably, the majority of the explained variance of the brain encoding models is shared between MLLM embeddings of image captioning and other instructions. These results suggest that enhancing MLLMs’ ability to capture task-specific information could lead to better differentiation between various types of instructions, and thereby improving their precision in predicting brain responses.
nan
Article 1344
Title@2025-05-26 (1): Multi-modal brain encoding models for multi-modal stimuli
Title: Multi-modal brain encoding models for multi-modal stimuli | Multimodale Gehirnkodierungsmodelle für multimodale Reize | 多模式刺激多模式大脑编码模型 2505.20027v1 |
Authors: Subba Reddy Oota, Khushbu Pahwa, Mounika Marreddy, Maneesh Singh, Manish Gupta, Bapi S. Raju
Despite participants engaging in unimodal stimuli, such as watching images or silent videos, recent work has demonstrated that multi-modal Transformer models can predict visual brain activity impressively well, even with incongruent modality representations. This raises the question of how accurately these multi-modal models can predict brain activity when participants are engaged in multi-modal stimuli. As these models grow increasingly popular, their use in studying neural activity provides insights into how our brains respond to such multi-modal naturalistic stimuli, i.e., where it separates and integrates information across modalities through a hierarchy of early sensory regions to higher cognition. We investigate this question by using multiple unimodal and two types of multi-modal models-cross-modal and jointly pretrained-to determine which type of model is more relevant to fMRI brain activity when participants are engaged in watching movies. We observe that both types of multi-modal models show improved alignment in several language and visual regions. This study also helps in identifying which brain regions process unimodal versus multi-modal information. We further investigate the contribution of each modality to multi-modal alignment by carefully removing unimodal features one by one from multi-modal representations, and find that there is additional information beyond the unimodal embeddings that is processed in the visual and language regions. Based on this investigation, we find that while for cross-modal models, their brain alignment is partially attributed to the video modality; for jointly pretrained models, it is partially attributed to both the video and audio modalities. This serves as a strong motivation for the neuroscience community to investigate the interpretability of these models for deepening our understanding of multi-modal information processing in brain.
nan
Article 1345
Title@2025-05-26 (1): Gradient Inversion Transcript: Leveraging Robust Generative Priors to Reconstruct Training Data from Gradient Leakage
Title: Gradient Inversion Transcript: Leveraging Robust Generative Priors to Reconstruct Training Data from Gradient Leakage | Gradient Inversion Transcript: Leveraging Robust Generative Priors to Reconstruct Trainingsdaten von Gradient Leakage | 梯度反转轨迹:从梯度渗漏中重新构建培训数据的杠杆化强力生成前程 2505.20026v1 |
Authors: Xinping Chen, Chen Liu
We propose Gradient Inversion Transcript (GIT), a novel generative approach for reconstructing training data from leaked gradients. GIT employs a generative attack model, whose architecture is tailored to align with the structure of the leaked model based on theoretical analysis. Once trained offline, GIT can be deployed efficiently and only relies on the leaked gradients to reconstruct the input data, rendering it applicable under various distributed learning environments. When used as a prior for other iterative optimization-based methods, GIT not only accelerates convergence but also enhances the overall reconstruction quality. GIT consistently outperforms existing methods across multiple datasets and demonstrates strong robustness under challenging conditions, including inaccurate gradients, data distribution shifts and discrepancies in model parameters.
nan
Article 1346
Title@2025-05-26 (1): Human-Aligned Image Models Improve Visual Decoding from the Brain
Title: Human-Aligned Image Models Improve Visual Decoding from the Brain | Menschlich ausgerichtete Imagemodelle verbessern die visuelle Dekodierung aus dem Gehirn | 人与人之间的图像模型改进大脑的视觉解码 2502.03081v2 |
Authors: Nona Rajabi, Antônio H. Ribeiro, Miguel Vasco, Farzaneh Taleb, Mårten Björkman, Danica Kragic
Decoding visual images from brain activity has significant potential for advancing brain-computer interaction and enhancing the understanding of human perception. Recent approaches align the representation spaces of images and brain activity to enable visual decoding. In this paper, we introduce the use of human-aligned image encoders to map brain signals to images. We hypothesize that these models more effectively capture perceptual attributes associated with the rapid visual stimuli presentations commonly used in visual brain data recording experiments. Our empirical results support this hypothesis, demonstrating that this simple modification improves image retrieval accuracy by up to 21% compared to state-of-the-art methods. Comprehensive experiments confirm consistent performance improvements across diverse EEG architectures, image encoders, alignment methods, participants, and brain imaging modalities
nan
Article 1347
Title@2025-05-26 (1): Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare
Title: Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare | Ontologie- und LLM-basierte Datenharmonisierung für das Federated Learning in Healthcare | 以本体学和LLM为基础的保健方面联邦学习数据统一 2505.20020v1 |
Authors: Natallia Kokash, Lei Wang, Thomas H. Gillespie, Adam Belloum, Paola Grosso, Sara Quinney, Lang Li, Bernard de Bono
The rise of electronic health records (EHRs) has unlocked new opportunities for medical research, but privacy regulations and data heterogeneity remain key barriers to large-scale machine learning. Federated learning (FL) enables collaborative modeling without sharing raw data, yet faces challenges in harmonizing diverse clinical datasets. This paper presents a two-step data alignment strategy integrating ontologies and large language models (LLMs) to support secure, privacy-preserving FL in healthcare, demonstrating its effectiveness in a real-world project involving semantic mapping of EHR data.
nan
Article 1348
Title@2025-05-26 (1): ProcessBench: Identifying Process Errors in Mathematical Reasoning
Title: ProcessBench: Identifying Process Errors in Mathematical Reasoning | ProcessBench: Identifizierung von Prozessfehlern in mathematischer Reasoning | 进程快节: 识别数学原因中的进程错误 2412.06559v4 |
Authors: Chujie Zheng, Zhenru Zhang, Beichen Zhang, Runji Lin, Keming Lu, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin
As language models regularly make mistakes when solving math problems, automated identification of errors in the reasoning process becomes increasingly significant for their scalable oversight. In this paper, we introduce ProcessBench for measuring the ability to identify erroneous steps in mathematical reasoning. It consists of 3,400 test cases, primarily focused on competition- and Olympiad-level math problems. Each test case contains a step-by-step solution with error location annotated by human experts. Models are required to identify the earliest step that contains an error, or conclude that all steps are correct. We conduct extensive evaluation on ProcessBench, involving two types of models: process reward models (PRMs) and critic models, where for the latter we prompt general language models to critique each solution step by step. We draw two main observations: (1) Existing PRMs typically fail to generalize to more challenging math problems beyond GSM8K and MATH. They underperform both critic models (i.e., prompted general language models) and our own trained PRM that is straightforwardly fine-tuned on the PRM800K dataset. (2) The best open-source model, QwQ-32B-Preview, has demonstrated the critique capability competitive with the proprietary model GPT-4o, despite that it still lags behind the reasoning-specialized o1-mini. We hope ProcessBench can foster future research in reasoning process assessment, paving the way toward scalable oversight of language models.
nan
Article 1349
Title@2025-05-26 (1): Kernel-based estimators for functional causal effects
Title: Kernel-based estimators for functional causal effects | kernbasierte Schätzwerte für funktionelle kausale Effekte | 功能因果效应的内核核心估计值 2503.05024v3 |
Authors: Yordan P. Raykov, Hengrui Luo, Justin D. Strait, Wasiur R. KhudaBukhsh
We propose causal effect estimators based on empirical Fr'{e}chet means and operator-valued kernels, tailored to functional data spaces. These methods address the challenges of high-dimensionality, sequential ordering, and model complexity while preserving robustness to treatment misspecification. Using structural assumptions, we obtain compact representations of potential outcomes, enabling scalable estimation of causal effects over time and across covariates. We provide both theoretical, regarding the consistency of functional causal effects, as well as empirical comparison of a range of proposed causal effect estimators. Applications to binary treatment settings with functional outcomes illustrate the framework’s utility in biomedical monitoring, where outcomes exhibit complex temporal dynamics. Our estimators accommodate scenarios with registered covariates and outcomes, aligning them to the Fr'{e}chet means, as well as cases requiring higher-order representations to capture intricate covariate-outcome interactions. These advancements extend causal inference to dynamic and non-linear domains, offering new tools for understanding complex treatment effects in functional data settings.
nan
Article 1350
Title@2025-05-26 (1): Data-Dependent Regret Bounds for Constrained MABs
Title: Data-Dependent Regret Bounds for Constrained MABs | Datendependent Regret Bounds for Constrained MABs | 受约束 MAB 的受控数据依赖的 Regret Bounds 2505.20010v1 |
Authors: Gianmarco Genalti, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
This paper initiates the study of data-dependent regret bounds in constrained MAB settings. These bounds depend on the sequence of losses that characterize the problem instance. Thus, they can be much smaller than classical $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bounds, while being equivalent to them in the worst case. Despite this, data-dependent regret bounds have been completely overlooked in constrained MAB settings. The goal of this paper is to answer the following question: Can data-dependent regret bounds be derived in the presence of constraints? We answer this question affirmatively in constrained MABs with adversarial losses and stochastic constraints. Specifically, our main focus is on the most challenging and natural settings with hard constraints, where the learner must ensure that the constraints are always satisfied with high probability. We design an algorithm with a regret bound consisting of two data-dependent terms. The first term captures the difficulty of satisfying the constraints, while the second one encodes the complexity of learning independently of the presence of constraints. We also prove a lower bound showing that these two terms are not artifacts of our specific approach and analysis, but rather the fundamental components that inherently characterize the complexities of the problem. Finally, in designing our algorithm, we also derive some novel results in the related (and easier) soft constraints settings, which may be of independent interest.
nan
Article 1351
Title@2025-05-26 (1): Prediction-Powered E-Values
Title: Prediction-Powered E-Values | Voraussichtliche E-Werte | 预测力电子价值 2502.04294v2 |
Authors: Daniel Csillag, Claudio José Struchiner, Guilherme Tegoni Goedert
Quality statistical inference requires a sufficient amount of data, which can be missing or hard to obtain. To this end, prediction-powered inference has risen as a promising methodology, but existing approaches are largely limited to Z-estimation problems such as inference of means and quantiles. In this paper, we apply ideas of prediction-powered inference to e-values. By doing so, we inherit all the usual benefits of e-values – such as anytime-validity, post-hoc validity and versatile sequential inference – as well as greatly expand the set of inferences achievable in a prediction-powered manner. In particular, we show that every inference procedure that can be framed in terms of e-values has a prediction-powered counterpart, given by our method. We showcase the effectiveness of our framework across a wide range of inference tasks, from simple hypothesis testing and confidence intervals to more involved procedures for change-point detection and causal discovery, which were out of reach of previous techniques. Our approach is modular and easily integrable into existing algorithms, making it a compelling choice for practical applications.
nan
Article 1352
Title@2025-05-26 (1): TabPFN: One Model to Rule Them All?
Title: TabPFN: One Model to Rule Them All? | TabPFN: Ein Modell, um sie alle zu beherrschen? | TabPFN: 一种模式来统治他们吗? 2505.20003v1 |
Authors: Qiong Zhang, Yan Shuo Tan, Qinglong Tian, Pengfei Li
Hollmann et al. (Nature 637 (2025) 319-326) recently introduced TabPFN, a transformer-based deep learning model for regression and classification on tabular data, which they claim “outperforms all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time.” Furthermore, they have called TabPFN a “foundation model” for tabular data, as it can support “data generation, density estimation, learning reusable embeddings and fine-tuning”. If these statements are well-supported, TabPFN may have the potential to supersede existing modeling approaches on a wide range of statistical tasks, mirroring a similar revolution in other areas of artificial intelligence that began with the advent of large language models. In this paper, we provide a tailored explanation of how TabPFN works for a statistics audience, by emphasizing its interpretation as approximate Bayesian inference. We also provide more evidence of TabPFN’s “foundation model” capabilities: We show that an out-of-the-box application of TabPFN vastly outperforms specialized state-of-the-art methods for semi-supervised parameter estimation, prediction under covariate shift, and heterogeneous treatment effect estimation. We further show that TabPFN can outperform LASSO at sparse regression and can break a robustness-efficiency trade-off in classification. All experiments can be reproduced using the code provided at https://github.com/qinglong-tian/tabpfn_study (https://github.com/qinglong-tian/tabpfn_study).
nan
Article 1353
Title@2025-05-26 (1): Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents
Title: Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents | Unvollkommenheit: Simulieren von Studenten mit unterschiedlichen kognitiven Ebenen mit LLM-basierten Agenten | 普及缺陷:利用基于LLM的代理物模拟具有不同认知水平的学生 2505.19997v1 |
Authors: Tao Wu, Jingyuan Chen, Wang Lin, Mengze Li, Yumeng Zhu, Ang Li, Kun Kuang, Fei Wu
Large language models (LLMs) are revolutionizing education, with LLM-based agents playing a key role in simulating student behavior. A major challenge in student simulation is modeling the diverse learning patterns of students at various cognitive levels. However, current LLMs, typically trained as ``helpful assistants’’, target at generating perfect responses. As a result, they struggle to simulate students with diverse cognitive abilities, as they often produce overly advanced answers, missing the natural imperfections that characterize student learning and resulting in unrealistic simulations. To address this issue, we propose a training-free framework for student simulation. We begin by constructing a cognitive prototype for each student using a knowledge graph, which captures their understanding of concepts from past learning records. This prototype is then mapped to new tasks to predict student performance. Next, we simulate student solutions based on these predictions and iteratively refine them using a beam search method to better replicate realistic mistakes. To validate our approach, we construct the \texttt{Student_100} dataset, consisting of $100$ students working on Python programming and $5,000$ learning records. Experimental results show that our method consistently outperforms baseline models, achieving $100\%$ improvement in simulation accuracy.
nan
Article 1354
Title@2025-05-26 (1): Learning Optimal Multimodal Information Bottleneck Representations
Title: Learning Optimal Multimodal Information Bottleneck Representations | Optimales Lernen multimodaler Informationen Engpässe Vertretungen | 学习最佳最佳多模式信息 2505.19996v1 |
Authors: Qilong Wu, Yiyang Shao, Jun Wang, Xiaobo Sun
Leveraging high-quality joint representations from multimodal data can greatly enhance model performance in various machine-learning based applications. Recent multimodal learning methods, based on the multimodal information bottleneck (MIB) principle, aim to generate optimal MIB with maximal task-relevant information and minimal superfluous information via regularization. However, these methods often set ad hoc regularization weights and overlook imbalanced task-relevant information across modalities, limiting their ability to achieve optimal MIB. To address this gap, we propose a novel multimodal learning framework, Optimal Multimodal Information Bottleneck (OMIB), whose optimization objective guarantees the achievability of optimal MIB by setting the regularization weight within a theoretically derived bound. OMIB further addresses imbalanced task-relevant information by dynamically adjusting regularization weights per modality, promoting the inclusion of all task-relevant information. Moreover, we establish a solid information-theoretical foundation for OMIB’s optimization and implement it under the variational approximation framework for computational efficiency. Finally, we empirically validate the OMIB’s theoretical properties on synthetic data and demonstrate its superiority over the state-of-the-art benchmark methods in various downstream tasks.
nan
Article 1355
Title@2025-05-26 (1): Distortion Resilience for Goal-Oriented Semantic Communication
Title: Distortion Resilience for Goal-Oriented Semantic Communication | Distortion Resilienz für zielorientierte semantische Kommunikation | 目标导向语义交流的扭曲复原力 2309.14587v2 |
Authors: Minh-Duong Nguyen, Quang-Vinh Do, Zhaohui Yang, Quoc-Viet Pham, Won-Joo Hwang
Recent research efforts on Semantic Communication (SemCom) have mostly considered accuracy as a main problem for optimizing goal-oriented communication systems. However, these approaches introduce a paradox: the accuracy of Artificial Intelligence (AI) tasks should naturally emerge through training rather than being dictated by network constraints. Acknowledging this dilemma, this work introduces an innovative approach that leverages the rate distortion theory to analyze distortions induced by communication and compression, thereby analyzing the learning process. Specifically, we examine the distribution shift between the original data and the distorted data, thus assessing its impact on the AI model’s performance. Founding upon this analysis, we can preemptively estimate the empirical accuracy of AI tasks, making the goal-oriented SemCom problem feasible. To achieve this objective, we present the theoretical foundation of our approach, accompanied by simulations and experiments that demonstrate its effectiveness. The experimental results indicate that our proposed method enables accurate AI task performance while adhering to network constraints, establishing it as a valuable contribution to the field of signal processing. Furthermore, this work advances research in goal-oriented SemCom and highlights the significance of data-driven approaches in optimizing the performance of intelligent systems.
nan
Article 1356
Title@2025-05-26 (1): Federated Domain Generalization with Data-free On-server Matching Gradient
Title: Federated Domain Generalization with Data-free On-server Matching Gradient | Föderierte Domain-Verallgemeinerung mit datenfreiem On-Server-Zustimmungs-Gradient | 具有无数据观测站上与渐变匹配的无数据观测器的联邦通用域 2501.14653v2 |
Authors: Trong-Binh Nguyen, Minh-Duong Nguyen, Jinsun Park, Quoc-Viet Pham, Won Joo Hwang
Domain Generalization (DG) aims to learn from multiple known source domains a model that can generalize well to unknown target domains. One of the key approaches in DG is training an encoder which generates domain-invariant representations. However, this approach is not applicable in Federated Domain Generalization (FDG), where data from various domains are distributed across different clients. In this paper, we introduce a novel approach, dubbed Federated Learning via On-server Matching Gradient (FedOMG), which can \emph{efficiently leverage domain information from distributed domains}. Specifically, we utilize the local gradients as information about the distributed models to find an invariant gradient direction across all domains through gradient inner product maximization. The advantages are two-fold: 1) FedOMG can aggregate the characteristics of distributed models on the centralized server without incurring any additional communication cost, and 2) FedOMG is orthogonal to many existing FL/FDG methods, allowing for additional performance improvements by being seamlessly integrated with them. Extensive experimental evaluations on various settings to demonstrate the robustness of FedOMG compared to other FL/FDG baselines. Our method outperforms recent SOTA baselines on four FL benchmark datasets (MNIST, EMNIST, CIFAR-10, and CIFAR-100), and three FDG benchmark datasets (PACS, VLCS, and OfficeHome).
nan
Article 1357
Title@2025-05-26 (1): Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Title: Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach | Bedauerliche Analyse von durchschnittlichen Unichain-MDPs über einen actor-Critic-Ansatz | 通过“行动者-批评办法”对平均回报单链式微DP的遗憾分析 2505.19986v1 |
Authors: Swetha Ganesh, Vaneet Aggarwal
Actor-Critic methods are widely used for their scalability, yet existing theoretical guarantees for infinite-horizon average-reward Markov Decision Processes (MDPs) often rely on restrictive ergodicity assumptions. We propose NAC-B, a Natural Actor-Critic with Batching, that achieves order-optimal regret of $\tilde{O}(\sqrt{T})$ in infinite-horizon average-reward MDPs under the unichain assumption, which permits both transient states and periodicity. This assumption is among the weakest under which the classic policy gradient theorem remains valid for average-reward settings. NAC-B employs function approximation for both the actor and the critic, enabling scalability to problems with large state and action spaces. The use of batching in our algorithm helps mitigate potential periodicity in the MDP and reduces stochasticity in gradient estimates, and our analysis formalizes these benefits through the introduction of the constants $C_{\text{hit}}$ and $C_{\text{tar}}$, which characterize the rate at which empirical averages over Markovian samples converge to the stationary distribution.
nan
Article 1358
Title@2025-05-26 (1): Bridging The Multi-Modality Gaps of Audio, Visual and Linguistic for Speech Enhancement
Title: Bridging The Multi-Modality Gaps of Audio, Visual and Linguistic for Speech Enhancement | Überbrückung der Multi-Modalitätslücken von Audio, Visual und Linguistik zur Sprachverbesserung | 弥合视听和语言的多模式差距,加强语言、视听能力 2501.13375v2 |
Authors: Meng-Ping Lin, Jen-Cheng Hou, Chia-Wei Chen, Shao-Yi Chien, Jun-Cheng Chen, Xugang Lu, Yu Tsao
Speech enhancement (SE) aims to improve the quality and intelligibility of speech in noisy environments. Recent studies have shown that incorporating visual cues in audio signal processing can enhance SE performance. Given that human speech communication naturally involves audio, visual, and linguistic modalities, it is reasonable to expect additional improvements by integrating linguistic information. However, effectively bridging these modality gaps, particularly during knowledge transfer remains a significant challenge. In this paper, we propose a novel multi-modal learning framework, termed DLAV-SE, which leverages a diffusion-based model integrating audio, visual, and linguistic information for audio-visual speech enhancement (AVSE). Within this framework, the linguistic modality is modeled using a pretrained language model (PLM), which transfers linguistic knowledge to the audio-visual domain through a cross-modal knowledge transfer (CMKT) mechanism during training. After training, the PLM is no longer required at inference, as its knowledge is embedded into the AVSE model through the CMKT process. We conduct a series of SE experiments to evaluate the effectiveness of our approach. Results show that the proposed DLAV-SE system significantly improves speech quality and reduces generative artifacts, such as phonetic confusion, compared to state-of-the-art (SOTA) methods. Furthermore, visualization analyses confirm that the CMKT method enhances the generation quality of the AVSE outputs. These findings highlight both the promise of diffusion-based methods for advancing AVSE and the value of incorporating linguistic information to further improve system performance.
nan
Article 1359
Title@2025-05-26 (1): Rethinking Probabilistic Circuit Parameter Learning
Title: Rethinking Probabilistic Circuit Parameter Learning | Probabilistisches Parameter-Lernen neu denken | 重新思考概率电路参数学习 2505.19982v1 |
Authors: Anji Liu, Guy Van den Broeck
Probabilistic Circuits (PCs) offer a computationally scalable framework for generative modeling, supporting exact and efficient inference of a wide range of probabilistic queries. While recent advances have significantly improved the expressiveness and scalability of PCs, effectively training their parameters remains a challenge. In particular, a widely used optimization method, full-batch Expectation-Maximization (EM), requires processing the entire dataset before performing a single update, making it ineffective for large datasets. While empirical extensions to the mini-batch setting have been proposed, it remains unclear what objective these algorithms are optimizing, making it difficult to assess their theoretical soundness. This paper bridges the gap by establishing a novel connection between the general EM objective and the standard full-batch EM algorithm. Building on this, we derive a theoretically grounded generalization to the mini-batch setting and demonstrate its effectiveness through preliminary empirical results.
nan
Article 1360
Title@2025-05-26 (1): Differential Privacy Analysis of Decentralized Gossip Averaging under Varying Threat Models
Title: Differential Privacy Analysis of Decentralized Gossip Averaging under Varying Threat Models | Differential Privacy Analyse dezentralisierter Gossip Average unter unterschiedlichen Bedrohungsmodellen | 对不同威胁模式下分散的流民的隐私差异分析 2505.19969v1 |
Authors: Antti Koskela, Tejas Kulkarni
Fully decentralized training of machine learning models offers significant advantages in scalability, robustness, and fault tolerance. However, achieving differential privacy (DP) in such settings is challenging due to the absence of a central aggregator and varying trust assumptions among nodes. In this work, we present a novel privacy analysis of decentralized gossip-based averaging algorithms with additive node-level noise, both with and without secure summation over each node’s direct neighbors. Our main contribution is a new analytical framework based on a linear systems formulation that accurately characterizes privacy leakage across these scenarios. This framework significantly improves upon prior analyses, for example, reducing the R'enyi DP parameter growth from $O(T^2)$ to $O(T)$, where $T$ is the number of training rounds. We validate our analysis with numerical results demonstrating superior DP bounds compared to existing approaches. We further illustrate our analysis with a logistic regression experiment on MNIST image classification in a fully decentralized setting, demonstrating utility comparable to central aggregation methods.
nan
Article 1361
Title@2025-05-26 (1): Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Title: Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking) | Position: Löse schichtweise lineare Modelle, um neurale dynamische Phänomene zu verstehen (Neuraler Kollaps, Emergence, Lazy/Rich Regime und Grokking) | 位置:首先理解神经动态现象的解层图层线性模型(神经崩溃、新出现、Lazy/Rich制度和Grokking) 2502.21009v2 |
Authors: Yoonsoo Nam, Seok Hyeong Lee, Clementine C J Domine, Yeachan Park, Charles London, Wonyl Choi, Niclas Goring, Seungjai Lee
In physics, complex systems are often simplified into minimal, solvable models that retain only the core principles. In machine learning, layerwise linear models (e.g., linear neural networks) act as simplified representations of neural network dynamics. These models follow the dynamical feedback principle, which describes how layers mutually govern and amplify each other’s evolution. This principle extends beyond the simplified models, successfully explaining a wide range of dynamical phenomena in deep neural networks, including neural collapse, emergence, lazy and rich regimes, and grokking. In this position paper, we call for the use of layerwise linear models retaining the core principles of neural dynamical phenomena to accelerate the science of deep learning.
nan
Article 1362
Title@2025-05-26 (1): Learning to Select In-Context Demonstration Preferred by Large Language Model
Title: Learning to Select In-Context Demonstration Preferred by Large Language Model | Lernen, In-Kontext-Demonstration zu wählen Bevorzugt nach großen Sprachmodellen | 学习选择大语言模式首选的文本内演示 2505.19966v1 |
Authors: Zheng Zhang, Shaocheng Lan, Lei Song, Jiang Bian, Yexin Li, Kan Ren
In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks during inference using only a few demonstrations. However, ICL performance is highly dependent on the selection of these demonstrations. Recent work explores retrieval-based methods for selecting query-specific demonstrations, but these approaches often rely on surrogate objectives such as metric learning, failing to directly optimize ICL performance. Consequently, they struggle to identify truly beneficial demonstrations. Moreover, their discriminative retrieval paradigm is ineffective when the candidate pool lacks sufficient high-quality demonstrations. To address these challenges, we propose GenICL, a novel generative preference learning framework that leverages LLM feedback to directly optimize demonstration selection for ICL. Experiments on 19 datasets across 11 task categories demonstrate that GenICL achieves superior performance than existing methods in selecting the most effective demonstrations, leading to better ICL performance.
nan
Article 1363
Title@2025-05-26 (1): The Limits of Preference Data for Post-Training
Title: The Limits of Preference Data for Post-Training | Die Grenzen der Präferenzdaten für das Post-Training | 培训后优先数据限值 2505.19964v1 |
Authors: Eric Zhao, Jessica Dai, Pranjal Awasthi
Recent progress in strengthening the capabilities of large language models has stemmed from applying reinforcement learning to domains with automatically verifiable outcomes. A key question is whether we can similarly use RL to optimize for outcomes in domains where evaluating outcomes inherently requires human feedback; for example, in tasks like deep research and trip planning, outcome evaluation is qualitative and there are many possible degrees of success. One attractive and scalable modality for collecting human feedback is preference data: ordinal rankings (pairwise or $k$-wise) that indicate, for $k$ given outcomes, which one is preferred. In this work, we study a critical roadblock: preference data fundamentally and significantly limits outcome-based optimization. Even with idealized preference data (infinite, noiseless, and online), the use of ordinal feedback can prevent obtaining even approximately optimal solutions. We formalize this impossibility using voting theory, drawing an analogy between how a model chooses to answer a query with how voters choose a candidate to elect. This indicates that grounded human scoring and algorithmic innovations are necessary for extending the success of RL post-training to domains demanding human feedback. We also explore why these limitations have disproportionately impacted RLHF when it comes to eliciting reasoning behaviors (e.g., backtracking) versus situations where RLHF has been historically successful (e.g., instruction-tuning and safety training), finding that the limitations of preference data primarily suppress RLHF’s ability to elicit robust strategies – a class that encompasses most reasoning behaviors.
nan
Article 1364
Title@2025-05-26 (1): Robustly optimal dynamics for active matter reservoir computing
Title: Robustly optimal dynamics for active matter reservoir computing | Robust optimale Dynamik für das Recreservoir Computing mit aktiven Materien | 活性物质储油层计算强有力的最佳动态 2505.05420v2 |
Authors: Mario U. Gaimann, Miriam Klopotek
Information processing abilities of active matter are studied in the reservoir computing (RC) paradigm to infer the future state of a chaotic signal. We uncover an exceptional regime of agent dynamics that has been overlooked previously. It appears robustly optimal for performance under many conditions, thus providing valuable insights into computation with physical systems more generally. The key to forming effective mechanisms for information processing appears in the system’s intrinsic relaxation abilities. These are probed without actually enforcing a specific inference goal. The dynamical regime that achieves optimal computation is located just below a critical damping threshold, involving a relaxation with multiple stages, and is readable at the single-particle level. At the many-body level, it yields substrates robustly optimal for RC across varying physical parameters and inference tasks. A system in this regime exhibits a strong diversity of dynamic mechanisms under highly fluctuating driving forces. Correlations of agent dynamics can express a tight relationship between the responding system and the fluctuating forces driving it. As this model is interpretable in physical terms, it facilitates re-framing inquiries regarding learning and unconventional computing with a fresh rationale for many-body physics out of equilibrium.
nan
Article 1365
Title@2025-05-26 (1): Explanatory Summarization with Discourse-Driven Planning
Title: Explanatory Summarization with Discourse-Driven Planning | Erklärende Zusammenfassung mit diskursgetriebener Planung | 与 “ 分流规划 “ 结合的解释性总结 2504.19339v3 |
Authors: Dongqi Liu, Xi Yu, Vera Demberg, Mirella Lapata
Lay summaries for scientific documents typically include explanations to help readers grasp sophisticated concepts or arguments. However, current automatic summarization methods do not explicitly model explanations, which makes it difficult to align the proportion of explanatory content with human-written summaries. In this paper, we present a plan-based approach that leverages discourse frameworks to organize summary generation and guide explanatory sentences by prompting responses to the plan. Specifically, we propose two discourse-driven planning strategies, where the plan is conditioned as part of the input or part of the output prefix, respectively. Empirical experiments on three lay summarization datasets show that our approach outperforms existing state-of-the-art methods in terms of summary quality, and it enhances model robustness, controllability, and mitigates hallucination.
nan
Article 1366
Title@2025-05-26 (1): RAP: Runtime-Adaptive Pruning for LLM Inference
Title: RAP: Runtime-Adaptive Pruning for LLM Inference | RAP: Runtime-Adaptive Pruning für LLM-Inferenz | RAP:LLM 推断的运行时间-适应性节制 2505.17138v2 |
Authors: Huanrong Liu, Chunlin Tian, Xuyang Wei, Jiaheng Dai, Qin Liu, Tianqi Wei, Qingbiao Li, Li Li
Large language models (LLMs) excel at language understanding and generation, but their enormous computational and memory requirements hinder deployment. Compression offers a potential solution to mitigate these constraints. However, most existing methods rely on fixed heuristics and thus fail to adapt to runtime memory variations or heterogeneous KV-cache demands arising from diverse user requests. To address these limitations, we propose RAP, an elastic pruning framework driven by reinforcement learning (RL) that dynamically adjusts compression strategies in a runtime-aware manner. Specifically, RAP dynamically tracks the evolving ratio between model parameters and KV-cache across practical execution. Recognizing that FFNs house most parameters, whereas parameter -light attention layers dominate KV-cache formation, the RL agent retains only those components that maximize utility within the current memory budget, conditioned on instantaneous workload and device state. Extensive experiments results demonstrate that RAP outperforms state-of-the-art baselines, marking the first time to jointly consider model weights and KV-cache on the fly.
nan
Article 1367
Title@2025-05-26 (1): Multi-Type Point Cloud Autoencoder: A Complete Equivariant Embedding for Molecule Conformation and Pose
Title: Multi-Type Point Cloud Autoencoder: A Complete Equivariant Embedding for Molecule Conformation and Pose | Multi-Type-Punkt-Cloud-Autoencoder: Ein komplettes Equivariant-Embedding für Molekülkonformation und Pose | 多类型点云云自动编码器:分子构造和脉冲的完全等同嵌入 2405.13791v3 |
Authors: Michael Kilgour, Mark Tuckerman, Jutta Rogal
Representations are a foundational component of any modelling protocol, including on molecules and molecular solids. For tasks that depend on knowledge of both molecular conformation and 3D orientation, such as the modelling of molecular dimers, clusters, or condensed phases, we desire a rotatable representation that is provably complete in the types and positions of atomic nuclei and roto-inversion equivariant with respect to the input point cloud. In this paper, we develop, train, and evaluate a new type of autoencoder, molecular O(3) encoding net (Mo3ENet), for multi-type point clouds, for which we propose a new reconstruction loss, capitalizing on a Gaussian mixture representation of the input and output point clouds. Mo3ENet is end-to-end equivariant, meaning the learned representation can be manipulated on O(3), a practical bonus. An appropriately trained Mo3ENet latent space comprises a universal embedding for scalar and vector molecule property prediction tasks, as well as other downstream tasks incorporating the 3D molecular pose, and we demonstrate its fitness on several such tasks.
nan
Article 1368
Title@2025-05-26 (1): MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research
Title: MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research | MLR-Bench: Bewertung von KI-Agenten auf Open-Ended Machine Learning Research | MLR-Bench:评估AI公司在开放式机械学习研究方面的代理机构 2505.19955v1 |
Authors: Hui Chen, Miao Xiong, Yujie Lu, Wei Han, Ailin Deng, Yufei He, Jiaying Wu, Yibo Li, Yue Liu, Bryan Hooi
Recent advancements in AI agents have demonstrated their growing potential to drive and support scientific discovery. In this work, we introduce MLR-Bench, a comprehensive benchmark for evaluating AI agents on open-ended machine learning research. MLR-Bench includes three key components: (1) 201 research tasks sourced from NeurIPS, ICLR, and ICML workshops covering diverse ML topics; (2) MLR-Judge, an automated evaluation framework combining LLM-based reviewers with carefully designed review rubrics to assess research quality; and (3) MLR-Agent, a modular agent scaffold capable of completing research tasks through four stages: idea generation, proposal formulation, experimentation, and paper writing. Our framework supports both stepwise assessment across these distinct research stages, and end-to-end evaluation of the final research paper. We then use MLR-Bench to evaluate six frontier LLMs and an advanced coding agent, finding that while LLMs are effective at generating coherent ideas and well-structured papers, current coding agents frequently (e.g., in 80% of the cases) produce fabricated or invalidated experimental results–posing a major barrier to scientific reliability. We validate MLR-Judge through human evaluation, showing high agreement with expert reviewers, supporting its potential as a scalable tool for research evaluation. We open-source MLR-Bench to help the community benchmark, diagnose, and improve AI research agents toward trustworthy and transparent scientific discovery.
nan
Article 1369
Title@2025-05-26 (1): An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning
Title: An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning | Ein erklärbares Diagnose-Framework für neurodegenerative Dementias durch Verstärkungsoptimierte LLM-Reasoning | 通过强化-优化LLM解释性理疗理由的神经医学性痴呆症可解释的诊断框架 2505.19954v1 |
Authors: Andrew Zamai, Nathanael Fijalkow, Boris Mansencal, Laurent Simon, Eloi Navet, Pierrick Coupe
The differential diagnosis of neurodegenerative dementias is a challenging clinical task, mainly because of the overlap in symptom presentation and the similarity of patterns observed in structural neuroimaging. To improve diagnostic efficiency and accuracy, deep learning-based methods such as Convolutional Neural Networks and Vision Transformers have been proposed for the automatic classification of brain MRIs. However, despite their strong predictive performance, these models find limited clinical utility due to their opaque decision making. In this work, we propose a framework that integrates two core components to enhance diagnostic transparency. First, we introduce a modular pipeline for converting 3D T1-weighted brain MRIs into textual radiology reports. Second, we explore the potential of modern Large Language Models (LLMs) to assist clinicians in the differential diagnosis between Frontotemporal dementia subtypes, Alzheimer’s disease, and normal aging based on the generated reports. To bridge the gap between predictive accuracy and explainability, we employ reinforcement learning to incentivize diagnostic reasoning in LLMs. Without requiring supervised reasoning traces or distillation from larger models, our approach enables the emergence of structured diagnostic rationales grounded in neuroimaging findings. Unlike post-hoc explainability methods that retrospectively justify model decisions, our framework generates diagnostic rationales as part of the inference process-producing causally grounded explanations that inform and guide the model’s decision-making process. In doing so, our framework matches the diagnostic performance of existing deep learning methods while offering rationales that support its diagnostic conclusions.
nan
Article 1370
Title@2025-05-26 (1): Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions
Title: Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions | Welche Datenattribute stimulieren die Mathe- und Code-Reasoning? Eine Untersuchung über Einflussfunktionen | 哪些数据属性刺激数学和代码理由? 通过影响函数进行调查 2505.19949v1 |
Authors: Siqi Kou, Qingyuan Tian, Hanwen Xu, Zihao Zeng, Zhijie Deng
Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding, often bolstered by post-training on the chain-of-thoughts (CoTs) generated by stronger models. However, existing strategies for curating such training data predominantly rely on heuristics, limiting generalizability and failing to capture subtleties underlying in data. To address these limitations, we leverage influence functions to systematically attribute LLMs’ reasoning ability on math and coding to individual training examples, sequences, and tokens, enabling deeper insights into effective data characteristics. Our Influence-based Reasoning Attribution (Infra) uncovers nontrivial cross-domain effects across math and coding tasks: high-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning. Based on these findings, we introduce a simple yet effective dataset reweighting strategy by flipping task difficulty, which doubles AIME24 accuracy from 10\% to 20\% and boosts LiveCodeBench accuracy from 33.8\% to 35.3\% for Qwen2.5-7B-Instruct. Moreover, our fine-grained attribution reveals that the sequence-level exploratory behaviors enhance reasoning performance in both math and code, and the token-level influence patterns are distinct for math and code reasoning: the former prefers natural language logic connectors and the latter emphasizes structural syntax.
nan
Article 1371
Title@2025-05-26 (1): SaSi: A Self-augmented and Self-interpreted Deep Learning Approach for Few-shot Cryo-ET Particle Detection
Title: SaSi: A Self-augmented and Self-interpreted Deep Learning Approach for Few-shot Cryo-ET Particle Detection | SaSi: Ein selbst-augmentierter und selbst-interpretierter Deep-Learning-Ansatz für die wenige Schuss Cryo-ET Partikelerkennung | SaSi:对几近的Cryo-ET粒子探测自增强和自我解释的深层学习方法 2505.19948v1 |
Authors: Gokul Adethya, Bhanu Pratyush Mantha, Tianyang Wang, Xingjian Li, Min Xu
Cryo-electron tomography (cryo-ET) has emerged as a powerful technique for imaging macromolecular complexes in their near-native states. However, the localization of 3D particles in cellular environments still presents a significant challenge due to low signal-to-noise ratios and missing wedge artifacts. Deep learning approaches have shown great potential, but they need huge amounts of data, which can be a challenge in cryo-ET scenarios where labeled data is often scarce. In this paper, we propose a novel Self-augmented and Self-interpreted (SaSi) deep learning approach towards few-shot particle detection in 3D cryo-ET images. Our method builds upon self-augmentation techniques to further boost data utilization and introduces a self-interpreted segmentation strategy for alleviating dependency on labeled data, hence improving generalization and robustness. As demonstrated by experiments conducted on both simulated and real-world cryo-ET datasets, the SaSi approach significantly outperforms existing state-of-the-art methods for particle localization. This research increases understanding of how to detect particles with very few labels in cryo-ET and thus sets a new benchmark for few-shot learning in structural biology.
nan
Article 1372
Title@2025-05-26 (1): Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees
Title: Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees | Dynamisch gelerntes Test-Time-Modell-Routing in Sprachmodell Zoos mit Service-Level-Garantien | 具有服务级保障的语文示范动物园动态学习测试时间模型运行 2505.19947v1 |
Authors: Herbert Woisetschläger, Ryan Zhang, Shiqiang Wang, Hans-Arno Jacobsen
Open-weight LLM zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of 2x cost savings compared to existing LLM routing techniques.
nan
Article 1373
Title@2025-05-26 (1): Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^π$-Realizable MDPs
Title: Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^π$-Realizable MDPs | Inverse Q-Learning Done Right: Offline-Imitation Lernen in $Q^π$-realisierbaren MDPs | 逆向Q- 学习完成右: 以可变元DP为单位的离线模拟学习($$- $- 可变 MDP) 2505.19946v1 |
Authors: Antoine Moulin, Gergely Neu, Luca Viano
We study the problem of offline imitation learning in Markov decision processes (MDPs), where the goal is to learn a well-performing policy given a dataset of state-action pairs generated by an expert policy. Complementing a recent line of work on this topic that assumes the expert belongs to a tractable class of known policies, we approach this problem from a new angle and leverage a different type of structural assumption about the environment. Specifically, for the class of linear $Q^\pi$-realizable MDPs, we introduce a new algorithm called saddle-point offline imitation learning (\SPOIL), which is guaranteed to match the performance of any expert up to an additive error $\varepsilon$ with access to $\mathcal{O}(\varepsilon^{-2})$ samples. Moreover, we extend this result to possibly non-linear $Q^\pi$-realizable MDPs at the cost of a worse sample complexity of order $\mathcal{O}(\varepsilon^{-4})$. Finally, our analysis suggests a new loss function for training critic networks from expert data in deep imitation learning. Empirical evaluations on standard benchmarks demonstrate that the neural net implementation of \SPOIL is superior to behavior cloning and competitive with state-of-the-art algorithms.
nan
Article 1374
Title@2025-05-26 (1): RefinedFields: Radiance Fields Refinement for Planar Scene Representations
Title: RefinedFields: Radiance Fields Refinement for Planar Scene Representations | Verfeinerte Felder: Strahlungsfelder Verfeinerung für planare Szenendarstellungen | 精炼田地: 辐射田地 2312.00639v4 |
Authors: Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi, Laurent Caraffa, Jeremie Mary, Valérie Gouet-Brunet
Planar scene representations have recently witnessed increased interests for modeling scenes from images, as their lightweight planar structure enables compatibility with image-based models. Notably, K-Planes have gained particular attention as they extend planar scene representations to support in-the-wild scenes, in addition to object-level scenes. However, their visual quality has recently lagged behind that of state-of-the-art techniques. To reduce this gap, we propose RefinedFields, a method that leverages pre-trained networks to refine K-Planes scene representations via optimization guidance using an alternating training procedure. We carry out extensive experiments and verify the merit of our method on synthetic data and real tourism photo collections. RefinedFields enhances rendered scenes with richer details and improves upon its base representation on the task of novel view synthesis. Our project page can be found at https://refinedfields.github.io .
nan
Article 1375
Title@2025-05-26 (1): Can Visual Encoder Learn to See Arrows?
Title: Can Visual Encoder Learn to See Arrows? | Kann Visual Encoder lernen, Pfeile zu sehen? | 视觉编码器能学会看到箭头吗 ? 2505.19944v1 |
Authors: Naoyuki Terashita, Yusuke Tozaki, Hideaki Omote, Congkha Nguyen, Ryosuke Nakamoto, Yuta Koreeda, Hiroaki Ozaki
The diagram is a visual representation of a relationship illustrated with edges (lines or arrows), which is widely used in industrial and scientific communication. Although recognizing diagrams is essential for vision language models (VLMs) to comprehend domain-specific knowledge, recent studies reveal that many VLMs fail to identify edges in images. We hypothesize that these failures stem from an over-reliance on textual and positional biases, preventing VLMs from learning explicit edge features. Based on this idea, we empirically investigate whether the image encoder in VLMs can learn edge representation through training on a diagram dataset in which edges are biased neither by textual nor positional information. To this end, we conduct contrastive learning on an artificially generated diagram–caption dataset to train an image encoder and evaluate its diagram-related features on three tasks: probing, image retrieval, and captioning. Our results show that the finetuned model outperforms pretrained CLIP in all tasks and surpasses zero-shot GPT-4o and LLaVA-Mistral in the captioning task. These findings confirm that eliminating textual and positional biases fosters accurate edge recognition in VLMs, offering a promising path for advancing diagram understanding.
nan
Article 1376
Title@2025-05-26 (1): Beyond Freezing: Sparse Tuning Enhances Plasticity in Continual Learning with Pre-Trained Models
Title: Beyond Freezing: Sparse Tuning Enhances Plasticity in Continual Learning with Pre-Trained Models | Beyond Freezing: Sparse Tuning verbessert Plastizität im kontinuierlichen Lernen mit vortrainierten Modellen | 超出冻结范围:在继续学习过程中,采用培训前模式,粗略的加注可增强可塑性 2505.19943v1 |
Authors: Huan Zhang, Fan Lyu, Shuyu Dong, Shenghua Fan, Yujin Zheng, Dingwen Wang
Continual Learning with Pre-trained Models holds great promise for efficient adaptation across sequential tasks. However, most existing approaches freeze PTMs and rely on auxiliary modules like prompts or adapters, limiting model plasticity and leading to suboptimal generalization when facing significant distribution shifts. While full fine-tuning can improve adaptability, it risks disrupting crucial pre-trained knowledge. In this paper, we propose Mutual Information-guided Sparse Tuning (MIST), a plug-and-play method that selectively updates a small subset of PTM parameters, less than 5%, based on sensitivity to mutual information objectives. MIST enables effective task-specific adaptation while preserving generalization. To further reduce interference, we introduce strong sparsity regularization by randomly dropping gradients during tuning, resulting in fewer than 0.5% of parameters being updated per step. Applied before standard freeze-based methods, MIST consistently boosts performance across diverse continual learning benchmarks. Experiments show that integrating our method into multiple baselines yields significant performance gains. Our code is available at https://github.com/zhwhu/MIST.
nan
Article 1377
Title@2025-05-26 (1): Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning
Title: Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning | Aufgabenorientierte kabelarme semantische Kommunikation mit selbstüberwachtem Lernen | 以任务为导向的低标签低标签语义交流与自控学习 2505.19940v1 |
Authors: Run Gu, Wei Xu, Zhaohui Yang, Dusit Niyato, Aylin Yener
Task-oriented semantic communication enhances transmission efficiency by conveying semantic information rather than exact messages. Deep learning (DL)-based semantic communication can effectively cultivate the essential semantic knowledge for semantic extraction, transmission, and interpretation by leveraging massive labeled samples for downstream task training. In this paper, we propose a self-supervised learning-based semantic communication framework (SLSCom) to enhance task inference performance, particularly in scenarios with limited access to labeled samples. Specifically, we develop a task-relevant semantic encoder using unlabeled samples, which can be collected by devices in real-world edge networks. To facilitate task-relevant semantic extraction, we introduce self-supervision for learning contrastive features and formulate the information bottleneck (IB) problem to balance the tradeoff between the informativeness of the extracted features and task inference performance. Given the computational challenges of the IB problem, we devise a practical and effective solution by employing self-supervised classification and reconstruction pretext tasks. We further propose efficient joint training methods to enhance end-to-end inference accuracy over wireless channels, even with few labeled samples. We evaluate the proposed framework on image classification tasks over multipath wireless channels. Extensive simulation results demonstrate that SLSCom significantly outperforms conventional digital coding methods and existing DL-based approaches across varying labeled data set sizes and SNR conditions, even when the unlabeled samples are irrelevant to the downstream tasks.
nan
Article 1378
Title@2025-05-26 (1): Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
Title: Efficient Time Series Processing for Transformers and State-Space Models through Token Merging | Effiziente Zeitreihenverarbeitung für Transformatoren und State-Space-Modelle durch Token Merging | 通过 Token 合并对变形器和国家空间模型的有效时间序列处理 2405.17951v2 |
Authors: Leon Götz, Marcel Kollovieh, Stephan Günnemann, Leo Schwinn
Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.
nan
Article 1379
Title@2025-05-26 (1): Constructing a BPE Tokenization DFA
Title: Constructing a BPE Tokenization DFA | Aufbau einer BPE Tokenization DFA | 正在构建 BPE 磁盘化 DFA 2405.07671v2 |
Authors: Martin Berglund, Willeke Martens, Brink van der Merwe
Many natural language processing systems operate over tokenizations of text to address the open-vocabulary problem. In this paper, we give and analyze an algorithm for the efficient construction of deterministic finite automata (DFA) designed to operate directly on tokenizations produced by the popular byte pair encoding (BPE) technique. This makes it possible to apply many existing techniques and algorithms to the tokenized case, such as pattern matching, equivalence checking of tokenization dictionaries, and composing tokenized languages in various ways. The construction preserves some key properties of the automaton, and we use this to establish asymptotic bounds on the state complexity of the automata that result. Finally, we demonstrate how to construct an input-deterministic (subsequential) string-to-string transducer which precisely describes the relationship between strings and their correct tokenizations.
nan
Article 1380
Title@2025-05-26 (1): Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
Title: Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent | Modellierung von Multi-Task-Modellen, die als adaptives projektives Gradientenabsinken zusammenwachsen | 模拟多任务模式模型合并为适应性预测梯度下层 2501.01230v3 |
Authors: Yongxian Wei, Anke Tang, Li Shen, Zixuan Hu, Chun Yuan, Xiaochun Cao
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. Existing methods attempt to alleviate task conflicts by sparsifying task vectors or promoting orthogonality among them. However, they overlook the fundamental target of model merging: the merged model performs as closely as possible to task-specific models on respective tasks. We find these methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Based on our findings, we frame model merging as a constrained optimization problem ($\textit{i.e.}$, minimizing the gap between the merged model and individual models, subject to the constraint of retaining shared knowledge) and solve it via adaptive projective gradient descent. Specifically, we align the merged model with individual models by decomposing and reconstituting the loss function, alleviating conflicts through $\textit{data-free}$ optimization of task vectors. To retain shared knowledge, we optimize this objective by projecting gradients within a $\textit{shared subspace}$ spanning all tasks. Moreover, we view merging coefficients as adaptive learning rates and propose a task-aware, training-free strategy. Experiments show that our plug-and-play approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
nan
Article 1381
Title@2025-05-26 (1): Logic Gate Neural Networks are Good for Verification
Title: Logic Gate Neural Networks are Good for Verification | Logic Gate Neural Networks sind gut für die Verifikation | 逻辑门神经网络有利于核查 2505.19932v1 |
Authors: Fabian Kresse, Emily Yu, Christoph H. Lampert, Thomas A. Henzinger
Learning-based systems are increasingly deployed across various domains, yet the complexity of traditional neural networks poses significant challenges for formal verification. Unlike conventional neural networks, learned Logic Gate Networks (LGNs) replace multiplications with Boolean logic gates, yielding a sparse, netlist-like architecture that is inherently more amenable to symbolic verification, while still delivering promising performance. In this paper, we introduce a SAT encoding for verifying global robustness and fairness in LGNs. We evaluate our method on five benchmark datasets, including a newly constructed 5-class variant, and find that LGNs are both verification-friendly and maintain strong predictive performance.
nan
Article 1382
Title@2025-05-26 (1): JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
Title: JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs | JailbreakRadar: Umfassende Bewertung von Jailbreak Attacken gegen LLMs | Jailbreb Radar:全面评估对LLMs的越狱袭击 2402.05668v3 |
Authors: Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang
Jailbreak attacks aim to bypass the LLMs’ safeguards. While researchers have proposed different jailbreak attacks in depth, they have done so in isolation – either with unaligned settings or comparing a limited range of methods. To fill this gap, we present a large-scale evaluation of various jailbreak attacks. We collect 17 representative jailbreak attacks, summarize their features, and establish a novel jailbreak attack taxonomy. Then we conduct comprehensive measurement and ablation studies across nine aligned LLMs on 160 forbidden questions from 16 violation categories. Also, we test jailbreak attacks under eight advanced defenses. Based on our taxonomy and experiments, we identify some important patterns, such as heuristic-based attacks could achieve high attack success rates but are easy to mitigate by defenses, causing low practicality. Our study offers valuable insights for future research on jailbreak attacks and defenses. We hope our work could help the community avoid incremental work and serve as an effective benchmark tool for practitioners.
nan
Article 1383
Title@2025-05-26 (1): Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning
Title: Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning | Semantic-Aware Ressourcenmanagement für C-V2X Platooning über Multi-Agent Verstärkungslernen | 通过多机构强化学习进行 C-V2X 等离子处理的语义软件资源管理 2411.04672v2 |
Authors: Wenjun Zhang, Qiong Wu, Pingyi Fan, Kezhi Wang, Nan Cheng, Wen Chen, Khaled B. Letaief
Semantic communication transmits the extracted features of information rather than raw data, significantly reducing redundancy, which is crucial for addressing spectrum and energy challenges in 6G networks. In this paper, we introduce semantic communication into a cellular vehicle-to-everything (C-V2X)- based autonomous vehicle platoon system for the first time, aiming to achieve efficient management of communication resources in a dynamic environment. Firstly, we construct a mathematical model for semantic communication in platoon systems, in which the DeepSC model and MU-DeepSC model are used to semantically encode and decode unimodal and multi-modal data, respectively. Then, we propose the quality of experience (QoE) metric based on semantic similarity and semantic rate. Meanwhile, we consider the success rate of semantic information transmission (SRS) metric to ensure the fairness of channel resource allocation. Next, the optimization problem is posed with the aim of maximizing the QoE in vehicle-to-vehicle (V2V) links while improving SRS. To solve this mixed integer nonlinear programming problem (MINLP) and adapt to time-varying channel conditions, the paper proposes a distributed semantic-aware multi-modal resource allocation (SAMRA) algorithm based on multi-agent reinforcement learning (MARL), referred to as SAMRAMARL. The algorithm can dynamically allocate channels and power and determine semantic symbol length based on the contextual importance of the transmitted information, ensuring efficient resource utilization. Finally, extensive simulations have demonstrated that SAMRAMARL outperforms existing methods, achieving significant gains in QoE, SRS, and communication delay in C-V2X platooning scenarios.
nan
Article 1384
Title@2025-05-26 (1): Cellwise and Casewise Robust Covariance in High Dimensions
Title: Cellwise and Casewise Robust Covariance in High Dimensions | Cellwise und Casewise Robuste Kovarianz in hohen Abmessungen | 高维度的单元格和大小写常量 2505.19925v1 |
Authors: Fabio Centofanti, Mia Hubert, Peter J. Rousseeuw
The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.
nan
Article 1385
Title@2025-05-26 (1): Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
Title: Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL | Bellman-Updates vertrauen lernen: Selektive State-Adaptive Regularisierung für Offline RL | 学习信任 Bellman 更新信息: 选择性国家适应性离线转线常规化 2505.19923v1 |
Authors: Qin-Wen Luo, Ming-Kun Xie, Ye-Wen Wang, Sheng-Jun Huang
Offline reinforcement learning (RL) aims to learn an effective policy from a static dataset. To alleviate extrapolation errors, existing studies often uniformly regularize the value function or policy updates across all states. However, due to substantial variations in data quality, the fixed regularization strength often leads to a dilemma: Weak regularization strength fails to address extrapolation errors and value overestimation, while strong regularization strength shifts policy learning toward behavior cloning, impeding potential performance enabled by Bellman updates. To address this issue, we propose the selective state-adaptive regularization method for offline RL. Specifically, we introduce state-adaptive regularization coefficients to trust state-level Bellman-driven results, while selectively applying regularization on high-quality actions, aiming to avoid performance degradation caused by tight constraints on low-quality actions. By establishing a connection between the representative value regularization method, CQL, and explicit policy constraint methods, we effectively extend selective state-adaptive regularization to these two mainstream offline RL approaches. Extensive experiments demonstrate that the proposed method significantly outperforms the state-of-the-art approaches in both offline and offline-to-online settings on the D4RL benchmark.
nan
Article 1386
Title@2025-05-26 (1): (Un)supervised Learning of Maximal Lyapunov Functions
Title: (Un)supervised Learning of Maximal Lyapunov Functions | (Un)überwachtes Lernen von maximalen Lyapunov-Funktionen | (无受监督的学习 Maximal Lyapunov 函数的学习 2408.17246v2 |
Authors: Matthieu Barreau, Nicola Bastianello
In this paper, we address the problem of discovering maximal Lyapunov functions, as a means of determining the region of attraction of a dynamical system. To this end, we design a novel neural network architecture, which we prove to be a universal approximator of (maximal) Lyapunov functions. The architecture combines a local quadratic approximation with the output of a neural network, which models global higher-order terms in the Taylor expansion. We formulate the problem of training the Lyapunov function as an unsupervised optimization problem with dynamical constraints, which can be solved leveraging techniques from physics-informed learning. We propose and analyze a tailored training algorithm, based on the primal-dual algorithm, that can efficiently solve the problem. Additionally, we show how the learning problem formulation can be adapted to integrate data, when available. We apply the proposed approach to different classes of systems, showing that it matches or outperforms state-of-the-art alternatives in the accuracy of the approximated regions of attraction.
nan
Article 1387
Title@2025-05-26 (1): A Probabilistic Model for Non-Contrastive Learning
Title: A Probabilistic Model for Non-Contrastive Learning | Ein probabilistisches Modell für nicht kontrastives Lernen | 非交流性学习概率模型 2501.13031v2 |
Authors: Maximilian Fleissner, Pascal Esser, Debarghya Ghoshdastidar
Self-supervised learning (SSL) aims to find meaningful representations from unlabeled data by encoding semantic similarities through data augmentations. Despite its current popularity, theoretical insights about SSL are still scarce. For example, it is not yet known whether commonly used SSL loss functions can be related to a statistical model, much in the same as OLS, generalized linear models or PCA naturally emerge as maximum likelihood estimates of an underlying generative process. In this short paper, we consider a latent variable statistical model for SSL that exhibits an interesting property: Depending on the informativeness of the data augmentations, the MLE of the model either reduces to PCA, or approaches a simple non-contrastive loss. We analyze the model and also empirically illustrate our findings.
nan
Article 1388
Title@2025-05-26 (1): APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization
Title: APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization | APE: Ein datenzentrischer Benchmark für effiziente LLM-Anpassung in der Textzusammenfassung | APE: 文本摘要中高效LLM适应数据中心基准 2505.19912v1 |
Authors: Javier Marín
We present Adjacent Possible Exploration (APE), a simple yet effective method for adapting large language models to specific tasks using minimal computational resources. Unlike traditional fine-tuning that requires extensive compute, APE iteratively fine-tunes models on small, carefully selected data batches (200 examples), retaining only improvements. On news summarization, APE achieves 40 percent BLEU improvement using just a T4 GPU in 60 minutes, matching or exceeding more complex methods like LoRA while remaining conceptually simple. Our approach is particularly valuable for researchers and practitioners with limited computational resources. We provide open-source code and demonstrate APE’s effectiveness through both automatic metrics and human evaluation. While inspired by evolutionary theory’s “adjacent possible”, APE’s core insight has a very practical application: small, iterative data perturbations can efficiently guide LLMs toward task-specific performance without expensive retraining.
nan
Article 1389
Title@2025-05-26 (1): Inverse Problem Sampling in Latent Space Using Sequential Monte Carlo
Title: Inverse Problem Sampling in Latent Space Using Sequential Monte Carlo | Inverse Problem-Sampling im Latent Space mit Sequential Monte Carlo | 利用定序蒙特卡洛在低层空间进行逆向问题抽样 2502.05908v2 |
Authors: Idan Achituve, Hai Victor Habi, Amir Rosenfeld, Arnon Netzer, Idit Diamant, Ethan Fetaya
In image processing, solving inverse problems is the task of finding plausible reconstructions of an image that was corrupted by some (usually known) degradation operator. Commonly, this process is done using a generative image model that can guide the reconstruction towards solutions that appear natural. The success of diffusion models over the last few years has made them a leading candidate for this task. However, the sequential nature of diffusion models makes this conditional sampling process challenging. Furthermore, since diffusion models are often defined in the latent space of an autoencoder, the encoder-decoder transformations introduce additional difficulties. To address these challenges, we suggest a novel sampling method based on sequential Monte Carlo (SMC) in the latent space of diffusion models. We name our method LD-SMC. We define a generative model for the data using additional auxiliary observations and perform posterior inference with SMC sampling based on a backward diffusion process. Empirical evaluations on ImageNet and FFHQ show the benefits of LD-SMC over competing methods in various inverse problem tasks and especially in challenging inpainting tasks.
nan
Article 1390
Title@2025-05-26 (1): ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining
Title: ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining | ESLM: Risiko-Averse Selective Language Modeling für effizientes Vortraining | ESLM: 有效培训前风险-反风险选择语言建模 2505.19893v1 |
Authors: Melis Ilayda Bal, Volkan Cevher, Michael Muehlebach
Large language model pretraining is compute-intensive, yet many tokens contribute marginally to learning, resulting in inefficiency. We introduce Efficient Selective Language Modeling (ESLM), a risk-aware algorithm that improves training efficiency and distributional robustness by performing online token-level batch selection. ESLM leverages per-token statistics (e.g., entropy or loss) and applies value-at-risk thresholding to retain only the most informative tokens per batch. This data-centric mechanism reshapes the training loss, prioritizing high-risk tokens and eliminating redundant gradient computation. We frame ESLM as a bilevel game: the model competes with a masking adversary that selects worst-case token subsets under a constrained thresholding rule. In the loss-based setting, ESLM recovers conditional value-at-risk loss minimization, providing a principled connection to distributionally robust optimization. We extend our approach to Ada-ESLM, which adaptively tunes the selection confidence during training. Experiments on GPT-2 pretraining show that ESLM significantly reduces training FLOPs while maintaining or improving both perplexity and downstream performance compared to baselines. Our approach also scales across model sizes, pretraining corpora, and integrates naturally with knowledge distillation.
nan
Article 1391
Title@2025-05-26 (1): APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
Title: APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs | APB: Beschleunigen des verteilten Long-Context-Schlussfolgerungens durch Übergeben von komprimierten Kontextblöcken über GPUs | APP: 通过通过横跨 GPU 传递压缩的上下文区块加速分布式长文字推文 2502.12085v2 |
Authors: Yuxiang Huang, Mingye Li, Xu Han, Chaojun Xiao, Weilin Zhao, Sun Ao, Hao Zhou, Jie Zhou, Zhiyuan Liu, Maosong Sun
While long-context inference is crucial for advancing large language model (LLM) applications, its prefill speed remains a significant bottleneck. Current approaches, including sequence parallelism strategies and compute reduction through approximate attention mechanisms, still fall short of delivering optimal inference efficiency. This hinders scaling the inputs to longer sequences and processing long-context queries in a timely manner. To address this, we introduce APB, an efficient long-context inference framework that leverages multi-host approximate attention to enhance prefill speed by reducing compute and enhancing parallelism simultaneously. APB introduces a communication mechanism for essential key-value pairs within a sequence parallelism framework, enabling a faster inference speed while maintaining task performance. We implement APB by incorporating a tailored FlashAttn kernel alongside optimized distribution strategies, supporting diverse models and parallelism configurations. APB achieves speedups of up to 9.2x, 4.2x, and 1.6x compared with FlashAttn, RingAttn, and StarAttn, respectively, without any observable task performance degradation. We provide the implementation and experiment code of APB in https://github.com/thunlp/APB.
nan
Article 1392
Title@2025-05-26 (1): A Langevin sampling algorithm inspired by the Adam optimizer
Title: A Langevin sampling algorithm inspired by the Adam optimizer | Ein Langevin-Sampling-Algorithmus, inspiriert vom Adam-Optimierer | 由亚当优化器启发的Langevin取样算法 2504.18911v2 |
Authors: Benedict Leimkuhler, René Lohmann, Peter Whalley
We present a framework for adaptive-stepsize MCMC sampling based on time-rescaled Langevin dynamics, in which the stepsize variation is dynamically driven by an additional degree of freedom. Our approach augments the phase space by an additional variable which in turn defines a time reparameterization. The use of an auxiliary relaxation equation allows accumulation of a moving average of a local monitor function and provides for precise control of the timestep while circumventing the need to modify the drift term in the physical system. Our algorithm is straightforward to implement and can be readily combined with any off-the-peg fixed-stepsize Langevin integrator. As a particular example, we consider control of the stepsize by monitoring the norm of the log-posterior gradient, which takes inspiration from the Adam optimizer, the stepsize being automatically reduced in regions of steep change of the log posterior and increased on plateaus, improving numerical stability and convergence speed. As in Adam, the stepsize variation depends on the recent history of the gradient norm, which enhances stability and improves accuracy compared to more immediate control approaches. We demonstrate the potential benefit of this method–both in accuracy and in stability–in numerical experiments including Neal’s funnel and a Bayesian neural network for classification of MNIST data.
nan
Article 1393
Title@2025-05-26 (1): Learning mechanical systems from real-world data using discrete forced Lagrangian dynamics
Title: Learning mechanical systems from real-world data using discrete forced Lagrangian dynamics | Mechanische Systeme aus realen Daten mit diskreter, erzwungener Lagrange-Dynamik lernen | 使用离散强制拉格朗江动力从真实世界数据中学习机械系统 2505.20370v1 |
Authors: Martine Dyring Hansen, Elena Celledoni, Benjamin Kwanen Tapley
We introduce a data-driven method for learning the equations of motion of mechanical systems directly from position measurements, without requiring access to velocity data. This is particularly relevant in system identification tasks where only positional information is available, such as motion capture, pixel data or low-resolution tracking. Our approach takes advantage of the discrete Lagrange-d’Alembert principle and the forced discrete Euler-Lagrange equations to construct a physically grounded model of the system’s dynamics. We decompose the dynamics into conservative and non-conservative components, which are learned separately using feed-forward neural networks. In the absence of external forces, our method reduces to a variational discretization of the action principle naturally preserving the symplectic structure of the underlying Hamiltonian system. We validate our approach on a variety of synthetic and real-world datasets, demonstrating its effectiveness compared to baseline methods. In particular, we apply our model to (1) measured human motion data and (2) latent embeddings obtained via an autoencoder trained on image sequences. We demonstrate that we can faithfully reconstruct and separate both the conservative and forced dynamics, yielding interpretable and physically consistent predictions.
nan
Article 1394
Title@2025-05-26 (1): Single-Agent vs. Multi-Agent LLM Strategies for Automated Student Reflection Assessment
Title: Single-Agent vs. Multi-Agent LLM Strategies for Automated Student Reflection Assessment | Single-Agent vs. Multi-Agent LLM-Strategien für die automatisierte Bewertung von Studentenreflexionen | 学生自动反省评估战略 2504.05716v2 |
Authors: Gen Li, Li Chen, Cheng Tang, Valdemar Švábenský, Daisuke Deguchi, Takayoshi Yamashita, Atsushi Shimada
We explore the use of Large Language Models (LLMs) for automated assessment of open-text student reflections and prediction of academic performance. Traditional methods for evaluating reflections are time-consuming and may not scale effectively in educational settings. In this work, we employ LLMs to transform student reflections into quantitative scores using two assessment strategies (single-agent and multi-agent) and two prompting techniques (zero-shot and few-shot). Our experiments, conducted on a dataset of 5,278 reflections from 377 students over three academic terms, demonstrate that the single-agent with few-shot strategy achieves the highest match rate with human evaluations. Furthermore, models utilizing LLM-assessed reflection scores outperform baselines in both at-risk student identification and grade prediction tasks. These findings suggest that LLMs can effectively automate reflection assessment, reduce educators’ workload, and enable timely support for students who may need additional assistance. Our work emphasizes the potential of integrating advanced generative AI technologies into educational practices to enhance student engagement and academic success.
nan
Article 1395
Title@2025-05-26 (1): Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks
Title: Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks | Zielspezifisches De Novo-Design von Wirkstoff-Kandidatenmolekülen mit Graph Transformer-basierten Generativen Adversarial-Netzwerken | 配有基于图形变形器的成形反转基因网络的药物候选分子具体新设计 2302.07868v7 |
Authors: Atabey Ünlü, Elif Çevrim, Melih Gökay Yiğit, Ahmet Sarıgün, Hayriye Çelikbilek, Osman Bayram, Deniz Cansen Kahraman, Abdurrahman Olğaç, Ahmet Sureyya Rifaioğlu, Erden Banoğlu, Tunca Doğan
Discovering novel drug candidate molecules is one of the most fundamental and critical steps in drug development. Generative deep learning models, which create synthetic data given a probability distribution, offer a high potential for designing de novo molecules. However, to be utilisable in real life drug development pipelines, these models should be able to design drug like and target centric molecules. In this study, we propose an end to end generative system, DrugGEN, for the de novo design of drug candidate molecules that interact with intended target proteins. The proposed method represents molecules as graphs and processes them via a generative adversarial network comprising graph transformer layers. The system is trained using a large dataset of drug like compounds and target specific bioactive molecules to design effective inhibitory molecules against the AKT1 protein, which is critically important in developing treatments for various types of cancer. We conducted molecular docking and dynamics to assess the target centric generation performance of the model, as well as attention score visualisation to examine model interpretability. In parallel, selected compounds were chemically synthesised and evaluated in the context of in vitro enzymatic assays, which identified two bioactive molecules that inhibited AKT1 at low micromolar concentrations. These results indicate that DrugGEN’s de novo molecules have a high potential for interacting with the AKT1 protein at the level of its native ligands. Using the open access DrugGEN codebase, it is possible to easily train models for other druggable proteins, given a dataset of experimentally known bioactive molecules.
nan
Article 1396
Title@2025-05-26 (1): Risk-Averse Reinforcement Learning with Itakura-Saito Loss
Title: Risk-Averse Reinforcement Learning with Itakura-Saito Loss | Risiko-Averse Verstärkungs-Lernen mit Itakura-Saito-Verlust | 以Itakuura-Saito损失进行反风险强化学习 2505.16925v2 |
Authors: Igor Udovichenko, Olivier Croissant, Anita Toleutaeva, Evgeny Burnaev, Alexander Korotin
Risk-averse reinforcement learning finds application in various high-stakes fields. Unlike classical reinforcement learning, which aims to maximize expected returns, risk-averse agents choose policies that minimize risk, occasionally sacrificing expected value. These preferences can be framed through utility theory. We focus on the specific case of the exponential utility function, where one can derive the Bellman equations and employ various reinforcement learning algorithms with few modifications. To address this, we introduce to the broad machine learning community a numerically stable and mathematically sound loss function based on the Itakura-Saito divergence for learning state-value and action-value functions. We evaluate the Itakura-Saito loss function against established alternatives, both theoretically and empirically. In the experimental section, we explore multiple scenarios, some with known analytical solutions, and show that the considered loss function outperforms the alternatives.
nan
Article 1397
Title@2025-05-26 (1): Explaining the role of Intrinsic Dimensionality in Adversarial Training
Title: Explaining the role of Intrinsic Dimensionality in Adversarial Training | Erklärung der Rolle der Intrinsischen Dimensionalität im Adversarial Training | 解释内在多面性在相互培训中的作用 2405.17130v2 |
Authors: Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Hassan Sajjad, Sanjay Chawla
Adversarial Training (AT) impacts different architectures in distinct ways: vision models gain robustness but face reduced generalization, encoder-based models exhibit limited robustness improvements with minimal generalization loss, and recent work in latent-space adversarial training (LAT) demonstrates that decoder-based models achieve improved robustness by applying AT across multiple layers. We provide the first explanation for these trends by leveraging the manifold conjecture: off-manifold adversarial examples (AEs) enhance robustness, while on-manifold AEs improve generalization. We show that vision and decoder-based models exhibit low intrinsic dimensionality in earlier layers (favoring off-manifold AEs), whereas encoder-based models do so in later layers (favoring on-manifold AEs). Exploiting this property, we introduce SMAAT, which improves the scalability of AT for encoder-based models by perturbing the layer with the lowest intrinsic dimensionality. This reduces the projected gradient descent (PGD) chain length required for AE generation, cutting GPU time by 25-33% while significantly boosting robustness. We validate SMAAT across multiple tasks, including text generation, sentiment classification, safety filtering, and retrieval augmented generation setups, demonstrating superior robustness with comparable generalization to standard training.
nan
Article 1398
Title@2025-05-26 (1): Multi-Graph Inductive Representation Learning for Large-Scale Urban Rail Demand Prediction under Disruptions
Title: Multi-Graph Inductive Representation Learning for Large-Scale Urban Rail Demand Prediction under Disruptions | Multi-Graph Induktives Representationslernen für großflächige Nachfragevorhersage für die Stadtbahn unter Störungen | 大型城市铁路需求预测中断下的大型城市铁路需求预测 2408.15619v2 |
Authors: Dang Viet Anh Nguyen, J. Victor Flensburg, Fabrizio Cerreto, Bianca Pascariu, Paola Pellegrini, Carlos Lima Azevedo, Filipe Rodrigues
With the expansion of cities over time, URT (Urban Rail Transit) networks have also grown significantly. Demand prediction plays an important role in supporting planning, scheduling, fleet management, and other operational decisions. In this study, we propose an Origin-Destination (OD) demand prediction model called Multi-Graph Inductive Representation Learning (mGraphSAGE) for large-scale URT networks under operational uncertainties. Our main contributions are twofold: we enhance prediction results while ensuring scalability for large networks by relying simultaneously on multiple graphs, where each OD pair is a node on a graph and distinct OD relationships, such as temporal and spatial correlations; we show the importance of including operational uncertainties such as train delays and cancellations as inputs in demand prediction for daily operations. The model is validated on three different scales of the URT network in Copenhagen, Denmark. Experimental results show that by leveraging information from neighboring ODs and learning node representations via sampling and aggregation, mGraphSAGE is particularly suitable for OD demand prediction in large-scale URT networks, outperforming reference machine learning methods. Furthermore, during periods with train cancellations and delays, the performance gap between mGraphSAGE and other methods improves compared to normal operating conditions, demonstrating its ability to leverage system reliability information for predicting OD demand under uncertainty.
nan
Article 1399
Title@2025-05-26 (1): Deep Active Inference Agents for Delayed and Long-Horizon Environments
Title: Deep Active Inference Agents for Delayed and Long-Horizon Environments | Tiefe aktive Inferenz-Agenten für verzögerte und lang-Horizonte Umgebungen | 延迟和长-Horizon环境的深海活性推断剂 2505.19867v1 |
Authors: Yavar Taheri Yeganeh, Mohsen Jafari, Andrea Matta
With the recent success of world-model agents, which extend the core idea of model-based reinforcement learning by learning a differentiable model for sample-efficient control across diverse tasks, active inference (AIF) offers a complementary, neuroscience-grounded paradigm that unifies perception, learning, and action within a single probabilistic framework powered by a generative model. Despite this promise, practical AIF agents still rely on accurate immediate predictions and exhaustive planning, a limitation that is exacerbated in delayed environments requiring plans over long horizons, tens to hundreds of steps. Moreover, most existing agents are evaluated on robotic or vision benchmarks which, while natural for biological agents, fall short of real-world industrial complexity. We address these limitations with a generative-policy architecture featuring (i) a multi-step latent transition that lets the generative model predict an entire horizon in a single look-ahead, (ii) an integrated policy network that enables the transition and receives gradients of the expected free energy, (iii) an alternating optimization scheme that updates model and policy from a replay buffer, and (iv) a single gradient step that plans over long horizons, eliminating exhaustive planning from the control loop. We evaluate our agent in an environment that mimics a realistic industrial scenario with delayed and long-horizon settings. The empirical results confirm the effectiveness of the proposed approach, demonstrating the coupled world-model with the AIF formalism yields an end-to-end probabilistic controller capable of effective decision making in delayed, long-horizon settings without handcrafted rewards or expensive planning.
nan
Article 1400
Title@2025-05-26 (1): HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
Title: HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation | HS-STAR: Hierarchische Probenahme für selbstlernende Vernunfter über Schwierigkeitsschätzung und Budget-Umverteilung | HS-STAR:通过难以估计和预算重新定位为自学理性者进行等级抽样 2505.19866v1 |
Authors: Feng Xiong, Hongling Xu, Yifei Wang, Runxi Cheng, Yong Wang, Xiangxiang Chu
Self-taught reasoners (STaRs) enhance the mathematical reasoning abilities of large language models (LLMs) by leveraging self-generated responses for self-training. Recent studies have incorporated reward models to guide response selection or decoding, aiming to obtain higher-quality data. However, they typically allocate a uniform sampling budget across all problems, overlooking the varying utility of problems at different difficulty levels. In this work, we conduct an empirical study and find that problems near the boundary of the LLM’s reasoning capability offer significantly greater learning utility than both easy and overly difficult ones. To identify and exploit such problems, we propose HS-STaR, a Hierarchical Sampling framework for Self-Taught Reasoners. Given a fixed sampling budget, HS-STaR first performs lightweight pre-sampling with a reward-guided difficulty estimation strategy to efficiently identify boundary-level problems. Subsequently, it dynamically reallocates the remaining budget toward these high-utility problems during a re-sampling phase, maximizing the generation of valuable training data. Extensive experiments across multiple reasoning benchmarks and backbone LLMs demonstrate that HS-STaR significantly outperforms other baselines without requiring additional sampling budget.
nan
Article 1401
Title@2025-05-26 (1): Information-theoretic Generalization Analysis for Expected Calibration Error
Title: Information-theoretic Generalization Analysis for Expected Calibration Error | Informationstheoretische Generalisierungsanalyse für erwarteten Kalibrierungsfehler | 预期校准错误信息理论概括分析 2405.15709v2 |
Authors: Futoshi Futami, Masahiro Fujisawa
While the expected calibration error (ECE), which employs binning, is widely adopted to evaluate the calibration performance of machine learning models, theoretical understanding of its estimation bias is limited. In this paper, we present the first comprehensive analysis of the estimation bias in the two common binning strategies, uniform mass and uniform width binning. Our analysis establishes upper bounds on the bias, achieving an improved convergence rate. Moreover, our bounds reveal, for the first time, the optimal number of bins to minimize the estimation bias. We further extend our bias analysis to generalization error analysis based on the information-theoretic approach, deriving upper bounds that enable the numerical evaluation of how small the ECE is for unknown data. Experiments using deep learning models show that our bounds are nonvacuous thanks to this information-theoretic generalization analysis approach.
nan
Article 1402
Title@2025-05-26 (1): FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance Fields
Title: FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance Fields | FruitNeRF++: Eine generalisierte Multi-Fruit-Counting-Methode, die kontrastives Lernen und neurale Strahlungsfelder nutzt | 水果NeRF++:通用的多功能计数方法,利用矛盾学习和神经辐射场 2505.19863v1 |
Authors: Lukas Meyer, Andrei-Timotei Ardelean, Tim Weyrich, Marc Stamminger
We introduce FruitNeRF++, a novel fruit-counting approach that combines contrastive learning with neural radiance fields to count fruits from unstructured input photographs of orchards. Our work is based on FruitNeRF, which employs a neural semantic field combined with a fruit-specific clustering approach. The requirement for adaptation for each fruit type limits the applicability of the method, and makes it difficult to use in practice. To lift this limitation, we design a shape-agnostic multi-fruit counting framework, that complements the RGB and semantic data with instance masks predicted by a vision foundation model. The masks are used to encode the identity of each fruit as instance embeddings into a neural instance field. By volumetrically sampling the neural fields, we extract a point cloud embedded with the instance features, which can be clustered in a fruit-agnostic manner to obtain the fruit count. We evaluate our approach using a synthetic dataset containing apples, plums, lemons, pears, peaches, and mangoes, as well as a real-world benchmark apple dataset. Our results demonstrate that FruitNeRF++ is easier to control and compares favorably to other state-of-the-art methods.
nan
Article 1403
Title@2025-05-26 (1): KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example
Title: KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example | KAN verbessern wir die HEP-Klassifizierungsaufgaben? Kolmogorov-Arnold Networks für ein LHC-Physikbeispiel | KAN我们改进了HEP分类任务? KAN我们改进了HEP分类任务? Kolmogorov-Arnold网络应用到一个LHC物理范例 2408.02743v2 |
Authors: Johannes Erdmann, Florian Mausolf, Jan Lukas Späh
Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as an alternative to multilayer perceptrons, suggesting advantages in performance and interpretability. We study a typical binary event classification task in high-energy physics including high-level features and comment on the performance and interpretability of KANs in this context. Consistent with expectations, we find that the learned activation functions of a one-layer KAN resemble the univariate log-likelihood ratios of the respective input features. In deeper KANs, the activations in the first layer differ from those in the one-layer KAN, which indicates that the deeper KANs learn more complex representations of the data, a pattern commonly observed in other deep-learning architectures. We study KANs with different depths and widths and we compare them to multilayer perceptrons in terms of performance and number of trainable parameters. For the chosen classification task, we do not find that KANs are more parameter efficient. However, small KANs may offer advantages in terms of interpretability that come at the cost of only a moderate loss in performance.
nan
Article 1404
Title@2025-05-26 (1): Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity
Title: Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity | Varianzreduziertes Kaskade Q-Lernen: Algorithmen und Probenkomplexität | 差异减少的连级学习:等级和抽样复杂性 2408.06544v2 |
Authors: Mohammad Boveiri, Peyman Mohajerin Esfahani
We study the problem of estimating the optimal Q-function of $\gamma$-discounted Markov decision processes (MDPs) under the synchronous setting, where independent samples for all state-action pairs are drawn from a generative model at each iteration. We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ). VRCQ comprises two key building blocks: (i) the established direct variance reduction technique and (ii) our proposed variance reduction scheme, Cascade Q-learning. By leveraging these techniques, VRCQ provides superior guarantees in the $\ell_\infty$-norm compared with the existing model-free stochastic approximation-type algorithms. Specifically, we demonstrate that VRCQ is minimax optimal. Additionally, when the action set is a singleton (so that the Q-learning problem reduces to policy evaluation), it achieves non-asymptotic instance optimality while requiring the minimum number of samples theoretically possible. Our theoretical results and their practical implications are supported by numerical experiments.
nan
Article 1405
Title@2025-05-26 (1): REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models
Title: REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models | REA-RL: Reflection-Aware Online-Verstärkungs-Lernen für effiziente große Vernunftmodelle | REA-RL:为高效大型理由模型进行反思-软件在线强化学习 2505.19862v1 |
Authors: Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Jun Rao, Min Zhang
Large Reasoning Models (LRMs) demonstrate strong performance in complex tasks but often face the challenge of overthinking, leading to substantially high inference costs. Existing approaches synthesize shorter reasoning responses for LRMs to learn, but are inefficient for online usage due to the time-consuming data generation and filtering processes. Meanwhile, online reinforcement learning mainly adopts a length reward to encourage short reasoning responses, but tends to lose the reflection ability and harm the performance. To address these issues, we propose REA-RL, which introduces a small reflection model for efficient scaling in online training, offering both parallel sampling and sequential revision. Besides, a reflection reward is designed to further prevent LRMs from favoring short yet non-reflective responses. Experiments show that both methods maintain or enhance performance while significantly improving inference efficiency. Their combination achieves a good balance between performance and efficiency, reducing inference costs by 35% without compromising performance. Further analysis demonstrates that our methods are effective by maintaining reflection frequency for hard problems while appropriately reducing it for simpler ones without losing reflection ability. Codes are available at https://github.com/hexuandeng/REA-RL.
nan
Article 1406
Title@2025-05-26 (1): Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?
Title: Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning? | Editing as Unlearning: Sind Methoden der Wissensbearbeitung starke Grundlagen für großes Sprachmodell Unlearning? | 编辑为 “ 重新学习:知识编辑方法是否为大语言模式的 “ 退出学习 “ 的 “ 大语言模式 “ 的 “ 坚实基线 “ ? 2505.19855v1 |
Authors: Zexi Li, Xiangzhu Wang, William F. Shen, Meghdad Kurmanji, Xinchi Qiu, Dongqi Cai, Chao Wu, Nicholas D. Lane
Large language Model (LLM) unlearning, i.e., selectively removing information from LLMs, is vital for responsible model deployment. Differently, LLM knowledge editing aims to modify LLM knowledge instead of removing it. Though editing and unlearning seem to be two distinct tasks, we find there is a tight connection between them. In this paper, we conceptualize unlearning as a special case of editing where information is modified to a refusal or “empty set” $\emptyset$ response, signifying its removal. This paper thus investigates if knowledge editing techniques are strong baselines for LLM unlearning. We evaluate state-of-the-art (SOTA) editing methods (e.g., ROME, MEMIT, GRACE, WISE, and AlphaEdit) against existing unlearning approaches on pretrained and finetuned knowledge. Results show certain editing methods, notably WISE and AlphaEdit, are effective unlearning baselines, especially for pretrained knowledge, and excel in generating human-aligned refusal answers. To better adapt editing methods for unlearning applications, we propose practical recipes including self-improvement and query merging. The former leverages the LLM’s own in-context learning ability to craft a more human-aligned unlearning target, and the latter enables ROME and MEMIT to perform well in unlearning longer sample sequences. We advocate for the unlearning community to adopt SOTA editing methods as baselines and explore unlearning from an editing perspective for more holistic LLM memory control.
nan
Article 1407
Title@2025-05-26 (1): DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Title: DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning | DISCOVER: Automatisiertes Curricula für Sparse-Reward-Verstärkungs-Lernen | DISCOV: 失学-退职强化学习自动化课程 2505.19850v1 |
Authors: Leander Diaz-Bone, Marco Bagatella, Jonas Hübotter, Andreas Krause
Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL’s core premise - requiring efficient exploration coupled with long-horizon credit assignment - and overcoming these challenges is key for building self-improving agents with superhuman ability. We argue that solving complex and high-dimensional tasks requires solving simpler tasks that are relevant to the target task. In contrast, most prior work designs strategies for selecting exploratory tasks with the objective of solving any task, making exploration of challenging high-dimensional, long-horizon tasks intractable. We find that the sense of direction, necessary for effective exploration, can be extracted from existing RL algorithms, without needing any prior information. Based on this finding, we propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task. We connect DISCOVER to principled exploration in bandits, formally bounding the time until the target task becomes achievable in terms of the agent’s initial distance to the target, but independent of the volume of the space of all tasks. Empirically, we perform a thorough evaluation in high-dimensional environments. We find that the directed goal selection of DISCOVER solves exploration problems that are beyond the reach of prior state-of-the-art exploration methods in RL.
nan
Article 1408
Title@2025-05-26 (1): Efficient Deconvolution in Populational Inverse Problems
Title: Efficient Deconvolution in Populational Inverse Problems | Effiziente Dekonvolution in inversen Bevölkerungsproblemen | 人口逆向问题的有效演变 2505.19841v1 |
Authors: Arnaud Vadeboncoeur, Mark Girolami, Andrew M. Stuart
This work is focussed on the inversion task of inferring the distribution over parameters of interest leading to multiple sets of observations. The potential to solve such distributional inversion problems is driven by increasing availability of data, but a major roadblock is blind deconvolution, arising when the observational noise distribution is unknown. However, when data originates from collections of physical systems, a population, it is possible to leverage this information to perform deconvolution. To this end, we propose a methodology leveraging large data sets of observations, collected from different instantiations of the same physical processes, to simultaneously deconvolve the data corrupting noise distribution, and to identify the distribution over model parameters defining the physical processes. A parameter-dependent mathematical model of the physical process is employed. A loss function characterizing the match between the observed data and the output of the mathematical model is defined; it is minimized as a function of the both the parameter inputs to the model of the physics and the parameterized observational noise. This coupled problem is addressed with a modified gradient descent algorithm that leverages specific structure in the noise model. Furthermore, a new active learning scheme is proposed, based on adaptive empirical measures, to train a surrogate model to be accurate in parameter regions of interest; this approach accelerates computation and enables automatic differentiation of black-box, potentially nondifferentiable, code computing parameter-to-solution maps. The proposed methodology is demonstrated on porous medium flow, damped elastodynamics, and simplified models of atmospheric dynamics.
nan
Article 1409
Title@2025-05-26 (1): One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
Title: One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP | Ein Surrogate an Narren: All: Universelle, übertragbare und gezielte Widersacherangriffe mit CLIP | 以CLIP取代 “ 愚人Them all “ :通用、可转移和有针对性的对立攻击 2505.19840v1 |
Authors: Binyan Xu, Xilin Dai, Di Tang, Kehuan Zhang
Deep Neural Networks (DNNs) have achieved widespread success yet remain prone to adversarial attacks. Typically, such attacks either involve frequent queries to the target model or rely on surrogate models closely mirroring the target model – often trained with subsets of the target model’s training data – to achieve high attack success rates through transferability. However, in realistic scenarios where training data is inaccessible and excessive queries can raise alarms, crafting adversarial examples becomes more challenging. In this paper, we present UnivIntruder, a novel attack framework that relies solely on a single, publicly available CLIP model and publicly available datasets. By using textual concepts, UnivIntruder generates universal, transferable, and targeted adversarial perturbations that mislead DNNs into misclassifying inputs into adversary-specified classes defined by textual concepts. Our extensive experiments show that our approach achieves an Attack Success Rate (ASR) of up to 85% on ImageNet and over 99% on CIFAR-10, significantly outperforming existing transfer-based methods. Additionally, we reveal real-world vulnerabilities, showing that even without querying target models, UnivIntruder compromises image search engines like Google and Baidu with ASR rates up to 84%, and vision language models like GPT-4 and Claude-3.5 with ASR rates up to 80%. These findings underscore the practicality of our attack in scenarios where traditional avenues are blocked, highlighting the need to reevaluate security paradigms in AI applications.
nan
Article 1410
Title@2025-05-26 (1): Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications
Title: Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications | Multi-Agenten-Verstärkung Lernen in Cybersicherheit: Von Grundlagen zu Anwendungen | 网络安全多机构强化多机构网络安全学习:从基础到应用 2505.19837v1 |
Authors: Christoph R. Landolt, Christoph Würsch, Roland Meier, Alain Mermoud, Julian Jang-Jaccard
Multi-Agent Reinforcement Learning (MARL) has shown great potential as an adaptive solution for addressing modern cybersecurity challenges. MARL enables decentralized, adaptive, and collaborative defense strategies and provides an automated mechanism to combat dynamic, coordinated, and sophisticated threats. This survey investigates the current state of research in MARL applications for automated cyber defense (ACD), focusing on intruder detection and lateral movement containment. Additionally, it examines the role of Autonomous Intelligent Cyber-defense Agents (AICA) and Cyber Gyms in training and validating MARL agents. Finally, the paper outlines existing challenges, such as scalability and adversarial robustness, and proposes future research directions. This also discusses how MARL integrates in AICA to provide adaptive, scalable, and dynamic solutions to counter the increasingly sophisticated landscape of cyber threats. It highlights the transformative potential of MARL in areas like intrusion detection and lateral movement containment, and underscores the value of Cyber Gyms for training and validation of AICA.
nan
Article 1411
Title@2025-05-26 (1): DiffNMR: Advancing Inpainting of Randomly Sampled Nuclear Magnetic Resonance Signals
Title: DiffNMR: Advancing Inpainting of Randomly Sampled Nuclear Magnetic Resonance Signals | DiffNMR: Advancing Inpainting von zufällig gemusterten Kernmagnetresonanzsignalen | DiffNMR:推进随机抽样核磁共振信号的油漆 2505.20367v1 |
Authors: Sen Yan, Fabrizio Gabellieri, Etienne Goffinet, Filippo Castiglione, Thomas Launey
Nuclear Magnetic Resonance (NMR) spectroscopy leverages nuclear magnetization to probe molecules’ chemical environment, structure, and dynamics, with applications spanning from pharmaceuticals to the petroleum industry. Despite its utility, the high cost of NMR instrumentation, operation and the lengthy duration of experiments necessitate the development of computational techniques to optimize acquisition times. Non-Uniform sampling (NUS) is widely employed as a sub-sampling method to address these challenges, but it often introduces artifacts and degrades spectral quality, offsetting the benefits of reduced acquisition times. In this work, we propose the use of deep learning techniques to enhance the reconstruction quality of NUS spectra. Specifically, we explore the application of diffusion models, a relatively untapped approach in this domain. Our methodology involves applying diffusion models to both time-time and time-frequency NUS data, yielding satisfactory reconstructions of challenging spectra from the benchmark Artina dataset. This approach demonstrates the potential of diffusion models to improve the efficiency and accuracy of NMR spectroscopy as well as the superiority of using a time-frequency domain data over the time-time one, opening new landscapes for future studies.
nan
Article 1412
Title@2025-05-26 (1): Revisiting Glorot Initialization for Long-Range Linear Recurrences
Title: Revisiting Glorot Initialization for Long-Range Linear Recurrences | Wiederbesuch der Glorot-Initialisierung für langanhaltende lineare Wiederholungen | 重新审查长频线性线性重现的地球初始化 2505.19827v1 |
Authors: Noga Bar, Mariia Seleznova, Yotam Alexander, Gitta Kutyniok, Raja Giryes
Proper initialization is critical for Recurrent Neural Networks (RNNs), particularly in long-range reasoning tasks, where repeated application of the same weight matrix can cause vanishing or exploding signals. A common baseline for linear recurrences is Glorot initialization, designed to ensure stable signal propagation–but derived under the infinite-width, fixed-length regime–an unrealistic setting for RNNs processing long sequences. In this work, we show that Glorot initialization is in fact unstable: small positive deviations in the spectral radius are amplified through time and cause the hidden state to explode. Our theoretical analysis demonstrates that sequences of length $t = O(\sqrt{n})$, where $n$ is the hidden width, are sufficient to induce instability. To address this, we propose a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below one, preventing rapid signal explosion or decay. These results suggest that standard initialization schemes may break down in the long-sequence regime, motivating a separate line of theory for stable recurrent initialization.
nan
Article 1413
Title@2025-05-26 (1): Foundation Models for Tabular Data within Systemic Contexts Need Grounding
Title: Foundation Models for Tabular Data within Systemic Contexts Need Grounding | Basismodelle für tabellarische Daten in systemischen Kontexten benötigen Erdung | 系统环境中需要依据的表格数据基础模型 2505.19825v1 |
Authors: Tassilo Klein, Johannes Hoffart
Current research on tabular foundation models often overlooks the complexities of large-scale, real-world data by treating tables as isolated entities and assuming information completeness, thereby neglecting the vital operational context. To address this, we introduce the concept of Semantically Linked Tables (SLT), recognizing that tables are inherently connected to both declarative and procedural operational knowledge. We propose Foundation Models for Semantically Linked Tables (FMSLT), which integrate these components to ground tabular data within its true operational context. This comprehensive representation unlocks the full potential of machine learning for complex, interconnected tabular data across diverse domains. Realizing FMSLTs requires access to operational knowledge that is often unavailable in public datasets, highlighting the need for close collaboration between domain experts and researchers. Our work exposes the limitations of current tabular foundation models and proposes a new direction centered on FMSLTs, aiming to advance robust, context-aware models for structured data.
nan
Article 1414
Title@2025-05-26 (1): An Introductory Survey to Autoencoder-based Deep Clustering – Sandboxes for Combining Clustering with Deep Learning
Title: An Introductory Survey to Autoencoder-based Deep Clustering – Sandboxes for Combining Clustering with Deep Learning | Eine Einführungsstudie zum Autoencoder-basierten Deep Clustering – Sandboxen für die Kombination von Clustering mit Deep Learning | 以自动编码器为基础的深层集束 – – 将集束与深层学习相结合的沙箱的介绍性调查 2504.02087v2 |
Authors: Collin Leiber, Lukas Miklautz, Claudia Plant, Christian Böhm
Autoencoders offer a general way of learning low-dimensional, non-linear representations from data without labels. This is achieved without making any particular assumptions about the data type or other domain knowledge. The generality and domain agnosticism in combination with their simplicity make autoencoders a perfect sandbox for researching and developing novel (deep) clustering algorithms. Clustering methods group data based on similarity, a task that benefits from the lower-dimensional representation learned by an autoencoder, mitigating the curse of dimensionality. Specifically, the combination of deep learning with clustering, called Deep Clustering, enables to learn a representation tailored to specific clustering tasks, leading to high-quality results. This survey provides an introduction to fundamental autoencoder-based deep clustering algorithms that serve as building blocks for many modern approaches.
nan
Article 1415
Title@2025-05-26 (1): LAPA-based Dynamic Privacy Optimization for Wireless Federated Learning in Heterogeneous Environments
Title: LAPA-based Dynamic Privacy Optimization for Wireless Federated Learning in Heterogeneous Environments | LAPA-basierte Dynamic Privacy Optimization for Wireless Federated Learning in heterogenen Umgebungen | 以LAPA为基础的在多种不同环境无线联邦学习的动态隐私优化 2505.19823v1 |
Authors: Pengcheng Sun, Erwu Liu, Wei Ni, Rui Wang, Yuanzhe Geng, Lijuan Lai, Abbas Jamalipour
Federated Learning (FL) is a distributed machine learning paradigm based on protecting data privacy of devices, which however, can still be broken by gradient leakage attack via parameter inversion techniques. Differential privacy (DP) technology reduces the risk of private data leakage by adding artificial noise to the gradients, but detrimental to the FL utility at the same time, especially in the scenario where the data is Non-Independent Identically Distributed (Non-IID). Based on the impact of heterogeneous data on aggregation performance, this paper proposes a Lightweight Adaptive Privacy Allocation (LAPA) strategy, which assigns personalized privacy budgets to devices in each aggregation round without transmitting any additional information beyond gradients, ensuring both privacy protection and aggregation efficiency. Furthermore, the Deep Deterministic Policy Gradient (DDPG) algorithm is employed to optimize the transmission power, in order to determine the optimal timing at which the adaptively attenuated artificial noise aligns with the communication noise, enabling an effective balance between DP and system utility. Finally, a reliable aggregation strategy is designed by integrating communication quality and data distribution characteristics, which improves aggregation performance while preserving privacy. Experimental results demonstrate that the personalized noise allocation and dynamic optimization strategy based on LAPA proposed in this paper enhances convergence performance while satisfying the privacy requirements of FL.
nan
Article 1416
Title@2025-05-26 (1): Poison in the Well: Feature Embedding Disruption in Backdoor Attacks
Title: Poison in the Well: Feature Embedding Disruption in Backdoor Attacks | Gift im Brunnen: Feature Einbetten von Disruption in Backdoor-Angriffe | 井中毒:幕后袭击中的特异性嵌入干扰 2505.19821v1 |
Authors: Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Qingming Li, Shouling Ji
Backdoor attacks embed malicious triggers into training data, enabling attackers to manipulate neural network behavior during inference while maintaining high accuracy on benign inputs. However, existing backdoor attacks face limitations manifesting in excessive reliance on training data, poor stealth, and instability, which hinder their effectiveness in real-world applications. Therefore, this paper introduces ShadowPrint, a versatile backdoor attack that targets feature embeddings within neural networks to achieve high ASRs and stealthiness. Unlike traditional approaches, ShadowPrint reduces reliance on training data access and operates effectively with exceedingly low poison rates (as low as 0.01%). It leverages a clustering-based optimization strategy to align feature embeddings, ensuring robust performance across diverse scenarios while maintaining stability and stealth. Extensive evaluations demonstrate that ShadowPrint achieves superior ASR (up to 100%), steady CA (with decay no more than 1% in most cases), and low DDR (averaging below 5%) across both clean-label and dirty-label settings, and with poison rates ranging from as low as 0.01% to 0.05%, setting a new standard for backdoor attack capabilities and emphasizing the need for advanced defense strategies focused on feature space manipulations.
nan
Article 1417
Title@2025-05-26 (1): InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory
Title: InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory | InfoCons: Identifizieren von interpretierbaren kritischen Konzepten in Punktwolken über Informationstheorie | 信息库:通过信息理论确定点云中可解释的关键概念 2505.19820v1 |
Authors: Feifei Li, Mi Zhang, Zhaoxiang Wang, Min Yang
Interpretability of point cloud (PC) models becomes imperative given their deployment in safety-critical scenarios such as autonomous vehicles. We focus on attributing PC model outputs to interpretable critical concepts, defined as meaningful subsets of the input point cloud. To enable human-understandable diagnostics of model failures, an ideal critical subset should be faithful (preserving points that causally influence predictions) and conceptually coherent (forming semantically meaningful structures that align with human perception). We propose InfoCons, an explanation framework that applies information-theoretic principles to decompose the point cloud into 3D concepts, enabling the examination of their causal effect on model predictions with learnable priors. We evaluate InfoCons on synthetic datasets for classification, comparing it qualitatively and quantitatively with four baselines. We further demonstrate its scalability and flexibility on two real-world datasets and in two applications that utilize critical scores of PC.
nan
Article 1418
Title@2025-05-26 (1): Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates
Title: Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates | Schnelle differenzierbare Modale Simulation von nichtlinearen Strings, Membranen und Platten | 非线性字符串、膜和平板等非线性字符串的快速可区分模式模拟 2505.05940v2 |
Authors: Rodrigo Diaz, Mark Sandler
Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically informed audio synthesis. However, traditional implementations, particularly for non-linear models like the von K'arm'an plate, are computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast, differentiable, GPU-accelerated modal framework built with the JAX library, providing efficient simulations and enabling gradient-based inverse modelling. Benchmarks show that our approach significantly outperforms CPU and GPU-based implementations, particularly for simulations with many modes. Inverse modelling experiments demonstrate that our approach can recover physical parameters, including tension, stiffness, and geometry, from both synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to other methods, it provides greater interpretability and more compact parameterisation. The code is released as open source to support future research and applications in differentiable physical modelling and sound synthesis.
nan
Article 1419
Title@2025-05-26 (1): Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
Title: Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models | Jailbreak-AudioBench: In-Depth-Bewertung und Analyse von Jailbreak-Bedrohungen für große Audio-Sprachenmodelle | 监狱破碎-AudioBennch:对大型音频语言模型的监狱破碎威胁进行内部评价和分析 2501.13772v2 |
Authors: Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Sheng, Philip Torr, Jindong Gu, Renjing Xu
Large Language Models (LLMs) demonstrate impressive zero-shot performance across a wide range of natural language processing tasks. Integrating various modality encoders further expands their capabilities, giving rise to Multimodal Large Language Models (MLLMs) that process not only text but also visual and auditory modality inputs. However, these advanced capabilities may also pose significant security risks, as models can be exploited to generate harmful or inappropriate content through jailbreak attacks. While prior work has extensively explored how manipulating textual or visual modality inputs can circumvent safeguards in LLMs and MLLMs, the vulnerability of audio-specific Jailbreak on Large Audio-Language Models (LALMs) remains largely underexplored. To address this gap, we introduce Jailbreak-AudioBench, which consists of the Toolbox, curated Dataset, and comprehensive Benchmark. The Toolbox supports not only text-to-audio conversion but also a range of audio editing techniques. The curated Dataset provides diverse explicit and implicit jailbreak audio examples in both original and edited forms. Utilizing this dataset, we evaluate multiple state-of-the-art LALMs, establishing the most comprehensive audio jailbreak benchmark to date. Finally, Jailbreak-AudioBench establishes a foundation for advancing future research on LALMs safety alignment by enabling the in-depth exposure of more powerful jailbreak threats, such as query-based audio editing, and by facilitating the development of effective defense mechanisms.
nan
Article 1420
Title@2025-05-26 (1): Density Ratio-Free Doubly Robust Proxy Causal Learning
Title: Density Ratio-Free Doubly Robust Proxy Causal Learning | Dichte Verhältnis-frei doppelt robust Proxy Kausal Lernen | 低密度比率-无杜布利强力代理原因学习 2505.19807v1 |
Authors: Bariscan Bozkurt, Houssam Zenati, Dimitri Meunier, Liyuan Xu, Arthur Gretton
We study the problem of causal function estimation in the Proxy Causal Learning (PCL) framework, where confounders are not observed but proxies for the confounders are available. Two main approaches have been proposed: outcome bridge-based and treatment bridge-based methods. In this work, we propose two kernel-based doubly robust estimators that combine the strengths of both approaches, and naturally handle continuous and high-dimensional variables. Our identification strategy builds on a recent density ratio-free method for treatment bridge-based PCL; furthermore, in contrast to previous approaches, it does not require indicator functions or kernel smoothing over the treatment variable. These properties make it especially well-suited for continuous or high-dimensional treatments. By using kernel mean embeddings, we have closed-form solutions and strong consistency guarantees. Our estimators outperform existing methods on PCL benchmarks, including a prior doubly robust method that requires both kernel smoothing and density ratio estimation.
nan
Article 1421
Title@2025-05-26 (1): Continuous Simplicial Neural Networks
Title: Continuous Simplicial Neural Networks | Kontinuierliche simplizielle Neuralnetze | 简单连续神经网络 2503.12919v2 |
Authors: Aref Einizade, Dorina Thanou, Fragkiskos D. Malliaros, Jhony H. Giraldo
Simplicial complexes provide a powerful framework for modeling high-order interactions in structured data, making them particularly suitable for applications such as trajectory prediction and mesh processing. However, existing simplicial neural networks (SNNs), whether convolutional or attention-based, rely primarily on discrete filtering techniques, which can be restrictive. In contrast, partial differential equations (PDEs) on simplicial complexes offer a principled approach to capture continuous dynamics in such structures. In this work, we introduce continuous simplicial neural network (COSIMO), a novel SNN architecture derived from PDEs on simplicial complexes. We provide theoretical and experimental justifications of COSIMO’s stability under simplicial perturbations. Furthermore, we investigate the over-smoothing phenomenon, a common issue in geometric deep learning, demonstrating that COSIMO offers better control over this effect than discrete SNNs. Our experiments on real-world datasets demonstrate that COSIMO achieves competitive performance compared to state-of-the-art SNNs in complex and noisy environments.
nan
Article 1422
Title@2025-05-26 (1): Modulated differentiable STFT and balanced spectrum metric for freight train wheelset bearing cross-machine transfer monitoring under speed fluctuations
Title: Modulated differentiable STFT and balanced spectrum metric for freight train wheelset bearing cross-machine transfer monitoring under speed fluctuations | Modulierte differenzierbare STFT und symmetrische Spektralmetrik für Güterzug-Radsatzlager-Übertragungsüberwachung unter Geschwindigkeitsschwankungen | 根据速度波动情况对具有跨机械转移监测的货运火车轮轮车采用机动机动的可机动机动式STFT和平衡频谱度指标 2406.11917v3 |
Authors: Chao He, Hongmei Shi, Ruixin Li, Jianbo Li, ZuJun Yu
The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentiable short-time Fourier transform (STFT) and physics-informed balanced spectrum quality metric is proposed to learn domain-invariant and discriminative features under time-varying speeds. Firstly, due to insufficiency in extracting extract frequency components of time-varying speed signals using fixed windows, a modulated differentiable STFT (MDSTFT) that is interpretable with STFT-informed theoretical support, is proposed to extract the robust time-frequency spectrum (TFS). During training process, multiple windows with different lengths dynamically change. Also, in addition to the classification metric and domain discrepancy metric, we creatively introduce a third kind of metric, referred to as the physics-informed metric, to enhance transferable TFS. A physics-informed balanced spectrum quality (BSQ) regularization loss is devised to guide an optimization direction for MDSTFT and model. With it, not only can model acquire high-quality TFS, but also a physics-restricted domain adaptation network can be also acquired, making it learn real-world physics knowledge, ultimately diminish the domain discrepancy across different datasets. The experiment is conducted in the scenario of migrating from the laboratory datasets to the freight train dataset, indicating that the hybrid-driven pyDSN outperforms existing methods and has practical value.
nan
Article 1423
Title@2025-05-26 (1): Exploring Consciousness in LLMs: A Systematic Survey of Theories, Implementations, and Frontier Risks
Title: Exploring Consciousness in LLMs: A Systematic Survey of Theories, Implementations, and Frontier Risks | Erforschung des Bewusstseins in LLMs: Eine systematische Untersuchung von Theorien, Implementierungen und Grenzrisiken | 探索LLMM中的觉悟:对理论、实施和前沿风险的系统调查 2505.19806v1 |
Authors: Sirui Chen, Shuqin Ma, Shu Yu, Hanwang Zhang, Shengjie Zhao, Chaochao Lu
Consciousness stands as one of the most profound and distinguishing features of the human mind, fundamentally shaping our understanding of existence and agency. As large language models (LLMs) develop at an unprecedented pace, questions concerning intelligence and consciousness have become increasingly significant. However, discourse on LLM consciousness remains largely unexplored territory. In this paper, we first clarify frequently conflated terminologies (e.g., LLM consciousness and LLM awareness). Then, we systematically organize and synthesize existing research on LLM consciousness from both theoretical and empirical perspectives. Furthermore, we highlight potential frontier risks that conscious LLMs might introduce. Finally, we discuss current challenges and outline future directions in this emerging field. The references discussed in this paper are organized at https://github.com/OpenCausaLab/Awesome-LLM-Consciousness.
nan
Article 1424
Title@2025-05-26 (1): GraphAU-Pain: Graph-based Action Unit Representation for Pain Intensity Estimation
Title: GraphAU-Pain: Graph-based Action Unit Representation for Pain Intensity Estimation | GraphAU-Pain: Darstellung der Graph-basierten Aktionseinheit für Schmerzintensitätsabschätzung | 图AAU-Pain: 以图表为基础的行动股 疼痛强度估计代表 2505.19802v1 |
Authors: Zhiyu Wang, Yang Liu, Hatice Gunes
Understanding pain-related facial behaviors is essential for digital healthcare in terms of effective monitoring, assisted diagnostics, and treatment planning, particularly for patients unable to communicate verbally. Existing data-driven methods of detecting pain from facial expressions are limited due to interpretability and severity quantification. To this end, we propose GraphAU-Pain, leveraging a graph-based framework to model facial Action Units (AUs) and their interrelationships for pain intensity estimation. AUs are represented as graph nodes, with co-occurrence relationships as edges, enabling a more expressive depiction of pain-related facial behaviors. By utilizing a relational graph neural network, our framework offers improved interpretability and significant performance gains. Experiments conducted on the publicly available UNBC dataset demonstrate the effectiveness of the GraphAU-Pain, achieving an F1-score of 66.21% and accuracy of 87.61% in pain intensity estimation.
nan
Article 1425
Title@2025-05-26 (1): Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks
Title: Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks | Nicht-asymptotische Konvergenzanalyse des stochastischen Gradienten Hamiltonian Monte Carlo Algorithmus mit diskontinuierlichem stochastischem Gradienten mit Anwendungen zum Training von ReLU-Neuralnetzwerken | 对随机梯度汉密尔顿·汉密尔顿·蒙特-蒙特卡洛算法进行非症状趋同分析,使用不连续的随机梯度,并用于RELU神经网络培训 2409.17107v2 |
Authors: Luxu Liang, Ariel Neufeld, Ying Zhang
In this paper, we provide a non-asymptotic analysis of the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm to a target measure in Wasserstein-1 and Wasserstein-2 distance. Crucially, compared to the existing literature on SGHMC, we allow its stochastic gradient to be discontinuous. This allows us to provide explicit upper bounds, which can be controlled to be arbitrarily small, for the expected excess risk of non-convex stochastic optimization problems with discontinuous stochastic gradients, including, among others, the training of neural networks with ReLU activation function. To illustrate the applicability of our main results, we consider numerical experiments on quantile estimation and on several optimization problems involving ReLU neural networks relevant in finance and artificial intelligence.
nan
Article 1426
Title@2025-05-26 (1): The Missing Point in Vision Transformers for Universal Image Segmentation
Title: The Missing Point in Vision Transformers for Universal Image Segmentation | Der fehlende Punkt in Vision Transformers für die universelle Bildsegmentierung | 通用图像分割的愿景变异器中的缺失点 2505.19795v1 |
Authors: Sajjad Shahabodini, Mobina Mansoori, Farnoush Bayatmakou, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi
Image segmentation remains a challenging task in computer vision, demanding robust mask generation and precise classification. Recent mask-based approaches yield high-quality masks by capturing global context. However, accurately classifying these masks, especially in the presence of ambiguous boundaries and imbalanced class distributions, remains an open challenge. In this work, we introduce ViT-P, a novel two-stage segmentation framework that decouples mask generation from classification. The first stage employs a proposal generator to produce class-agnostic mask proposals, while the second stage utilizes a point-based classification model built on the Vision Transformer (ViT) to refine predictions by focusing on mask central points. ViT-P serves as a pre-training-free adapter, allowing the integration of various pre-trained vision transformers without modifying their architecture, ensuring adaptability to dense prediction tasks. Furthermore, we demonstrate that coarse and bounding box annotations can effectively enhance classification without requiring additional training on fine annotation datasets, reducing annotation costs while maintaining strong performance. Extensive experiments across COCO, ADE20K, and Cityscapes datasets validate the effectiveness of ViT-P, achieving state-of-the-art results with 54.0 PQ on ADE20K panoptic segmentation, 87.4 mIoU on Cityscapes semantic segmentation, and 63.6 mIoU on ADE20K semantic segmentation. The code and pretrained models are available at: https://github.com/sajjad-sh33/ViT-P}{https://github.com/sajjad-sh33/ViT-P.
nan
Article 1427
Title@2025-05-26 (1): What Can RL Bring to VLA Generalization? An Empirical Study
Title: What Can RL Bring to VLA Generalization? An Empirical Study | Was kann RL zur VLA-Verallgemeinerung bringen? Eine empirische Studie | RL能带给VLA的概括化带来什么?经验研究。 2505.19789v1 |
Authors: Jijia Liu, Feng Gao, Bingwen Wei, Xinlei Chen, Qingmin Liao, Yi Wu, Chao Yu, Yu Wang
Large Vision-Language Action (VLA) models have shown significant potential for embodied AI. However, their predominant training via supervised fine-tuning (SFT) limits generalization due to susceptibility to compounding errors under distribution shifts. Reinforcement learning (RL) offers a path to overcome these limitations by optimizing for task objectives via trial-and-error, yet a systematic understanding of its specific generalization benefits for VLAs compared to SFT is lacking. To address this, our study introduces a comprehensive benchmark for evaluating VLA generalization and systematically investigates the impact of RL fine-tuning across diverse visual, semantic, and execution dimensions. Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io
nan
Article 1428
Title@2025-05-26 (1): MedDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support
Title: MedDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support | MedDreamer: Modellbasiertes Verstärkungslernen mit latenter Imagination auf komplexen EHRs für die klinische Entscheidungsunterstützung | Medreamer:以模型为基础的强化学习,对临床决定支助的复杂电子人力资源进行中层想象 2505.19785v1 |
Authors: Qianyi Xu, Gousia Habib, Dilruk Perera, Mengling Feng
Timely and personalized treatment decisions are essential across a wide range of healthcare settings where patient responses vary significantly and evolve over time. Clinical data used to support these decisions are often irregularly sampled, sparse, and noisy. Existing decision support systems commonly rely on discretization and imputation, which can distort critical temporal dynamics and degrade decision quality. Moreover, they often overlook the clinical significance of irregular recording frequencies, filtering out patterns in how and when data is collected. Reinforcement Learning (RL) is a natural fit for clinical decision-making, enabling sequential, long-term optimization in dynamic, uncertain environments. However, most existing treatment recommendation systems are model-free and trained solely on offline data, making them sample-inefficient, sensitive to data quality, and poorly generalizable across tasks or cohorts. To address these limitations, we propose MedDreamer, a two-phase model-based RL framework for personalized treatment recommendation. MedDreamer uses a world model with an Adaptive Feature Integration (AFI) module to effectively model irregular, sparse clinical data. Through latent imagination, it simulates plausible patient trajectories to enhance learning, refining its policy using a mix of real and imagined experiences. This enables learning policies that go beyond suboptimal historical decisions while remaining grounded in clinical data. To our knowledge, this is the first application of latent imagination to irregular healthcare data. Evaluations on sepsis and mechanical ventilation (MV) treatment using two large-scale EHR datasets show that MedDreamer outperforms both model-free and model-based baselines in clinical outcomes and off-policy metrics.
nan
Article 1429
Title@2025-05-26 (1): Out-of-distribution Reject Option Method for Dataset Shift Problem in Early Disease Onset Prediction
Title: Out-of-distribution Reject Option Method for Dataset Shift Problem in Early Disease Onset Prediction | Out-of-Distribution Ablehnung der Option Methode für Datensatz Verschiebung Problem bei Früherkrankungen Beginn Vorhersage | 用于早期疾病上移预测中数据集移位问题的不分发拒绝选项方法 2405.19864v2 |
Authors: Taisei Tosaki, Eiichiro Uchino, Ryosuke Kojima, Yohei Mineharu, Yuji Okamoto, Mikio Arita, Nobuyuki Miyai, Yoshinori Tamada, Tatsuya Mikami, Koichi Murashita, Shigeyuki Nakaji, Yasushi Okuno
Machine learning is increasingly used to predict lifestyle-related disease onset using health and medical data. However, its predictive accuracy for use is often hindered by dataset shift, which refers to discrepancies in data distribution between the training and testing datasets. This issue leads to the misclassification of out-of-distribution (OOD) data. To diminish dataset shift in real-world settings, this paper proposes the out-of-distribution reject option for prediction (ODROP). This method integrates an OOD detection model to preclude OOD data from the prediction phase. We used two real-world health checkup datasets (Hirosaki and Wakayama) with dataset shift, across three disease onset prediction tasks: diabetes, dyslipidemia, and hypertension. Both components of ODROP method – the OOD detection model and the prediction model – were trained on the Hirosaki dataset. We assessed the effectiveness of ODROP on the Wakayama dataset using AUROC-rejection rate curve plot. In the five OOD detection approaches (the variational autoencoder, neural network ensemble std, neural network ensemble epistemic, neural network energy, and neural network gaussian mixture based energy measurement), the variational autoencoder method demonstrated notably higher stability and a greater improvement in AUROC. For example, in the Wakayama dataset, the AUROC for diabetes onset increased from 0.80 without ODROP to 0.90 at a 31.1% rejection rate, and for dyslipidemia, it improved from 0.70 without ODROP to 0.76 at a 34% rejection rate. In addition, we categorized dataset shifts into two types using SHAP clustering – those that considerably affect predictions and those that do not. This study is the first to apply OOD detection to actual health and medical data, demonstrating its potential to substantially improve the accuracy and reliability of disease prediction models amidst dataset shift.
nan
Article 1430
Title@2025-05-26 (1): Mol-LLM: Multimodal Generalist Molecular LLM with Improved Graph Utilization
Title: Mol-LLM: Multimodal Generalist Molecular LLM with Improved Graph Utilization | Mol-LLM: Multimodaler Generalist Molecular LLM mit verbesserter Graphenverwendung | Mol-LLM:利用改进图表的多式通用主义分子有限力M 2502.02810v2 |
Authors: Chanhui Lee, Hanbum Ko, Yuheon Song, YongJun Jeong, Rodrigo Hormazabal, Sehui Han, Kyunghoon Bae, Sungbin Lim, Sungwoong Kim
Recent advances in large language models (LLMs) have led to models that tackle diverse molecular tasks, such as chemical reaction prediction and molecular property prediction. Large-scale molecular instruction-tuning datasets have enabled sequence-only (e.g., SMILES or SELFIES) generalist molecular LLMs, and researchers are now exploring multimodal approaches that incorporate molecular structural information for further gains. However, a genuinely multimodal, generalist LLM that covers a broad spectrum of molecular tasks has yet to be fully investigated. We observe that naive next token prediction training ignores graph-structural information, limiting an LLM’s ability to exploit molecular graphs. To address this, we propose (i) Molecular structure Preference Optimization (MolPO), which facilitates graph usage by optimizing preferences between pairs of correct and perturbed molecular structures, and (ii) an advanced graph encoder with a tailored pre-training strategy to improve the effect of graph utilization by MolPO. Building on these contributions, we introduce Mol-LLM, the first multimodal generalist model that (a) handles a broad spectrum of molecular tasks among molecular LLMs, (b) explicitly leverages molecular-structure information, and (c) takes advantage of extensive instruction tuning. Mol-LLM attains state-of-the-art or comparable results across the most comprehensive molecular-LLM benchmark-even on out-of-distribution datasets for reaction and property prediction, where it surpasses prior generalist molecular LLMs by a large margin.
nan
Article 1431
Title@2025-05-26 (1): Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models
Title: Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models | Fortschritte bei der Klassifikation medizinischer Bilder durch Modelle der Fine-Tuning Natural Domain Foundation | 通过精美开发自然域基金会模型提高医学图像分类 2505.19779v1 |
Authors: Mobina Mansoori, Sajjad Shahabodini, Farnoush Bayatmakou, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi
Using massive datasets, foundation models are large-scale, pre-trained models that perform a wide range of tasks. These models have shown consistently improved results with the introduction of new methods. It is crucial to analyze how these trends impact the medical field and determine whether these advancements can drive meaningful change. This study investigates the application of recent state-of-the-art foundation models, DINOv2, MAE, VMamba, CoCa, SAM2, and AIMv2, for medical image classification. We explore their effectiveness on datasets including CBIS-DDSM for mammography, ISIC2019 for skin lesions, APTOS2019 for diabetic retinopathy, and CHEXPERT for chest radiographs. By fine-tuning these models and evaluating their configurations, we aim to understand the potential of these advancements in medical image classification. The results indicate that these advanced models significantly enhance classification outcomes, demonstrating robust performance despite limited labeled data. Based on our results, AIMv2, DINOv2, and SAM2 models outperformed others, demonstrating that progress in natural domain training has positively impacted the medical domain and improved classification outcomes. Our code is publicly available at: https://github.com/sajjad-sh33/Medical-Transfer-Learning.
nan
Article 1432
Title@2025-05-26 (1): Query Performance Prediction using Relevance Judgments Generated by Large Language Models
Title: Query Performance Prediction using Relevance Judgments Generated by Large Language Models | Abfrage der Leistungsvorhersage anhand von Relevanzurteilen, die von großen Sprachmodellen erzeugt werden | 使用大语言模型产生的相关性判断的查询性绩效预测 2404.01012v3 |
Authors: Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke
Query performance prediction (QPP) aims to estimate the retrieval quality of a search system for a query without human relevance judgments. Previous QPP methods typically return a single scalar value and do not require the predicted values to approximate a specific information retrieval (IR) evaluation measure, leading to certain drawbacks: (i) a single scalar is insufficient to accurately represent different IR evaluation measures, especially when metrics do not highly correlate, and (ii) a single scalar limits the interpretability of QPP methods because solely using a scalar is insufficient to explain QPP results. To address these issues, we propose a QPP framework using automatically generated relevance judgments (QPP-GenRE), which decomposes QPP into independent subtasks of predicting the relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels. This also allows us to interpret predicted IR evaluation measures, and identify, track and rectify errors in generated relevance judgments to improve QPP quality. We predict an item’s relevance by using open-source large language models (LLMs) to ensure scientific reproducibility. We face two main challenges: (i) excessive computational costs of judging an entire corpus for predicting a metric considering recall, and (ii) limited performance in prompting open-source LLMs in a zero-/few-shot manner. To solve the challenges, we devise an approximation strategy to predict an IR measure considering recall and propose to fine-tune open-source LLMs using human-labeled relevance judgments. Experiments on the TREC 2019 to 2022 deep learning tracks and CAsT-19 and 20 datasets show that QPP-GenRE achieves state-of-the-art QPP quality for both lexical and neural rankers.
nan
Article 1433
Title@2025-05-26 (1): Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
Title: Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO | Verständnis der Leistungslücke im Preference Learning: Eine Dichotomie von RLHF und DPO | 了解优先学习方面的绩效差距:RLHF和DPO的二分切开术 2505.19770v1 |
Authors: Ruizhe Shi, Minhak Song, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du
We present a fine-grained theoretical analysis of the performance gap between reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) under a representation gap. Our study decomposes this gap into two sources: an explicit representation gap under exact optimization and an implicit representation gap under finite samples. In the exact optimization setting, we characterize how the relative capacities of the reward and policy model classes influence the final policy qualities. We show that RLHF, DPO, or online DPO can outperform one another depending on the type of model mis-specifications. Notably, online DPO can outperform both RLHF and standard DPO when the reward and policy model classes are isomorphic and both mis-specified. In the approximate optimization setting, we provide a concrete construction where the ground-truth reward is implicitly sparse and show that RLHF requires significantly fewer samples than DPO to recover an effective reward model – highlighting a statistical advantage of two-stage learning. Together, these results provide a comprehensive understanding of the performance gap between RLHF and DPO under various settings, and offer practical insights into when each method is preferred.
nan
Article 1434
Title@2025-05-26 (1): Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases
Title: Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases | Diff-Def: Diffusionsgenerierte Deformationsfelder für Bedingte Atlase | Diff- Def: 用于条件图集的 Diff- Def: 用于条件图集的 Dif- 扩散- 驱动解析字段 2403.16776v2 |
Authors: Sophie Starck, Vasiliki Sideri-Lampretsa, Bernhard Kainz, Martin J. Menten, Tamara T. Mueller, Daniel Rueckert
Anatomical atlases are widely used for population studies and analysis. Conditional atlases target a specific sub-population defined via certain conditions, such as demographics or pathologies, and allow for the investigation of fine-grained anatomical differences like morphological changes associated with ageing or disease. Existing approaches use either registration-based methods that are often unable to handle large anatomical variations or generative adversarial models, which are challenging to train since they can suffer from training instabilities. Instead of generating atlases directly in as intensities, we propose using latent diffusion models to generate deformation fields, which transform a general population atlas into one representing a specific sub-population. Our approach ensures structural integrity, enhances interpretability and avoids hallucinations that may arise during direct image synthesis by generating this deformation field and regularising it using a neighbourhood of images. We compare our method to several state-of-the-art atlas generation methods using brain MR images from the UK Biobank. Our method generates highly realistic atlases with smooth transformations and high anatomical fidelity, outperforming existing baselines. We demonstrate the quality of these atlases through comprehensive evaluations, including quantitative metrics for anatomical accuracy, perceptual similarity, and qualitative analyses displaying the consistency and realism of the generated atlases.
nan
Article 1435
Title@2025-05-26 (1): Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding
Title: Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding | Agentic Predictor: Leistungsvorhersage für Agentic Workflows über Multi-View-Encoding | AG 预测员:通过多查看编码对AG-工作流程的性能预测 2505.19764v1 |
Authors: Patara Trirat, Wonyong Jeong, Sung Ju Hwang
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but optimizing LLM-based agentic systems remains challenging due to the vast search space of agent configurations, prompting strategies, and communication patterns. Existing approaches often rely on heuristic-based tuning or exhaustive evaluation, which can be computationally expensive and suboptimal. This paper proposes Agentic Predictor, a lightweight predictor for efficient agentic workflow evaluation. Agentic Predictor is equipped with a multi-view workflow encoding technique that leverages multi-view representation learning of agentic systems by incorporating code architecture, textual prompts, and interaction graph features. To achieve high predictive accuracy while significantly reducing the number of required workflow evaluations for training a predictor, Agentic Predictor employs cross-domain unsupervised pretraining. By learning to approximate task success rates, Agentic Predictor enables fast and accurate selection of optimal agentic workflow configurations for a given task, significantly reducing the need for expensive trial-and-error evaluations. Experiments on a carefully curated benchmark spanning three domains show that our predictor outperforms state-of-the-art methods in both predictive accuracy and workflow utility, highlighting the potential of performance predictors in streamlining the design of LLM-based agentic workflows.
nan
Article 1436
Title@2025-05-26 (1): Unfolding AlphaFold’s Bayesian Roots in Probability Kinematics
Title: Unfolding AlphaFold’s Bayesian Roots in Probability Kinematics | AlphaFolds Bayesische Wurzeln in der Wahrscheinlichkeitskinematik entfalten | 将 AlphaFold 的贝叶根在概率 Kinematics 中卸载 2505.19763v1 |
Authors: Thomas Hamelryck, Kanti V. Mardia
We present a novel theoretical interpretation of AlphaFold1. The seminal breakthrough of AlphaFold1 in protein structure prediction by deep learning relied on a learned potential energy function, in contrast to the later end-to-end architectures of AlphaFold2 and AlphaFold3. While this potential was originally justified by referring to physical potentials of mean force (PMFs), we reinterpret AlphaFold1’s potential as an instance of probability kinematics - also known as Jeffrey conditioning - a principled but underrecognised generalization of conventional Bayesian updating. Probability kinematics accommodates uncertain or soft evidence in the form of updated probabilities over a partition. This perspective reveals AlphaFold1’s potential as a form of generalized Bayesian updating, rather than a thermodynamic potential. To confirm our probabilistic framework’s scope and precision, we analyze a synthetic 2D model in which an angular random walk prior is updated with evidence on distances via probability kinematics, mirroring AlphaFold1’s approach. This theoretical contribution connects AlphaFold1 to a broader class of well-justified Bayesian methods, allowing precise quantification, surpassing merely qualitative heuristics based on PMFs. More broadly, given the achievements of AlphaFold1, probability kinematics holds considerable promise for probabilistic deep learning, as it allows for the formulation of complex models from a few simpler components.
nan
Article 1437
Title@2025-05-26 (1): In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement
Title: In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement | In-Context-Demonstrationsfragen: Zur Prompt-Optimierung für Pseudo-Supervision-Verfeinerung | 内文示范事项:关于Psuedo-监督改进的迅速优化 2410.03124v2 |
Authors: Zhen-Yu Zhang, Jiandong Zhang, Huaxiu Yao, Gang Niu, Masashi Sugiyama
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality. Most existing methods rely on human supervision or parameter retraining, both of which are costly in terms of data collection and computational resources. To handle these challenges, a direct solution is to generate ``high-confidence’’ data from unsupervised downstream tasks and use them for in-context prompting or prompt optimization to refine the pseudo-supervision. However, relying solely on such data may lead to overfitting. In this paper, we leverage the in-context learning (ICL) abilities of LLMs and propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision. The proposed learning objective ensures that the optimized prompt guides the LLM to generate consistent responses for a given input when pseudo-supervised data from the downstream task are used as demonstrations, enabling refinement over the entire pseudo-supervision. The prompt is optimized by translating gradient signals into textual critiques, which serve as feedback to iteratively refine the prompt and model responses. Theoretical analysis in a simplified classification setting shows that the refined pseudo-supervision exhibits a geometric clustering structure, helping to mitigate overfitting. Experiments on question answering, natural language inference benchmarks, and a real-world molecule optimization task, show the effectiveness of the proposed algorithm.
nan
Article 1438
Title@2025-05-26 (1): Semantic-Aware Interpretable Multimodal Music Auto-Tagging
Title: Semantic-Aware Interpretable Multimodal Music Auto-Tagging | Semantic-Aware Interpretierbare multimodale Musik Auto-Tagging | 解析多式音乐 自动调制 2505.17233v2 |
Authors: Andreas Patakis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou
Music auto-tagging is essential for organizing and discovering music in extensive digital libraries. While foundation models achieve exceptional performance in this domain, their outputs often lack interpretability, limiting trust and usability for researchers and end-users alike. In this work, we present an interpretable framework for music auto-tagging that leverages groups of musically meaningful multimodal features, derived from signal processing, deep learning, ontology engineering, and natural language processing. To enhance interpretability, we cluster features semantically and employ an expectation maximization algorithm, assigning distinct weights to each group based on its contribution to the tagging process. Our method achieves competitive tagging performance while offering a deeper understanding of the decision-making process, paving the way for more transparent and user-centric music tagging systems.
nan
Article 1439
Title@2025-05-26 (1): CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement
Title: CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement | CIDRe: Ein referenzfreies Multi-Aspekt-Kriterium für die Qualitätsmessung von Code Comment | CIDRe: 守则评论质量衡量的无参考性、无参考性、多特征的多标准标准 2505.19757v1 |
Authors: Maria Dziuba, Valentin Malykh
Effective generation of structured code comments requires robust quality metrics for dataset curation, yet existing approaches (SIDE, MIDQ, STASIS) suffer from limited code-comment analysis. We propose CIDRe, a language-agnostic reference-free quality criterion combining four synergistic aspects: (1) relevance (code-comment semantic alignment), (2) informativeness (functional coverage), (3) completeness (presence of all structure sections), and (4) description length (detail sufficiency). We validate our criterion on a manually annotated dataset. Experiments demonstrate CIDRe’s superiority over existing metrics, achieving improvement in cross-entropy evaluation. When applied to filter comments, the models finetuned on CIDRe-filtered data show statistically significant quality gains in GPT-4o-mini assessments.
nan
Article 1440
Title@2025-05-26 (1): Discrete Markov Bridge
Title: Discrete Markov Bridge | Diskretierte Markov-Brücke | 分立马尔科夫桥 2505.19752v1 |
Authors: Hengli Li, Yuxuan Wang, Song-Chun Zhu, Ying Nian Wu, Zilong Zheng
Discrete diffusion has recently emerged as a promising paradigm in discrete data modeling. However, existing methods typically rely on a fixed rate transition matrix during training, which not only limits the expressiveness of latent representations, a fundamental strength of variational methods, but also constrains the overall design space. To address these limitations, we propose Discrete Markov Bridge, a novel framework specifically designed for discrete representation learning. Our approach is built upon two key components: Matrix Learning and Score Learning. We conduct a rigorous theoretical analysis, establishing formal performance guarantees for Matrix Learning and proving the convergence of the overall framework. Furthermore, we analyze the space complexity of our method, addressing practical constraints identified in prior studies. Extensive empirical evaluations validate the effectiveness of the proposed Discrete Markov Bridge, which achieves an Evidence Lower Bound (ELBO) of 1.38 on the Text8 dataset, outperforming established baselines. Moreover, the proposed model demonstrates competitive performance on the CIFAR-10 dataset, achieving results comparable to those obtained by image-specific generation approaches.
nan
Article 1441
Title@2025-05-26 (1): Machine Learning Algorithm for Noise Reduction and Disease-Causing Gene Feature Extraction in Gene Sequencing Data
Title: Machine Learning Algorithm for Noise Reduction and Disease-Causing Gene Feature Extraction in Gene Sequencing Data | Maschinelles Lernen Algorithmen zur Lärmreduzierung und krankheitsverursachende Gen-Feature-Extraktion in Gensequenzierungsdaten | 用于减少噪音和在基因测序数据中进行疾病传播的基因特征采掘的机器学习算法 2505.19740v1 |
Authors: Weichen Si, Yihao Ou, Zhen Tian
In this study, we propose a machine learning-based method for noise reduction and disease-causing gene feature extraction in gene sequencing DeepSeqDenoise algorithm combines CNN and RNN to effectively remove the sequencing noise, and improves the signal-to-noise ratio by 9.4 dB. We screened 17 key features by feature engineering, and constructed an integrated learning model to predict disease-causing genes with 94.3% accuracy. We successfully identified 57 new candidate disease-causing genes in a cardiovascular disease cohort validation, and detected 3 missed variants in clinical applications. The method significantly outperforms existing tools and provides strong support for accurate diagnosis of genetic diseases.
nan
Article 1442
Title@2025-05-26 (1): Weighted Leave-One-Out Cross Validation
Title: Weighted Leave-One-Out Cross Validation | Gewichtete Leave-One-Out Cross-Validierung | 加权请假一次性离职后交叉验证 2505.19737v1 |
Authors: Luc Pronzato, Maria-João Rendas
We present a weighted version of Leave-One-Out (LOO) cross-validation for estimating the Integrated Squared Error (ISE) when approximating an unknown function by a predictor that depends linearly on evaluations of the function over a finite collection of sites. The method relies on the construction of the best linear estimator of the squared prediction error at an arbitrary unsampled site based on squared LOO residuals, assuming that the function is a realization of a Gaussian Process (GP). A theoretical analysis of performance of the ISE estimator is presented, and robustness with respect to the choice of the GP kernel is investigated first analytically, then through numerical examples. Overall, the estimation of ISE is significantly more precise than with classical, unweighted, LOO cross validation. Application to model selection is briefly considered through examples.
nan
Article 1443
Title@2025-05-26 (1): Using Time Structure to Estimate Causal Effects
Title: Using Time Structure to Estimate Causal Effects | Zeitstruktur zur Schätzung von Kausalitätseffekten verwenden | 利用时间结构估计因果关系 2504.11076v2 |
Authors: Tom Hochsprung, Jakob Runge, Andreas Gerhardus
There exist several approaches for estimating causal effects in time series when latent confounding is present. Many of these approaches rely on additional auxiliary observed variables or time series such as instruments, negative controls or time series that satisfy the front- or backdoor criterion in certain graphs. In this paper, we present a novel approach for estimating direct (and via Wright’s path rule total) causal effects in a time series setup which does not rely on additional auxiliary observed variables or time series. This approach assumes that the underlying time series is a Structural Vector Autoregressive (SVAR) process and estimates direct causal effects by solving certain linear equation systems made up of different covariances and model parameters. We state sufficient graphical criteria in terms of the so-called full time graph under which these linear equations systems are uniquely solvable and under which their solutions contain the to-be-identified direct causal effects as components. We also state sufficient lag-based criteria under which the previously mentioned graphical conditions are satisfied and, thus, under which direct causal effects are identifiable. Several numerical experiments underline the correctness and applicability of our results.
nan
Article 1444
Title@2025-05-26 (1): Accelerating Nash Learning from Human Feedback via Mirror Prox
Title: Accelerating Nash Learning from Human Feedback via Mirror Prox | Beschleunigendes Nash-Lernen aus menschlichem Feedback über Spiegelprox | 通过镜像Prox从人类反馈中加快学习 2505.19731v1 |
Authors: Daniil Tiapkin, Daniele Calandriello, Denis Belomestny, Eric Moulines, Alexey Naumov, Kashif Rasul, Michal Valko, Pierre Menard
Traditional Reinforcement Learning from Human Feedback (RLHF) often relies on reward models, frequently assuming preference structures like the Bradley-Terry model, which may not accurately capture the complexities of real human preferences (e.g., intransitivity). Nash Learning from Human Feedback (NLHF) offers a more direct alternative by framing the problem as finding a Nash equilibrium of a game defined by these preferences. In this work, we introduce Nash Mirror Prox ($\mathtt{Nash-MP}$), an online NLHF algorithm that leverages the Mirror Prox optimization scheme to achieve fast and stable convergence to the Nash equilibrium. Our theoretical analysis establishes that Nash-MP exhibits last-iterate linear convergence towards the $\beta$-regularized Nash equilibrium. Specifically, we prove that the KL-divergence to the optimal policy decreases at a rate of order $(1+2\beta)^{-N/2}$, where $N$ is a number of preference queries. We further demonstrate last-iterate linear convergence for the exploitability gap and uniformly for the span semi-norm of log-probabilities, with all these rates being independent of the size of the action space. Furthermore, we propose and analyze an approximate version of Nash-MP where proximal steps are estimated using stochastic policy gradients, making the algorithm closer to applications. Finally, we detail a practical implementation strategy for fine-tuning large language models and present experiments that demonstrate its competitive performance and compatibility with existing methods.
nan
Article 1445
Title@2025-05-26 (1): Stuffed Mamba: Oversized States Lead to the Inability to Forget
Title: Stuffed Mamba: Oversized States Lead to the Inability to Forget | Gefüllte Mamba: Übergroße Staaten führen zu der Unfähigkeit zu vergessen | 马姆巴:国家规模过大,导致无法忘却 2410.07145v2 |
Authors: Yingfa Chen, Xinrong Zhang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
Recent advancements in recurrent architectures, such as Mamba and RWKV, have showcased strong language capabilities. Unlike transformer-based models, these architectures encode all contextual information into a fixed-size state, leading to great inference efficiency. However, this approach can cause information interference, where different token data conflicts, resulting in performance degradation and incoherent outputs beyond a certain context length. To prevent this, most RNNs incorporate mechanisms designed to “forget” earlier tokens. In this paper, we reveal that Mamba-based models struggle to effectively forget earlier tokens even with built-in forgetting mechanisms. We demonstrate that this issue stems from training on contexts that are too short for the state size, enabling the model to perform well without needing to learn how to forget. Then, we show that the minimum training length required for the model to learn forgetting scales linearly with the state size, and the maximum context length for accurate retrieval of a 5-digit passkey scales exponentially with the state size, indicating that the model retains some information beyond the point where forgetting begins. These findings highlight a critical limitation in current RNN architectures and provide valuable insights for improving long-context modeling. Our work suggests that future RNN designs must account for the interplay between state size, training length, and forgetting mechanisms to achieve robust performance in long-context tasks.
nan
Article 1446
Title@2025-05-26 (1): A Structured Tour of Optimization with Finite Differences
Title: A Structured Tour of Optimization with Finite Differences | Eine strukturierte Tour der Optimierung mit endlichen Unterschieden | 结构化优化与有限差异旅游 2505.19720v1 |
Authors: Marco Rando, Cesare Molinari, Lorenzo Rosasco, Silvia Villa
Finite-difference methods are widely used for zeroth-order optimization in settings where gradient information is unavailable or expensive to compute. These procedures mimic first-order strategies by approximating gradients through function evaluations along a set of random directions. From a theoretical perspective, recent studies indicate that imposing structure (such as orthogonality) on the chosen directions allows for the derivation of convergence rates comparable to those achieved with unstructured random directions (i.e., directions sampled independently from a distribution). Empirically, although structured directions are expected to enhance performance, they often introduce additional computational costs, which can limit their applicability in high-dimensional settings. In this work, we examine the impact of structured direction selection in finite-difference methods. We review and extend several strategies for constructing structured direction matrices and compare them with unstructured approaches in terms of computational cost, gradient approximation quality, and convergence behavior. Our evaluation spans both synthetic tasks and real-world applications such as adversarial perturbation. The results demonstrate that structured directions can be generated with computational costs comparable to unstructured ones while significantly improving gradient estimation accuracy and optimization performance.
nan
Article 1447
Title@2025-05-26 (1): OCN: Effectively Utilizing Higher-Order Common Neighbors for Better Link Prediction
Title: OCN: Effectively Utilizing Higher-Order Common Neighbors for Better Link Prediction | OCN: Höhere Ordnung effektiv nutzen gemeinsame Nachbarn für bessere Link-Vorhersage | OCN:有效利用高端共同邻居改善联系预测 2505.19719v1 |
Authors: Juntong Wang, Xiyuan Wang, Muhan Zhang
Common Neighbors (CNs) and their higher-order variants are important pairwise features widely used in state-of-the-art link prediction methods. However, existing methods often struggle with the repetition across different orders of CNs and fail to fully leverage their potential. We identify that these limitations stem from two key issues: redundancy and over-smoothing in high-order common neighbors. To address these challenges, we design orthogonalization to eliminate redundancy between different-order CNs and normalization to mitigate over-smoothing. By combining these two techniques, we propose Orthogonal Common Neighbor (OCN), a novel approach that significantly outperforms the strongest baselines by an average of 7.7% on popular link prediction benchmarks. A thorough theoretical analysis is provided to support our method. Ablation studies also verify the effectiveness of our orthogonalization and normalization techniques.
nan
Article 1448
Title@2025-05-26 (1): Graceful Forgetting in Generative Language Models
Title: Graceful Forgetting in Generative Language Models | Anmutiges Vergessen in generativen Sprachmodellen | 在创用语言模型中优雅地忘却 2505.19715v1 |
Authors: Chunyang Jiang, Chi-min Chan, Yiyang Cai, Yulong Liu, Wei Xue, Yike Guo
Recently, the pretrain-finetune paradigm has become a cornerstone in various deep learning areas. While in general the pre-trained model would promote both effectiveness and efficiency of downstream tasks fine-tuning, studies have shown that not all knowledge acquired during pre-training is beneficial. Some of the knowledge may actually bring detrimental effects to the fine-tuning tasks, which is also known as negative transfer. To address this problem, graceful forgetting has emerged as a promising approach. The core principle of graceful forgetting is to enhance the learning plasticity of the target task by selectively discarding irrelevant knowledge. However, this approach remains underexplored in the context of generative language models, and it is often challenging to migrate existing forgetting algorithms to these models due to architecture incompatibility. To bridge this gap, in this paper we propose a novel framework, Learning With Forgetting (LWF), to achieve graceful forgetting in generative language models. With Fisher Information Matrix weighting the intended parameter updates, LWF computes forgetting confidence to evaluate self-generated knowledge regarding the forgetting task, and consequently, knowledge with high confidence is periodically unlearned during fine-tuning. Our experiments demonstrate that, although thoroughly uncovering the mechanisms of knowledge interaction remains challenging in pre-trained language models, applying graceful forgetting can contribute to enhanced fine-tuning performance.
nan
Article 1449
Title@2025-05-26 (1): MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Title: MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning | MT$^{3}$: Skalierung von MLLM-basierten Textbildmaschinenübersetzungen über Multi-Task-Verstärkungslernen | MT$=%3}$:通过多任务强化学习,扩大基于MLLM的文本图像机翻译 2505.19714v1 |
Authors: Zhaopeng Feng, Yupu Liang, Shaosheng Cao, Jiayuan Su, Jiahan Ren, Zhe Xu, Yao Hu, Wenxuan Huang, Jian Wu, Zuozhu Liu
Text Image Machine Translation (TIMT)-the task of translating textual content embedded in images-is critical for applications in accessibility, cross-lingual information access, and real-world document understanding. However, TIMT remains a complex challenge due to the need for accurate optical character recognition (OCR), robust visual-text reasoning, and high-quality translation, often requiring cascading multi-stage pipelines. Recent advances in large-scale Reinforcement Learning (RL) have improved reasoning in Large Language Models (LLMs) and Multimodal LLMs (MLLMs), but their application to end-to-end TIMT is still underexplored. To bridge this gap, we introduce MT$^{3}$, the first framework to apply Multi-Task RL to MLLMs for end-to-end TIMT. MT$^{3}$ adopts a multi-task optimization paradigm targeting three key sub-skills: text recognition, context-aware reasoning, and translation. It is trained using a novel multi-mixed reward mechanism that adapts rule-based RL strategies to TIMT’s intricacies, offering fine-grained, non-binary feedback across tasks. Furthermore, to facilitate the evaluation of TIMT in authentic cross-cultural and real-world social media contexts, we introduced XHSPost, the first social media TIMT benchmark. Our MT$^{3}$-7B-Zero achieves state-of-the-art results on the latest in-domain MIT-10M benchmark, outperforming strong baselines such as Qwen2.5-VL-72B and InternVL2.5-78B by notable margins across multiple metrics. Additionally, the model shows strong generalization to out-of-distribution language pairs and datasets. In-depth analyses reveal how multi-task synergy, reinforcement learning initialization, curriculum design, and reward formulation contribute to advancing MLLM-driven TIMT.
nan
Article 1450
Title@2025-05-26 (1): On the Relation between Rectified Flows and Optimal Transport
Title: On the Relation between Rectified Flows and Optimal Transport | Über die Beziehung zwischen rektifizierten Strömungen und optimalem Verkehr | 纠正性流动与最佳运输之间的关系 2505.19712v1 |
Authors: Johannes Hertrich, Antonin Chambolle, Julie Delon
This paper investigates the connections between rectified flows, flow matching, and optimal transport. Flow matching is a recent approach to learning generative models by estimating velocity fields that guide transformations from a source to a target distribution. Rectified flow matching aims to straighten the learned transport paths, yielding more direct flows between distributions. Our first contribution is a set of invariance properties of rectified flows and explicit velocity fields. In addition, we also provide explicit constructions and analysis in the Gaussian (not necessarily independent) and Gaussian mixture settings and study the relation to optimal transport. Our second contribution addresses recent claims suggesting that rectified flows, when constrained such that the learned velocity field is a gradient, can yield (asymptotically) solutions to optimal transport problems. We study the existence of solutions for this problem and demonstrate that they only relate to optimal transport under assumptions that are significantly stronger than those previously acknowledged. In particular, we present several counter-examples that invalidate earlier equivalence results in the literature, and we argue that enforcing a gradient constraint on rectified flows is, in general, not a reliable method for computing optimal transport maps.
nan
Article 1451
Title@2025-05-26 (1): Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems
Title: Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems | Automatisierte wissenschaftliche Entdeckung: Von der Gleichungserkundung zu autonomen Entdeckungssystemen | 自动科学发现:从赤道发现到自主发现系统 2305.02251v2 |
Authors: Stefan Kramer, Mattia Cerrato, Jannis Brugger, Sašo Džeroski, Ross King
The paper surveys automated scientific discovery, from equation discovery and symbolic regression to autonomous discovery systems and agents. It discusses the individual approaches from a “big picture” perspective and in context, but also discusses open issues and recent topics like the various roles of deep neural networks in this area, aiding in the discovery of human-interpretable knowledge. Further, we will present closed-loop scientific discovery systems, starting with the pioneering work on the Adam system up to current efforts in fields from material science to astronomy. Finally, we will elaborate on autonomy from a machine learning perspective, but also in analogy to the autonomy levels in autonomous driving. The maximal level, level five, is defined to require no human intervention at all in the production of scientific knowledge. Achieving this is one step towards solving the Nobel Turing Grand Challenge to develop AI Scientists: AI systems capable of making Nobel-quality scientific discoveries highly autonomously at a level comparable, and possibly superior, to the best human scientists by 2050.
nan
Article 1452
Title@2025-05-26 (1): Solving Euler equations with Multiple Discontinuities via Separation-Transfer Physics-Informed Neural Networks
Title: Solving Euler equations with Multiple Discontinuities via Separation-Transfer Physics-Informed Neural Networks | Lösen von Euler-Gleichungen mit mehreren Diskontinuitäten über Separation-Transfer-Physik-informierte Neuronale Netzwerke | 通过分离-传输、物理内建神经网络解决多断裂的电动方程式 2505.20361v1 |
Authors: Chuanxing Wang, Hui Luo, Kai Wang, Guohuai Zhu, Mingxing Luo
Despite the remarkable progress of physics-informed neural networks (PINNs) in scientific computing, they continue to face challenges when solving hydrodynamic problems with multiple discontinuities. In this work, we propose Separation-Transfer Physics Informed Neural Networks (ST-PINNs) to address such problems. By sequentially resolving discontinuities from strong to weak and leveraging transfer learning during training, ST-PINNs significantly reduce the problem complexity and enhance solution accuracy. To the best of our knowledge, this is the first study to apply a PINNs-based approach to the two-dimensional unsteady planar shock refraction problem, offering new insights into the application of PINNs to complex shock-interface interactions. Numerical experiments demonstrate that ST-PINNs more accurately capture sharp discontinuities and substantially reduce solution errors in hydrodynamic problems involving multiple discontinuities.
nan
Article 1453
Title@2025-05-26 (1): Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction
Title: Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction | Zukunftsorientierte Navigation: Dynamische Hindernisvermeidung mit einer heißen energiebasierten Multimodal-Bewegungsvorhersage | 面向未来的导航:以单热能源为基础的多模式动力预测,动态障碍避免动态障碍 2505.00237v2 |
Authors: Ze Zhang, Georg Hess, Junjie Hu, Emmanuel Dean, Lennart Svensson, Knut Åkesson
This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based neural network that generates high-resolution, multi-step predictions in a single operation. The prediction outcomes are further utilized to create geometric shapes formulated as mathematical constraints. Instead of treating each dynamic obstacle individually, predicted obstacles are grouped by proximity in an unsupervised way to improve performance and efficiency. The overall collision-free navigation is handled by model predictive control with a specific design for proactive dynamic obstacle avoidance. The proposed approach allows mobile robots to navigate effectively in dynamic environments. Its performance is accessed across various scenarios that represent typical warehouse settings. The results demonstrate that the proposed approach outperforms other existing dynamic obstacle avoidance methods.
nan
Article 1454
Title@2025-05-26 (1): HRP: High-Rank Preheating for Superior LoRA Initialization
Title: HRP: High-Rank Preheating for Superior LoRA Initialization | HRP: Hochanker Vorwärmung für die Superior LoRA Initialisierung | HRP: 高级LORA初始化的高热预热 2502.07739v3 |
Authors: Yuzhu Chen, Yingjie Wang, Shi Fu, Li Shen, Yongcheng Jing, Xinmei Tian, Dacheng Tao
This paper studies the crucial impact of initialization in Low-Rank Adaptation (LoRA). Through theoretical analysis, we demonstrate that the fine-tuned result of LoRA is highly sensitive to initialization, which is likely to lead suboptimal low-rank results. While this issue can be mitigated by adjusting the initial direction towards the main singular vectors of the target $\Delta W$, which is, however, typically unknown in real-world scenarios. To approximate this initial direction, we propose High-Rank Preheating (HRP), which first trains LoRA with a higher preheating rank for a few steps, then uses the main singular vectors of the derived $BA^\top$ as initialization for the main fine-tuning process. With only a modification in the initial direction, we prove that HRP makes LoRA achieve better fine-tuned results than random initialization in expectation, and the enhancement grows with the preheating rank. We validate our theoretical findings through extensive experiments in various models and tasks, where HRP significantly enhances LoRA’s effectiveness and outperforms other initialization strategies and other LoRA variants.
nan
Article 1455
Title@2025-05-26 (1): Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments
Title: Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments | Mosaic: Datenfreies Wissen Destillieren über Mixture-of-Experts für Heterogene verteilte Umgebungen | Mosaic:通过混合专家进行无数据知识蒸馏,促进异基因分布式环境 2505.19699v1 |
Authors: Junming Liu, Yanting Gao, Siyuan Meng, Yifei Sun, Aoqi Wu, Yufei Jin, Yirong Chen, Ding Wang, Guosun Zeng
Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise to inconsistent representations and divergent optimization dynamics across clients, ultimately hindering robust global performance. To transcend these challenges, we propose Mosaic, a novel data-free knowledge distillation framework tailored for heterogeneous distributed environments. Mosaic first trains local generative models to approximate each client’s personalized distribution, enabling synthetic data generation that safeguards privacy through strict separation from real data. Subsequently, Mosaic forms a Mixture-of-Experts (MoE) from client models based on their specialized knowledge, and distills it into a global model using the generated data. To further enhance the MoE architecture, Mosaic integrates expert predictions via a lightweight meta model trained on a few representative prototypes. Extensive experiments on standard image classification benchmarks demonstrate that Mosaic consistently outperforms state-of-the-art approaches under both model and data heterogeneity. The source code has been published at https://github.com/Wings-Of-Disaster/Mosaic.
nan
Article 1456
Title@2025-05-26 (1): Graph Guided Diffusion: Unified Guidance for Conditional Graph Generation
Title: Graph Guided Diffusion: Unified Guidance for Conditional Graph Generation | Graph Guided Diffusion: Unified Guidance for Conditional Graph Generation | 向导扩散:有条件图形生成统一指南 2505.19685v1 |
Authors: Victor M. Tenorio, Nicolas Zilberstein, Santiago Segarra, Antonio G. Marques
Diffusion models have emerged as powerful generative models for graph generation, yet their use for conditional graph generation remains a fundamental challenge. In particular, guiding diffusion models on graphs under arbitrary reward signals is difficult: gradient-based methods, while powerful, are often unsuitable due to the discrete and combinatorial nature of graphs, and non-differentiable rewards further complicate gradient-based guidance. We propose Graph Guided Diffusion (GGDiff), a novel guidance framework that interprets conditional diffusion on graphs as a stochastic control problem to address this challenge. GGDiff unifies multiple guidance strategies, including gradient-based guidance (for differentiable rewards), control-based guidance (using control signals from forward reward evaluations), and zero-order approximations (bridging gradient-based and gradient-free optimization). This comprehensive, plug-and-play framework enables zero-shot guidance of pre-trained diffusion models under both differentiable and non-differentiable reward functions, adapting well-established guidance techniques to graph generation–a direction largely unexplored. Our formulation balances computational efficiency, reward alignment, and sample quality, enabling practical conditional generation across diverse reward types. We demonstrate the efficacy of GGDiff in various tasks, including constraints on graph motifs, fairness, and link prediction, achieving superior alignment with target rewards while maintaining diversity and fidelity.
nan
Article 1457
Title@2025-05-26 (1): CauSkelNet: Causal Representation Learning for Human Behaviour Analysis
Title: CauSkelNet: Causal Representation Learning for Human Behaviour Analysis | CauSkelNet: Kausales Repräsentationslernen für die menschliche Verhaltensanalyse | CauSkelNet: 人类行为分析的因果关系学习 2409.15564v3 |
Authors: Xingrui Gu, Chuyi Jiang, Erte Wang, Zekun Wu, Qiang Cui, Leimin Tian, Lianlong Wu, Siyang Song, Chuang Yu
Traditional machine learning methods for movement recognition often struggle with limited model interpretability and a lack of insight into human movement dynamics. This study introduces a novel representation learning framework based on causal inference to address these challenges. Our two-stage approach combines the Peter-Clark (PC) algorithm and Kullback-Leibler (KL) divergence to identify and quantify causal relationships between human joints. By capturing joint interactions, the proposed causal Graph Convolutional Network (GCN) produces interpretable and robust representations. Experimental results on the EmoPain dataset demonstrate that the causal GCN outperforms traditional GCNs in accuracy, F1 score, and recall, particularly in detecting protective behaviors. This work contributes to advancing human motion analysis and lays a foundation for adaptive and intelligent healthcare solutions.
nan
Article 1458
Title@2025-05-26 (1): Deep Actor-Critics with Tight Risk Certificates
Title: Deep Actor-Critics with Tight Risk Certificates | Deep Actor-Critics mit engen Risikozertifikaten | 具有严格风险证书的深行为者-批评者 2505.19682v1 |
Authors: Bahareh Tasdighi, Manuel Haussmann, Yi-Shan Wu, Andres R. Masegosa, Melih Kandemir
After a period of research, deep actor-critic algorithms have reached a level where they influence our everyday lives. They serve as the driving force behind the continual improvement of large language models through user-collected feedback. However, their deployment in physical systems is not yet widely adopted, mainly because no validation scheme that quantifies their risk of malfunction. We demonstrate that it is possible to develop tight risk certificates for deep actor-critic algorithms that predict generalization performance from validation-time observations. Our key insight centers on the effectiveness of minimal evaluation data. Surprisingly, a small feasible of evaluation roll-outs collected from a pretrained policy suffices to produce accurate risk certificates when combined with a simple adaptation of PAC-Bayes theory. Specifically, we adopt a recently introduced recursive PAC-Bayes approach, which splits validation data into portions and recursively builds PAC-Bayes bounds on the excess loss of each portion’s predictor, using the predictor from the previous portion as a data-informed prior. Our empirical results across multiple locomotion tasks and policy expertise levels demonstrate risk certificates that are tight enough to be considered for practical use.
nan
Article 1459
Title@2025-05-26 (1): Cut out and Replay: A Simple yet Versatile Strategy for Multi-Label Online Continual Learning
Title: Cut out and Replay: A Simple yet Versatile Strategy for Multi-Label Online Continual Learning | Cut out und Replay: Eine einfache, aber vielseitige Strategie für Multi-Label Online Continual Learning | 剪切和重放:一个简单但通俗易懂的多标签在线持续学习战略 2505.19680v1 |
Authors: Xinrui Wang, Shao-yuan Li, Jiaqiang Zhang, Songcan Chen
Multi-Label Online Continual Learning (MOCL) requires models to learn continuously from endless multi-label data streams, facing complex challenges including persistent catastrophic forgetting, potential missing labels, and uncontrollable imbalanced class distributions. While existing MOCL methods attempt to address these challenges through various techniques, \textit{they all overlook label-specific region identifying and feature learning} - a fundamental solution rooted in multi-label learning but challenging to achieve in the online setting with incremental and partial supervision. To this end, we first leverage the inherent structural information of input data to evaluate and verify the innate localization capability of different pre-trained models. Then, we propose CUTER (CUT-out-and-Experience-Replay), a simple yet versatile strategy that provides fine-grained supervision signals by further identifying, strengthening and cutting out label-specific regions for efficient experience replay. It not only enables models to simultaneously address catastrophic forgetting, missing labels, and class imbalance challenges, but also serves as an orthogonal solution that seamlessly integrates with existing approaches. Extensive experiments on multiple multi-label image benchmarks demonstrate the superiority of our proposed method. The code is available at \href{https://github.com/wxr99/Cut-Replay}{https://github.com/wxr99/Cut-Replay}
nan
Article 1460
Title@2025-05-26 (1): Optimal Multi-Fidelity Best-Arm Identification
Title: Optimal Multi-Fidelity Best-Arm Identification | Optimale Multi-Fidelity Best-Arm-Identifikation | 最佳最佳多纤维最佳武器标识 2406.03033v2 |
Authors: Riccardo Poiani, Rémy Degenne, Emilie Kaufmann, Alberto Maria Metelli, Marcello Restelli
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm. Our first contribution is a tight, instance-dependent lower bound on the cost complexity. The study of the optimization problem featured in the lower bound provides new insights to devise computationally efficient algorithms, and leads us to propose a gradient-based approach with asymptotically optimal cost complexity. We demonstrate the benefits of the new algorithm compared to existing methods in experiments. Our theoretical and empirical findings also shed light on an intriguing concept of optimal fidelity for each arm.
nan
Article 1461
Title@2025-05-26 (1): Bridging Privacy and Robustness for Trustworthy Machine Learning
Title: Bridging Privacy and Robustness for Trustworthy Machine Learning | Überbrückung von Privatsphäre und Robustheit für vertrauenswürdiges maschinelles Lernen | 连接隐私和强力,促进可信赖的机器学习 2403.16591v4 |
Authors: Xiaojin Zhang, Wei Chen
The advent of machine learning has led to transformative changes across various domains, but the sensitive nature of data raises concerns about privacy and security. While Local Differential Privacy (LDP) has been a cornerstone in addressing these concerns, recent research has proposed privacy concepts aligned with the Bayesian inference perspective of an adversary, such as Average Bayesian Privacy (ABP) and Maximum Bayesian Privacy (MBP). This paper explores the intricate relationships between LDP, ABP, and MBP, and their implications for algorithmic robustness. We establish theoretical connections between these privacy notions, proving that LDP implies MBP and vice versa under certain conditions, and deriving bounds connecting MBP and ABP. We also investigate the relationship between PAC robust learning and privacy preservation, demonstrating how to derive PAC robustness from privacy-preserving algorithms and construct privacy-preserving algorithms from PAC robust ones. Our findings provide valuable insights for constructing privacy-preserving and robust machine learning algorithms.
nan
Article 1462
Title@2025-05-26 (1): Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling
Title: Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling | Zero-Shot-Streaming-Text zur Sprachsynthese mit Transducer und Auto-Regressive Modellierung | 零热流文本,用于带有传感器和自动递减建模的语音合成 2505.19669v1 |
Authors: Haiyang Sun, Shujie Hu, Shujie Liu, Lingwei Meng, Hui Wang, Bing Han, Yifan Yang, Yanqing Liu, Sheng Zhao, Yan Lu, Yanmin Qian
Zero-shot streaming text-to-speech is an important research topic in human-computer interaction. Existing methods primarily use a lookahead mechanism, relying on future text to achieve natural streaming speech synthesis, which introduces high processing latency. To address this issue, we propose SMLLE, a streaming framework for generating high-quality speech frame-by-frame. SMLLE employs a Transducer to convert text into semantic tokens in real time while simultaneously obtaining duration alignment information. The combined outputs are then fed into a fully autoregressive (AR) streaming model to reconstruct mel-spectrograms. To further stabilize the generation process, we design a Delete < Bos > Mechanism that allows the AR model to access future text introducing as minimal delay as possible. Experimental results suggest that the SMLLE outperforms current streaming TTS methods and achieves comparable performance over sentence-level TTS systems. Samples are available on https://anonymous.4open.science/w/demo_page-48B7/.
nan
Article 1463
Title@2025-05-26 (1): GTR: Graph-Table-RAG for Cross-Table Question Answering
Title: GTR: Graph-Table-RAG for Cross-Table Question Answering | GTR: Graph-Table-RAG für Cross-Table-Frageantworten | GTR:用于跨表问题解答的图表表-RAG 2504.01346v3 |
Authors: Jiaru Zou, Dongqi Fu, Sirui Chen, Xinrui He, Zihao Li, Yada Zhu, Jiawei Han, Jingrui He
Beyond pure text, a substantial amount of knowledge is stored in tables. In real-world scenarios, user questions often require retrieving answers that are distributed across multiple tables. GraphRAG has recently attracted much attention for enhancing LLMs’ reasoning capabilities by organizing external knowledge to address ad-hoc and complex questions, exemplifying a promising direction for cross-table question answering. In this paper, to address the current gap in available data, we first introduce a multi-table benchmark, MutliTableQA, comprising 60k tables and 25k user queries collected from real-world sources. Then, we propose the first Graph-Table-RAG framework, namely GTR, which reorganizes table corpora into a heterogeneous graph, employs a hierarchical coarse-to-fine retrieval process to extract the most relevant tables, and integrates graph-aware prompting for downstream LLMs’ tabular reasoning. Extensive experiments show that GTR exhibits superior cross-table question-answering performance while maintaining high deployment efficiency, demonstrating its real-world practical applicability.
nan
Article 1464
Title@2025-05-26 (1): Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning
Title: Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning | Mehrbildbeschreibungen für mehrsprachige, leichte Kognitive Impairment-Erkennung durch kontrastives Lernen enthüllen | 通过差异学习发现多语种轻视认知缺陷的单形多语种描述 2505.17067v2 |
Authors: Kristin Qi, Jiali Cheng, Youxiang Zhu, Hadi Amiri, Xiaohui Liang
Detecting Mild Cognitive Impairment from picture descriptions is critical yet challenging, especially in multilingual and multiple picture settings. Prior work has primarily focused on English speakers describing a single picture (e.g., the ‘Cookie Theft’). The TAUKDIAL-2024 challenge expands this scope by introducing multilingual speakers and multiple pictures, which presents new challenges in analyzing picture-dependent content. To address these challenges, we propose a framework with three components: (1) enhancing discriminative representation learning via supervised contrastive learning, (2) involving image modality rather than relying solely on speech and text modalities, and (3) applying a Product of Experts (PoE) strategy to mitigate spurious correlations and overfitting. Our framework improves MCI detection performance, achieving a +7.1% increase in Unweighted Average Recall (UAR) (from 68.1% to 75.2%) and a +2.9% increase in F1 score (from 80.6% to 83.5%) compared to the text unimodal baseline. Notably, the contrastive learning component yields greater gains for the text modality compared to speech. These results highlight our framework’s effectiveness in multilingual and multi-picture MCI detection.
nan
Article 1465
Title@2025-05-26 (1): Best-Arm Identification in Unimodal Bandits
Title: Best-Arm Identification in Unimodal Bandits | Best-Arm-Identifikation in unimodalen Banditen | 统一强盗中的最佳武器识别 2411.01898v2 |
Authors: Riccardo Poiani, Marc Jourdan, Emilie Kaufmann, Rémy Degenne
We study the fixed-confidence best-arm identification problem in unimodal bandits, in which the means of the arms increase with the index of the arm up to their maximum, then decrease. We derive two lower bounds on the stopping time of any algorithm. The instance-dependent lower bound suggests that due to the unimodal structure, only three arms contribute to the leading confidence-dependent cost. However, a worst-case lower bound shows that a linear dependence on the number of arms is unavoidable in the confidence-independent cost. We propose modifications of Track-and-Stop and a Top Two algorithm that leverage the unimodal structure. Both versions of Track-and-Stop are asymptotically optimal for one-parameter exponential families. The Top Two algorithm is asymptotically near-optimal for Gaussian distributions and we prove a non-asymptotic guarantee matching the worse-case lower bound. The algorithms can be implemented efficiently and we demonstrate their competitive empirical performance.
nan
Article 1466
Title@2025-05-26 (1): MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE
Title: MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | MoESD: Spekulatives Decoding-Potential zur Beschleunigung von Sparse MoE enthüllen | MOESD: Unveil 投机性代谢潜力加速偏散的中导体 2505.19645v1 |
Authors: Zongle Huang, Lei Zhu, Zongyuan Zhan, Ting Hu, Weikai Mao, Xianzhi Yu, Yongpan Liu, Tianyu Zhang
Large Language Models (LLMs) have achieved remarkable success across many applications, with Mixture of Experts (MoE) models demonstrating great potential. Compared to traditional dense models, MoEs achieve better performance with less computation. Speculative decoding (SD) is a widely used technique to accelerate LLM inference without accuracy loss, but it has been considered efficient only for dense models. In this work, we first demonstrate that, under medium batch sizes, MoE surprisingly benefits more from SD than dense models. Furthermore, as MoE becomes sparser – the prevailing trend in MoE designs – the batch size range where SD acceleration is expected to be effective becomes broader. To quantitatively understand tradeoffs involved in SD, we develop a reliable modeling based on theoretical analyses. While current SD research primarily focuses on improving acceptance rates of algorithms, changes in workload and model architecture can still lead to degraded SD acceleration even with high acceptance rates. To address this limitation, we introduce a new metric ‘target efficiency’ that characterizes these effects, thus helping researchers identify system bottlenecks and understand SD acceleration more comprehensively. For scenarios like private serving, this work unveils a new perspective to speed up MoE inference, where existing solutions struggle. Experiments on different GPUs show up to 2.29x speedup for Qwen2-57B-A14B at medium batch sizes and validate our theoretical predictions.
nan
Article 1467
Title@2025-05-26 (1): Navigating Conflicting Views: Harnessing Trust for Learning
Title: Navigating Conflicting Views: Harnessing Trust for Learning | Navigieren gegensätzlicher Ansichten: Vertrauen fürs Lernen gewinnen | 引导冲突观点:利用信任学习 2406.00958v3 |
Authors: Jueqing Lu, Wray Buntine, Yuanyuan Qi, Joanna Dipnall, Belinda Gabbe, Lan Du
Resolving conflicts is critical for improving the reliability of multi-view classification. While prior work focuses on learning consistent and informative representations across views, it often assumes perfect alignment and equal importance of all views, an assumption rarely met in real-world scenarios, as some views may express distinct information. To address this, we develop a computational trust-based discounting method that enhances the Evidential Multi-view framework by accounting for the instance-wise reliability of each view through a probability-sensitive trust mechanism. We evaluate our method on six real-world datasets using Top-1 Accuracy, Fleiss’ Kappa, and a new metric, Multi-View Agreement with Ground Truth, to assess prediction reliability. We also assess the effectiveness of uncertainty in indicating prediction correctness via AUROC.Additionally, we test the scalability of our method through end-to-end training on a large-scale dataset. The experimental results show that computational trust can effectively resolve conflicts, paving the way for more reliable multi-view classification models in real-world applications.
nan
Article 1468
Title@2025-05-26 (1): When fractional quasi p-norms concentrate
Title: When fractional quasi p-norms concentrate | Wenn fraktioniertes Quasi-P-Normen-Konzentrat | 当分微分准微调集中时 2505.19635v1 |
Authors: Ivan Y. Tyukin, Bogdan Grechuk, Evgeny M. Mirkes, Alexander N. Gorban
Concentration of distances in high dimension is an important factor for the development and design of stable and reliable data analysis algorithms. In this paper, we address the fundamental long-standing question about the concentration of distances in high dimension for fractional quasi $p$-norms, $p\in(0,1)$. The topic has been at the centre of various theoretical and empirical controversies. Here we, for the first time, identify conditions when fractional quasi $p$-norms concentrate and when they don’t. We show that contrary to some earlier suggestions, for broad classes of distributions, fractional quasi $p$-norms admit exponential and uniform in $p$ concentration bounds. For these distributions, the results effectively rule out previously proposed approaches to alleviate concentration by “optimal” setting the values of $p$ in $(0,1)$. At the same time, we specify conditions and the corresponding families of distributions for which one can still control concentration rates by appropriate choices of $p$. We also show that in an arbitrarily small vicinity of a distribution from a large class of distributions for which uniform concentration occurs, there are uncountably many other distributions featuring anti-concentration properties. Importantly, this behavior enables devising relevant data encoding or representation schemes favouring or discouraging distance concentration. The results shed new light on this long-standing problem and resolve the tension around the topic in both theory and empirical evidence reported in the literature.
nan
Article 1469
Title@2025-05-26 (1): Decoupling Spatio-Temporal Prediction: When Lightweight Large Models Meet Adaptive Hypergraphs
Title: Decoupling Spatio-Temporal Prediction: When Lightweight Large Models Meet Adaptive Hypergraphs | Entkoppelung Spatio-Temporale Vorhersage: Wenn leichte große Modelle adaptive Hypergraphen treffen | 脱钩的SPadio-TT时间预测:当轻量大模型与适应性高光谱相匹配时 2505.19620v1 |
Authors: Jiawen Chen, Qi Shao, Duxin Chen, Wenwu Yu
Spatio-temporal prediction is a pivotal task with broad applications in traffic management, climate monitoring, energy scheduling, etc. However, existing methodologies often struggle to balance model expressiveness and computational efficiency, especially when scaling to large real-world datasets. To tackle these challenges, we propose STH-SepNet (Spatio-Temporal Hypergraph Separation Networks), a novel framework that decouples temporal and spatial modeling to enhance both efficiency and precision. Therein, the temporal dimension is modeled using lightweight large language models, which effectively capture low-rank temporal dynamics. Concurrently, the spatial dimension is addressed through an adaptive hypergraph neural network, which dynamically constructs hyperedges to model intricate, higher-order interactions. A carefully designed gating mechanism is integrated to seamlessly fuse temporal and spatial representations. By leveraging the fundamental principles of low-rank temporal dynamics and spatial interactions, STH-SepNet offers a pragmatic and scalable solution for spatio-temporal prediction in real-world applications. Extensive experiments on large-scale real-world datasets across multiple benchmarks demonstrate the effectiveness of STH-SepNet in boosting predictive performance while maintaining computational efficiency. This work may provide a promising lightweight framework for spatio-temporal prediction, aiming to reduce computational demands and while enhancing predictive performance. Our code is avaliable at https://github.com/SEU-WENJIA/ST-SepNet-Lightweight-LLMs-Meet-Adaptive-Hypergraphs.
nan
Article 1470
Title@2025-05-26 (1): SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows
Title: SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows | SESaMo: Symmetrie-verstärkende stochastische Modulation für normalisierende Strömungen | SESaMo: 正常流动的对称性-强化斯托调动 2505.19619v1 |
Authors: Janik Kreit, Dominic Schuh, Kim A. Nicoli, Lena Funcke
Deep generative models have recently garnered significant attention across various fields, from physics to chemistry, where sampling from unnormalized Boltzmann-like distributions represents a fundamental challenge. In particular, autoregressive models and normalizing flows have become prominent due to their appealing ability to yield closed-form probability densities. Moreover, it is well-established that incorporating prior knowledge - such as symmetries - into deep neural networks can substantially improve training performances. In this context, recent advances have focused on developing symmetry-equivariant generative models, achieving remarkable results. Building upon these foundations, this paper introduces Symmetry-Enforcing Stochastic Modulation (SESaMo). Similar to equivariant normalizing flows, SESaMo enables the incorporation of inductive biases (e.g., symmetries) into normalizing flows through a novel technique called stochastic modulation. This approach enhances the flexibility of the generative model, allowing to effectively learn a variety of exact and broken symmetries. Our numerical experiments benchmark SESaMo in different scenarios, including an 8-Gaussian mixture model and physically relevant field theories, such as the $\phi^4$ theory and the Hubbard model.
nan
Article 1471
Title@2025-05-26 (1): When the Left Foot Leads to the Right Path: Bridging Initial Prejudice and Trainability
Title: When the Left Foot Leads to the Right Path: Bridging Initial Prejudice and Trainability | Wenn der linke Fuß auf den rechten Weg führt: Überbrückung von anfänglichen Vorurteilen und Trainingsfähigkeit | 当左脚引向右路时:弥合最初的偏见和可训练性 2505.12096v2 |
Authors: Alberto Bassi, Carlo Albert, Aurelien Lucchi, Marco Baity-Jesi, Emanuele Francazi
Understanding the statistical properties of deep neural networks (DNNs) at initialization is crucial for elucidating both their trainability and the intrinsic architectural biases they encode prior to data exposure. Mean-field (MF) analyses have demonstrated that the parameter distribution in randomly initialized networks dictates whether gradients vanish or explode. Concurrently, untrained DNNs were found to exhibit an initial-guessing bias (IGB), in which large regions of the input space are assigned to a single class. In this work, we derive a theoretical proof establishing the correspondence between IGB and previous MF theories, thereby connecting a network prejudice toward specific classes with the conditions for fast and accurate learning. This connection yields the counter-intuitive conclusion: the initialization that optimizes trainability is necessarily biased, rather than neutral. Furthermore, we extend the MF/IGB framework to multi-node activation functions, offering practical guidelines for designing initialization schemes that ensure stable optimization in architectures employing max- and average-pooling layers.
nan
Article 1472
Title@2025-05-26 (1): Learning and Interpreting Gravitational-Wave Features from CNNs with a Random Forest Approach
Title: Learning and Interpreting Gravitational-Wave Features from CNNs with a Random Forest Approach | Erlernen und Dolmetschen von Gravitational-Wave-Features von CNNs mit einem zufälligen Waldansatz | 使用随机森林方法从有线电视新闻网读取和解释引力维学特征 2505.20357v1 |
Authors: Jun Tian, He Wang, Jibo He, Yu Pan, Shuo Cao, Qingquan Jiang
Convolutional neural networks (CNNs) have become widely adopted in gravitational wave (GW) detection pipelines due to their ability to automatically learn hierarchical features from raw strain data. However, the physical meaning of these learned features remains underexplored, limiting the interpretability of such models. In this work, we propose a hybrid architecture that combines a CNN-based feature extractor with a random forest (RF) classifier to improve both detection performance and interpretability. Unlike prior approaches that directly connect classifiers to CNN outputs, our method introduces four physically interpretable metrics - variance, signal-to-noise ratio (SNR), waveform overlap, and peak amplitude - computed from the final convolutional layer. These are jointly used with the CNN output in the RF classifier to enable more informed decision boundaries. Tested on long-duration strain datasets, our hybrid model outperforms a baseline CNN model, achieving a relative improvement of 21\% in sensitivity at a fixed false alarm rate of 10 events per month. Notably, it also shows improved detection of low-SNR signals (SNR $\le$ 10), which are especially vulnerable to misclassification in noisy environments. Feature attribution via the RF model reveals that both CNN-extracted and handcrafted features contribute significantly to classification decisions, with learned variance and CNN outputs ranked among the most informative. These findings suggest that physically motivated post-processing of CNN feature maps can serve as a valuable tool for interpretable and efficient GW detection, bridging the gap between deep learning and domain knowledge.
nan
Article 1473
Title@2025-05-26 (1): Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
Title: Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models | Diagnostizieren und Abmildern von Modalitätsstörungen in multimodalen großen Sprachmodellen | 多式联运大语言模型中的诊断和减缓模式干预 2505.19616v1 |
Authors: Rui Cai, Bangzheng Li, Xiaofei Wen, Muhao Chen, Zhe Zhao
Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across tasks, yet they often exhibit difficulty in distinguishing task-relevant from irrelevant signals, particularly in tasks like Visual Question Answering (VQA), which can lead to susceptibility to misleading or spurious inputs. We refer to this broader limitation as the Cross-Modality Competency Problem: the model’s inability to fairly evaluate all modalities. This vulnerability becomes more evident in modality-specific tasks such as image classification or pure text question answering, where models are expected to rely solely on one modality. In such tasks, spurious information from irrelevant modalities often leads to significant performance degradation. We refer to this failure as Modality Interference, which serves as a concrete and measurable instance of the cross-modality competency problem. We further design a perturbation-based causal diagnostic experiment to verify and quantify this problem. To mitigate modality interference, we propose a novel framework to fine-tune MLLMs, including perturbation-based data augmentations with both heuristic perturbations and adversarial perturbations via Projected Gradient Descent (PGD), and a consistency regularization strategy applied to model outputs with original and perturbed inputs. Experiments on multiple benchmark datasets (image-heavy, text-heavy, and VQA tasks) and multiple model families with different scales demonstrate significant improvements in robustness and cross-modality competency, indicating our method’s effectiveness in boosting unimodal reasoning ability while enhancing performance on multimodal tasks.
nan
Article 1474
Title@2025-05-26 (1): Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning
Title: Multiplicity is an Inevitable and Inherent Challenge in Multimodal Learning | Vielfältigkeit ist eine unvermeidliche und inhärente Herausforderung im multimodalen Lernen | 多重性是多模式学习中不可避免和内在的挑战。 2505.19614v1 |
Authors: Sanghyuk Chun
Multimodal learning has seen remarkable progress, particularly with the emergence of large-scale pre-training across various modalities. However, most current approaches are built on the assumption of a deterministic, one-to-one alignment between modalities. This oversimplifies real-world multimodal relationships, where their nature is inherently many-to-many. This phenomenon, named multiplicity, is not a side-effect of noise or annotation error, but an inevitable outcome of semantic abstraction, representational asymmetry, and task-dependent ambiguity in multimodal tasks. This position paper argues that multiplicity is a fundamental bottleneck that manifests across all stages of the multimodal learning pipeline: from data construction to training and evaluation. This paper examines the causes and consequences of multiplicity, and highlights how multiplicity introduces training uncertainty, unreliable evaluation, and low dataset quality. This position calls for new research directions on multimodal learning: novel multiplicity-aware learning frameworks and dataset construction protocols considering multiplicity.
nan
Article 1475
Title@2025-05-26 (1): Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling
Title: Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling | Skrull: Auf dem Weg zu einem effizienten langen Kontext Feinabstimmung durch Dynamic Data Scheduling | Skrull:通过动态数据安排,实现高效长处微调 2505.19609v1 |
Authors: Hongtao Xu, Wenting Shen, Yuanxin Wei, Ang Wang, Guo Runfan, Tianxing Wang, Yong Li, Mingzhen Li, Weile Jia
Long-context supervised fine-tuning (Long-SFT) plays a vital role in enhancing the performance of large language models (LLMs) on long-context tasks. To smoothly adapt LLMs to long-context scenarios, this process typically entails training on mixed datasets containing both long and short sequences. However, this heterogeneous sequence length distribution poses significant challenges for existing training systems, as they fail to simultaneously achieve high training efficiency for both long and short sequences, resulting in sub-optimal end-to-end system performance in Long-SFT. In this paper, we present a novel perspective on data scheduling to address the challenges posed by the heterogeneous data distributions in Long-SFT. We propose Skrull, a dynamic data scheduler specifically designed for efficient long-SFT. Through dynamic data scheduling, Skrull balances the computation requirements of long and short sequences, improving overall training efficiency. Furthermore, we formulate the scheduling process as a joint optimization problem and thoroughly analyze the trade-offs involved. Based on those analysis, Skrull employs a lightweight scheduling algorithm to achieve near-zero cost online scheduling in Long-SFT. Finally, we implement Skrull upon DeepSpeed, a state-of-the-art distributed training system for LLMs. Experimental results demonstrate that Skrull outperforms DeepSpeed by 3.76x on average (up to 7.54x) in real-world long-SFT scenarios.
nan
Article 1476
Title@2025-05-26 (1): Energy-based Preference Optimization for Test-time Adaptation
Title: Energy-based Preference Optimization for Test-time Adaptation | Energiebasierte Preference-Optimierung für die Testzeitanpassung | 以能源为基础的试验时间适应最佳应用 2505.19607v1 |
Authors: Yewon Han, Seoyun Yang, Taesup Kim
Test-Time Adaptation (TTA) enhances model robustness by enabling adaptation to target distributions that differ from training distributions, improving real-world generalizability. Existing TTA approaches focus on adjusting the conditional distribution; however these methods often depend on uncertain predictions in the absence of label information, leading to unreliable performance. Energy-based frameworks suggest a promising alternative to address distribution shifts without relying on uncertain predictions, instead computing the marginal distribution of target data. However, they involve the critical challenge of requiring extensive SGLD sampling, which is impractical for test-time scenarios requiring immediate adaptation. In this work, we propose Energy-based Preference Optimization for Test-time Adaptation (EPOTTA), which is based on a sampling free strategy. We first parameterize the target model using a pretrained model and residual energy function, enabling marginal likelihood maximization of target data without sampling. Building on the observation that the parameterization is mathematically equivalent to DPO objective, we then directly adapt the model to a target distribution without explicitly training the residual. Our experiments verify that EPOTTA is well-calibrated and performant while achieving computational efficiency.
nan
Article 1477
Title@2025-05-26 (1): Kuramoto-FedAvg: Using Synchronization Dynamics to Improve Federated Learning Optimization under Statistical Heterogeneity
Title: Kuramoto-FedAvg: Using Synchronization Dynamics to Improve Federated Learning Optimization under Statistical Heterogeneity | Kuramoto-FedAvg: Synchronisationsdynamik zur Verbesserung der Federated Learning Optimization unter statistischer Heterogenität | Kuramoto-FedAvg:利用同步动态改善统计多样性下的联邦学习优化 2505.19605v1 |
Authors: Aggrey Muhebwa, Khotso Selialia, Fatima Anwar, Khalid K. Osman
Federated learning on heterogeneous (non-IID) client data experiences slow convergence due to client drift. To address this challenge, we propose Kuramoto-FedAvg, a federated optimization algorithm that reframes the weight aggregation step as a synchronization problem inspired by the Kuramoto model of coupled oscillators. The server dynamically weighs each client’s update based on its phase alignment with the global update, amplifying contributions that align with the global gradient direction while minimizing the impact of updates that are out of phase. We theoretically prove that this synchronization mechanism reduces client drift, providing a tighter convergence bound compared to the standard FedAvg under heterogeneous data distributions. Empirical validation supports our theoretical findings, showing that Kuramoto-FedAvg significantly accelerates convergence and improves accuracy across multiple benchmark datasets. Our work highlights the potential of coordination and synchronization-based strategies for managing gradient diversity and accelerating federated optimization in realistic non-IID settings.
nan
Article 1478
Title@2025-05-26 (1): Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis
Title: Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis | Machine Translation Models für Englisch-Hindi Sprachpaare bewerten: Eine vergleichende Analyse | 英文-中文语文配对评价机器翻译模型:比较分析 2505.19604v1 |
Authors: Ahan Prasannakumar Shetty
Machine translation has become a critical tool in bridging linguistic gaps, especially between languages as diverse as English and Hindi. This paper comprehensively evaluates various machine translation models for translating between English and Hindi. We assess the performance of these models using a diverse set of automatic evaluation metrics, both lexical and machine learning-based metrics. Our evaluation leverages an 18000+ corpus of English Hindi parallel dataset and a custom FAQ dataset comprising questions from government websites. The study aims to provide insights into the effectiveness of different machine translation approaches in handling both general and specialized language domains. Results indicate varying performance levels across different metrics, highlighting strengths and areas for improvement in current translation systems.
nan
Article 1479
Title@2025-05-26 (1): Distributional Reinforcement Learning with Dual Expectile-Quantile Regression
Title: Distributional Reinforcement Learning with Dual Expectile-Quantile Regression | Verstärktes Lernen mit Dual Expectile-Quantile Regression | 双预期量递减分布强化学习 2305.16877v4 |
Authors: Sami Jullien, Romain Deffayet, Jean-Michel Renders, Paul Groth, Maarten de Rijke
Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and extracts rich feedback from environment samples. The commonly used quantile regression approach to distributional RL – based on asymmetric $L_1$ losses – provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, asymmetric hybrid $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our proposed operator converges to the distributional Bellman operator in the limit of infinite estimated quantile and expectile fractions, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.
nan
Article 1480
Title@2025-05-26 (1): Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging
Title: Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging | Rep3D: Große 3D-Kernel mit Low-Rank-Empfangsmodellierung für die medizinische Bildgebung neu parametrieren | Rep3D: 医疗成像低射感应模型的大型 3D 内核再修复 2505.19603v1 |
Authors: Ho Hin Lee, Quan Liu, Shunxing Bao, Yuankai Huo, Bennett A. Landman
In contrast to vision transformers, which model long-range dependencies through global self-attention, large kernel convolutions provide a more efficient and scalable alternative, particularly in high-resolution 3D volumetric settings. However, naively increasing kernel size often leads to optimization instability and degradation in performance. Motivated by the spatial bias observed in effective receptive fields (ERFs), we hypothesize that different kernel elements converge at variable rates during training. To support this, we derive a theoretical connection between element-wise gradients and first-order optimization, showing that structurally re-parameterized convolution blocks inherently induce spatially varying learning rates. Building on this insight, we introduce Rep3D, a 3D convolutional framework that incorporates a learnable spatial prior into large kernel training. A lightweight two-stage modulation network generates a receptive-biased scaling mask, adaptively re-weighting kernel updates and enabling local-to-global convergence behavior. Rep3D adopts a plain encoder design with large depthwise convolutions, avoiding the architectural complexity of multi-branch compositions. We evaluate Rep3D on five challenging 3D segmentation benchmarks and demonstrate consistent improvements over state-of-the-art baselines, including transformer-based and fixed-prior re-parameterization methods. By unifying spatial inductive bias with optimization-aware learning, Rep3D offers an interpretable, and scalable solution for 3D medical image analysis. The source code is publicly available at https://github.com/leeh43/Rep3D.
nan
Article 1481
Title@2025-05-26 (1): Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Title: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression | Speichereffiziente visuelle Autoregressive Modellierung mit Scale-Aware-KV-Cache-Kompression | KV缓存压缩的内存有效视觉自动递减模型 2505.19602v1 |
Authors: Kunjun Li, Zigeng Chen, Cheng-Yen Yang, Jenq-Neng Hwang
Visual Autoregressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction approach, which yields substantial improvements in efficiency, scalability, and zero-shot generalization. Nevertheless, the coarse-to-fine methodology inherent in VAR results in exponential growth of the KV cache during inference, causing considerable memory consumption and computational redundancy. To address these bottlenecks, we introduce ScaleKV, a novel KV cache compression framework tailored for VAR architectures. ScaleKV leverages two critical observations: varying cache demands across transformer layers and distinct attention patterns at different scales. Based on these insights, ScaleKV categorizes transformer layers into two functional groups: drafters and refiners. Drafters exhibit dispersed attention across multiple scales, thereby requiring greater cache capacity. Conversely, refiners focus attention on the current token map to process local details, consequently necessitating substantially reduced cache capacity. ScaleKV optimizes the multi-scale inference pipeline by identifying scale-specific drafters and refiners, facilitating differentiated cache management tailored to each scale. Evaluation on the state-of-the-art text-to-image VAR model family, Infinity, demonstrates that our approach effectively reduces the required KV cache memory to 10% while preserving pixel-level fidelity.
nan
Article 1482
Title@2025-05-26 (1): Preference Optimization by Estimating the Ratio of the Data Distribution
Title: Preference Optimization by Estimating the Ratio of the Data Distribution | Präferenzoptimierung durch Schätzung des Verhältnisses der Datenverteilung | 通过估计数据分配比率实现最佳优化 2505.19601v1 |
Authors: Yeongmin Kim, Heesun Bae, Byeonghu Na, Il-Chul Moon
Direct preference optimization (DPO) is widely used as a simple and stable method for aligning large language models (LLMs) with human preferences. This paper investigates a generalized DPO loss that enables a policy model to match the target policy from a likelihood ratio estimation perspective. The ratio of the target policy provides a unique identification of the policy distribution without relying on reward models or partition functions. This allows the generalized loss to retain both simplicity and theoretical guarantees, which prior work such as $f$-PO fails to achieve simultaneously. We propose Bregman preference optimization (BPO), a generalized framework for ratio matching that provides a family of objective functions achieving target policy optimality. BPO subsumes DPO as a special case and offers tractable forms for all instances, allowing implementation with a few lines of code. We further develop scaled Basu’s power divergence (SBA), a gradient scaling method that can be used for BPO instances. The BPO framework complements other DPO variants and is applicable to target policies defined by these variants. In experiments, unlike other probabilistic loss extensions such as $f$-DPO or $f$-PO, which exhibit a trade-off between generation fidelity and diversity, instances of BPO improve both win rate and entropy compared with DPO. When applied to Llama-3-Instruct-8B, BPO achieves state-of-the-art performance among Llama-3-8B backbones, with a 55.9\% length-controlled win rate on AlpacaEval2.
nan
Article 1483
Title@2025-05-26 (1): Inconsistent Tokenizations Cause Language Models to be Perplexed by Japanese Grammar
Title: Inconsistent Tokenizations Cause Language Models to be Perplexed by Japanese Grammar | Inkonsistente Tokenisierungen führen dazu, dass Sprachmodelle von japanischer Grammatik verblüfft werden. | 前后不一致的招数导致语言模式被日语语法所混淆 2505.19599v1 |
Authors: Andrew Gambardella, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo
Typical methods for evaluating the performance of language models evaluate their ability to answer questions accurately. These evaluation metrics are acceptable for determining the extent to which language models can understand and reason about text in a general sense, but fail to capture nuanced capabilities, such as the ability of language models to recognize and obey rare grammar points, particularly in languages other than English. We measure the perplexity of language models when confronted with the “first person psych predicate restriction” grammar point in Japanese. Weblab is the only tested open source model in the 7-10B parameter range which consistently assigns higher perplexity to ungrammatical psych predicate sentences than grammatical ones. We give evidence that Weblab’s uniformly bad tokenization is a possible root cause for its good performance, and show that Llama 3’s perplexity on grammatical psych predicate sentences can be reduced by orders of magnitude (28x difference) by restricting test sentences to those with uniformly well-behaved tokenizations. We show in further experiments on machine translation tasks that language models will use alternative grammar patterns in order to produce grammatical sentences when tokenization issues prevent the most natural sentence from being output.
nan
Article 1484
Title@2025-05-26 (1): Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Title: Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs | Residual Connections und Normalisierung können eine Übersäuerung in GNNs wahrscheinlich verhindern | 残留连接和正常化可可可避免防止全球NN的过度移动 2406.02997v3 |
Authors: Michael Scholkemper, Xinyi Wu, Ali Jadbabaie, Michael T. Schaub
Residual connections and normalization layers have become standard design choices for graph neural networks (GNNs), and were proposed as solutions to the mitigate the oversmoothing problem in GNNs. However, how exactly these methods help alleviate the oversmoothing problem from a theoretical perspective is not well understood. In this work, we provide a formal and precise characterization of (linearized) GNNs with residual connections and normalization layers. We establish that (a) for residual connections, the incorporation of the initial features at each layer can prevent the signal from becoming too smooth, and determines the subspace of possible node representations; (b) batch normalization prevents a complete collapse of the output embedding space to a one-dimensional subspace through the individual rescaling of each column of the feature matrix. This results in the convergence of node representations to the top-$k$ eigenspace of the message-passing operator; (c) moreover, we show that the centering step of a normalization layer – which can be understood as a projection – alters the graph signal in message-passing in such a way that relevant information can become harder to extract. We therefore introduce a novel, principled normalization layer called GraphNormv2 in which the centering step is learned such that it does not distort the original graph signal in an undesirable way. Experimental results confirm the effectiveness of our method.
nan
Article 1485
Title@2025-05-26 (1): How Well Can Differential Privacy Be Audited in One Run?
Title: How Well Can Differential Privacy Be Audited in One Run? | Wie gut kann die Privatsphäre in einem einzigen Lauf überprüft werden? | 如何在单一运行中对差异隐私进行审计? 2503.07199v2 |
Authors: Amit Keinan, Moshe Shenfeld, Katrina Ligett
Recent methods for auditing the privacy of machine learning algorithms have improved computational efficiency by simultaneously intervening on multiple training examples in a single training run. Steinke et al. (2024) prove that one-run auditing indeed lower bounds the true privacy parameter of the audited algorithm, and give impressive empirical results. Their work leaves open the question of how precisely one-run auditing can uncover the true privacy parameter of an algorithm, and how that precision depends on the audited algorithm. In this work, we characterize the maximum achievable efficacy of one-run auditing and show that the key barrier to its efficacy is interference between the observable effects of different data elements. We present new conceptual approaches to minimize this barrier, towards improving the performance of one-run auditing of real machine learning algorithms.
nan
Article 1486
Title@2025-05-26 (1): Learning to Reason without External Rewards
Title: Learning to Reason without External Rewards | Vernunft lernen ohne externe Belohnungen | 学习没有外部奖励的理性 2505.19590v1 |
Authors: Xuandong Zhao, Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song
Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision. We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data. We propose Intuitor, an RLIF method that uses a model’s own confidence, termed self-certainty, as its sole reward signal. Intuitor replaces external rewards in Group Relative Policy Optimization (GRPO) with self-certainty scores, enabling fully unsupervised learning. Experiments demonstrate that Intuitor matches GRPO’s performance on mathematical benchmarks while achieving superior generalization to out-of-domain tasks like code generation, without requiring gold solutions or test cases. Our findings show that intrinsic model signals can drive effective learning across domains, offering a scalable alternative to RLVR for autonomous AI systems where verifiable rewards are unavailable. Code is available at https://github.com/sunblaze-ucb/Intuitor
nan
Article 1487
Title@2025-05-26 (1): WQLCP: Weighted Adaptive Conformal Prediction for Robust Uncertainty Quantification Under Distribution Shifts
Title: WQLCP: Weighted Adaptive Conformal Prediction for Robust Uncertainty Quantification Under Distribution Shifts | WQLCP: Gewichtete adaptive konforme Vorhersage für robuste Unsicherheit Quantifizierung unter Verteilungsverschiebungen | WQLCP: 分配变化下强势不确定性量化的加权适应性统一预测 2505.19587v1 |
Authors: Shadi Alijani, Homayoun Najjaran
Conformal prediction (CP) provides a framework for constructing prediction sets with guaranteed coverage, assuming exchangeable data. However, real-world scenarios often involve distribution shifts that violate exchangeability, leading to unreliable coverage and inflated prediction sets. To address this challenge, we first introduce Reconstruction Loss-Scaled Conformal Prediction (RLSCP), which utilizes reconstruction losses derived from a Variational Autoencoder (VAE) as an uncertainty metric to scale score functions. While RLSCP demonstrates performance improvements, mainly resulting in better coverage, it quantifies quantiles based on a fixed calibration dataset without considering the discrepancies between test and train datasets in an unexchangeable setting. In the next step, we propose Weighted Quantile Loss-scaled Conformal Prediction (WQLCP), which refines RLSCP by incorporating a weighted notion of exchangeability, adjusting the calibration quantile threshold based on weights with respect to the ratio of calibration and test loss values. This approach improves the CP-generated prediction set outputs in the presence of distribution shifts. Experiments on large-scale datasets, including ImageNet variants, demonstrate that WQLCP outperforms existing baselines by consistently maintaining coverage while reducing prediction set sizes, providing a robust solution for CP under distribution shifts.
nan
Article 1488
Title@2025-05-26 (1): Accelerating Prefilling for Long-Context LLMs via Sparse Pattern Sharing
Title: Accelerating Prefilling for Long-Context LLMs via Sparse Pattern Sharing | Beschleunigung der Vorfüllung für Langkontext-LLMs über Sparse Pattern Sharing | 通过 Sparse 模式共享加速预填长文本 LLMs 2505.19578v1 |
Authors: Dan Peng, Zhihui Fu, Zewen Ye, Zhuoran Song, Jun Wang
Sparse attention methods exploit the inherent sparsity in attention to speed up the prefilling phase of long-context inference, mitigating the quadratic complexity of full attention computation. While existing sparse attention methods rely on predefined patterns or inaccurate estimations to approximate attention behavior, they often fail to fully capture the true dynamics of attention, resulting in reduced efficiency and compromised accuracy. Instead, we propose a highly accurate sparse attention mechanism that shares similar yet precise attention patterns across heads, enabling a more realistic capture of the dynamic behavior of attention. Our approach is grounded in two key observations: (1) attention patterns demonstrate strong inter-head similarity, and (2) this similarity remains remarkably consistent across diverse inputs. By strategically sharing computed accurate patterns across attention heads, our method effectively captures actual patterns while requiring full attention computation for only a small subset of heads. Comprehensive evaluations demonstrate that our approach achieves superior or comparable speedup relative to state-of-the-art methods while delivering the best overall accuracy.
nan
Article 1489
Title@2025-05-26 (1): GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
Title: GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning | GraLoRA: Granulare Low-Rank-Anpassung für den Parameter-Effizient Feintuning | GRALORA: 用于参数有效精密调整的颗粒式低兰克适应 2505.20355v1 |
Authors: Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park
Low-Rank Adaptation (LoRA) is a popular method for parameter-efficient fine-tuning (PEFT) of generative models, valued for its simplicity and effectiveness. Despite recent enhancements, LoRA still suffers from a fundamental limitation: overfitting when the bottleneck is widened. It performs best at ranks 32-64, yet its accuracy stagnates or declines at higher ranks, still falling short of full fine-tuning (FFT) performance. We identify the root cause as LoRA’s structural bottleneck, which introduces gradient entanglement to the unrelated input channels and distorts gradient propagation. To address this, we introduce a novel structure, Granular Low-Rank Adaptation (GraLoRA) that partitions weight matrices into sub-blocks, each with its own low-rank adapter. With negligible computational or storage cost, GraLoRA overcomes LoRA’s limitations, effectively increases the representational capacity, and more closely approximates FFT behavior. Experiments on code generation and commonsense reasoning benchmarks show that GraLoRA consistently outperforms LoRA and other baselines, achieving up to +8.5% absolute gain in Pass@1 on HumanEval+. These improvements hold across model sizes and rank settings, making GraLoRA a scalable and robust solution for PEFT. Code, data, and scripts are available at https://github.com/SqueezeBits/GraLoRA.git
nan
Article 1490
Title@2025-05-26 (1): Situationally-Aware Dynamics Learning
Title: Situationally-Aware Dynamics Learning | Situational-Aware Dynamics Learning | 情况认知动态学习 2505.19574v1 |
Authors: Alejandro Murillo-Gonzalez, Lantao Liu
Autonomous robots operating in complex, unstructured environments face significant challenges due to latent, unobserved factors that obscure their understanding of both their internal state and the external world. Addressing this challenge would enable robots to develop a more profound grasp of their operational context. To tackle this, we propose a novel framework for online learning of hidden state representations, with which the robots can adapt in real-time to uncertain and dynamic conditions that would otherwise be ambiguous and result in suboptimal or erroneous behaviors. Our approach is formalized as a Generalized Hidden Parameter Markov Decision Process, which explicitly models the influence of unobserved parameters on both transition dynamics and reward structures. Our core innovation lies in learning online the joint distribution of state transitions, which serves as an expressive representation of latent ego- and environmental-factors. This probabilistic approach supports the identification and adaptation to different operational situations, improving robustness and safety. Through a multivariate extension of Bayesian Online Changepoint Detection, our method segments changes in the underlying data generating process governing the robot’s dynamics. The robot’s transition model is then informed with a symbolic representation of the current situation derived from the joint distribution of latest state transitions, enabling adaptive and context-aware decision-making. To showcase the real-world effectiveness, we validate our approach in the challenging task of unstructured terrain navigation, where unmodeled and unmeasured terrain characteristics can significantly impact the robot’s motion. Extensive experiments in both simulation and real world reveal significant improvements in data efficiency, policy performance, and the emergence of safer, adaptive navigation strategies.
nan
Article 1491
Title@2025-05-26 (1): Truncated Kernel Stochastic Gradient Descent on Spheres
Title: Truncated Kernel Stochastic Gradient Descent on Spheres | Beschnittener Kern Stochastischer Gradient Abstieg auf Sphären | 球体上被排出核心内核岩层渐变源 2410.01570v5 |
Authors: Jinhui Bai, Lei Shi
Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD introduces a novel regularization strategy by implementing stochastic gradient descent through a closed-form solution of the projection of the stochastic gradient in a low-dimensional subspace. In contrast to traditional kernel SGD, the regularization strategy implemented by T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only $\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$ computational complexity and $\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ storage to achieve optimal rates for the d-dimensional sphere, where $0<\epsilon<\frac{1}{2}$ can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.
nan
Article 1492
Title@2025-05-26 (1): MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model
Title: MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model | MSD-LLM: Schiffshaft in Hafenstaatkontrolle mit großem Sprachmodell vorhersagen | MSD-LLM:用大语言模型预测港口国控制检查中船舶扣留情况 2505.19568v1 |
Authors: Jiongchao Jin, Xiuju Fu, Xiaowei Gao, Tao Cheng, Ran Yan
Maritime transportation is the backbone of global trade, making ship inspection essential for ensuring maritime safety and environmental protection. Port State Control (PSC), conducted by national ports, enforces compliance with safety regulations, with ship detention being the most severe consequence, impacting both ship schedules and company reputations. Traditional machine learning methods for ship detention prediction are limited by the capacity of representation learning and thus suffer from low accuracy. Meanwhile, autoencoder-based deep learning approaches face challenges due to the severe data imbalance in learning historical PSC detention records. To address these limitations, we propose Maritime Ship Detention with Large Language Models (MSD-LLM), integrating a dual robust subspace recovery (DSR) layer-based autoencoder with a progressive learning pipeline to handle imbalanced data and extract meaningful PSC representations. Then, a large language model groups and ranks features to identify likely detention cases, enabling dynamic thresholding for flexible detention predictions. Extensive evaluations on 31,707 PSC inspection records from the Asia-Pacific region show that MSD-LLM outperforms state-of-the-art methods more than 12\% on Area Under the Curve (AUC) for Singapore ports. Additionally, it demonstrates robustness to real-world challenges, making it adaptable to diverse maritime risk assessment scenarios.
nan
Article 1493
Title@2025-05-26 (1): BackSlash: Rate Constrained Optimized Training of Large Language Models
Title: BackSlash: Rate Constrained Optimized Training of Large Language Models | BackSlash: Rate Constrained Optimized Training of Large Language Models | 对大语言模式优化培训 2504.16968v3 |
Authors: Jun Wu, Jiangtao Wen, Yuxing Han
The rapid advancement of large-language models (LLMs) has driven extensive research into parameter compression after training has been completed, yet compression during the training phase remains largely unexplored. In this work, we introduce Rate-Constrained Training (BackSlash), a novel training-time compression approach based on rate-distortion optimization (RDO). BackSlash enables a flexible trade-off between model accuracy and complexity, significantly reducing parameter redundancy while preserving performance. Experiments in various architectures and tasks demonstrate that BackSlash can reduce memory usage by 60% - 90% without accuracy loss and provides significant compression gain compared to compression after training. Moreover, BackSlash proves to be highly versatile: it enhances generalization with small Lagrange multipliers, improves model robustness to pruning (maintaining accuracy even at 80% pruning rates), and enables network simplification for accelerated inference on edge devices.
nan
Article 1494
Title@2025-05-26 (1): Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams
Title: Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams | Lego Sketch: Ein skalierbares neurales Netzwerk für das Sketching von Datenströmen | Lego Sletch: 一个可缩放的内存放大神经网络,用于切割数据流 2505.19561v1 |
Authors: Yuan Feng, Yukun Cao, Hairu Wang, Xike Xie, S Kevin Zhou
Sketches, probabilistic structures for estimating item frequencies in infinite data streams with limited space, are widely used across various domains. Recent studies have shifted the focus from handcrafted sketches to neural sketches, leveraging memory-augmented neural networks (MANNs) to enhance the streaming compression capabilities and achieve better space-accuracy trade-offs.However, existing neural sketches struggle to scale across different data domains and space budgets due to inflexible MANN configurations. In this paper, we introduce a scalable MANN architecture that brings to life the {\it Lego sketch}, a novel sketch with superior scalability and accuracy. Much like assembling creations with modular Lego bricks, the Lego sketch dynamically coordinates multiple memory bricks to adapt to various space budgets and diverse data domains. Our theoretical analysis guarantees its high scalability and provides the first error bound for neural sketch. Furthermore, extensive experimental evaluations demonstrate that the Lego sketch exhibits superior space-accuracy trade-offs, outperforming existing handcrafted and neural sketches. Our code is available at https://github.com/FFY0/LegoSketch_ICML.
nan
Article 1495
Title@2025-05-26 (1): EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding
Title: EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding | EuroCon: Benchmarking Parlament Beratung für politische Konsensfindung | EuroCon:确定议会审议政治共识结果的基准 2505.19558v1 |
Authors: Zhaowei Zhang, Minghua Yi, Mengmeng Wang, Fengshuo Bai, Zilong Zheng, Yipeng Kang, Yaodong Yang
Achieving political consensus is crucial yet challenging for the effective functioning of social governance. However, although frontier AI systems represented by large language models (LLMs) have developed rapidly in recent years, their capabilities on this scope are still understudied. In this paper, we introduce EuroCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to reach political consensus among divergent party positions across diverse parliament settings. Specifically, EuroCon incorporates four factors to build each simulated parliament setting: specific political issues, political goals, participating parties, and power structures based on seat distribution. We also develop an evaluation framework for EuroCon to simulate real voting outcomes in different parliament settings, assessing whether LLM-generated resolutions meet predefined political goals. Our experimental results demonstrate that even state-of-the-art models remain undersatisfied with complex tasks like passing resolutions by a two-thirds majority and addressing security issues, while revealing some common strategies LLMs use to find consensus under different power structures, such as prioritizing the stance of the dominant party, highlighting EuroCon’s promise as an effective platform for studying LLMs’ ability to find political consensus.
nan
Article 1496
Title@2025-05-26 (1): Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics
Title: Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics | Ausrichten von Multiclass Neural Network Klassifikator Kriterium mit Task Performance Metrics | 将多等神经网络分类标准与任务性性能计量对齐 2405.20954v2 |
Authors: Deyuan Li, Taesoo Daniel Lee, Marynel Vázquez, Nathan Tsoi
Multiclass neural network classifiers are typically trained using cross-entropy loss but evaluated using metrics derived from the confusion matrix, such as Accuracy, $F_\beta$-Score, and Matthews Correlation Coefficient. This mismatch between the training objective and evaluation metric can lead to suboptimal performance, particularly when the user’s priorities differ from what cross-entropy implicitly optimizes. For example, in the presence of class imbalance, $F_1$-Score may be preferred over Accuracy. Similarly, given a preference towards precision, the $F_{\beta=0.25}$-Score will better reflect this preference than $F_1$-Score. However, standard cross-entropy loss does not accommodate such a preference. Building on prior work leveraging soft-set confusion matrices and a continuous piecewise-linear Heaviside approximation, we propose Evaluation Aligned Surrogate Training (EAST), a novel approach to train multiclass classifiers using close surrogates of confusion-matrix based metrics, thereby aligning a neural network classifier’s predictions more closely to a target evaluation metric than typical cross-entropy loss. EAST introduces three key innovations: First, we propose a novel dynamic thresholding approach during training. Second, we propose using a multiclass soft-set confusion matrix. Third, we introduce an annealing process that gradually aligns the surrogate loss with the target evaluation metric. Our theoretical analysis shows that EAST results in consistent estimators of the target evaluation metric. Furthermore, we show that the learned network parameters converge asymptotically to values that optimize for the target evaluation metric. Extensive experiments validate the effectiveness of our approach, demonstrating improved alignment between training objectives and evaluation metrics, while outperforming existing methods across many datasets.
nan
Article 1497
Title@2025-05-26 (1): On scalable and efficient training of diffusion samplers
Title: On scalable and efficient training of diffusion samplers | Zur skalierbaren und effizienten Schulung von Diffusionssammlern | 对推广采样员进行可推广和高效率的培训 2505.19552v1 |
Authors: Minkyu Kim, Kiyoung Seong, Dongyeop Woo, Sungsoo Ahn, Minsu Kim
We address the challenge of training diffusion models to sample from unnormalized energy distributions in the absence of data, the so-called diffusion samplers. Although these approaches have shown promise, they struggle to scale in more demanding scenarios where energy evaluations are expensive and the sampling space is high-dimensional. To address this limitation, we propose a scalable and sample-efficient framework that properly harmonizes the powerful classical sampling method and the diffusion sampler. Specifically, we utilize Monte Carlo Markov chain (MCMC) samplers with a novelty-based auxiliary energy as a Searcher to collect off-policy samples, using an auxiliary energy function to compensate for exploring modes the diffusion sampler rarely visits. These off-policy samples are then combined with on-policy data to train the diffusion sampler, thereby expanding its coverage of the energy landscape. Furthermore, we identify primacy bias, i.e., the preference of samplers for early experience during training, as the main cause of mode collapse during training, and introduce a periodic re-initialization trick to resolve this issue. Our method significantly improves sample efficiency on standard benchmarks for diffusion samplers and also excels at higher-dimensional problems and real-world molecular conformer generation.
nan
Article 1498
Title@2025-05-26 (1): Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach
Title: Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach | Entsperren der Macht von Diffusionsmodellen in der sequentiellen Empfehlung: Ein einfacher und effektiver Ansatz | 在 “ 序列建议:简单而有效办法 “ 中解锁扩散模型扩散能力 2505.19544v1 |
Authors: Jialei Chen, Yuanbo Xu, Yiheng Jiang
In this paper, we focus on the often-overlooked issue of embedding collapse in existing diffusion-based sequential recommendation models and propose ADRec, an innovative framework designed to mitigate this problem. Diverging from previous diffusion-based methods, ADRec applies an independent noise process to each token and performs diffusion across the entire target sequence during training. ADRec captures token interdependency through auto-regression while modeling per-token distributions through token-level diffusion. This dual approach enables the model to effectively capture both sequence dynamics and item representations, overcoming the limitations of existing methods. To further mitigate embedding collapse, we propose a three-stage training strategy: (1) pre-training the embedding weights, (2) aligning these weights with the ADRec backbone, and (3) fine-tuning the model. During inference, ADRec applies the denoising process only to the last token, ensuring that the meaningful patterns in historical interactions are preserved. Our comprehensive empirical evaluation across six datasets underscores the effectiveness of ADRec in enhancing both the accuracy and efficiency of diffusion-based sequential recommendation systems.
nan
Article 1499
Title@2025-05-26 (1): Cuff-KT: Tackling Learners’ Real-time Learning Pattern Adjustment via Tuning-Free Knowledge State Guided Model Updating
Title: Cuff-KT: Tackling Learners’ Real-time Learning Pattern Adjustment via Tuning-Free Knowledge State Guided Model Updating | Cuff-KT: Anpassung von Lernmustern in Echtzeit durch Tuning-Free Knowledge State Guided Model Aktualisieren | CUff-KT:通过更新无资-无知识国家指导模式,解决学生实时学习模式调整问题 2505.19543v1 |
Authors: Yiyun Zhou, Zheqi Lv, Shengyu Zhang, Jingyuan Chen
Knowledge Tracing (KT) is a core component of Intelligent Tutoring Systems, modeling learners’ knowledge state to predict future performance and provide personalized learning support. Traditional KT models assume that learners’ learning abilities remain relatively stable over short periods or change in predictable ways based on prior performance. However, in reality, learners’ abilities change irregularly due to factors like cognitive fatigue, motivation, and external stress – a task introduced, which we refer to as Real-time Learning Pattern Adjustment (RLPA). Existing KT models, when faced with RLPA, lack sufficient adaptability, because they fail to timely account for the dynamic nature of different learners’ evolving learning patterns. Current strategies for enhancing adaptability rely on retraining, which leads to significant overfitting and high time overhead issues. To address this, we propose Cuff-KT, comprising a controller and a generator. The controller assigns value scores to learners, while the generator generates personalized parameters for selected learners. Cuff-KT controllably adapts to data changes fast and flexibly without fine-tuning. Experiments on five datasets from different subjects demonstrate that Cuff-KT significantly improves the performance of five KT models with different structures under intra- and inter-learner shifts, with an average relative increase in AUC of 10% and 4%, respectively, at a negligible time cost, effectively tackling RLPA task. Our code and datasets are fully available at https://github.com/zyy-2001/Cuff-KT.
nan
Article 1500
Title@2025-05-26 (1): FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation
Title: FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation | FastCache: Schnelles Caching für Difffusionstransformator durch erlernbare lineare Annäherung | 快速缓存: 通过可学习的线性近似化快速缓存扩散变异器 2505.20353v1 |
Authors: Dong Liu, Jiayi Zhang, Yifan Li, Yanxuan Yu, Ben Lengerich, Ying Nian Wu
Diffusion Transformers (DiT) are powerful generative models but remain computationally intensive due to their iterative structure and deep transformer stacks. To alleviate this inefficiency, we propose FastCache, a hidden-state-level caching and compression framework that accelerates DiT inference by exploiting redundancy within the model’s internal representations. FastCache introduces a dual strategy: (1) a spatial-aware token selection mechanism that adaptively filters redundant tokens based on hidden state saliency, and (2) a transformer-level cache that reuses latent activations across timesteps when changes are statistically insignificant. These modules work jointly to reduce unnecessary computation while preserving generation fidelity through learnable linear approximation. Theoretical analysis shows that FastCache maintains bounded approximation error under a hypothesis-testing-based decision rule. Empirical evaluations across multiple DiT variants demonstrate substantial reductions in latency and memory usage, with best generation output quality compared to other cache methods, as measured by FID and t-FID. Code implementation of FastCache is available on GitHub at https://github.com/NoakLiu/FastCache-xDiT.
nan
Article 1501
Title@2025-05-26 (1): R3: Robust Rubric-Agnostic Reward Models
Title: R3: Robust Rubric-Agnostic Reward Models | R3: Robuste Rubric-Agnostische Belohnungsmodelle | R3:坚固的Rubric-不可知奖赏模型 2505.13388v2 |
Authors: David Anugraha, Zilu Tang, Lester James V. Miranda, Hanyang Zhao, Mohammad Rifqi Farhansyah, Garry Kuwanto, Derry Wijaya, Genta Indra Winata
Reward models are essential for aligning language model outputs with human preferences, yet existing approaches often lack both controllability and interpretability. These models are typically optimized for narrow objectives, limiting their generalizability to broader downstream tasks. Moreover, their scalar outputs are difficult to interpret without contextual reasoning. To address these limitations, we introduce R3, a novel reward modeling framework that is rubric-agnostic, generalizable across evaluation dimensions, and provides interpretable, reasoned score assignments. R3 enables more transparent and flexible evaluation of language models, supporting robust alignment with diverse human values and use cases. Our models, data, and code are available as open source at https://github.com/rubricreward/r3
nan
Article 1502
Title@2025-05-26 (1): Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Title: Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs | Amulett: Neuausrichtung während der Testzeit für Personalisierte Präferenzanpassung von LLMs | 缩略图:在试验期间重新对准,以适应LLMM的个性化偏好 2502.19148v2 |
Authors: Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang
How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, user preferences are usually personalized, changing, and diverse regarding culture, values, or time. This leads to the problem that the actual user preferences often do not coincide with those trained by the model developers in the practical use of LLMs. Since we cannot collect enough data and retrain for every demand, researching efficient real-time preference adaptation methods based on the backbone LLMs during test time is important. To this end, we introduce Amulet, a novel, training-free framework that formulates the decoding process of every token as a separate online learning problem with the guidance of simple user-provided prompts, thus enabling real-time optimization to satisfy users’ personalized preferences. To reduce the computational cost brought by this optimization process for each token, we additionally provide a closed-form solution for each iteration step of the optimization process, thereby reducing the computational time cost to a negligible level. The detailed experimental results demonstrate that Amulet can achieve significant performance improvements in rich settings with combinations of different LLMs, datasets, and user preferences, while maintaining acceptable computational efficiency.
nan
Article 1503
Title@2025-05-26 (1): CITRAS: Covariate-Informed Transformer for Time Series Forecasting
Title: CITRAS: Covariate-Informed Transformer for Time Series Forecasting | CITRAS: Kovariat-informierter Transformer für die Zeitreihenprognose | CITRAS: 用于时间序列预测的共变-内建变换器 2503.24007v2 |
Authors: Yosuke Yamaguchi, Issei Suemitsu, Wenpeng Wei
In practical time series forecasting, covariates provide rich contextual information that can potentially enhance the forecast of target variables. Although some covariates extend into the future forecasting horizon (e.g., calendar events, discount schedules), most multivariate models fail to leverage this pivotal insight due to the length discrepancy with target variables. Additionally, capturing the dependency between target variables and covariates is non-trivial, as models must precisely reflect the local impact of covariates while also capturing global cross-variate dependencies. To overcome these challenges, we propose CITRAS, a decoder-only Transformer that flexibly leverages multiple targets, past covariates, and future covariates. While preserving strong autoregressive capabilities, CITRAS introduces two novel mechanisms in patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates future covariates into the forecasting of target variables based on their concurrent dependencies. Additionally, Attention Score Smoothing refines locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the past series of attention scores. Experimentally, CITRAS outperforms state-of-the-art models on thirteen real-world benchmarks from both covariate-informed and multivariate settings, demonstrating its versatile ability to leverage cross-variate and cross-time dependencies for improved forecasting accuracy.
nan
Article 1504
Title@2025-05-26 (1): Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games
Title: Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games | Kontinuierliche Zeitanalyse von schweren Ball Momentum in Min-Max-Spiele | Min-Min-Max运动会重球势连续分析 2505.19537v1 |
Authors: Yi Feng, Kaito Fujii, Stratis Skoulakis, Xiao Wang, Volkan Cevher
Since Polyak’s pioneering work, heavy ball (HB) momentum has been widely studied in minimization. However, its role in min-max games remains largely unexplored. As a key component of practical min-max algorithms like Adam, this gap limits their effectiveness. In this paper, we present a continuous-time analysis for HB with simultaneous and alternating update schemes in min-max games. Locally, we prove smaller momentum enhances algorithmic stability by enabling local convergence across a wider range of step sizes, with alternating updates generally converging faster. Globally, we study the implicit regularization of HB, and find smaller momentum guides algorithms trajectories towards shallower slope regions of the loss landscapes, with alternating updates amplifying this effect. Surprisingly, all these phenomena differ from those observed in minimization, where larger momentum yields similar effects. Our results reveal fundamental differences between HB in min-max games and minimization, and numerical experiments further validate our theoretical results.
nan
Article 1505
Title@2025-05-26 (1): Training-Free Multi-Step Audio Source Separation
Title: Training-Free Multi-Step Audio Source Separation | Schulungsfreie Mehrstufen-Audio-Quellentrennung | 无培训的多步骤多步骤音频来源分离 2505.19534v1 |
Authors: Yongyi Zang, Jingyi Li, Qiuqiang Kong
Audio source separation aims to separate a mixture into target sources. Previous audio source separation systems usually conduct one-step inference, which does not fully explore the separation ability of models. In this work, we reveal that pretrained one-step audio source separation models can be leveraged for multi-step separation without additional training. We propose a simple yet effective inference method that iteratively applies separation by optimally blending the input mixture with the previous step’s separation result. At each step, we determine the optimal blending ratio by maximizing a metric. We prove that our method always yield improvement over one-step inference, provide error bounds based on model smoothness and metric robustness, and provide theoretical analysis connecting our method to denoising along linear interpolation paths between noise and clean distributions, a property we link to denoising diffusion bridge models. Our approach effectively delivers improved separation performance as a “free lunch” from existing models. Our empirical results demonstrate that our multi-step separation approach consistently outperforms one-step inference across both speech enhancement and music source separation tasks, and can achieve scaling performance similar to training a larger model, using more data, or in some cases employing a multi-step training objective. These improvements appear not only on the optimization metric during multi-step inference, but also extend to nearly all non-optimized metrics (with one exception). We also discuss limitations of our approach and directions for future research.
nan
Article 1506
Title@2025-05-26 (1): ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models
Title: ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models | ExAnte: Ein Benchmark für Ex-Ante-Schlussfolgerungen in großen Sprachmodellen | ExAnte:大语言模型前推定基准 2505.19533v1 |
Authors: Yachuan Liu, Xiaochun Wei, Lin Shi, Xinnuo Li, Bohan Zhang, Paramveer Dhillon, Qiaozhu Mei
Large language models (LLMs) face significant challenges in ex-ante reasoning, where analysis, inference, or predictions must be made without access to information from future events. Even with explicit prompts enforcing temporal cutoffs, LLMs often generate outputs influenced by internalized knowledge of events beyond the specified cutoff. This paper introduces a novel task and benchmark designed to evaluate the ability of LLMs to reason while adhering to such temporal constraints. The benchmark includes a variety of tasks: stock prediction, Wikipedia event prediction, scientific publication prediction, and Question Answering (QA), designed to assess factual knowledge under temporal cutoff constraints. We use leakage rate to quantify models’ reliance on future information beyond cutoff timestamps. Experimental results reveal that LLMs struggle to consistently adhere to temporal cutoffs across common prompting strategies and tasks, demonstrating persistent challenges in ex-ante reasoning. This benchmark provides a potential evaluation framework to advance the development of LLMs’ temporal reasoning ability for time-sensitive applications.
nan
Article 1507
Title@2025-05-26 (1): Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning
Title: Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning | Fox im Henhouse: Supply-Chain-Hintertür greift gegen Verstärkungslernen an | Henhouse的狐狸:供应-Chain对加强学习的后门攻击 2505.19532v1 |
Authors: Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah Erfani, Benjamin I. P. Rubinstein
The current state-of-the-art backdoor attacks against Reinforcement Learning (RL) rely upon unrealistically permissive access models, that assume the attacker can read (or even write) the victim’s policy parameters, observations, or rewards. In this work, we question whether such a strong assumption is required to launch backdoor attacks against RL. To answer this question, we propose the \underline{S}upply-\underline{C}h\underline{a}in \underline{B}ackdoor (SCAB) attack, which targets a common RL workflow: training agents using external agents that are provided separately or embedded within the environment. In contrast to prior works, our attack only relies on legitimate interactions of the RL agent with the supplied agents. Despite this limited access model, by poisoning a mere $3\%$ of training experiences, our attack can successfully activate over $90\%$ of triggered actions, reducing the average episodic return by $80\%$ for the victim. Our novel attack demonstrates that RL attacks are likely to become a reality under untrusted RL training supply-chains.
nan
Article 1508
Title@2025-05-26 (1): Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Title: Minimalist Softmax Attention Provably Learns Constrained Boolean Functions | Minimalistische Softmax-Achtung lernt nachweislich eingeschränkte Boolean-Funktionen | 最小软性软性关注 2505.19531v1 |
Authors: Jerry Yao-Chieh Hu, Xiwen Zhang, Maojiang Su, Zhao Song, Han Liu
We study the computational limits of learning $k$-bit Boolean functions (specifically, $\mathrm{AND}$, $\mathrm{OR}$, and their noisy variants), using a minimalist single-head softmax-attention mechanism, where $k=\Theta(d)$ relevant bits are selected from $d$ inputs. We show that these simple $\mathrm{AND}$ and $\mathrm{OR}$ functions are unsolvable with a single-head softmax-attention mechanism alone. However, with teacher forcing, the same minimalist attention is capable of solving them. These findings offer two key insights: Architecturally, solving these Boolean tasks requires only minimalist attention, without deep Transformer blocks or FFNs. Methodologically, one gradient descent update with supervision suffices and replaces the multi-step Chain-of-Thought (CoT) reasoning scheme of [Kim and Suzuki, ICLR 2025] for solving Boolean problems. Together, the bounds expose a fundamental gap between what this minimal architecture achieves under ideal supervision and what is provably impossible under standard training.
nan
Article 1509
Title@2025-05-26 (1): SLOT: Sample-specific Language Model Optimization at Test-time
Title: SLOT: Sample-specific Language Model Optimization at Test-time | Steckplatz: Beispielspezifische Sprachmodelloptimierung zur Testzeit | SPLOT: 测试时特定抽样语文示范模式优化 2505.12392v2 |
Authors: Yang Hu, Xingyu Zhang, Xueji Fang, Zhiyang Chen, Xiao Wang, Huatian Zhang, Guojun Qi
We propose SLOT (Sample-specific Language Model Optimization at Test-time), a novel and parameter-efficient test-time inference approach that enhances a language model’s ability to more accurately respond to individual prompts. Existing Large Language Models (LLMs) often struggle with complex instructions, leading to poor performances on those not well represented among general samples. To address this, SLOT conducts few optimization steps at test-time to update a light-weight sample-specific parameter vector. It is added to the final hidden layer before the output head, and enables efficient adaptation by caching the last layer features during per-sample optimization. By minimizing the cross-entropy loss on the input prompt only, SLOT helps the model better aligned with and follow each given instruction. In experiments, we demonstrate that our method outperforms the compared models across multiple benchmarks and LLMs. For example, Qwen2.5-7B with SLOT achieves an accuracy gain of 8.6% on GSM8K from 57.54% to 66.19%, while DeepSeek-R1-Distill-Llama-70B with SLOT achieves a SOTA accuracy of 68.69% on GPQA among 70B-level models. Our code is available at https://github.com/maple-research-lab/SLOT.
nan
Article 1510
Title@2025-05-26 (1): Navigating loss manifolds via rigid body dynamics: A promising avenue for robustness and generalisation
Title: Navigating loss manifolds via rigid body dynamics: A promising avenue for robustness and generalisation | Navigieren von Verlustkrümmern über starre Körperdynamik: Ein vielversprechender Weg für Robustheit und Verallgemeinerung | 通过僵硬体体体动态来控制损失方块:加强和普及的有希望的途径 2505.19527v1 |
Authors: Mohammed D. Belgoumri, Mohamed Reda Bouadjenek, Hakim Hacid, Imran Razzak, Sunil Aryal
Training large neural networks through gradient-based optimization requires navigating high-dimensional loss landscapes, which often exhibit pathological geometry, leading to undesirable training dynamics. In particular, poor generalization frequently results from convergence to sharp minima that are highly sensitive to input perturbations, causing the model to overfit the training data while failing to generalize to unseen examples. Furthermore, these optimization procedures typically display strong dependence on the fine structure of the loss landscape, leading to unstable training dynamics, due to the fractal-like nature of the loss surface. In this work, we propose an alternative optimizer that simultaneously reduces this dependence, and avoids sharp minima, thereby improving generalization. This is achieved by simulating the motion of the center of a ball rolling on the loss landscape. The degree to which our optimizer departs from the standard gradient descent is controlled by a hyperparameter, representing the radius of the ball. Changing this hyperparameter allows for probing the loss landscape at different scales, making it a valuable tool for understanding its geometry.
nan
Article 1511
Title@2025-05-26 (1): Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
Title: Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate | Rethinking Gating Mechanism in Sparse MoE: Arbiträre Modalitätsinputs mit vertrauensgeführtem Tor bearbeiten | 微粒MOE中的重新思考定位机制:用信任引导门处理任意模式投入 2505.19525v1 |
Authors: Liangwei Nathan Zheng, Wei Emma Zhang, Mingyu Guo, Miao Xu, Olaf Maennel, Weitong Chen
Effectively managing missing modalities is a fundamental challenge in real-world multimodal learning scenarios, where data incompleteness often results from systematic collection errors or sensor failures. Sparse Mixture-of-Experts (SMoE) architectures have the potential to naturally handle multimodal data, with individual experts specializing in different modalities. However, existing SMoE approach often lacks proper ability to handle missing modality, leading to performance degradation and poor generalization in real-world applications. We propose Conf-SMoE to introduce a two-stage imputation module to handle the missing modality problem for the SMoE architecture and reveal the insight of expert collapse from theoretical analysis with strong empirical evidence. Inspired by our theoretical analysis, Conf-SMoE propose a novel expert gating mechanism by detaching the softmax routing score to task confidence score w.r.t ground truth. This naturally relieves expert collapse without introducing additional load balance loss function. We show that the insights of expert collapse aligns with other gating mechanism such as Gaussian and Laplacian gate. We also evaluate the proposed method on four different real world dataset with three different experiment settings to conduct comprehensive the analysis of Conf-SMoE on modality fusion and resistance to missing modality.
nan
Article 1512
Title@2025-05-26 (1): Semi-Supervised Model-Free Bayesian State Estimation from Compressed Measurements
Title: Semi-Supervised Model-Free Bayesian State Estimation from Compressed Measurements | Halbüberwachte modellfreie bayesische Staatsschätzung aus komprimierten Messungen | 根据压缩计量法对贝耶斯州无模式模型的半有效估算 2407.07368v5 |
Authors: Anubhab Ghosh, Yonina C. Eldar, Saikat Chatterjee
We consider data-driven Bayesian state estimation from compressed measurements (BSCM) of a model-free process. The dimension of the temporal measurement vector is lower than that of the temporal state vector to be estimated, leading to an under-determined inverse problem. The underlying dynamical model of the state’s evolution is unknown for a ‘model-free process.’ Hence, it is difficult to use traditional model-driven methods, for example, Kalman and particle filters. Instead, we consider data-driven methods. We experimentally show that two existing unsupervised learning-based data-driven methods fail to address the BSCM problem in a model-free process. The methods are – data-driven nonlinear state estimation (DANSE) and deep Markov model (DMM). While DANSE provides good predictive/forecasting performance to model the temporal measurement data as a time series, its unsupervised learning lacks suitable regularization for tackling the BSCM task. We then propose a semi-supervised learning approach and develop a semi-supervised learning-based DANSE method, referred to as SemiDANSE. In SemiDANSE, we use a large amount of unlabelled data along with a limited amount of labelled data, i.e., pairwise measurement-and-state data, which provides the desired regularization. Using three benchmark dynamical systems, we empirically show that the data-driven SemiDANSE provides competitive state estimation performance for BSCM using a handful of different measurement systems, against a hybrid method called KalmanNet and two model-driven methods (extended Kalman filter and unscented Kalman filter) that know the dynamical models exactly.
nan
Article 1513
Title@2025-05-26 (1): Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning
Title: Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning | Anwendungen und Wirkungsbewertung generativer adversarialer Netzwerke im semi-überwachten Lernen | 半监测学习中产生反效果网络的应用和效果评价 2505.19522v1 |
Authors: Jiyu Hu, Haijiang Zeng, Zhen Tian
In recent years, image classification, as a core task in computer vision, relies on high-quality labelled data, which restricts the wide application of deep learning models in practical scenarios. To alleviate the problem of insufficient labelled samples, semi-supervised learning has gradually become a research hotspot. In this paper, we construct a semi-supervised image classification model based on Generative Adversarial Networks (GANs), and through the introduction of the collaborative training mechanism of generators, discriminators and classifiers, we achieve the effective use of limited labelled data and a large amount of unlabelled data, improve the quality of image generation and classification accuracy, and provide an effective solution for the task of image recognition in complex environments.
nan
Article 1514
Title@2025-05-26 (1): Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures
Title: Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures | Dynamisches Lernen unter Umweltauflagen durch messinduzierte Bundle-Strukturen | 通过衡量产生的捆绑结构,在环境制约因素下学习动力 2505.19521v1 |
Authors: Dongzhe Zheng, Wenjie Mei
Learning unknown dynamics under environmental (or external) constraints is fundamental to many fields (e.g., modern robotics), particularly challenging when constraint information is only locally available and uncertain. Existing approaches requiring global constraints or using probabilistic filtering fail to fully exploit the geometric structure inherent in local measurements (by using, e.g., sensors) and constraints. This paper presents a geometric framework unifying measurements, constraints, and dynamics learning through a fiber bundle structure over the state space. This naturally induced geometric structure enables measurement-aware Control Barrier Functions that adapt to local sensing (or measurement) conditions. By integrating Neural ODEs, our framework learns continuous-time dynamics while preserving geometric constraints, with theoretical guarantees of learning convergence and constraint satisfaction dependent on sensing quality. The geometric framework not only enables efficient dynamics learning but also suggests promising directions for integration with reinforcement learning approaches. Extensive simulations demonstrate significant improvements in both learning efficiency and constraint satisfaction over traditional methods, especially under limited and uncertain sensing conditions.
nan
Article 1515
Title@2025-05-26 (1): SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
Title: SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback | SIPDO: Closed-Loop Prompt Optimierung über Synthetic Data Feedback | SIPDO:通过合成数据反馈,通过闭闭电话快速优化 2505.19514v1 |
Authors: Yaoning Yu, Ye Yu, Kai Wei, Haojing Luo, Haohan Wang
Prompt quality plays a critical role in the performance of large language models (LLMs), motivating a growing body of work on prompt optimization. Most existing methods optimize prompts over a fixed dataset, assuming static input distributions and offering limited support for iterative improvement. We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop framework for prompt learning that integrates synthetic data generation into the optimization process. SIPDO couples a synthetic data generator with a prompt optimizer, where the generator produces new examples that reveal current prompt weaknesses and the optimizer incrementally refines the prompt in response. This feedback-driven loop enables systematic improvement of prompt performance without assuming access to external supervision or new tasks. Experiments across question answering and reasoning benchmarks show that SIPDO outperforms standard prompt tuning methods, highlighting the value of integrating data synthesis into prompt learning workflows.
nan
Article 1516
Title@2025-05-26 (1): Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models
Title: Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models | Benchmarking multimodaler Wissenskonflikt für große multimodale Modelle | 确定大型多式联运模式多模式知识冲突基准 2505.19509v1 |
Authors: Yifan Jia, Kailin Jiang, Yuyang Liang, Qihan Ren, Yi Xin, Rui Yang, Fenze Feng, Mingcai Chen, Hengyang Lu, Haozhe Wang, Xiaoye Qu, Dongrui Liu, Lizhen Cui, Yuntao Du
Large Multimodal Models(LMMs) face notable challenges when encountering multimodal knowledge conflicts, particularly under retrieval-augmented generation(RAG) frameworks where the contextual information from external sources may contradict the model’s internal parametric knowledge, leading to unreliable outputs. However, existing benchmarks fail to reflect such realistic conflict scenarios. Most focus solely on intra-memory conflicts, while context-memory and inter-context conflicts remain largely investigated. Furthermore, commonly used factual knowledge-based evaluations are often overlooked, and existing datasets lack a thorough investigation into conflict detection capabilities. To bridge this gap, we propose MMKC-Bench, a benchmark designed to evaluate factual knowledge conflicts in both context-memory and inter-context scenarios. MMKC-Bench encompasses three types of multimodal knowledge conflicts and includes 1,573 knowledge instances and 3,381 images across 23 broad types, collected through automated pipelines with human verification. We evaluate three representative series of LMMs on both model behavior analysis and conflict detection tasks. Our findings show that while current LMMs are capable of recognizing knowledge conflicts, they tend to favor internal parametric knowledge over external evidence. We hope MMKC-Bench will foster further research in multimodal knowledge conflict and enhance the development of multimodal RAG systems. The source code is available at https://github.com/MLLMKCBENCH/MLLMKC.
nan
Article 1517
Title@2025-05-26 (1): Multimodal Machine Translation with Visual Scene Graph Pruning
Title: Multimodal Machine Translation with Visual Scene Graph Pruning | Multimodale maschinelle Übersetzung mit visuellen Szenendiagrammen | 带有视觉场景图的多式机器翻译 2505.19507v1 |
Authors: Chenyu Lu, Shiliang Sun, Jing Zhao, Nan Zhang, Tengfei Song, Hao Yang
Multimodal machine translation (MMT) seeks to address the challenges posed by linguistic polysemy and ambiguity in translation tasks by incorporating visual information. A key bottleneck in current MMT research is the effective utilization of visual data. Previous approaches have focused on extracting global or region-level image features and using attention or gating mechanisms for multimodal information fusion. However, these methods have not adequately tackled the issue of visual information redundancy in MMT, nor have they proposed effective solutions. In this paper, we introduce a novel approach–multimodal machine translation with visual Scene Graph Pruning (PSG), which leverages language scene graph information to guide the pruning of redundant nodes in visual scene graphs, thereby reducing noise in downstream translation tasks. Through extensive comparative experiments with state-of-the-art methods and ablation studies, we demonstrate the effectiveness of the PSG model. Our results also highlight the promising potential of visual information pruning in advancing the field of MMT.
nan
Article 1518
Title@2025-05-26 (1): Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment
Title: Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment | Verständnis, warum große Sprachmodelle in der Zeitreihenanalyse unwirksam sein können: Die Auswirkungen der Modalitätsausrichtung | 理解为何大语言模型在时间序列分析中无效:方式调整的影响 2410.12326v2 |
Authors: Liangwei Nathan Zheng, Chang George Dong, Wei Emma Zhang, Lin Yue, Miao Xu, Olaf Maennel, Weitong Chen
Large Language Models (LLMs) have demonstrated impressive performance in time series analysis and seems to understand the time temporal relationship well than traditional transformer-based approaches. However, since LLMs are not designed for time series tasks, simpler models like linear regressions can often achieve comparable performance with far less complexity. In this study, we perform extensive experiments to assess the effectiveness of applying LLMs to key time series tasks, including forecasting, classification, imputation, and anomaly detection. We compare the performance of LLMs against simpler baseline models, such as single layer linear models and randomly initialized LLMs. Our results reveal that LLMs offer minimal advantages for these core time series tasks and may even distort the temporal structure of the data. In contrast, simpler models consistently outperform LLMs while requiring far fewer parameters. Furthermore, we analyze existing reprogramming techniques and show, through data manifold analysis, that these methods fail to effectively align time series data with language and display “pseudo-alignment” behavior in embedding space. Our findings suggest that the performance of LLM based methods in time series tasks arises from the intrinsic characteristics and structure of time series data, rather than any meaningful alignment with the language model architecture.
nan
Article 1519
Title@2025-05-26 (1): DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation
Title: DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation | DOGe: Defensive Output Generation für LLM-Schutz vor Wissensdestillation | DOGe: 防知识蒸馏保护LLM的防御性产出产生 2505.19504v1 |
Authors: Pingzhi Li, Zhen Tan, Huaizhi Qu, Huan Liu, Tianlong Chen
Large Language Models (LLMs) represent substantial intellectual and economic investments, yet their effectiveness can inadvertently facilitate model imitation via knowledge distillation (KD).In practical scenarios, competitors can distill proprietary LLM capabilities by simply observing publicly accessible outputs, akin to reverse-engineering a complex performance by observation alone. Existing protective methods like watermarking only identify imitation post-hoc, while other defenses assume the student model mimics the teacher’s internal logits, rendering them ineffective against distillation purely from observed output text. This paper confronts the challenge of actively protecting LLMs within the realistic constraints of API-based access. We introduce an effective and efficient Defensive Output Generation (DOGe) strategy that subtly modifies the output behavior of an LLM. Its outputs remain accurate and useful for legitimate users, yet are designed to be misleading for distillation, significantly undermining imitation attempts. We achieve this by fine-tuning only the final linear layer of the teacher LLM with an adversarial loss. This targeted training approach anticipates and disrupts distillation attempts during inference time. Our experiments show that, while preserving or even improving the original performance of the teacher model, student models distilled from the defensively generated teacher outputs demonstrate catastrophically reduced performance, demonstrating our method’s effectiveness as a practical safeguard against KD-based model imitation.
nan
Article 1520
Title@2025-05-26 (1): Differentially private ratio statistics
Title: Differentially private ratio statistics | Statistiken über unterschiedliche private Verhältnisse | 差异性私人比率统计 2505.20351v1 |
Authors: Tomer Shoham, Katrina Ligettt
Ratio statistics–such as relative risk and odds ratios–play a central role in hypothesis testing, model evaluation, and decision-making across many areas of machine learning, including causal inference and fairness analysis. However, despite privacy concerns surrounding many datasets and despite increasing adoption of differential privacy, differentially private ratio statistics have largely been neglected by the literature and have only recently received an initial treatment by Lin et al. [1]. This paper attempts to fill this lacuna, giving results that can guide practice in evaluating ratios when the results must be protected by differential privacy. In particular, we show that even a simple algorithm can provide excellent properties concerning privacy, sample accuracy, and bias, not just asymptotically but also at quite small sample sizes. Additionally, we analyze a differentially private estimator for relative risk, prove its consistency, and develop a method for constructing valid confidence intervals. Our approach bridges a gap in the differential privacy literature and provides a practical solution for ratio estimation in private machine learning pipelines.
nan
Article 1521
Title@2025-05-26 (1): Learning for Dynamic Combinatorial Optimization without Training Data
Title: Learning for Dynamic Combinatorial Optimization without Training Data | Lernen für dynamische kombinatorische Optimierung ohne Trainingsdaten | 没有培训数据的动态组合优化学习 2505.19497v1 |
Authors: Yiqiao Liao, Farinaz Koushanfar, Parinaz Naghizadeh
We introduce DyCO-GNN, a novel unsupervised learning framework for Dynamic Combinatorial Optimization that requires no training data beyond the problem instance itself. DyCO-GNN leverages structural similarities across time-evolving graph snapshots to accelerate optimization while maintaining solution quality. We evaluate DyCO-GNN on dynamic maximum cut, maximum independent set, and the traveling salesman problem across diverse datasets of varying sizes, demonstrating its superior performance under tight and moderate time budgets. DyCO-GNN consistently outperforms the baseline methods, achieving high-quality solutions up to 3-60x faster, highlighting its practical effectiveness in rapidly evolving resource-constrained settings.
nan
Article 1522
Title@2025-05-26 (1): MetaSTNet: Multimodal Meta-learning for Cellular Traffic Conformal Prediction
Title: MetaSTNet: Multimodal Meta-learning for Cellular Traffic Conformal Prediction | MetaSTNet: Multimodales Meta-Learning für zellulären Verkehr Konforme Vorhersage | MetaSTNet: 细胞交通预测的多模式元学习 2505.21553v1 |
Authors: Hui Ma, Kai Yang
Network traffic prediction techniques have attracted much attention since they are valuable for network congestion control and user experience improvement. While existing prediction techniques can achieve favorable performance when there is sufficient training data, it remains a great challenge to make accurate predictions when only a small amount of training data is available. To tackle this problem, we propose a deep learning model, entitled MetaSTNet, based on a multimodal meta-learning framework. It is an end-to-end network architecture that trains the model in a simulator and transfers the meta-knowledge to a real-world environment, which can quickly adapt and obtain accurate predictions on a new task with only a small amount of real-world training data. In addition, we further employ cross conformal prediction to assess the calibrated prediction intervals. Extensive experiments have been conducted on real-world datasets to illustrate the efficiency and effectiveness of MetaSTNet.
nan
Article 1523
Title@2025-05-26 (1): Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
Title: Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval | Discounted Online Convex-Optimierung: Einheitlicher Bedauern über einen kontinuierlichen Intervall | 贴现的在线 Convex 优化: 连续间隔的统一遗憾 2505.19491v1 |
Authors: Wenhao Yang, Sifan Yang, Lijun Zhang
Reflecting the greater significance of recent history over the distant past in non-stationary environments, $\lambda$-discounted regret has been introduced in online convex optimization (OCO) to gracefully forget past data as new information arrives. When the discount factor $\lambda$ is given, online gradient descent with an appropriate step size achieves an $O(1/\sqrt{1-\lambda})$ discounted regret. However, the value of $\lambda$ is often not predetermined in real-world scenarios. This gives rise to a significant open question: is it possible to develop a discounted algorithm that adapts to an unknown discount factor. In this paper, we affirmatively answer this question by providing a novel analysis to demonstrate that smoothed OGD (SOGD) achieves a uniform $O(\sqrt{\log T/1-\lambda})$ discounted regret, holding for all values of $\lambda$ across a continuous interval simultaneously. The basic idea is to maintain multiple OGD instances to handle different discount factors, and aggregate their outputs sequentially by an online prediction algorithm named as Discounted-Normal-Predictor (DNP) (Kapralov and Panigrahy,2010). Our analysis reveals that DNP can combine the decisions of two experts, even when they operate on discounted regret with different discount factors.
nan
Article 1524
Title@2025-05-26 (1): Understanding Transformer from the Perspective of Associative Memory
Title: Understanding Transformer from the Perspective of Associative Memory | Transformer aus der Perspektive des assoziativen Gedächtnisses verstehen | 从共同记忆的角度理解变异器 2505.19488v1 |
Authors: Shu Zhong, Mingyu Xu, Tenglong Ao, Guang Shi
In this paper, we share our reflections and insights on understanding Transformer architectures through the lens of associative memory–a classic psychological concept inspired by human cognition. We start with the basics of associative memory (think simple linear attention) and then dive into two dimensions: Memory Capacity: How much can a Transformer really remember, and how well? We introduce retrieval SNR to measure this and use a kernel perspective to mathematically reveal why Softmax Attention is so effective. We also show how FFNs can be seen as a type of associative memory, leading to insights on their design and potential improvements. Memory Update: How do these memories learn and evolve? We present a unified framework for understanding how different Transformer variants (like DeltaNet and Softmax Attention) update their “knowledge base”. This leads us to tackle two provocative questions: 1. Are Transformers fundamentally limited in what they can express, and can we break these barriers? 2. If a Transformer had infinite context, would it become infinitely intelligent? We want to demystify Transformer architecture, offering a clearer understanding of existing designs. This exploration aims to provide fresh insights and spark new avenues for Transformer innovation.
nan
Article 1525
Title@2025-05-26 (1): VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning
Title: VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning | VLMLight: Verkehrssignalsteuerung über Vision-Language Meta-Control und Dual-Branch-Reasoning | VLMLight:通过视觉语言、超控制和双层理由解释控制交通信号控制 2505.19486v1 |
Authors: Maonan Wang, Yirong Chen, Aoyu Pang, Yuxin Cai, Chung Shue Chen, Yuheng Kan, Man-On Pun
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
nan
Article 1526
Title@2025-05-26 (1): Understanding the learned look-ahead behavior of chess neural networks
Title: Understanding the learned look-ahead behavior of chess neural networks | Das gelernte Look-Ahead-Verhalten von neuronalen Schachnetzwerken verstehen | 了解国际象棋神经网络所学的直视行为 2505.21552v1 |
Authors: Diogo Cruz
We investigate the look-ahead capabilities of chess-playing neural networks, specifically focusing on the Leela Chess Zero policy network. We build on the work of Jenner et al. (2024) by analyzing the model’s ability to consider future moves and alternative sequences beyond the immediate next move. Our findings reveal that the network’s look-ahead behavior is highly context-dependent, varying significantly based on the specific chess position. We demonstrate that the model can process information about board states up to seven moves ahead, utilizing similar internal mechanisms across different future time steps. Additionally, we provide evidence that the network considers multiple possible move sequences rather than focusing on a single line of play. These results offer new insights into the emergence of sophisticated look-ahead capabilities in neural networks trained on strategic tasks, contributing to our understanding of AI reasoning in complex domains. Our work also showcases the effectiveness of interpretability techniques in uncovering cognitive-like processes in artificial intelligence systems.
nan
Article 1527
Title@2025-05-26 (1): Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
Title: Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs | Gewinnen Sie schnell oder verlieren Sie langsam: Ausgleichende Geschwindigkeit und Genauigkeit in Latenz-Sensitive Entscheidungen von LLMs | 慢赢或慢输:LLMs的延缓敏感决定中平衡速度和准确性 2505.19481v1 |
Authors: Hao Kang, Qingru Zhang, Han Cai, Weiyuan Xu, Tushar Krishna, Yilun Du, Tsachy Weissman
Large language models (LLMs) have shown remarkable performance across diverse reasoning and generation tasks, and are increasingly deployed as agents in dynamic environments such as code generation and recommendation systems. However, many real-world applications, such as high-frequency trading and real-time competitive gaming, require decisions under strict latency constraints, where faster responses directly translate into higher rewards. Despite the importance of this latency quality trade off, it remains underexplored in the context of LLM based agents. In this work, we present the first systematic study of this trade off in real time decision making tasks. To support our investigation, we introduce two new benchmarks: HFTBench, a high frequency trading simulation, and StreetFighter, a competitive gaming platform. Our analysis reveals that optimal latency quality balance varies by task, and that sacrificing quality for lower latency can significantly enhance downstream performance. To address this, we propose FPX, an adaptive framework that dynamically selects model size and quantization level based on real time demands. Our method achieves the best performance on both benchmarks, improving win rate by up to 80% in Street Fighter and boosting daily yield by up to 26.52% in trading, underscoring the need for latency aware evaluation and deployment strategies for LLM based agents. These results demonstrate the critical importance of latency aware evaluation and deployment strategies for real world LLM based agents. Our benchmarks are available at Latency Sensitive Benchmarks.
nan
Article 1528
Title@2025-05-26 (1): Revolutionizing Wildfire Detection with Convolutional Neural Networks: A VGG16 Model Approach
Title: Revolutionizing Wildfire Detection with Convolutional Neural Networks: A VGG16 Model Approach | Revolutionierung der Wildfire-Detektion mit konvolutionären neuralen Netzwerken: Ein VGG16-Modellansatz | 与革命神经神经网络一起革命性野火探测革命:VGG16示范方法 2505.19479v1 |
Authors: Lakshmi Aishwarya Malladi, Navarun Gupta, Ahmed El-Sayed, Xingguo Xiong
Over 8,024 wildfire incidents have been documented in 2024 alone, affecting thousands of fatalities and significant damage to infrastructure and ecosystems. Wildfires in the United States have inflicted devastating losses. Wildfires are becoming more frequent and intense, which highlights how urgently efficient warning systems are needed to avoid disastrous outcomes. The goal of this study is to enhance the accuracy of wildfire detection by using Convolutional Neural Network (CNN) built on the VGG16 architecture. The D-FIRE dataset, which includes several kinds of wildfire and non-wildfire images, was employed in the study. Low-resolution images, dataset imbalance, and the necessity for real-time applicability are some of the main challenges. These problems were resolved by enriching the dataset using data augmentation techniques and optimizing the VGG16 model for binary classification. The model produced a low false negative rate, which is essential for reducing unexplored fires, despite dataset boundaries. In order to help authorities execute fast responses, this work shows that deep learning models such as VGG16 can offer a reliable, automated approach for early wildfire recognition. For the purpose of reducing the impact of wildfires, our future work will concentrate on connecting to systems with real-time surveillance networks and enlarging the dataset to cover more varied fire situations.
nan
Article 1529
Title@2025-05-26 (1): Weighted quantization using MMD: From mean field to mean shift via gradient flows
Title: Weighted quantization using MMD: From mean field to mean shift via gradient flows | Gewichtete Quantisierung mit MMD: Vom mittleren Feld zur mittleren Verschiebung über Gradientenströme | 使用 MMD 加权量化: 从平均字段到通过梯度流转移 2502.10600v2 |
Authors: Ayoub Belhadji, Daniel Sharp, Youssef Marzouk
Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures that best approximates the target distribution. While much existing work relies on the Wasserstein distance to quantify approximation errors, maximum mean discrepancy (MMD) has received comparatively less attention, especially when allowing for variable particle weights. We argue that a Wasserstein-Fisher-Rao gradient flow is well-suited for designing quantizations optimal under MMD. We show that a system of interacting particles satisfying a set of ODEs discretizes this flow. We further derive a new fixed-point algorithm called mean shift interacting particles (MSIP). We show that MSIP extends the classical mean shift algorithm, widely used for identifying modes in kernel density estimators. Moreover, we show that MSIP can be interpreted as preconditioned gradient descent and that it acts as a relaxation of Lloyd’s algorithm for clustering. Our unification of gradient flows, mean shift, and MMD-optimal quantization yields algorithms that are more robust than state-of-the-art methods, as demonstrated via high-dimensional and multi-modal numerical experiments.
nan
Article 1530
Title@2025-05-26 (1): Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables
Title: Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables | Informationstheoretische Generalisierungsanalyse für VQ-VAEs: Eine Rolle latenter Variablen | VQ-VAEs 信息理论概括分析:隐性变量的作用 2505.19470v1 |
Authors: Futoshi Futami, Masahiro Fujisawa
Latent variables (LVs) play a crucial role in encoder-decoder models by enabling effective data compression, prediction, and generation. Although their theoretical properties, such as generalization, have been extensively studied in supervised learning, similar analyses for unsupervised models such as variational autoencoders (VAEs) remain insufficiently underexplored. In this work, we extend information-theoretic generalization analysis to vector-quantized (VQ) VAEs with discrete latent spaces, introducing a novel data-dependent prior to rigorously analyze the relationship among LVs, generalization, and data generation. We derive a novel generalization error bound of the reconstruction loss of VQ-VAEs, which depends solely on the complexity of LVs and the encoder, independent of the decoder. Additionally, we provide the upper bound of the 2-Wasserstein distance between the distributions of the true data and the generated data, explaining how the regularization of the LVs contributes to the data generation performance.
nan
Article 1531
Title@2025-05-26 (1): Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory
Title: Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory | Diversity-getriebene Generative Datensatzdestillation basierend auf Diffusionsmodell mit selbstadaptivem Speicher | 基于带有自适应内存的传播模型的传播模型的多样化生成数据集蒸馏 2505.19469v1 |
Authors: Mingzhuo Li, Guang Li, Jiafeng Mao, Takahiro Ogawa, Miki Haseyama
Dataset distillation enables the training of deep neural networks with comparable performance in significantly reduced time by compressing large datasets into small and representative ones. Although the introduction of generative models has made great achievements in this field, the distributions of their distilled datasets are not diverse enough to represent the original ones, leading to a decrease in downstream validation accuracy. In this paper, we present a diversity-driven generative dataset distillation method based on a diffusion model to solve this problem. We introduce self-adaptive memory to align the distribution between distilled and real datasets, assessing the representativeness. The degree of alignment leads the diffusion model to generate more diverse datasets during the distillation process. Extensive experiments show that our method outperforms existing state-of-the-art methods in most situations, proving its ability to tackle dataset distillation tasks.
nan
Article 1532
Title@2025-05-26 (1): Parrot: Multilingual Visual Instruction Tuning
Title: Parrot: Multilingual Visual Instruction Tuning | Papagei: Mehrsprachige visuelle Anleitung | Parrot: 多语言视觉教学图示 2406.02539v3 |
Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye
The rapid development of Multimodal Large Language Models (MLLMs), such as GPT-4o, marks a significant step toward artificial general intelligence. Existing methods typically align vision encoders with LLMs via supervised fine-tuning (SFT), but this often deteriorates their ability to handle multiple languages as training progresses. We empirically observe that imbalanced SFT datasets, largely English-centric, degrade performance on non-English languages due to the failure in multilingual token alignment. To address this, we propose PARROT, a novel approach that leverages textual guidance for visual token alignment at the language level. PARROT conditions visual tokens on diverse language inputs and uses Mixture-of-Experts (MoE) to align multilingual tokens. By computing cross-attention between initial visual features and textual embeddings, we select the most relevant experts, converting visual tokens into language-specific representations. Additionally, we introduce the Massive Multilingual Multimodal Benchmark (MMMB), a new benchmark comprising 6 languages, 15 categories, and 12,000 questions, to assess multilingual capabilities. PARROT achieves state-of-the-art performance on both the multilingual benchmarks and a wide range of multimodal tasks. Code and dataset are available at: https://github.com/AIDC-AI/Parrot
nan
Article 1533
Title@2025-05-26 (1): Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
Title: Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin | Auf dem Weg zum Ende der Ausbildung zur automatischen Spracherkennung für nigerianische Pidgin | 走向尼日利亚皮吉纳自动语音识别的端至端培训 2010.11123v2 |
Authors: Amina Mardiyyah Rufai, Afolabi Abeeb, Esther Oduntan, Tayo Arulogun, Oluwabukola Adegboro, Daniel Ajisafe
The prevalence of automatic speech recognition (ASR) systems in spoken language applications has increased significantly in recent years. Notably, many African languages lack sufficient linguistic resources to support the robustness of these systems. This paper focuses on the development of an end-to-end speech recognition system customized for Nigerian Pidgin English. We investigated and evaluated different pretrained state-of-the-art architectures on a new dataset. Our empirical results demonstrate a notable performance of the variant Wav2Vec2 XLSR-53 on our dataset, achieving a word error rate (WER) of 29.6% on the test set, surpassing other architectures such as NEMO QUARTZNET and Wav2Vec2.0 BASE-100H in quantitative assessments. Additionally, we demonstrate that pretrained state-of-the-art architectures do not work well out-of-the-box. We performed zero-shot evaluation using XLSR-English as the baseline, chosen for its similarity to Nigerian Pidgin. This yielded a higher WER of 73.7%. By adapting this architecture to nuances represented in our dataset, we reduce error by 59.84%. Our dataset comprises 4,288 recorded utterances from 10 native speakers, partitioned into training, validation, and test sets. This study underscores the potential for improving ASR systems for under-resourced languages like Nigerian Pidgin English, contributing to greater inclusion in speech technology applications. We publicly release our unique parallel dataset (speech-to-text) on Nigerian Pidgin, as well as the model weights on Hugging Face. Our code would be made available to foster future research from the community.
nan
Article 1534
Title@2025-05-26 (1): Decision Flow Policy Optimization
Title: Decision Flow Policy Optimization | Optimierung der Entscheidungsflusspolitik | 优化决策流程政策 2505.20350v1 |
Authors: Jifeng Hu, Sili Huang, Siyuan Guo, Zhaogeng Liu, Li Shen, Lichao Sun, Hechang Chen, Yi Chang, Dacheng Tao
In recent years, generative models have shown remarkable capabilities across diverse fields, including images, videos, language, and decision-making. By applying powerful generative models such as flow-based models to reinforcement learning, we can effectively model complex multi-modal action distributions and achieve superior robotic control in continuous action spaces, surpassing the limitations of single-modal action distributions with traditional Gaussian-based policies. Previous methods usually adopt the generative models as behavior models to fit state-conditioned action distributions from datasets, with policy optimization conducted separately through additional policies using value-based sample weighting or gradient-based updates. However, this separation prevents the simultaneous optimization of multi-modal distribution fitting and policy improvement, ultimately hindering the training of models and degrading the performance. To address this issue, we propose Decision Flow, a unified framework that integrates multi-modal action distribution modeling and policy optimization. Specifically, our method formulates the action generation procedure of flow-based models as a flow decision-making process, where each action generation step corresponds to one flow decision. Consequently, our method seamlessly optimizes the flow policy while capturing multi-modal action distributions. We provide rigorous proofs of Decision Flow and validate the effectiveness through extensive experiments across dozens of offline RL environments. Compared with established offline RL baselines, the results demonstrate that our method achieves or matches the SOTA performance.
nan
Article 1535
Title@2025-05-26 (1): Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs
Title: Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs | Herkunfts-Tracer: Eine Methode zur Erkennung von LoRA-Feinabstimmungs-Ursprungen in LLMs | 来源追踪器:用LLMM探测LORA精导来源的方法 2505.19466v1 |
Authors: Hongyu Liang, Yuting Zheng, Yihan Li, Yiran Zhang, Shiyu Liang
As large language models (LLMs) continue to advance, their deployment often involves fine-tuning to enhance performance on specific downstream tasks. However, this customization is sometimes accompanied by misleading claims about the origins, raising significant concerns about transparency and trust within the open-source community. Existing model verification techniques typically assess functional, representational, and weight similarities. However, these approaches often struggle against obfuscation techniques, such as permutations and scaling transformations. To address this limitation, we propose a novel detection method Origin-Tracer that rigorously determines whether a model has been fine-tuned from a specified base model. This method includes the ability to extract the LoRA rank utilized during the fine-tuning process, providing a more robust verification framework. This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning. We empirically validated our method on thirty-one diverse open-source models under conditions that simulate real-world obfuscation scenarios. We empirically analyze the effectiveness of our framework and finally, discuss its limitations. The results demonstrate the effectiveness of our approach and indicate its potential to establish new benchmarks for model verification.
nan
Article 1536
Title@2025-05-26 (1): Residual Cross-Attention Transformer-Based Multi-User CSI Feedback with Deep Joint Source-Channel Coding
Title: Residual Cross-Attention Transformer-Based Multi-User CSI Feedback with Deep Joint Source-Channel Coding | Residual Cross-Attention Transformer-basierte Multi-User CSI Feedback mit Deep Joint Source-Channel Coding | CSI 与深源-源-汇联合编码的反馈 2505.19465v1 |
Authors: Hengwei Zhang, Minghui Wu, Li Qiao, Ling Liu, Ziqi Han, Zhen Gao
This letter proposes a deep-learning (DL)-based multi-user channel state information (CSI) feedback framework for massive multiple-input multiple-output systems, where the deep joint source-channel coding (DJSCC) is utilized to improve the CSI reconstruction accuracy. Specifically, we design a multi-user joint CSI feedback framework, whereby the CSI correlation of nearby users is utilized to reduce the feedback overhead. Under the framework, we propose a new residual cross-attention transformer architecture, which is deployed at the base station to further improve the CSI feedback performance. Moreover, to tackle the “cliff-effect” of conventional bit-level CSI feedback approaches, we integrated DJSCC into the multi-user CSI feedback, together with utilizing a two-stage training scheme to adapt to varying uplink noise levels. Experimental results demonstrate the superiority of our methods in CSI feedback performance, with low network complexity and better scalability.
nan
Article 1537
Title@2025-05-26 (1): Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation
Title: Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation | Ihr Klassifikator kann mehr: Auf dem Weg zur Überbrückung der Lücken in Klassifizierung, Robustheit und Generation | 您的分类员可以做更多的事情: 缩小分类、强健和代际差距 2505.19459v1 |
Authors: Kaichao Jiang, He Wang, Xiaoshuai Hao, Xiulong Yang, Ajian Liu, Qi Chu, Yunfeng Diao
Joint Energy-based Models (JEMs), a class of hybrid generative-discriminative models, are well known for their ability to achieve both high classification accuracy and generative capability within a single model. However, their robustness still lags significantly behind the classifiers based adversarial training (AT). Conversely, while AT is currently the most effective approach to improving the classifier’s robustness, it typically sacrifices accuracy on clean data and lacks generative capability. The triple trade-off between classification accuracy, generative capability and robustness, raises a natural question: Can a single model simultaneously achieve high classification accuracy, adversarial robustness, and generative performance? – a goal that has been rarely explored. To address this question, we systematically analyze the energy distribution differences of clean, adversarial, and generated samples across various JEM variants and adversarially trained models. We observe that AT tends to reduce the energy gap between clean and adversarial samples, while JEMs reduce the gap between clean and synthetic ones. This observation suggests a key insight: if the energy distributions of all three data types can be aligned, we might unify the strengths of AT and JEMs, resolving their inherent trade-offs. Building on this idea, we propose Energy-based Joint Distribution Adversarial Training (EB-JDAT), to jointly model the clean data distribution, the adversarial distribution, and the classifier by maximizing their joint probability. EB-JDAT is a general and flexible optimization method, compatible with various JEM variants. Extensive experimental results demonstrate that EB-JDAT not only maintains near original accuracy and generative capability of JEMs, but also significantly enhances robustness, even surpassing state-of-the-art ATs.
nan
Article 1538
Title@2025-05-26 (1): Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
Title: Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians | Recurrent Self-Attention Dynamics: Eine energie-agnostische Perspektive von Jacobians | 《自我注意动态:雅各布人对能源不可知的视角》 2505.19458v1 |
Authors: Akiyoshi Tomihari, Ryo Karakida
The theoretical understanding of self-attention (SA) has been steadily progressing. A prominent line of work studies a class of SA layers that admit an energy function decreased by state updates. While it provides valuable insights into inherent biases in signal propagation, it often relies on idealized assumptions or additional constraints not necessarily present in standard SA. Thus, to broaden our understanding, this work aims to relax these energy constraints and provide an energy-agnostic characterization of inference dynamics by dynamical systems analysis. In more detail, we first consider relaxing the symmetry and single-head constraints traditionally required in energy-based formulations. Next, to investigate more general SA architectures capable of oscillatory dynamics without necessarily admitting an energy function, we analyze the Jacobian matrix of the state. We reveal that normalization layers effectively normalize the Jacobian’s complex eigenvalues, forcing the dynamics close to a critical state. This significantly enhances inference performance. Furthermore, we utilize the Jacobian perspective to develop regularization methods for training and a pseudo-energy for monitoring inference dynamics.
nan
Article 1539
Title@2025-05-26 (1): MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
Title: MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering | MM-Prompt: Cross-Modal Prompt Tuning zur kontinuierlichen visuellen Fragestellung | MM-Prompt: 用于持续视觉问答的跨模式快速测试 2505.19455v1 |
Authors: Xu Li, Fan Lyu
Continual Visual Question Answering (CVQA) based on pre-trained models(PTMs) has achieved promising progress by leveraging prompt tuning to enable continual multi-modal learning. However, most existing methods adopt cross-modal prompt isolation, constructing visual and textual prompts separately, which exacerbates modality imbalance and leads to degraded performance over time. To tackle this issue, we propose MM-Prompt, a novel framework incorporating cross-modal prompt query and cross-modal prompt recovery. The former enables balanced prompt selection by incorporating cross-modal signals during query formation, while the latter promotes joint prompt reconstruction through iterative cross-modal interactions, guided by an alignment loss to prevent representational drift. Extensive experiments show that MM-Prompt surpasses prior approaches in accuracy and knowledge retention, while maintaining balanced modality engagement throughout continual learning.
nan
Article 1540
Title@2025-05-26 (1): MetaGMT: Improving Actionable Interpretability of Graph Multilinear Networks via Meta-Learning Filtration
Title: MetaGMT: Improving Actionable Interpretability of Graph Multilinear Networks via Meta-Learning Filtration | MetaGMT: Durch Meta-Learning Filtration die Durchführbarkeit von Graphen-Multilinearen Netzwerken verbessern | MetGMT:通过Met-Learn Filtation改进图形多线网络可操作的解释性 2505.19445v1 |
Authors: Rishabh Bhattacharya, Hari Shankar, Vaishnavi Shivkumar, Ponnurangam Kumaraguru
The growing adoption of Graph Neural Networks (GNNs) in high-stakes domains like healthcare and finance demands reliable explanations of their decision-making processes. While inherently interpretable GNN architectures like Graph Multi-linear Networks (GMT) have emerged, they remain vulnerable to generating explanations based on spurious correlations, potentially undermining trust in critical applications. We present MetaGMT, a meta-learning framework that enhances explanation fidelity through a novel bi-level optimization approach. We demonstrate that MetaGMT significantly improves both explanation quality (AUC-ROC, Precision@K) and robustness to spurious patterns, across BA-2Motifs, MUTAG, and SP-Motif benchmarks. Our approach maintains competitive classification accuracy while producing more faithful explanations (with an increase up to 8% of Explanation ROC on SP-Motif 0.5) compared to baseline methods. These advancements in interpretability could enable safer deployment of GNNs in sensitive domains by (1) facilitating model debugging through more reliable explanations, (2) supporting targeted retraining when biases are identified, and (3) enabling meaningful human oversight. By addressing the critical challenge of explanation reliability, our work contributes to building more trustworthy and actionable GNN systems for real-world applications.
nan
Article 1541
Title@2025-05-26 (1): Discovering Forbidden Topics in Language Models
Title: Discovering Forbidden Topics in Language Models | Verbotene Themen in Sprachmodellen entdecken | 发现语言模型中的禁止专题 2505.17441v2 |
Authors: Can Rager, Chris Wendler, Rohit Gandikota, David Bau
Refusal discovery is the task of identifying the full set of topics that a language model refuses to discuss. We introduce this new problem setting and develop a refusal discovery method, LLM-crawler, that uses token prefilling to find forbidden topics. We benchmark the LLM-crawler on Tulu-3-8B, an open-source model with public safety tuning data. Our crawler manages to retrieve 31 out of 36 topics within a budget of 1000 prompts. Next, we scale the crawl to a frontier model using the prefilling option of Claude-Haiku. Finally, we crawl three widely used open-weight models: Llama-3.3-70B and two of its variants finetuned for reasoning: DeepSeek-R1-70B and Perplexity-R1-1776-70B. DeepSeek-R1-70B reveals patterns consistent with censorship tuning: The model exhibits “thought suppression” behavior that indicates memorization of CCP-aligned responses. Although Perplexity-R1-1776-70B is robust to censorship, LLM-crawler elicits CCP-aligned refusals answers in the quantized model. Our findings highlight the critical need for refusal discovery methods to detect biases, boundaries, and alignment failures of AI systems.
nan
Article 1542
Title@2025-05-26 (1): MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding
Title: MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding | MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding | MORE-Brain:可解释和可通用跨主题FMRI视觉解码专家有条不紊混合 2505.15946v2 |
Authors: Yuxiang Wei, Yanteng Zhang, Xi Xiao, Tianyang Wang, Xiao Wang, Vince D. Calhoun
Decoding visual experiences from fMRI offers a powerful avenue to understand human perception and develop advanced brain-computer interfaces. However, current progress often prioritizes maximizing reconstruction fidelity while overlooking interpretability, an essential aspect for deriving neuroscientific insight. To address this gap, we propose MoRE-Brain, a neuro-inspired framework designed for high-fidelity, adaptable, and interpretable visual reconstruction. MoRE-Brain uniquely employs a hierarchical Mixture-of-Experts architecture where distinct experts process fMRI signals from functionally related voxel groups, mimicking specialized brain networks. The experts are first trained to encode fMRI into the frozen CLIP space. A finetuned diffusion model then synthesizes images, guided by expert outputs through a novel dual-stage routing mechanism that dynamically weighs expert contributions across the diffusion process. MoRE-Brain offers three main advancements: First, it introduces a novel Mixture-of-Experts architecture grounded in brain network principles for neuro-decoding. Second, it achieves efficient cross-subject generalization by sharing core expert networks while adapting only subject-specific routers. Third, it provides enhanced mechanistic insight, as the explicit routing reveals precisely how different modeled brain regions shape the semantic and spatial attributes of the reconstructed image. Extensive experiments validate MoRE-Brain’s high reconstruction fidelity, with bottleneck analyses further demonstrating its effective utilization of fMRI signals, distinguishing genuine neural decoding from over-reliance on generative priors. Consequently, MoRE-Brain marks a substantial advance towards more generalizable and interpretable fMRI-based visual decoding. Code will be publicly available soon: https://github.com/yuxiangwei0808/MoRE-Brain.
nan
Article 1543
Title@2025-05-26 (1): RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features
Title: RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features | RDI: Eine gegnerische Robustheitsbewertungsmetrik für tiefe neuronale Netzwerke basierend auf modellstatistischen Merkmalen | RDI:基于示范统计特征的深神经网络对抗性强力评价标准 2504.18556v2 |
Authors: Jialei Song, Xingquan Zuo, Feiyang Wang, Hai Huang, Tianle Zhang
Deep neural networks (DNNs) are highly susceptible to adversarial samples, raising concerns about their reliability in safety-critical tasks. Currently, methods of evaluating adversarial robustness are primarily categorized into attack-based and certified robustness evaluation approaches. The former not only relies on specific attack algorithms but also is highly time-consuming, while the latter due to its analytical nature, is typically difficult to implement for large and complex models. A few studies evaluate model robustness based on the model’s decision boundary, but they suffer from low evaluation accuracy. To address the aforementioned issues, we propose a novel adversarial robustness evaluation metric, Robustness Difference Index (RDI), which is based on model statistical features. RDI draws inspiration from clustering evaluation by analyzing the intra-class and inter-class distances of feature vectors separated by the decision boundary to quantify model robustness. It is attack-independent and has high computational efficiency. Experiments show that, RDI demonstrates a stronger correlation with the gold-standard adversarial robustness metric of attack success rate (ASR). The average computation time of RDI is only 1/30 of the evaluation method based on the PGD attack. Our open-source code is available at: https://github.com/BUPTAIOC/RDI.
nan
Article 1544
Title@2025-05-26 (1): Fairness Practices in Industry: A Case Study in Machine Learning Teams Building Recommender Systems
Title: Fairness Practices in Industry: A Case Study in Machine Learning Teams Building Recommender Systems | Fairness Practices in der Industrie: Eine Fallstudie in Machine Learning Teams Bau von Recommender Systemen | 工业公平做法:机械学习小组建立建议系统个案研究 2505.19441v1 |
Authors: Jing Nathan Yan, Junxiong Wang, Jeffrey M. Rzeszotarski, Allison Koenecke
The rapid proliferation of recommender systems necessitates robust fairness practices to address inherent biases. Assessing fairness, though, is challenging due to constantly evolving metrics and best practices. This paper analyzes how industry practitioners perceive and incorporate these changing fairness standards in their workflows. Through semi-structured interviews with 11 practitioners from technical teams across a range of large technology companies, we investigate industry implementations of fairness in recommendation system products. We focus on current debiasing practices, applied metrics, collaborative strategies, and integrating academic research into practice. Findings show a preference for multi-dimensional debiasing over traditional demographic methods, and a reliance on intuitive rather than academic metrics. This study also highlights the difficulties in balancing fairness with both the practitioner’s individual (bottom-up) roles and organizational (top-down) workplace constraints, including the interplay with legal and compliance experts. Finally, we offer actionable recommendations for the recommender system community and algorithmic fairness practitioners, underlining the need to refine fairness practices continually.
nan
Article 1545
Title@2025-05-26 (1): The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
Title: The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models | Die Geburt des Wissens: Emergente Funktionen über Zeit, Raum und Maßstab in großen Sprachmodellen | 知识的诞生:跨越时间、空间和大语言模型规模的新兴特征 2505.19440v1 |
Authors: Shashata Sawmya, Micah Adler, Nir Shavit
This paper studies the emergence of interpretable categorical features within large language models (LLMs), analyzing their behavior across training checkpoints (time), transformer layers (space), and varying model sizes (scale). Using sparse autoencoders for mechanistic interpretability, we identify when and where specific semantic concepts emerge within neural activations. Results indicate clear temporal and scale-specific thresholds for feature emergence across multiple domains. Notably, spatial analysis reveals unexpected semantic reactivation, with early-layer features re-emerging at later layers, challenging standard assumptions about representational dynamics in transformer models.
nan
Article 1546
Title@2025-05-26 (1): Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Title: Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression | Kann komprimierte LLMs wirklich handeln? Eine empirische Bewertung der Agentischen Fähigkeiten in der LLM-Kompression | 能否压缩LLM Really Act? 对LLM Actrables in LLM Corpression的代理能力进行经验评估。 2505.19433v1 |
Authors: Peijie Dong, Zhenheng Tang, Xiang Liu, Lujun Li, Xiaowen Chu, Bo Li
Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks only focus on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities - workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs’ agentic abilities. ACBench spans (1) 12 tasks across 4 capabilities (e.g., WorfBench for workflow generation, Needle-in-Haystack for long-context retrieval), (2) quantization (GPTQ, AWQ) and pruning (Wanda, SparseGPT), and (3) 15 models, including small (Gemma-2B), standard (Qwen2.5 7B-32B), and distilled reasoning LLMs (DeepSeek-R1-Distill). Our experiments reveal compression tradeoffs: 4-bit quantization preserves workflow generation and tool use (1%-3% drop) but degrades real-world application accuracy by 10%-15%. We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis. ACBench provides actionable insights for optimizing LLM compression in agentic scenarios. The code can be found in https://github.com/pprp/ACBench.
nan
Article 1547
Title@2025-05-26 (1): Advanced long-term earth system forecasting by learning the small-scale nature
Title: Advanced long-term earth system forecasting by learning the small-scale nature | Fortschrittliche Langzeitprognosen des Erdsystems durch Erlernen der kleinmaßstäblichen Natur | 学习小规模性质,进行高级长期地球系统预测 2505.19432v1 |
Authors: Hao Wu, Yuan Gao, Ruiqi Shu, Kun Wang, Ruijian Gou, Chuhan Wu, Xinliang Liu, Juncai He, Shuhao Cao, Junfeng Fang, Xingjian Shi, Feng Tao, Qi Song, Shengxuan Ji, Yanfei Xiang, Yuze Sun, Jiahao Li, Fan Xu, Huanshuo Dong, Haixin Wang, Fan Zhang, Penghao Zhao, Xian Wu, Qingsong Wen, Deliang Chen, Xiaomeng Huang
Reliable long-term forecast of Earth system dynamics is heavily hampered by instabilities in current AI models during extended autoregressive simulations. These failures often originate from inherent spectral bias, leading to inadequate representation of critical high-frequency, small-scale processes and subsequent uncontrolled error amplification. We present Triton, an AI framework designed to address this fundamental challenge. Inspired by increasing grids to explicitly resolve small scales in numerical models, Triton employs a hierarchical architecture processing information across multiple resolutions to mitigate spectral bias and explicitly model cross-scale dynamics. We demonstrate Triton’s superior performance on challenging forecast tasks, achieving stable year-long global temperature forecasts, skillful Kuroshio eddy predictions till 120 days, and high-fidelity turbulence simulations preserving fine-scale structures all without external forcing, with significantly surpassing baseline AI models in long-term stability and accuracy. By effectively suppressing high-frequency error accumulation, Triton offers a promising pathway towards trustworthy AI-driven simulation for climate and earth system science.
nan
Article 1548
Title@2025-05-26 (1): Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage
Title: Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage | Bedeutung Gewichteter Score passend für Diffusion Sampler mit erweiterten Modus Abdeckung | 具有强化模式覆盖率的传播采样器比对重要加权分数 2505.19431v1 |
Authors: Chenguang Wang, Xiaoyu Zhang, Kaiyuan Cui, Weichen Zhao, Yongtao Guan, Tianshu Yu
Training neural samplers directly from unnormalized densities without access to target distribution samples presents a significant challenge. A critical desideratum in these settings is achieving comprehensive mode coverage, ensuring the sampler captures the full diversity of the target distribution. However, prevailing methods often circumvent the lack of target data by optimizing reverse KL-based objectives. Such objectives inherently exhibit mode-seeking behavior, potentially leading to incomplete representation of the underlying distribution. While alternative approaches strive for better mode coverage, they typically rely on implicit mechanisms like heuristics or iterative refinement. In this work, we propose a principled approach for training diffusion-based samplers by directly targeting an objective analogous to the forward KL divergence, which is conceptually known to encourage mode coverage. We introduce \textit{Importance Weighted Score Matching}, a method that optimizes this desired mode-covering objective by re-weighting the score matching loss using tractable importance sampling estimates, thereby overcoming the absence of target distribution data. We also provide theoretical analysis of the bias and variance for our proposed Monte Carlo estimator and the practical loss function used in our method. Experiments on increasingly complex multi-modal distributions, including 2D Gaussian Mixture Models with up to 120 modes and challenging particle systems with inherent symmetries – demonstrate that our approach consistently outperforms existing neural samplers across all distributional distance metrics, achieving state-of-the-art results on all benchmarks.
nan
Article 1549
Title@2025-05-26 (1): MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision
Title: MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision | MAS-ZERO: Konzipieren von Multi-Agenten-Systemen mit Zero Supervision | MAS-ZERO: 设计无监督的多机构系统 2505.14996v2 |
Authors: Zixuan Ke, Austin Xu, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty
Multi-agent systems (MAS) leveraging the impressive capabilities of Large Language Models (LLMs) hold significant potential for tackling complex tasks. However, most current MAS depend on manually designed agent roles and communication protocols. These manual designs often fail to align with the underlying LLMs’ strengths and struggle to adapt to novel tasks. Recent automatic MAS approaches attempt to mitigate these limitations but typically necessitate a validation set for tuning and yield static MAS designs lacking adaptability during inference. We introduce MAS-ZERO, the first self-evolved, inference-time framework for automatic MAS design. MAS-ZERO employs meta-level design to iteratively generate, evaluate, and refine MAS configurations tailored to each problem instance, without requiring a validation set. Critically, it enables dynamic agent composition and problem decomposition through meta-feedback on solvability and completeness. Experiments across math, graduate-level QA, and software engineering benchmarks, using both closed-source and open-source LLM backbones of varying sizes, demonstrate that MAS-ZERO outperforms both manual and automatic MAS baselines, achieving a 7.44% average accuracy improvement over the next strongest baseline while maintaining cost-efficiency. These findings underscore the promise of meta-level self-evolved design for creating effective and adaptive MAS.
nan
Article 1550
Title@2025-05-26 (1): WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
Title: WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | WINA: Gewichtsinformierte Neuronen-Aktivierung zur Beschleunigung der Large Language Model Inferenz | WINA: 加速大语言模型推断:超速超高语言速变 速超速超时超高电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 速 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 电 2505.19427v1 |
Authors: Sihan Chen, Dan Zhao, Jongwoo Ko, Colby Banbury, Huiping Zhuang, Luming Liang, Tianyi Chen
The growing computational demands of large language models (LLMs) make efficient inference and activation strategies increasingly critical. While recent approaches, such as Mixture-of-Experts (MoE), leverage selective activation but require specialized training, training-free sparse activation methods offer broader applicability and superior resource efficiency through their plug-and-play design. However, many existing methods rely solely on hidden state magnitudes to determine activation, resulting in high approximation errors and suboptimal inference accuracy. To address these limitations, we propose WINA (Weight Informed Neuron Activation), a novel, simple, and training-free sparse activation framework that jointly considers hidden state magnitudes and the column-wise $\ell_2$-norms of weight matrices. We show that this leads to a sparsification strategy that obtains optimal approximation error bounds with theoretical guarantees tighter than existing techniques. Empirically, WINA also outperforms state-of-the-art methods (e.g., TEAL) by up to $2.94\%$ in average performance at the same sparsity levels, across a diverse set of LLM architectures and datasets. These results position WINA as a new performance frontier for training-free sparse activation in LLM inference, advancing training-free sparse activation methods and setting a robust baseline for efficient inference. The source code is available at https://github.com/microsoft/wina.
nan
Article 1551
Title@2025-05-26 (1): The Role of Diversity in In-Context Learning for Large Language Models
Title: The Role of Diversity in In-Context Learning for Large Language Models | Die Rolle der Vielfalt im In-Context-Lernen für große Sprachmodelle | 多样性在为大语言模式进行内文学习方面的作用 2505.19426v1 |
Authors: Wenyang Xiao, Haoyu Zhao, Lingxiao Huang
In-context learning (ICL) is a crucial capability of current large language models (LLMs), where the selection of examples plays a key role in performance. While most existing approaches focus on selecting the most similar examples to the query, the impact of diversity in example selection remains underexplored. We systematically investigate the role of diversity in in-context example selection through experiments across a range of tasks, from sentiment classification to more challenging math and code problems. Experiments on Llama-3.1, Gemma-2, and Mistral-v0.3 families of models show that diversity-aware selection methods improve performance, particularly on complex tasks like math and code, and enhance robustness to out-of-distribution queries. To support these findings, we introduce a theoretical framework that explains the benefits of incorporating diversity in in-context example selection.
nan
Article 1552
Title@2025-05-26 (1): Structure Disruption: Subverting Malicious Diffusion-Based Inpainting via Self-Attention Query Perturbation
Title: Structure Disruption: Subverting Malicious Diffusion-Based Inpainting via Self-Attention Query Perturbation | Strukturstörung: Verringern von bösartiger Diffusions-basierter Inpainting durch Selbstaufmerksamkeit Abfrage Störung | 结构混乱:通过自控查询干扰来改变恶意扩散的涂漆 2505.19425v1 |
Authors: Yuhao He, Jinyu Tian, Haiwei Wu, Jianqing Li
The rapid advancement of diffusion models has enhanced their image inpainting and editing capabilities but also introduced significant societal risks. Adversaries can exploit user images from social media to generate misleading or harmful content. While adversarial perturbations can disrupt inpainting, global perturbation-based methods fail in mask-guided editing tasks due to spatial constraints. To address these challenges, we propose Structure Disruption Attack (SDA), a powerful protection framework for safeguarding sensitive image regions against inpainting-based editing. Building upon the contour-focused nature of self-attention mechanisms of diffusion models, SDA optimizes perturbations by disrupting queries in self-attention during the initial denoising step to destroy the contour generation process. This targeted interference directly disrupts the structural generation capability of diffusion models, effectively preventing them from producing coherent images. We validate our motivation through visualization techniques and extensive experiments on public datasets, demonstrating that SDA achieves state-of-the-art (SOTA) protection performance while maintaining strong robustness.
nan
Article 1553
Title@2025-05-26 (1): Each Graph is a New Language: Graph Learning with LLMs
Title: Each Graph is a New Language: Graph Learning with LLMs | Jeder Graph ist eine neue Sprache: Graph Learning mit LLMs | 每图都是一种新语言:用LLMM学习图表 2501.11478v3 |
Authors: Huachi Zhou, Jiahe Du, Chuang Zhou, Chang Yang, Yilin Xiao, Yuxuan Xie, Xiao Huang
Recent efforts leverage Large Language Models (LLMs) for modeling text-attributed graph structures in node classification tasks. These approaches describe graph structures for LLMs to understand or aggregate LLM-generated textual attribute embeddings through graph structure. However, these approaches face two main limitations in modeling graph structures with LLMs. (i) Graph descriptions become verbose in describing high-order graph structure. (ii) Textual attributes alone do not contain adequate graph structure information. It is challenging to model graph structure concisely and adequately with LLMs. LLMs lack built-in mechanisms to model graph structures directly. They also struggle with complex long-range dependencies between high-order nodes and target nodes. Inspired by the observation that LLMs pre-trained on one language can achieve exceptional performance on another with minimal additional training, we propose \textbf{G}raph-\textbf{D}efined \textbf{L}anguage for \textbf{L}arge \textbf{L}anguage \textbf{M}odel (GDL4LLM). This novel framework enables LLMs to transfer their powerful language understanding capabilities to graph-structured data. GDL4LLM translates graphs into a graph language corpus instead of graph descriptions and pre-trains LLMs on this corpus to adequately understand graph structures. During fine-tuning, this corpus describes the structural information of target nodes concisely with only a few tokens. By treating graphs as a new language, GDL4LLM enables LLMs to model graph structures adequately and concisely for node classification tasks. Extensive experiments on three real-world datasets demonstrate that GDL4LLM outperforms description-based and textual attribute embeddings-based baselines by efficiently modeling different orders of graph structure with LLMs.
nan
Article 1554
Title@2025-05-26 (1): Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Title: Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift | Im Moment falsch dann: Nicht-Stationäre Direktpräferenz-Optimierung unter Preference Drift | 右,右,错误 然后: 非标准直接首选优化 在偏好驱动器下 2407.18676v2 |
Authors: Seongho Son, William Bankes, Sayak Ray Chowdhury, Brooks Paige, Ilija Bogunovic
Reinforcement learning from human feedback (RLHF) aligns Large Language Models (LLMs) with human preferences. However, these preferences can often change over time due to external factors (e.g. environment change and societal influence). Consequently, what was wrong then might be right now. Current preference optimization algorithms do not account for temporal preference drift in their modeling, which can lead to severe misalignment. To address this limitation, we use a Dynamic Bradley-Terry model that models preferences via time-dependent reward functions, and propose Non-Stationary Direct Preference Optimisation (NS-DPO). By introducing a discount parameter in the loss function, NS-DPO applies exponential weighting, which proportionally focuses learning on more time-relevant datapoints. We theoretically analyse the convergence of NS-DPO in the offline setting, providing upper bounds on the estimation error caused by non-stationary preferences. Finally, we demonstrate the effectiveness of NS-DPO for fine-tuning LLMs in scenarios with drifting preferences. By simulating preference drift using renowned reward models and modifying popular LLM datasets accordingly, we show that NS-DPO fine-tuned LLMs remain robust under non-stationarity, significantly outperforming baseline algorithms that ignore temporal preference changes, without sacrificing performance in stationary cases.
nan
Article 1555
Title@2025-05-26 (1): SaVe-TAG: Semantic-aware Vicinal Risk Minimization for Long-Tailed Text-Attributed Graphs
Title: SaVe-TAG: Semantic-aware Vicinal Risk Minimization for Long-Tailed Text-Attributed Graphs | SaVe-TAG: Semantisch-bewusst Vicinal Risk Minimierung für langgestreckte Text-Attribute Graphen | SaVe-TAG: 长途脱轨文本可归图解析相邻风险最小化 2410.16882v3 |
Authors: Leyao Wang, Yu Wang, Bo Ni, Yuying Zhao, Hanyu Wang, Yao Ma, Tyler Derr
Real-world graph data often follows long-tailed distributions, making it difficult for Graph Neural Networks (GNNs) to generalize well across both head and tail classes. Recent advances in Vicinal Risk Minimization (VRM) have shown promise in mitigating class imbalance with numeric interpolation; however, existing approaches largely rely on embedding-space arithmetic, which fails to capture the rich semantics inherent in text-attributed graphs. In this work, we propose our method, SaVe-TAG (Semantic-aware Vicinal Risk Minimization for Long-Tailed Text-Attributed Graphs), a novel VRM framework that leverages Large Language Models (LLMs) to perform text-level interpolation, generating on-manifold, boundary-enriching synthetic samples for minority classes. To mitigate the risk of noisy generation, we introduce a confidence-based edge assignment mechanism that uses graph topology as a natural filter to ensure structural consistency. We provide theoretical justification for our method and conduct extensive experiments on benchmark datasets, showing that our approach consistently outperforms both numeric interpolation and prior long-tailed node classification baselines. Our results highlight the importance of integrating semantic and structural signals for balanced and effective learning on text-attributed graphs.
nan
Article 1556
Title@2025-05-26 (1): Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling
Title: Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling | Streng eingeschränkte generative Modellierung über Split Augmented Langevin Sampling | 通过分分扩大Langevin抽样进行严格约束的生成模型模拟 2505.18017v2 |
Authors: Matthieu Blanke, Yongquan Qu, Sara Shamekh, Pierre Gentine
Deep generative models hold great promise for representing complex physical systems, but their deployment is currently limited by the lack of guarantees on the physical plausibility of the generated outputs. Ensuring that known physical constraints are enforced is therefore critical when applying generative models to scientific and engineering problems. We address this limitation by developing a principled framework for sampling from a target distribution while rigorously satisfying physical constraints. Leveraging the variational formulation of Langevin dynamics, we propose Split Augmented Langevin (SAL), a novel primal-dual sampling algorithm that enforces constraints progressively through variable splitting, with convergence guarantees. While the method is developed theoretically for Langevin dynamics, we demonstrate its effective applicability to diffusion models. In particular, we use constrained diffusion models to generate physical fields satisfying energy and mass conservation laws. We apply our method to diffusion-based data assimilation on a complex physical system, where enforcing physical constraints substantially improves both forecast accuracy and the preservation of critical conserved quantities. We also demonstrate the potential of SAL for challenging feasibility problems in optimal control.
nan
Article 1557
Title@2025-05-26 (1): Toward Physics-Informed Machine Learning for Data Center Operations: A Tropical Case Study
Title: Toward Physics-Informed Machine Learning for Data Center Operations: A Tropical Case Study | Auf dem Weg zum physikinformierten maschinellen Lernen für Rechenzentrumsoperationen: Eine Tropische Fallstudie | 争取为数据中心业务进行物理一体化机械学习:热带案例研究 2505.19414v1 |
Authors: Ruihang Wang, Zhiwei Cao, Qingang Zhang, Rui Tan, Yonggang Wen, Tommy Leung, Stuart Kennedy, Justin Teoh
Data centers are the backbone of computing capacity. Operating data centers in the tropical regions faces unique challenges due to consistently high ambient temperature and elevated relative humidity throughout the year. These conditions result in increased cooling costs to maintain the reliability of the computing systems. While existing machine learning-based approaches have demonstrated potential to elevate operations to a more proactive and intelligent level, their deployment remains dubious due to concerns about model extrapolation capabilities and associated system safety issues. To address these concerns, this article proposes incorporating the physical characteristics of data centers into traditional data-driven machine learning solutions. We begin by introducing the data center system, including the relevant multiphysics processes and the data-physics availability. Next, we outline the associated modeling and optimization problems and propose an integrated, physics-informed machine learning system to address them. Using the proposed system, we present relevant applications across varying levels of operational intelligence. A case study on an industry-grade tropical data center is provided to demonstrate the effectiveness of our approach. Finally, we discuss key challenges and highlight potential future directions.
nan
Article 1558
Title@2025-05-26 (1): Future Link Prediction Without Memory or Aggregation
Title: Future Link Prediction Without Memory or Aggregation | Zukünftige Link-Vorhersage ohne Gedächtnis oder Aggregation | 没有记忆或聚合的未来联系预测 2505.19408v1 |
Authors: Lu Yi, Runlin Lei, Fengran Mo, Yanping Zheng, Zhewei Wei, Yuhang Ye
Future link prediction on temporal graphs is a fundamental task with wide applicability in real-world dynamic systems. These scenarios often involve both recurring (seen) and novel (unseen) interactions, requiring models to generalize effectively across both types of edges. However, existing methods typically rely on complex memory and aggregation modules, yet struggle to handle unseen edges. In this paper, we revisit the architecture of existing temporal graph models and identify two essential but overlooked modeling requirements for future link prediction: representing nodes with unique identifiers and performing target-aware matching between source and destination nodes. To this end, we propose Cross-Attention based Future Link Predictor on Temporal Graphs (CRAFT), a simple yet effective architecture that discards memory and aggregation modules and instead builds on two components: learnable node embeddings and cross-attention between the destination and the source’s recent interactions. This design provides strong expressive power and enables target-aware modeling of the compatibility between candidate destinations and the source’s interaction patterns. Extensive experiments on diverse datasets demonstrate that CRAFT consistently achieves superior performance with high efficiency, making it well-suited for large-scale real-world applications.
nan
Article 1559
Title@2025-05-26 (1): FedHERO: A Federated Learning Approach for Node Classification Task on Heterophilic Graphs
Title: FedHERO: A Federated Learning Approach for Node Classification Task on Heterophilic Graphs | FedHERO: Ein Federated Learning Approach für Knotenklassifikation Aufgaben auf heterophilen Graphen | FEFHERO: 异生物图节点分类任务联邦学习方法 2504.21206v2 |
Authors: Zihan Chen, Xingbo Fu, Yushun Dong, Jundong Li, Cong Shen
Federated Graph Learning (FGL) empowers clients to collaboratively train Graph neural networks (GNNs) in a distributed manner while preserving data privacy. However, FGL methods usually require that the graph data owned by all clients is homophilic to ensure similar neighbor distribution patterns of nodes. Such an assumption ensures that the learned knowledge is consistent across the local models from all clients. Therefore, these local models can be properly aggregated as a global model without undermining the overall performance. Nevertheless, when the neighbor distribution patterns of nodes vary across different clients (e.g., when clients hold graphs with different levels of heterophily), their local models may gain different and even conflict knowledge from their node-level predictive tasks. Consequently, aggregating these local models usually leads to catastrophic performance deterioration on the global model. To address this challenge, we propose FedHERO, an FGL framework designed to harness and share insights from heterophilic graphs effectively. At the heart of FedHERO is a dual-channel GNN equipped with a structure learner, engineered to discern the structural knowledge encoded in the local graphs. With this specialized component, FedHERO enables the local model for each client to identify and learn patterns that are universally applicable across graphs with different patterns of node neighbor distributions. FedHERO not only enhances the performance of individual client models by leveraging both local and shared structural insights but also sets a new precedent in this field to effectively handle graph data with various node neighbor distribution patterns. We conduct extensive experiments to validate the superior performance of FedHERO against existing alternatives.
nan
Article 1560
Title@2025-05-26 (1): Exploring the Possibility of TypiClust for Low-Budget Federated Active Learning
Title: Exploring the Possibility of TypiClust for Low-Budget Federated Active Learning | Erforschung der Möglichkeit des TypiClusts für budgetarmes, föderiertes aktives Lernen | 探讨低预算联邦积极学习的TypiClust 2505.19404v1 |
Authors: Yuta Ono, Hiroshi Nakamura, Hideki Takase
Federated Active Learning (FAL) seeks to reduce the burden of annotation under the realistic constraints of federated learning by leveraging Active Learning (AL). As FAL settings make it more expensive to obtain ground truth labels, FAL strategies that work well in low-budget regimes, where the amount of annotation is very limited, are needed. In this work, we investigate the effectiveness of TypiClust, a successful low-budget AL strategy, in low-budget FAL settings. Our empirical results show that TypiClust works well even in low-budget FAL settings contrasted with relatively low performances of other methods, although these settings present additional challenges, such as data heterogeneity, compared to AL. In addition, we show that FAL settings cause distribution shifts in terms of typicality, but TypiClust is not very vulnerable to the shifts. We also analyze the sensitivity of TypiClust to feature extraction methods, and it suggests a way to perform FAL even in limited data situations.
nan
Article 1561
Title@2025-05-26 (1): KHRONOS: a Kernel-Based Neural Architecture for Rapid, Resource-Efficient Scientific Computation
Title: KHRONOS: a Kernel-Based Neural Architecture for Rapid, Resource-Efficient Scientific Computation | KHRONOS: Eine Kernel-basierte Neuralarchitektur für schnelle, ressourceneffiziente wissenschaftliche Berechnung | KHRONOS:一个以核心为基础的神经结构,用于快速、资源高效科学计算 2505.13315v2 |
Authors: Reza T. Batley, Sourav Saha
Contemporary models of high dimensional physical systems are constrained by the curse of dimensionality and a reliance on dense data. We introduce KHRONOS (Kernel Expansion Hierarchy for Reduced Order, Neural Optimized Surrogates), an AI framework for model based, model free and model inversion tasks. KHRONOS constructs continuously differentiable target fields with a hierarchical composition of per-dimension kernel expansions, which are tensorized into modes and then superposed. We evaluate KHRONOS on a canonical 2D, Poisson equation benchmark: across 16 to 512 degrees of freedom (DoFs), it obtained L_2-square errors of 5e-4 down to 6e-11. This represents a greater than 100-fold gain over Kolmogorov Arnold Networks (which itself reports a 100 times improvement on MLPs/PINNs with 100 times fewer parameters) when controlling for the number of parameters. This also represents a 1e6-fold improvement in L_2-square error compared to standard linear FEM at comparable DoFs. Inference complexity is dominated by inner products, yielding sub-millisecond full-field predictions that scale to an arbitrary resolution. For inverse problems, KHRONOS facilitates rapid, iterative level set recovery in only a few forward evaluations, with sub-microsecond per sample latency. KHRONOS’s scalability, expressivity, and interpretability open new avenues in constrained edge computing, online control, computer vision, and beyond.
nan
Article 1562
Title@2025-05-26 (1): Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs
Title: Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs | Können LLMs helfen, Erkenntnisse über LLMs zu enthüllen? Eine groß angelegte, sich entwickelnde Literaturanalyse von Frontier LLMs | LLMs 帮助发现关于LLM的见识? 大型、不断发展的前沿LMS文学分析 2502.18791v3 |
Authors: Jungsoo Park, Junmo Kang, Gabriel Stanovsky, Alan Ritter
The surge of LLM studies makes synthesizing their findings challenging. Analysis of experimental results from literature can uncover important trends across studies, but the time-consuming nature of manual data extraction limits its use. Our study presents a semi-automated approach for literature analysis that accelerates data extraction using LLMs. It automatically identifies relevant arXiv papers, extracts experimental results and related attributes, and organizes them into a structured dataset, LLMEvalDB. We then conduct an automated literature analysis of frontier LLMs, reducing the effort of paper surveying and data extraction by more than 93% compared to manual approaches. We validate LLMEvalDB by showing that it reproduces key findings from a recent manual analysis of Chain-of-Thought (CoT) reasoning and also uncovers new insights that go beyond it, showing, for example, that in-context examples benefit coding & multimodal tasks but offer limited gains in math reasoning tasks compared to zero-shot CoT. Our automatically updatable dataset enables continuous tracking of target models by extracting evaluation studies as new data becomes available. Through LLMEvalDB and empirical analysis, we provide insights into LLMs while facilitating ongoing literature analyses of their behavior.
nan
Article 1563
Title@2025-05-26 (1): Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
Title: Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent | Auf dem Weg zum Verständnis der Verallgemeinerbarkeit des verzögerten stochastischen Absinkens | 了解拖延的拖延的逐步后世后代的普遍适用性 2308.09430v4 |
Authors: Xiaoge Deng, Li Shen, Shengwei Li, Tao Sun, Dongsheng Li, Dacheng Tao
Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay $\tau$. Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of $\tilde{\mathcal{O}}(\frac{T-\tau}{n\tau})$ and $\tilde{\mathcal{O}}(\frac{1}{n})$ for quadratic convex and strongly convex problems, respectively, where $T$ refers to the iteration number and $n$ is the amount of training data. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm. Analogous analysis can be generalized to the random delay setting, and the experimental results validate our theoretical findings.
nan
Article 1564
Title@2025-05-26 (1): Are Time-Series Foundation Models Deployment-Ready? A Systematic Study of Adversarial Robustness Across Domains
Title: Are Time-Series Foundation Models Deployment-Ready? A Systematic Study of Adversarial Robustness Across Domains | Sind Time-Series-Stiftungsmodelle bereit? Eine systematische Studie über die widerrechtliche Robustheit über Domains hinweg | 时间-系列基金会的模型是部署-准备模型吗? 2505.19397v1 |
Authors: Jiawen Zhang, Zhenwei Zhang, Shun Zheng, Xumeng Wen, Jia Li, Jiang Bian
Time Series Foundation Models (TSFMs), which are pretrained on large-scale, cross-domain data and capable of zero-shot forecasting in new scenarios without further training, are increasingly adopted in real-world applications. However, as the zero-shot forecasting paradigm gets popular, a critical yet overlooked question emerges: Are TSFMs robust to adversarial input perturbations? Such perturbations could be exploited in man-in-the-middle attacks or data poisoning. To address this gap, we conduct a systematic investigation into the adversarial robustness of TSFMs. Our results show that even minimal perturbations can induce significant and controllable changes in forecast behaviors, including trend reversal, temporal drift, and amplitude shift, posing serious risks to TSFM-based services. Through experiments on representative TSFMs and multiple datasets, we reveal their consistent vulnerabilities and identify potential architectural designs, such as structural sparsity and multi-task pretraining, that may improve robustness. Our findings offer actionable guidance for designing more resilient forecasting systems and provide a critical assessment of the adversarial robustness of TSFMs.
nan
Article 1565
Title@2025-05-26 (1): Uniform convergence of the smooth calibration error and its relationship with functional gradient
Title: Uniform convergence of the smooth calibration error and its relationship with functional gradient | Einheitliche Konvergenz des glatten Kalibrierfehlers und seines Verhältnisses mit dem funktionellen Gradienten | 平稳校准误差及其与功能梯度的关系统一汇合 2505.19396v1 |
Authors: Futoshi Futami, Atsushi Nitanda
Calibration is a critical requirement for reliable probabilistic prediction, especially in high-risk applications. However, the theoretical understanding of which learning algorithms can simultaneously achieve high accuracy and good calibration remains limited, and many existing studies provide empirical validation or a theoretical guarantee in restrictive settings. To address this issue, in this work, we focus on the smooth calibration error (CE) and provide a uniform convergence bound, showing that the smooth CE is bounded by the sum of the smooth CE over the training dataset and a generalization gap. We further prove that the functional gradient of the loss function can effectively control the training smooth CE. Based on this framework, we analyze three representative algorithms: gradient boosting trees, kernel boosting, and two-layer neural networks. For each, we derive conditions under which both classification and calibration performances are simultaneously guaranteed. Our results offer new theoretical insights and practical guidance for designing reliable probabilistic models with provable calibration guarantees.
nan
Article 1566
Title@2025-05-26 (1): Towards the Causal Complete Cause of Multi-Modal Representation Learning
Title: Towards the Causal Complete Cause of Multi-Modal Representation Learning | Auf dem Weg zur kausalen vollständigen Ursache des multi-Modalen Repräsentationslernens | 走向多模式代表制学习的事业完全原因 2407.14058v6 |
Authors: Jingyao Wang, Siyu Zhao, Wenwen Qiang, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong
Multi-Modal Learning (MML) aims to learn effective representations across modalities for accurate predictions. Existing methods typically focus on modality consistency and specificity to learn effective representations. However, from a causal perspective, they may lead to representations that contain insufficient and unnecessary information. To address this, we propose that effective MML representations should be causally sufficient and necessary. Considering practical issues like spurious correlations and modality conflicts, we relax the exogeneity and monotonicity assumptions prevalent in prior works and explore the concepts specific to MML, i.e., Causal Complete Cause $C^3$. We begin by defining $C^3$, which quantifies the probability of representations being causally sufficient and necessary. We then discuss the identifiability of $C^3$ and introduce an instrumental variable to support identifying $C^3$ with non-exogeneity and non-monotonicity. Building on this, we conduct the $C^3$ measurement, i.e., (C^3) risk. We propose a twin network to estimate it through (i) the real-world branch: utilizing the instrumental variable for sufficiency, and (ii) the hypothetical-world branch: applying gradient-based counterfactual modeling for necessity. Theoretical analyses confirm its reliability. Based on these results, we propose $C^3$ Regularization, a plug-and-play method that enforces the causal completeness of the learned representations by minimizing $C^3$ risk. Extensive experiments demonstrate its effectiveness.
nan
Article 1567
Title@2025-05-26 (1): Alignment of large language models with constrained learning
Title: Alignment of large language models with constrained learning | Ausrichtung großer Sprachmodelle mit eingeschränktem Lernen | 大型语言模式与限制学习的结合 2505.19387v1 |
Authors: Botong Zhang, Shuo Li, Ignacio Hounie, Osbert Bastani, Dongsheng Ding, Alejandro Ribeiro
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem, where the goal is to maximize a primary reward objective while satisfying constraints on secondary utilities. Despite the popularity of Lagrangian-based LLM policy search in constrained alignment, iterative primal-dual methods often fail to converge, and non-iterative dual-based methods do not achieve optimality in the LLM parameter space. To address these challenges, we employ Lagrangian duality to develop an iterative dual-based alignment method that alternates between updating the LLM policy via Lagrangian maximization and updating the dual variable via dual descent. In theory, we characterize the primal-dual gap between the primal value in the distribution space and the dual value in the LLM parameter space. We further quantify the optimality gap of the learned LLM policies at near-optimal dual variables with respect to both the objective and the constraint functions. These results prove that dual-based alignment methods can find an optimal constrained LLM policy, up to an LLM parametrization gap. We demonstrate the effectiveness and merits of our approach through extensive experiments conducted on the PKU-SafeRLHF dataset.
nan
Article 1568
Title@2025-05-26 (1): JingFang: An Expert-Level Large Language Model for Traditional Chinese Medicine Clinical Consultation and Syndrome Differentiation-Based Treatment
Title: JingFang: An Expert-Level Large Language Model for Traditional Chinese Medicine Clinical Consultation and Syndrome Differentiation-Based Treatment | JingFang: Ein sachverständiges Sprachmodell für die traditionelle chinesische Medizin Klinische Beratung und Syndromdifferenzierungsbasierte Behandlung | JingFang:中国传统医学临床咨询和综合症差别治疗专家级大语言模式 2502.04345v2 |
Authors: Yehan Yang, Tianhao Ma, Ruotai Li, Xinhan Zheng, Guodong Shan, Chisheng Li
The effective application of traditional Chinese medicine (TCM) requires extensive knowledge of TCM and clinical experience. The emergence of Large Language Models (LLMs) provides a solution to this, while existing LLMs for TCM exhibit critical limitations of incomplete clinical consultation and diagnoses, as well as inaccurate syndrome differentiation. To address these issues, we establish JingFang (JF), a novel TCM LLM that demonstrates the level of expertise in clinical consultation and syndrome differentiation. We propose a Multi-Agent Collaborative Chain-of-Thought Mechanism (MACCTM) for comprehensive and targeted clinical consultation, enabling JF with effective and accurate diagnostic ability. In addition, a Syndrome Agent and a Dual-Stage Recovery Scheme (DSRS) are developed to accurately enhance the differentiation of the syndrome and the subsequent corresponding treatment. JingFang not only facilitates the application of LLMs but also promotes the effective application of TCM for healthcare.
nan
Article 1569
Title@2025-05-26 (1): Unsupervised Anomaly Detection Using Diffusion Trend Analysis for Display Inspection
Title: Unsupervised Anomaly Detection Using Diffusion Trend Analysis for Display Inspection | Unüberwachte Anomalieerkennung mit Diffusion Trendanalyse für Display-Inspektion | 用于显示检查的利用扩散趋势分析进行无监督异常探测 2407.09578v2 |
Authors: Eunwoo Kim, Un Yang, Cheol Lae Roh, Stefano Ermon
Reconstruction-based anomaly detection via denoising diffusion model has limitations in determining appropriate noise parameters that can degrade anomalies while preserving normal characteristics. Also, normal regions can fluctuate considerably during reconstruction, resulting in false detection. In this paper, we propose a method to detect anomalies by analysis of reconstruction trend depending on the degree of degradation, effectively solving the both problems that impede practical application in display inspection.
nan
Article 1570
Title@2025-05-25 (7): SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning
Title: SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning | SALSA-RL: Stabilitätsanalyse im Latent Space of Actions zur Stärkung des Lernens | SALSA-RL:加强学习行动空间的稳定分析 2502.15512v2 |
Authors: Xuyang Li, Romit Maulik
Modern deep reinforcement learning (DRL) methods have made significant advances in handling continuous action spaces. However, real-world control systems–especially those requiring precise and reliable performance–often demand interpretability in the sense of a-priori assessments of agent behavior to identify safe or failure-prone interactions with environments. To address this limitation, we propose SALSA-RL (Stability Analysis in the Latent Space of Actions), a novel RL framework that models control actions as dynamic, time-dependent variables evolving within a latent space. By employing a pre-trained encoder-decoder and a state-dependent linear system, our approach enables interpretability through local stability analysis, where instantaneous growth in action-norms can be predicted before their execution. We demonstrate that SALSA-RL can be deployed in a non-invasive manner for assessing the local stability of actions from pretrained RL agents without compromising on performance across diverse benchmark environments. By enabling a more interpretable analysis of action generation, SALSA-RL provides a powerful tool for advancing the design, analysis, and theoretical understanding of RL systems.
nan
Article 1571
Title@2025-05-25 (7): Foundations of Top-$k$ Decoding For Language Models
Title: Foundations of Top-$k$ Decoding For Language Models | Grundlagen von Top-$k$ Dekodierung für Sprachmodelle | 语言模式最高价基数 2505.19371v1 |
Authors: Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban
Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in $k$, so that binary search provably and efficiently finds the optimal $k$. We show that top-$k$ decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization).
nan
Article 1572
Title@2025-05-25 (7): SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition
Title: SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition | SETransformer: Eine hybride, auf Aufmerksamkeit basierende Architektur für robuste menschliche Aktivitätserkennung | 转型:以关注为基础的混合结构,以确认强有力的人类活动 2505.19369v1 |
Authors: Yunbo Liu, Xukui Qin, Yifan Gao, Xiang Li, Chengwei Feng
Human Activity Recognition (HAR) using wearable sensor data has become a central task in mobile computing, healthcare, and human-computer interaction. Despite the success of traditional deep learning models such as CNNs and RNNs, they often struggle to capture long-range temporal dependencies and contextual relevance across multiple sensor channels. To address these limitations, we propose SETransformer, a hybrid deep neural architecture that combines Transformer-based temporal modeling with channel-wise squeeze-and-excitation (SE) attention and a learnable temporal attention pooling mechanism. The model takes raw triaxial accelerometer data as input and leverages global self-attention to capture activity-specific motion dynamics over extended time windows, while adaptively emphasizing informative sensor channels and critical time steps. We evaluate SETransformer on the WISDM dataset and demonstrate that it significantly outperforms conventional models including LSTM, GRU, BiLSTM, and CNN baselines. The proposed model achieves a validation accuracy of 84.68\% and a macro F1-score of 84.64\%, surpassing all baseline architectures by a notable margin. Our results show that SETransformer is a competitive and interpretable solution for real-world HAR tasks, with strong potential for deployment in mobile and ubiquitous sensing applications.
nan
Article 1573
Title@2025-05-25 (7): One Step Diffusion via Shortcut Models
Title: One Step Diffusion via Shortcut Models | Ein Schritt Diffusion über Shortcut-Modelle | 通过快捷键模型进行单步扩散 2410.12557v2 |
Authors: Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel
Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process. Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.
nan
Article 1574
Title@2025-05-25 (7): Adaptive Diffusion Guidance via Stochastic Optimal Control
Title: Adaptive Diffusion Guidance via Stochastic Optimal Control | Adaptive Diffusionsführung über stochastische Optimale Kontrolle | 通过斯托卡优化控制进行适应性扩散指导 2505.19367v1 |
Authors: Iskander Azangulov, Peter Potaptchik, Qinyu Li, Eddie Aamari, George Deligiannidis, Judith Rousseau
Guidance is a cornerstone of modern diffusion models, playing a pivotal role in conditional generation and enhancing the quality of unconditional samples. However, current approaches to guidance scheduling–determining the appropriate guidance weight–are largely heuristic and lack a solid theoretical foundation. This work addresses these limitations on two fronts. First, we provide a theoretical formalization that precisely characterizes the relationship between guidance strength and classifier confidence. Second, building on this insight, we introduce a stochastic optimal control framework that casts guidance scheduling as an adaptive optimization problem. In this formulation, guidance strength is not fixed but dynamically selected based on time, the current sample, and the conditioning class, either independently or in combination. By solving the resulting control problem, we establish a principled foundation for more effective guidance in diffusion models.
nan
Article 1575
Title@2025-05-25 (7): FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation
Title: FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation | FD-Bench: Modularer und fairer Benchmark für datengetriebene Fluidsimulation | FD-时区:数据驱动流流模拟模块化公平基准 2505.20349v1 |
Authors: Haixin Wang, Ruoyan Li, Fred Xu, Fang Sun, Kaiqiao Han, Zijie Huang, Guancheng Wan, Ching Chang, Xiao Luo, Wei Wang, Yizhou Sun
Data-driven modeling of fluid dynamics has advanced rapidly with neural PDE solvers, yet a fair and strong benchmark remains fragmented due to the absence of unified PDE datasets and standardized evaluation protocols. Although architectural innovations are abundant, fair assessment is further impeded by the lack of clear disentanglement between spatial, temporal and loss modules. In this paper, we introduce FD-Bench, the first fair, modular, comprehensive and reproducible benchmark for data-driven fluid simulation. FD-Bench systematically evaluates 85 baseline models across 10 representative flow scenarios under a unified experimental setup. It provides four key contributions: (1) a modular design enabling fair comparisons across spatial, temporal, and loss function modules; (2) the first systematic framework for direct comparison with traditional numerical solvers; (3) fine-grained generalization analysis across resolutions, initial conditions, and temporal windows; and (4) a user-friendly, extensible codebase to support future research. Through rigorous empirical studies, FD-Bench establishes the most comprehensive leaderboard to date, resolving long-standing issues in reproducibility and comparability, and laying a foundation for robust evaluation of future data-driven fluid models. The code is open-sourced at https://anonymous.4open.science/r/FD-Bench-15BC.
nan
Article 1576
Title@2025-05-25 (7): Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments
Title: Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments | Konsistenzbasierte abduktive Begründung über Wahrnehmungsfehler mehrerer vortrainierter Modelle in neuartigen Umgebungen | 创新环境中多个未受过培训的多种模式的认知错误的基于一致性的直截力理由 2505.19361v1 |
Authors: Mario Leiva, Noel Ngu, Joshua Shay Kricheli, Aditya Taparia, Ransalu Senanayake, Paulo Shakarian, Nathaniel Bastian, John Corcoran, Gerardo Simari
The deployment of pre-trained perception models in novel environments often leads to performance degradation due to distributional shifts. Although recent artificial intelligence approaches for metacognition use logical rules to characterize and filter model errors, improving precision often comes at the cost of reduced recall. This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction. We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem. The input predictions and the learned error detection rules derived from each model are encoded in a logic program. We then seek an abductive explanation–a subset of model predictions–that maximizes prediction coverage while ensuring the rate of logical inconsistencies (derived from domain constraints) remains below a specified threshold. We propose two algorithms for this knowledge representation task: an exact method based on Integer Programming (IP) and an efficient Heuristic Search (HS). Through extensive experiments on a simulated aerial imagery dataset featuring controlled, complex distributional shifts, we demonstrate that our abduction-based framework outperforms individual models and standard ensemble baselines, achieving, for instance, average relative improvements of approximately 13.6% in F1-score and 16.6% in accuracy across 15 diverse test datasets when compared to the best individual model. Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect reasoners in challenging, novel scenarios.
nan
Article 1577
Title@2025-05-25 (7): Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval
Title: Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval | Optimierte Text-Embedding-Modelle und Benchmarks für die Amharische Passage Retrieval | 阿姆光通过通过检索的最佳文本嵌入模型和基准 2505.19356v1 |
Authors: Kidist Amde Mekonnen, Yosef Worku Alemneh, Maarten de Rijke
Neural retrieval methods using transformer-based pre-trained language models have advanced multilingual and cross-lingual retrieval. However, their effectiveness for low-resource, morphologically rich languages such as Amharic remains underexplored due to data scarcity and suboptimal tokenization. We address this gap by introducing Amharic-specific dense retrieval models based on pre-trained Amharic BERT and RoBERTa backbones. Our proposed RoBERTa-Base-Amharic-Embed model (110M parameters) achieves a 17.6% relative improvement in MRR@10 and a 9.86% gain in Recall@10 over the strongest multilingual baseline, Arctic Embed 2.0 (568M parameters). More compact variants, such as RoBERTa-Medium-Amharic-Embed (42M), remain competitive while being over 13x smaller. Additionally, we train a ColBERT-based late interaction retrieval model that achieves the highest MRR@10 score (0.843) among all evaluated models. We benchmark our proposed models against both sparse and dense retrieval baselines to systematically assess retrieval effectiveness in Amharic. Our analysis highlights key challenges in low-resource settings and underscores the importance of language-specific adaptation. To foster future research in low-resource IR, we publicly release our dataset, codebase, and trained models at https://github.com/kidist-amde/amharic-ir-benchmarks.
nan
Article 1578
Title@2025-05-25 (7): FlashMD: long-stride, universal prediction of molecular dynamics
Title: FlashMD: long-stride, universal prediction of molecular dynamics | FlashMD: Langstride, universelle Vorhersage der molekularen Dynamik | FlashMD:长途、全方位预测分子动态 2505.19350v1 |
Authors: Filippo Bigi, Sanggyu Chong, Agustinus Kristiadi, Michele Ceriotti
Molecular dynamics (MD) provides insights into atomic-scale processes by integrating over time the equations that describe the motion of atoms under the action of interatomic forces. Machine learning models have substantially accelerated MD by providing inexpensive predictions of the forces, but they remain constrained to minuscule time integration steps, which are required by the fast time scale of atomic motion. In this work, we propose FlashMD, a method to predict the evolution of positions and momenta over strides that are between one and two orders of magnitude longer than typical MD time steps. We incorporate considerations on the mathematical and physical properties of Hamiltonian dynamics in the architecture, generalize the approach to allow the simulation of any thermodynamic ensemble, and carefully assess the possible failure modes of such a long-stride MD approach. We validate FlashMD’s accuracy in reproducing equilibrium and time-dependent properties, using both system-specific and general-purpose models, extending the ability of MD simulation to reach the long time scales needed to model microscopic processes of high scientific and technological relevance.
nan
Article 1579
Title@2025-05-25 (7): Communication-Efficient Multi-Device Inference Acceleration for Transformer Models
Title: Communication-Efficient Multi-Device Inference Acceleration for Transformer Models | Kommunikationseffiziente Multi-Device-Inferenzbeschleunigung für Transformer-Modelle | 变换模型的通信效率高多变量推推加速 2505.19342v1 |
Authors: Xiao Liu, Lijun Zhang, Deepak Ganesan, Hui Guan
Transformer models power many AI applications but suffer from high inference latency, limiting their use in real-time settings. Multi-device inference can reduce latency by parallelizing computation. Yet, existing methods require high inter-device bandwidth, making them impractical for bandwidth-constrained environments. We propose ASTRA, a communication-efficient framework that accelerates Transformer inference through a novel integration of sequence parallelism and a Mixed-Precision Attention mechanism designed to minimize inter-device communication. ASTRA compresses non-local token embeddings via vector quantization and preserves task accuracy through two optimizations, Noise-Augmented Quantization and Distributed Class Tokens. Experiments on ViT and GPT2 across vision and NLP tasks show that ASTRA achieves up to 2.64X speedups over single-device inference and up to 15.25X speedups over state-of-the-art multi-device inferences, while operating under bandwidths as low as 10 Mbps. ASTRA is open-sourced at https://github.com/xl1990/Astra.
nan
Article 1580
Title@2025-05-25 (7): Flow Q-Learning
Title: Flow Q-Learning | Fluss Q-Lernen | 流动学习 2502.02538v2 |
Authors: Seohong Park, Qiyang Li, Sergey Levine
We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. Training a flow policy with RL is a tricky problem, due to the iterative nature of the action generation process. We address this challenge by training an expressive one-step policy with RL, rather than directly guiding an iterative flow policy to maximize values. This way, we can completely avoid unstable recursive backpropagation, eliminate costly iterative action generation at test time, yet still mostly maintain expressivity. We experimentally show that FQL leads to strong performance across 73 challenging state- and pixel-based OGBench and D4RL tasks in offline RL and offline-to-online RL. Project page: https://seohong.me/projects/fql/
nan
Article 1581
Title@2025-05-25 (7): Improving Compositional Generation with Diffusion Models Using Lift Scores
Title: Improving Compositional Generation with Diffusion Models Using Lift Scores | Verbesserung der kompositorischen Generierung mit Diffusionsmodellen mit Lift-Scores | 利用使用提升分数的传播模型改善组成型 2505.13740v2 |
Authors: Chenning Yu, Sicun Gao
We introduce a novel resampling criterion using lift scores, for improving compositional generation in diffusion models. By leveraging the lift scores, we evaluate whether generated samples align with each single condition and then compose the results to determine whether the composed prompt is satisfied. Our key insight is that lift scores can be efficiently approximated using only the original diffusion model, requiring no additional training or external modules. We develop an optimized variant that achieves relatively lower computational overhead during inference while maintaining effectiveness. Through extensive experiments, we demonstrate that lift scores significantly improved the condition alignment for compositional generation across 2D synthetic data, CLEVR position tasks, and text-to-image synthesis. Our code is available at http://rainorangelemon.github.io/complift.
nan
Article 1582
Title@2025-05-25 (7): TRANSIT your events into a new mass: Fast background interpolation for weakly-supervised anomaly searches
Title: TRANSIT your events into a new mass: Fast background interpolation for weakly-supervised anomaly searches | Übertragen Sie Ihre Ereignisse in eine neue Masse: Schnelle Hintergrundinterpolation für schwach überwachte Anomaliensuche | 将您的事件转换成一个新的质量: 快速背景内插, 用于受微弱监督的异常搜索 2503.04342v2 |
Authors: Ivan Oleksiyuk, Svyatoslav Voloshynovskiy, Tobias Golling
We introduce a new model for conditional and continuous data morphing called TRansport Adversarial Network for Smooth InTerpolation (TRANSIT). We apply it to create a background data template for weakly-supervised searches at the LHC. The method smoothly transforms sideband events to match signal region mass distributions. We demonstrate the performance of TRANSIT using the LHC Olympics R\&D dataset. The model captures non-linear mass correlations of features and produces a template that offers a competitive anomaly sensitivity compared to state-of-the-art transport-based template generators. Moreover, the computational training time required for TRANSIT is an order of magnitude lower than that of competing deep learning methods. This makes it ideal for analyses that iterate over many signal regions and signal models. Unlike generative models, which must learn a full probability density distribution, i.e., the correlations between all the variables, the proposed transport model only has to learn a smooth conditional shift of the distribution. This allows for a simpler, more efficient residual architecture, enabling mass uncorrelated features to pass the network unchanged while the mass correlated features are adjusted accordingly. Furthermore, we show that the latent space of the model provides a set of mass decorrelated features useful for anomaly detection without background sculpting.
nan
Article 1583
Title@2025-05-25 (7): WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper
Title: WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper | WhisperD: Dementia Spracherkennung und Filler-Worterkennung mit Whisper | 耳语:痴呆症言语识别和用耳语探测填字词 2505.21551v1 |
Authors: Emmanuel Akinrintoyo, Nadine Abdelhalim, Nicole Salomons
Whisper fails to correctly transcribe dementia speech because persons with dementia (PwDs) often exhibit irregular speech patterns and disfluencies such as pauses, repetitions, and fragmented sentences. It was trained on standard speech and may have had little or no exposure to dementia-affected speech. However, correct transcription is vital for dementia speech for cost-effective diagnosis and the development of assistive technology. In this work, we fine-tune Whisper with the open-source dementia speech dataset (DementiaBank) and our in-house dataset to improve its word error rate (WER). The fine-tuning also includes filler words to ascertain the filler inclusion rate (FIR) and F1 score. The fine-tuned models significantly outperformed the off-the-shelf models. The medium-sized model achieved a WER of 0.24, outperforming previous work. Similarly, there was a notable generalisability to unseen data and speech patterns.
nan
Article 1584
Title@2025-05-25 (7): Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales
Title: Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales | LLM Absolute Relevanz Urteile auf feinkörnigen Ordinalwaagen | 理論或非理論:LLM 关于精准奥氏比额的绝对相关性判决 2505.19334v1 |
Authors: Charles Godfrey, Ping Nie, Natalia Ostapuk, David Ken, Shang Gao, Souheil Inati
Large language models (LLMs) obtain state of the art zero shot relevance ranking performance on a variety of information retrieval tasks. The two most common prompts to elicit LLM relevance judgments are pointwise scoring (a.k.a. relevance generation), where the LLM sees a single query-document pair and outputs a single relevance score, and listwise ranking (a.k.a. permutation generation), where the LLM sees a query and a list of documents and outputs a permutation, sorting the documents in decreasing order of relevance. The current research community consensus is that listwise ranking yields superior performance, and significant research effort has been devoted to crafting LLM listwise ranking algorithms. The underlying hypothesis is that LLMs are better at making relative relevance judgments than absolute ones. In tension with this hypothesis, we find that the gap between pointwise scoring and listwise ranking shrinks when pointwise scoring is implemented using a sufficiently large ordinal relevance label space, becoming statistically insignificant for many LLM-benchmark dataset combinations (where significant'' means
95\% confidence that listwise ranking improves NDCG@10’’). Our evaluations span four LLMs, eight benchmark datasets from the BEIR and TREC-DL suites, and two proprietary datasets with relevance labels collected after the training cut-off of all LLMs evaluated.
nan
Article 1585
Title@2025-05-25 (7): Bayesian Comparisons Between Representations
Title: Bayesian Comparisons Between Representations | Bayesische Vergleiche zwischen Repräsentationen | 代表之间的贝叶比较 2411.08739v3 |
Authors: Heiko H. Schütt
Which neural networks are similar is a fundamental question for both machine learning and neuroscience. Here, it is proposed to base comparisons on the predictive distributions of linear readouts from intermediate representations. In Bayesian statistics, the prior predictive distribution is a full description of the inductive bias and generalization of a model, making it a great basis for comparisons. This distribution directly gives the evidence a dataset would provide in favor of the model. If we want to compare multiple models to each other, we can use a metric for probability distributions like the Jensen-Shannon distance or the total variation distance. As these are metrics, this induces pseudo-metrics for representations, which measure how well two representations could be distinguished based on a linear read out. For a linear readout with a Gaussian prior on the read-out weights and Gaussian noise, we can analytically compute the (prior and posterior) predictive distributions without approximations. These distributions depend only on the linear kernel matrix of the representations in the model. Thus, the Bayesian metrics connect to both linear read-out based comparisons and kernel based metrics like centered kernel alignment and representational similarity analysis. The new methods are demonstrated with deep neural networks trained on ImageNet-1k comparing them to each other and a small subset of the Natural Scenes Dataset. The Bayesian comparisons are correlated to but distinct from existing metrics. Evaluations vary slightly less across random image samples and yield informative results with full uncertainty information. Thus the proposed Bayesian metrics nicely extend our toolkit for comparing representations.
nan
Article 1586
Title@2025-05-25 (7): Paying Alignment Tax with Contrastive Learning
Title: Paying Alignment Tax with Contrastive Learning | Steuern mit kontraproduktivem Lernen ausgleichen | 与反向学习支付一致税 2505.19327v1 |
Authors: Buse Sibel Korkmaz, Rahul Nair, Elizabeth M. Daly, Antonio del Rio Chanona
Current debiasing approaches often result a degradation in model capabilities such as factual accuracy and knowledge retention. Through systematic evaluation across multiple benchmarks, we demonstrate that existing debiasing methods face fundamental trade-offs, particularly in smaller models, leading to reduced truthfulness, knowledge loss, or unintelligible outputs. To address these limitations, we propose a contrastive learning framework that learns through carefully constructed positive and negative examples. Our approach introduces contrast computation and dynamic loss scaling to balance bias mitigation with faithfulness preservation. Experimental results across multiple model scales demonstrate that our method achieves substantial improvements in both toxicity reduction and faithfulness preservation. Most importantly, we show that our framework is the first to consistently improve both metrics simultaneously, avoiding the capability degradation characteristic of existing approaches. These results suggest that explicit modeling of both positive and negative examples through contrastive learning could be a promising direction for reducing the alignment tax in language model debiasing.
nan
Article 1587
Title@2025-05-25 (7): An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces
Title: An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces | Eine Adversarial Analyse von Thompson Sampling für Full-Information Online-Lernen: von Finite zu Unendlichen Aktionsräumen | 对Thompson网上全面信息学习抽样分析:从有限到无限行动空间 2502.14790v4 |
Authors: Alexander Terenin, Jeffrey Negrea
We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner’s prior is defined over the space of an adversary’s future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling over the $d$-dimensional unit cube, using a certain Gaussian process prior widely-used in the Bayesian optimization literature, has a $\mathcal{O}\Big(\beta\sqrt{Td\log(1+\sqrt{d}\frac{\lambda}{\beta})}\Big)$ rate against a $\beta$-bounded $\lambda$-Lipschitz adversary.
nan
Article 1588
Title@2025-05-25 (7): Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models
Title: Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models | Regress, nicht raten – Ein Rückschritt-ähnlicher Verlust an Zahlenzeichen für Sprachmodelle | Regress, don’t guess - 语言模型数字调的回归式损失 2411.02083v2 |
Authors: Jonas Zausinger, Lars Pennig, Anamarija Kozina, Sean Sdahl, Julian Sikora, Adrian Dendorfer, Timofey Kuznetsov, Mohamad Hagog, Nina Wiedemann, Kacper Chlodny, Vincent Limbach, Anna Ketteler, Thorben Prein, Vishwa Mohan Singh, Michael Morris Danziger, Jannis Born
While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially arithmetic. One fundamental limitation is the nature of the Cross Entropy loss, which assumes a nominal scale and thus cannot convey proximity between generated number tokens. In response, we here present a regression-like loss that operates purely on token level. Our proposed Number Token Loss (NTL) comes in two flavors and minimizes either the Lp norm or the Wasserstein distance between the numerical values of the real and predicted number tokens. NTL can easily be added to any language model and extend the Cross Entropy objective during training without runtime overhead. We evaluate the proposed scheme on various mathematical datasets and find that it consistently improves performance in math-related tasks. In a direct comparison on a regression task, we find that NTL can match the performance of a regression head, despite operating on token level. Finally, we scale NTL up to 3B parameter models and observe improved performance, demonstrating its potential for seamless integration into LLMs. We hope that this work can inspire LLM developers to improve their pretraining objectives. The code is available via: https://tum-ai.github.io/number-token-loss/
nan
Article 1589
Title@2025-05-25 (7): PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders
Title: PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders | PIGPVAE: Physik-informierte Gauß-Prozessvariationelle Autoencoder | PIGPVAE: 物理化高斯进程变异自动编码器 2505.19320v1 |
Authors: Michail Spitieris, Massimiliano Ruocco, Abdulmajid Murad, Alessandro Nocente
Recent advances in generative AI offer promising solutions for synthetic data generation but often rely on large datasets for effective training. To address this limitation, we propose a novel generative model that learns from limited data by incorporating physical constraints to enhance performance. Specifically, we extend the VAE architecture by incorporating physical models in the generative process, enabling it to capture underlying dynamics more effectively. While physical models provide valuable insights, they struggle to capture complex temporal dependencies present in real-world data. To bridge this gap, we introduce a discrepancy term to account for unmodeled dynamics, represented within a latent Gaussian Process VAE (GPVAE). Furthermore, we apply regularization to ensure the generated data aligns closely with observed data, enhancing both the diversity and accuracy of the synthetic samples. The proposed method is applied to indoor temperature data, achieving state-of-the-art performance. Additionally, we demonstrate that PIGPVAE can produce realistic samples beyond the observed distribution, highlighting its robustness and usefulness under distribution shifts.
nan
Article 1590
Title@2025-05-25 (7): Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Title: Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? | Sind Transformer durch die Verbindung getrennter Kenntnisse in Trainingsdaten in der Lage, Vernunft zu erreichen? | 将培训数据方面的单独知识连接起来的变换者是否具有理性? 2501.15857v6 |
Authors: Yutong Yin, Zhaoran Wang
Humans exhibit remarkable compositional reasoning by integrating knowledge from various sources. For example, if someone learns ( B = f(A) ) from one source and ( C = g(B) ) from another, they can deduce ( C=g(B)=g(f(A)) ) even without encountering ( ABC ) together, showcasing the generalization ability of human intelligence. In this paper, we introduce a synthetic learning task, “FTCT” (Fragmented at Training, Chained at Testing), to validate the potential of Transformers in replicating this skill and interpret its inner mechanism. In the training phase, data consist of separated knowledge fragments from an overall causal graph. During testing, Transformers must infer complete causal graph traces by integrating these fragments. Our findings demonstrate that few-shot Chain-of-Thought prompting enables Transformers to perform compositional reasoning on FTCT by revealing correct combinations of fragments, even if such combinations were absent in the training data. Furthermore, the emergence of compositional reasoning ability is strongly correlated with the model complexity and training-testing data similarity. We propose, both theoretically and empirically, that Transformers learn an underlying generalizable program from training, enabling effective compositional reasoning during testing.
nan
Article 1591
Title@2025-05-25 (7): Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics
Title: Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics | Effort-aware Fairness: Aufnahme einer philosophisch-informierten, menschlich-zentrierten Nennung von Effort in algorithmische Fairness-Metriken | 努力做到公平:将了解哲学、以人为中心的努力理念纳入到算法公平度量中 2505.19317v1 |
Authors: Tin Nguyen, Jiannan Xu, Zora Che, Phuong-Anh Nguyen-Le, Rushil Dandamudi, Donald Braman, Furong Huang, Hal Daumé III, Zubin Jelveh
Although popularized AI fairness metrics, e.g., demographic parity, have uncovered bias in AI-assisted decision-making outcomes, they do not consider how much effort one has spent to get to where one is today in the input feature space. However, the notion of effort is important in how Philosophy and humans understand fairness. We propose a philosophy-informed way to conceptualize and evaluate Effort-aware Fairness (EaF) based on the concept of Force, or temporal trajectory of predictive features coupled with inertia. In addition to our theoretical formulation of EaF metrics, our empirical contributions include: 1/ a pre-registered human subjects experiment, which demonstrates that for both stages of the (individual) fairness evaluation process, people consider the temporal trajectory of a predictive feature more than its aggregate value; 2/ pipelines to compute Effort-aware Individual/Group Fairness in the criminal justice and personal finance contexts. Our work may enable AI model auditors to uncover and potentially correct unfair decisions against individuals who spent significant efforts to improve but are still stuck with systemic/early-life disadvantages outside their control.
nan
Article 1592
Title@2025-05-25 (7): Demand Selection for VRP with Emission Quota
Title: Demand Selection for VRP with Emission Quota | Auswahl der Nachfrage nach VRP mit Emissionsquoten | 具有排放配额的VRP需求选择 2505.19315v1 |
Authors: Farid Najar, Dominique Barth, Yann Strozecki
Combinatorial optimization (CO) problems are traditionally addressed using Operations Research (OR) methods, including metaheuristics. In this study, we introduce a demand selection problem for the Vehicle Routing Problem (VRP) with an emission quota, referred to as QVRP. The objective is to minimize the number of omitted deliveries while respecting the pollution quota. We focus on the demand selection part, called Maximum Feasible Vehicle Assignment (MFVA), while the construction of a routing for the VRP instance is solved using classical OR methods. We propose several methods for selecting the packages to omit, both from machine learning (ML) and OR. Our results show that, in this static problem setting, classical OR-based methods consistently outperform ML-based approaches.
nan
Article 1593
Title@2025-05-25 (7): Concept Reachability in Diffusion Models: Beyond Dataset Constraints
Title: Concept Reachability in Diffusion Models: Beyond Dataset Constraints | Konzept-Erreichbarkeit in Diffusions-Modellen: Jenseits von Datensatzbeschränkungen | 传播模型中可达到的概念:超越数据集的制约 2505.19313v1 |
Authors: Marta Aparicio Rodriguez, Xenia Miscouridou, Anastasia Borovykh
Despite significant advances in quality and complexity of the generations in text-to-image models, prompting does not always lead to the desired outputs. Controlling model behaviour by directly steering intermediate model activations has emerged as a viable alternative allowing to reach concepts in latent space that may otherwise remain inaccessible by prompt. In this work, we introduce a set of experiments to deepen our understanding of concept reachability. We design a training data setup with three key obstacles: scarcity of concepts, underspecification of concepts in the captions, and data biases with tied concepts. Our results show: (i) concept reachability in latent space exhibits a distinct phase transition, with only a small number of samples being sufficient to enable reachability, (ii) where in the latent space the intervention is performed critically impacts reachability, showing that certain concepts are reachable only at certain stages of transformation, and (iii) while prompting ability rapidly diminishes with a decrease in quality of the dataset, concepts often remain reliably reachable through steering. Model providers can leverage this to bypass costly retraining and dataset curation and instead innovate with user-facing control mechanisms.
nan
Article 1594
Title@2025-05-25 (7): Stochastic Hessian Fittings with Lie Groups
Title: Stochastic Hessian Fittings with Lie Groups | Stochastische hessische Beschläge mit Lie Groups | 配有谎言组的假体装配机 2402.11858v5 |
Authors: Xi-Lin Li
This report investigates the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion derived from the preconditioned stochastic gradient descent (PSGD) method. This criterion is closely related to many widely used second-order and adaptive gradient optimization methods, including BFGS, the Gauss-Newton algorithm, natural gradient descent, and AdaGrad. Our analyses reveal the efficiency and reliability differences of a broad range of preconditioner fitting methods, ranging from closed-form to iterative approaches, using Hessian-vector products or stochastic gradients only, with Hessian fittings across various geometric settings (the Euclidean space, the manifold of symmetric positive definite (SPD) matrices and a variety of Lie groups). The most intriguing finding is that the Hessian fitting problem is strongly convex under mild conditions in certain general Lie groups. This result turns the Hessian fitting into a well-behaved Lie group optimization problem and facilitates the designs of highly efficient and elegant Lie group sparse preconditioner fitting methods for large-scale stochastic optimizations.
nan
Article 1595
Title@2025-05-25 (7): Fractional-Boundary-Regularized Deep Galerkin Method for Variational Inequalities in Mixed Optimal Stopping and Control
Title: Fractional-Boundary-Regularized Deep Galerkin Method for Variational Inequalities in Mixed Optimal Stopping and Control | Fraktional-Boundary-Regularized Deep Galerkin-Methode für unterschiedliche Ungleichheiten in gemischten Optimalen Stoppen und Steuern | 用于混合最佳制止和控制中差异性不平等的 分数-界分- 常规深加热法 2505.19309v1 |
Authors: Yun Zhao, Harry Zheng
Mixed optimal stopping and stochastic control problems define variational inequalities with non-linear Hamilton-Jacobi-Bellman (HJB) operators, whose numerical solution is notoriously difficult and lack of reliable benchmarks. We first use the dual approach to transform it into a linear operator, and then introduce a Fractional-Boundary-Regularized Deep Galerkin Method (FBR-DGM) that augments the classical $L^2$ loss with Sobolev-Slobodeckij norms on the parabolic boundary, enforcing regularity and yielding consistent improvements in the network approximation and its derivatives. The improved accuracy allows the network to be converted back to the original solution using the dual transform. The self-consistency and stability of the network can be tested by checking the primal-dual relationship among optimal value, optimal wealth, and optimal control, offering innovative benchmarks in the absence of analytical solutions.
nan
Article 1596
Title@2025-05-25 (7): From Single Images to Motion Policies via Video-Generation Environment Representations
Title: From Single Images to Motion Policies via Video-Generation Environment Representations | Von Einzelbildern zu Motion Policies über Video-Generation Umweltvertretungen | 从单一图像到通过视频环境代表从单一图像到运动政策 2505.19306v1 |
Authors: Weiming Zhi, Ziyong Ma, Tianyi Zhang, Matthew Johnson-Roberson
Autonomous robots typically need to construct representations of their surroundings and adapt their motions to the geometry of their environment. Here, we tackle the problem of constructing a policy model for collision-free motion generation, consistent with the environment, from a single input RGB image. Extracting 3D structures from a single image often involves monocular depth estimation. Developments in depth estimation have given rise to large pre-trained models such as DepthAnything. However, using outputs of these models for downstream motion generation is challenging due to frustum-shaped errors that arise. Instead, we propose a framework known as Video-Generation Environment Representation (VGER), which leverages the advances of large-scale video generation models to generate a moving camera video conditioned on the input image. Frames of this video, which form a multiview dataset, are then input into a pre-trained 3D foundation model to produce a dense point cloud. We then introduce a multi-scale noise approach to train an implicit representation of the environment structure and build a motion generation model that complies with the geometry of the representation. We extensively evaluate VGER over a diverse set of indoor and outdoor environments. We demonstrate its ability to produce smooth motions that account for the captured geometry of a scene, all from a single RGB input image.
nan
Article 1597
Title@2025-05-25 (7): Time Series Embedding Methods for Classification Tasks: A Review
Title: Time Series Embedding Methods for Classification Tasks: A Review | Zeitreihen Einbetten von Methoden für die Klassifizierung Aufgaben: Eine Überprüfung | 分类任务所含方法:审查 2501.13392v2 |
Authors: Habib Irani, Yasamin Ghahremani, Arshia Kermani, Vangelis Metsis
Time series analysis has become crucial in various fields, from engineering and finance to healthcare and social sciences. Due to their multidimensional nature, time series often need to be embedded into a fixed-dimensional feature space to enable processing with various machine learning algorithms. In this paper, we present a comprehensive review and quantitative evaluation of time series embedding methods for effective representations in machine learning and deep learning models. We introduce a taxonomy of embedding techniques, categorizing them based on their theoretical foundations and application contexts. Our work provides a quantitative evaluation of representative methods from each category by assessing their performance on downstream classification tasks across diverse real-world datasets. Our experimental results demonstrate that the performance of embedding methods varies significantly depending on the dataset and classification algorithm used, highlighting the importance of careful model selection and extensive experimentation for specific applications. To facilitate further research and practical applications, we provide an open-source code repository implementing these embedding methods. This study contributes to the field by offering a systematic comparison of time series embedding techniques, guiding practitioners in selecting appropriate methods for their specific applications, and providing a foundation for future advancements in time series analysis.
nan
Article 1598
Title@2025-05-25 (7): LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols
Title: LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols | LLM-basierte Emulation der Funkressourcenkontrollschicht: Auf dem Weg zu KI-Native RAN-Protokollen | 基于LLM的无线电资源控制层模拟模拟无线电资源控制层:迈向AI-NTRAN议定书 2505.16821v2 |
Authors: Ziming Liu, Bryan Liu, Alvaro Valcarce, Xiaoli Chu
Integrating large AI models (LAMs) into 6G mobile networks promises to redefine protocol design and control-plane intelligence by enabling autonomous, cognitive network operations. While industry concepts, such as ETSI’s Experiential Networked Intelligence (ENI), envision LAM-driven agents for adaptive network slicing and intent-based management, practical implementations still face challenges in protocol literacy and real-world deployment. This paper presents an end-to-end demonstration of a LAM that generates standards-compliant, ASN.1-encoded Radio Resource Control (RRC) messages as part of control-plane procedures inside a gNB. We treat RRC messaging as a domain-specific language and fine-tune a decoder-only transformer model (LLaMA class) using parameter-efficient Low-Rank Adaptation (LoRA) on RRC messages linearized to retain their ASN.1 syntactic structure before standard byte-pair encoding tokenization. This enables combinatorial generalization over RRC protocol states while minimizing training overhead. On 30k field-test request-response pairs, our 8 B model achieves a median cosine similarity of 0.97 with ground-truth messages on an edge GPU – a 61 % relative gain over a zero-shot LLaMA-3 8B baseline – indicating substantially improved structural and semantic RRC fidelity. Overall, our results show that LAMs, when augmented with Radio Access Network (RAN)-specific reasoning, can directly orchestrate control-plane procedures, representing a stepping stone toward the AI-native air-interface paradigm. Beyond RRC emulation, this work lays the groundwork for future AI-native wireless standards.
nan
Article 1599
Title@2025-05-25 (7): On the status of current quantum machine learning software
Title: On the status of current quantum machine learning software | Zum Status der aktuellen Quantenmaschinen-Lernsoftware | 关于当前量子机器学习软件现状 2503.08962v2 |
Authors: Manish K. Gupta, Tomasz Rybotycki, Piotr Gawron
The recent advancements in noisy intermediate-scale quantum (NISQ) devices implementation allow us to study their application to real-life computational problems. However, hardware challenges are not the only ones that hinder our quantum computation capabilities. Software limitations are the other, less explored side of this medal. Using satellite image segmentation as a task example, we investigated how difficult it is to run a hybrid quantum-classical model on a real, publicly available quantum device. We also analyzed the costs of such endeavor and the change in quality of model.
nan
Article 1600
Title@2025-05-25 (7): 100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?
Title: 100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability? | 100-LongBench: Sind de facto Long-Context-Benchmarks wortwörtlich die Lang-Context-Fähigkeit zu bewerten? | 100-LongBench:事实上的长文本基准是否实际评价长文本能力? 2505.19293v1 |
Authors: Wang Yang, Hongye Jin, Shaochen Zhong, Song Jiang, Qifan Wang, Vipin Chaudhary, Xiaotian Han
Long-context capability is considered one of the most important abilities of LLMs, as a truly long context-capable LLM enables users to effortlessly process many originally exhausting tasks – e.g., digesting a long-form document to find answers vs. directly asking an LLM about it. However, existing real-task-based long-context evaluation benchmarks have two major shortcomings. First, benchmarks like LongBench often do not provide proper metrics to separate long-context performance from the model’s baseline ability, making cross-model comparison unclear. Second, such benchmarks are usually constructed with fixed input lengths, which limits their applicability across different models and fails to reveal when a model begins to break down. To address these issues, we introduce a length-controllable long-context benchmark and a novel metric that disentangles baseline knowledge from true long-context capabilities. Experiments demonstrate the superiority of our approach in effectively evaluating LLMs.
nan
Article 1601
Title@2025-05-25 (7): Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering
Title: Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering | Hypercube-RAG: Hypercube-based Retrieval-Augmented Generation for In-domain Scientific Question-Answering | Hypercube-RAG: 内地科学问题解答的超立方体回收回溯性养代 2505.19288v1 |
Authors: Jimeng Shi, Sizhe Zhou, Bowen Jin, Wei Hu, Shaowen Wang, Giri Narasimhan, Jiawei Han
Large language models (LLMs) often need to incorporate external knowledge to solve theme-specific problems. Retrieval-augmented generation (RAG), which empowers LLMs to generate more qualified responses with retrieved external data and knowledge, has shown its high promise. However, traditional semantic similarity-based RAGs struggle to return concise yet highly relevant information for domain knowledge-intensive tasks, such as scientific question-answering (QA). Built on a multi-dimensional (cube) structure called Hypercube, which can index documents in an application-driven, human-defined, multi-dimensional space, we introduce the Hypercube-RAG, a novel RAG framework for precise and efficient retrieval. Given a query, Hypercube-RAG first decomposes it based on its entities and topics and then retrieves relevant documents from cubes by aligning these decomposed components with hypercube dimensions. Experiments on three in-domain scientific QA datasets demonstrate that our method improves accuracy by 3.7% and boosts retrieval efficiency by 81.2%, measured as relative gains over the strongest RAG baseline. More importantly, our Hypercube-RAG inherently offers explainability by revealing the underlying predefined hypercube dimensions used for retrieval. The code and data sets are available at https://github.com/JimengShi/Hypercube-RAG.
nan
Article 1602
Title@2025-05-25 (7): Provably Overwhelming Transformer Models with Designed Inputs
Title: Provably Overwhelming Transformer Models with Designed Inputs | Wahrscheinlich überwältigende Transformer-Modelle mit designten Eingängen | 具有设计投入的、可预见地压得压得压倒的变压器模型 2502.06038v2 |
Authors: Lev Stambler, Seyed Sajjad Nezhadi, Matthew Coudron
We develop an algorithm which, given a trained transformer model $\mathcal{M}$ as input, as well as a string of tokens $s$ of length $n_{fix}$ and an integer $n_{free}$, can generate a mathematical proof that $\mathcal{M}$ is overwhelmed'' by $s$, in time and space $\widetilde{O}(n_{fix}^2 + n_{free}^3)$. We say that $\mathcal{M}$ is
overwhelmed’’ by $s$ when the output of the model evaluated on this string plus any additional string $t$, $\mathcal{M}(s + t)$, is completely insensitive to the value of the string $t$ whenever length($t$) $\leq n_{free}$. Along the way, we prove a particularly strong worst-case form of ``over-squashing’’, which we use to bound the model’s behavior. Our technique uses computer-aided proofs to establish this type of operationally relevant guarantee about transformer models. We empirically test our algorithm on a single layer transformer complete with an attention head, layer-norm, MLP/ReLU layers, and RoPE positional encoding. We believe that this work is a stepping stone towards the difficult task of obtaining useful guarantees for trained transformer models.
nan
Article 1603
Title@2025-05-25 (7): A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Title: A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning | Eine Momentaufnahme des Einflusses: Ein lokales Daten-Attributions-Framework für Online-Verstärkungs-Lernen | 《影响概览:在线强化学习地方数据归属框架》 2505.19281v1 |
Authors: Yuzheng Hu, Fan Wu, Haotian Ye, David Forsyth, James Zou, Nan Jiang, Jiaqi W. Ma, Han Zhao
Online reinforcement learning (RL) excels in complex, safety-critical domains, yet it faces challenges such as sample inefficiency, training instability, and a lack of interpretability. Data attribution offers a principled way to trace model behavior back to individual training samples. However, in online RL, each training sample not only drives policy updates but also influences future data collection, violating the fixed dataset assumption in existing attribution methods. In this paper, we initiate the study of data attribution for online RL, focusing on the widely used Proximal Policy Optimization (PPO) algorithm. We start by establishing a local attribution framework, interpreting model checkpoints with respect to the records in the recent training buffer. We design two target functions, capturing agent action and cumulative return respectively, and measure each record’s contribution through gradient similarity between its training loss and these targets. We demonstrate the power of this framework through three concrete applications: diagnosis of learning, temporal analysis of behavior formation, and targeted intervention during training. Leveraging this framework, we further propose an algorithm, iterative influence-based filtering (IIF), for online RL training that iteratively performs experience filtering to refine policy updates. Across standard RL benchmarks (classic control, navigation, locomotion) to RLHF for large language models, IIF reduces sample complexity, speeds up training, and achieves higher returns. Overall, these results advance interpretability, efficiency, and effectiveness of online RL.
nan
Article 1604
Title@2025-05-25 (7): Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization
Title: Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization | Optimaler Transport Barycenter über Nonconvex-Concave Minimax-Optimierung | 通过非 connconvex- concave Minimax 优化化优化运输博利中心 2501.14635v2 |
Authors: Kaheon Kim, Rentian Yao, Changbo Zhu, Xiaohui Chen
The optimal transport barycenter (a.k.a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the unregularized barycenter for discretized probability distributions on point clouds is a challenging task when the domain dimension $d > 1$. Most practical algorithms for approximating the barycenter problem are based on entropic regularization. In this paper, we introduce a nearly linear time $O(m \log{m})$ and linear space complexity $O(m)$ primal-dual algorithm, the Wasserstein-Descent $\dot{\mathbb{H}}^1$-Ascent (WDHA) algorithm, for computing the exact barycenter when the input probability density functions are discretized on an $m$-point grid. The key success of the WDHA algorithm hinges on alternating between two different yet closely related Wasserstein and Sobolev optimization geometries for the primal barycenter and dual Kantorovich potential subproblems. Under reasonable assumptions, we establish the convergence rate and iteration complexity of WDHA to its stationary point when the step size is appropriately chosen. Superior computational efficacy, scalability, and accuracy over the existing Sinkhorn-type algorithms are demonstrated on high-resolution (e.g., $1024 \times 1024$ images) 2D synthetic and real data.
nan
Article 1605
Title@2025-05-25 (7): Achieving $\tilde{\mathcal{O}}(1/N)$ Optimality Gap in Restless Bandits through Gaussian Approximation
Title: Achieving $\tilde{\mathcal{O}}(1/N)$ Optimality Gap in Restless Bandits through Gaussian Approximation | Erreichen von $\tilde{\mathcal{O}(1/N)$ Optimality Gap in ruhelosen Banditen durch Gaußsche Annäherung | 通过高斯近似度实现无休止强盗的最佳差距 $\ tilde\ mathcal{O\\\\\\\\\\( n)$ 2410.15003v2 |
Authors: Chen Yan, Weina Wang, Lei Ying
We study the finite-horizon Restless Multi-Armed Bandit (RMAB) problem with $N$ homogeneous arms. Prior work has shown that when an RMAB satisfies a non-degeneracy condition, Linear-Programming-based (LP-based) policies derived from the fluid approximation, which captures the mean dynamics of the system, achieve an exponentially small optimality gap. However, it is common for RMABs to be degenerate, in which case LP-based policies can result in a $\Theta(1/\sqrt{N})$ optimality gap per arm. In this paper, we propose a novel Stochastic-Programming-based (SP-based) policy that, under a uniqueness assumption, achieves an $\tilde{\mathcal{O}}(1/N)$ optimality gap for degenerate RMABs. Our approach is based on the construction of a Gaussian stochastic system that captures not only the mean but also the variance of the RMAB dynamics, resulting in a more accurate approximation than the fluid approximation. We then solve a stochastic program for this system to obtain our policy. This is the first result to establish an $\tilde{\mathcal{O}}(1/N)$ optimality gap for degenerate RMABs.
nan
Article 1606
Title@2025-05-25 (7): Cellular Traffic Prediction via Byzantine-robust Asynchronous Federated Learning
Title: Cellular Traffic Prediction via Byzantine-robust Asynchronous Federated Learning | Zelluläre Verkehrsvorhersage über byzantinisches-robustes Asynchrones Federated Learning | 通过Byzantine-Robust 亚同步联谊会学习的细胞交通预测 2505.19263v1 |
Authors: Hui Ma, Kai Yang, Yang Jiao
Network traffic prediction plays a crucial role in intelligent network operation. Traditional prediction methods often rely on centralized training, necessitating the transfer of vast amounts of traffic data to a central server. This approach can lead to latency and privacy concerns. To address these issues, federated learning integrated with differential privacy has emerged as a solution to improve data privacy and model robustness in distributed settings. Nonetheless, existing federated learning protocols are vulnerable to Byzantine attacks, which may significantly compromise model robustness. Developing a robust and privacy-preserving prediction model in the presence of Byzantine clients remains a significant challenge. To this end, we propose an asynchronous differential federated learning framework based on distributionally robust optimization. The proposed framework utilizes multiple clients to train the prediction model collaboratively with local differential privacy. In addition, regularization techniques have been employed to further improve the Byzantine robustness of the models. We have conducted extensive experiments on three real-world datasets, and the results elucidate that our proposed distributed algorithm can achieve superior performance over existing methods.
nan
Article 1607
Title@2025-05-25 (7): Towards a Spatiotemporal Fusion Approach to Precipitation Nowcasting
Title: Towards a Spatiotemporal Fusion Approach to Precipitation Nowcasting | Auf dem Weg zu einem Spatiotemporalen Fusionsansatz zur Niederschlagung von Nowcasting | 迈向对降水即时播送采取相向时间融合办法 2505.19258v1 |
Authors: Felipe Curcio, Pedro Castro, Augusto Fonseca, Rafaela Castro, Raquel Franco, Eduardo Ogasawara, Victor Stepanenko, Fabio Porto, Mariza Ferro, Eduardo Bezerra
With the increasing availability of meteorological data from various sensors, numerical models and reanalysis products, the need for efficient data integration methods has become paramount for improving weather forecasts and hydrometeorological studies. In this work, we propose a data fusion approach for precipitation nowcasting by integrating data from meteorological and rain gauge stations in Rio de Janeiro metropolitan area with ERA5 reanalysis data and GFS numerical weather prediction. We employ the spatiotemporal deep learning architecture called STConvS2S, leveraging a structured dataset covering a 9 x 11 grid. The study spans from January 2011 to October 2024, and we evaluate the impact of integrating three surface station systems. Among the tested configurations, the fusion-based model achieves an F1-score of 0.2033 for forecasting heavy precipitation events (greater than 25 mm/h) at a one-hour lead time. Additionally, we present an ablation study to assess the contribution of each station network and propose a refined inference strategy for precipitation nowcasting, integrating the GFS numerical weather prediction (NWP) data with in-situ observations.
nan
Article 1608
Title@2025-05-25 (7): Learning-Augmented Online Bipartite Fractional Matching
Title: Learning-Augmented Online Bipartite Fractional Matching | Learning-Augmented Online Bipartite Fraktional Matching | 学习增强的在线双两派人数配对 2505.19252v1 |
Authors: Davin Choo, Billy Jin, Yongho Shin
Online bipartite matching is a fundamental problem in online optimization, extensively studied both in its integral and fractional forms due to its theoretical significance and practical applications, such as online advertising and resource allocation. Motivated by recent progress in learning-augmented algorithms, we study online bipartite fractional matching when the algorithm is given advice in the form of a suggested matching in each iteration. We develop algorithms for both the vertex-weighted and unweighted variants that provably dominate the naive “coin flip” strategy of randomly choosing between the advice-following and advice-free algorithms. Moreover, our algorithm for the vertex-weighted setting extends to the AdWords problem under the small bids assumption, yielding a significant improvement over the seminal work of Mahdian, Nazerzadeh, and Saberi (EC 2007, TALG 2012). Complementing our positive results, we establish a hardness bound on the robustness-consistency tradeoff that is attainable by any algorithm. We empirically validate our algorithms through experiments on synthetic and real-world data.
nan
Article 1609
Title@2025-05-25 (7): Empirical Privacy Variance
Title: Empirical Privacy Variance | Empirische Datenschutzvarianz | 隐私经验差异 2503.12314v2 |
Authors: Yuzheng Hu, Fan Wu, Ruicheng Xian, Yuhang Liu, Lydia Zakynthinou, Pritish Kamath, Chiyuan Zhang, David Forsyth
We propose the notion of empirical privacy variance and study it in the context of differentially private fine-tuning of language models. Specifically, we show that models calibrated to the same $(\varepsilon, \delta)$-DP guarantee using DP-SGD with different hyperparameter configurations can exhibit significant variations in empirical privacy, which we quantify through the lens of memorization. We investigate the generality of this phenomenon across multiple dimensions and discuss why it is surprising and relevant. Through regression analysis, we examine how individual and composite hyperparameters influence empirical privacy. The results reveal a no-free-lunch trade-off: existing practices of hyperparameter tuning in DP-SGD, which focus on optimizing utility under a fixed privacy budget, often come at the expense of empirical privacy. To address this, we propose refined heuristics for hyperparameter selection that explicitly account for empirical privacy, showing that they are both precise and practically useful. Finally, we take preliminary steps to understand empirical privacy variance. We propose two hypotheses, identify limitations in existing techniques like privacy auditing, and outline open questions for future research.
nan
Article 1610
Title@2025-05-25 (7): Improving Value Estimation Critically Enhances Vanilla Policy Gradient
Title: Improving Value Estimation Critically Enhances Vanilla Policy Gradient | Verbesserung der Wertschätzung Kritisch verbessert Vanilla Policy Gradient | 显著加强香草政策梯度 2505.19247v1 |
Authors: Tao Wang, Ruipeng Zhang, Sicun Gao
Modern policy gradient algorithms, such as TRPO and PPO, outperform vanilla policy gradient in many RL tasks. Questioning the common belief that enforcing approximate trust regions leads to steady policy improvement in practice, we show that the more critical factor is the enhanced value estimation accuracy from more value update steps in each iteration. To demonstrate, we show that by simply increasing the number of value update steps per iteration, vanilla policy gradient itself can achieve performance comparable to or better than PPO in all the standard continuous control benchmark environments. Importantly, this simple change to vanilla policy gradient is significantly more robust to hyperparameter choices, opening up the possibility that RL algorithms may still become more effective and easier to use.
nan
Article 1611
Title@2025-05-25 (7): To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
Title: To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers | To CoT or To Loop? Ein formaler Vergleich zwischen Ketten-of-Thought und Schleiftransformatoren | 尝试链和循环变换器之间的正式比较 2505.19245v1 |
Authors: Kevin Xu, Issei Sato
Chain-of-Thought (CoT) and Looped Transformers have been shown to empirically improve performance on reasoning tasks and to theoretically enhance expressivity by recursively increasing the number of computational steps. However, their comparative capabilities are still not well understood. In this paper, we provide a formal analysis of their respective strengths and limitations. We show that Looped Transformers can efficiently simulate parallel computations for deterministic tasks, which we formalize as evaluation over directed acyclic graphs. In contrast, CoT with stochastic decoding excels at approximate inference for compositional structures, namely self-reducible problems. These separations suggest the tasks for which depth-driven recursion is more suitable, thereby offering practical cues for choosing between reasoning paradigms.
nan
Article 1612
Title@2025-05-25 (7): ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Title: ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment | ActiveDPO: Aktive Direktpräferenzoptimierung für eine stichprobeneffiziente Ausrichtung | 主动式DPO:为抽样有效对齐积极直接首选优化 2505.19241v1 |
Authors: Xiaoqiang Lin, Arun Verma, Zhongxiang Dai, Daniela Rus, See-Kiong Ng, Bryan Kian Hsiang Low
The recent success of using human preferences to align large language models (LLMs) has significantly improved their performance in various downstream tasks like question answering, mathematical reasoning, and code generation. However,3 achieving effective LLM alignment depends on high-quality human preference datasets. Collecting these datasets requires human preference annotation, which is costly and resource-intensive, necessitating efficient active data selection methods. Existing methods either lack a strong theoretical foundation or depend on restrictive reward function assumptions (e.g., linearity). To this end, we propose an algorithm, ActiveDPO, that uses a theoretically grounded data selection criterion for non-linear reward functions while directly leveraging the LLM itself to parameterize the reward model that is used for active data selection. As a result, ActiveDPO explicitly accounts for the influence of LLM on data selection, unlike methods that select the data without considering the LLM that is being aligned, thereby leading to more effective and efficient data collection. Extensive experiments show that ActiveDPO outperforms existing methods across various models and datasets.
nan
Article 1613
Title@2025-05-25 (7): CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling
Title: CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling | CLIP-UP: Ein einfaches und effizientes Mixture-of-Experts CLIP Training Rezept mit Sparse Upcycling | CLIP-UP:一个简单、高效的专家混合体 CLIP 与粗垃圾垃圾垃圾垃圾处理有关的培训名额 2502.00965v2 |
Authors: Xinze Wang, Chen Chen, Yinfei Yang, Hong-You Chen, Bowen Zhang, Aditya Pal, Xiangxin Zhu, Xianzhi Du
Mixture-of-Experts (MoE) models are crucial for scaling model capacity while controlling inference costs. While integrating MoE into multimodal models like CLIP improves performance, training these models is notoriously challenging and expensive. We propose CLIP-Upcycling (CLIP-UP), an efficient alternative training strategy that converts a pre-trained dense CLIP model into a sparse MoE architecture. Through extensive experimentation with various settings and auxiliary losses, we demonstrate that CLIP-UP significantly reduces training complexity and cost. Remarkably, our sparse CLIP B/16 model, trained with CLIP-UP, outperforms its dense counterpart by 7.2% and 6.6% on COCO and Flickr30k text-to-image Recall@1 benchmarks respectively. It even surpasses the larger CLIP L/14 model on this task while using only 30% of the inference FLOPs. We further demonstrate the generalizability of our training recipe across different scales, establishing sparse upcycling as a practical and scalable approach for building efficient, high-performance CLIP models.
nan
Article 1614
Title@2025-05-25 (7): LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models
Title: LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models | LLLMs: Eine datengestützte Untersuchung der sich entwickelnden Forschung über Grenzen großer Sprachmodelle | LLLMs:关于大语言模式限制的不断发展的研究数据驱动调查 2505.19240v1 |
Authors: Aida Kostikova, Zhipin Wang, Deidamea Bajri, Ole Pütz, Benjamin Paaßen, Steffen Eger
Large language model (LLM) research has grown rapidly, along with increasing concern about their limitations such as failures in reasoning, hallucinations, and limited multilingual capability. In this survey, we conduct a data-driven, semi-automated review of research on limitations of LLM (LLLMs) from 2022 to 2024 using a bottom-up approach. From a corpus of 250,000 ACL and arXiv papers, we identify 14,648 relevant papers using keyword filtering, LLM-based classification, validated against expert labels, and topic clustering (via two approaches, HDBSCAN+BERTopic and LlooM). We find that LLM-related research increases over fivefold in ACL and fourfold in arXiv. Since 2022, LLLMs research grows even faster, reaching over 30% of LLM papers by late 2024. Reasoning remains the most studied limitation, followed by generalization, hallucination, bias, and security. The distribution of topics in the ACL dataset stays relatively stable over time, while arXiv shifts toward safety and controllability (with topics like security risks, alignment, hallucinations, knowledge editing), and multimodality between 2022 and 2024. We release a dataset of annotated abstracts and a validated methodology, and offer a quantitative view of trends in LLM limitations research.
nan
Article 1615
Title@2025-05-25 (7): Learning Transformer-based World Models with Contrastive Predictive Coding
Title: Learning Transformer-based World Models with Contrastive Predictive Coding | Transformer-basierte Weltmodelle mit kontradiktivem Predictive Coding lernen | 以学习变换器为基础的世界差异预测编码模式 2503.04416v2 |
Authors: Maxime Burchi, Radu Timofte
The DreamerV3 algorithm recently obtained remarkable performance across diverse environment domains by learning an accurate world model based on Recurrent Neural Networks (RNNs). Following the success of model-based reinforcement learning algorithms and the rapid adoption of the Transformer architecture for its superior training efficiency and favorable scaling properties, recent works such as STORM have proposed replacing RNN-based world models with Transformer-based world models using masked self-attention. However, despite the improved training efficiency of these methods, their impact on performance remains limited compared to the Dreamer algorithm, struggling to learn competitive Transformer-based world models. In this work, we show that the next state prediction objective adopted in previous approaches is insufficient to fully exploit the representation capabilities of Transformers. We propose to extend world model predictions to longer time horizons by introducing TWISTER (Transformer-based World model wIth contraSTivE Representations), a world model using action-conditioned Contrastive Predictive Coding to learn high-level temporal feature representations and improve the agent performance. TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search.
nan
Article 1616
Title@2025-05-25 (7): Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Title: Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees | Effiziente Politikoptimierung in robusten, eingeschränkten MDPs mit Iterationskomplexitätsgarantien | 在强力约束下,在具有迭接复杂度保障的多用途发展方案中提高政策效率的优化 2505.19238v1 |
Authors: Sourav Ganguly, Arnob Ghosh, Kishan Panaganti, Adam Wierman
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function–to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $\epsilon$ sub-optimality and feasible policy after $O(\epsilon^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($\gamma$) and by at least 6x for larger value of $\gamma$.
nan
Article 1617
Title@2025-05-25 (7): To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging
Title: To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging | Eine Welt in einem Funken Neuron zu sehen: Entwirren von Multi-Task-Interferenzen für trainingsfreies Modellverschmelzen | 《在中世纪的火花中看到世界:为无培训模式合并拆散多任务干预》 2503.05320v2 |
Authors: Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh
Fine-tuning pre-trained models on targeted datasets enhances task-specific performance but often comes at the expense of generalization. Model merging techniques, which integrate multiple fine-tuned models into a single multi-task model through task arithmetic, offer a promising solution. However, task interference remains a fundamental challenge, leading to performance degradation and suboptimal merged models. Existing approaches largely overlook the fundamental roles of neurons, their connectivity, and activation, resulting in a merging process and a merged model that does not consider how neurons relay and process information. In this work, we present the first study that relies on neuronal mechanisms for model merging. We decompose task-specific representations into two complementary neuronal subspaces that regulate neuron sensitivity and input adaptability. Leveraging this decomposition, we introduce NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces, enabling training-free model fusion across diverse tasks. Through extensive experiments, we demonstrate that NeuroMerging achieves superior performance compared to existing methods on multi-task benchmarks across both natural language and vision domains. Our findings highlight the importance of aligning neuronal mechanisms in model merging, offering new insights into mitigating task interference and improving knowledge fusion. Code will be released upon acceptance.
nan
Article 1618
Title@2025-05-25 (7): CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models
Title: CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models | CoreMatching: Co-adaptive Sparse Inference Framework mit Token und Neuron Pruning für eine umfassende Beschleunigung von Vision-Language-Modellen | 核心配料:与Token 和Neron Prurning 共同调适的简单推断框架,以全面加速视觉语言模型 2505.19235v1 |
Authors: Qinsi Wang, Hancheng Ye, Ming-Yu Chung, Yudong Liu, Yueqian Lin, Martin Kuo, Mingyuan Ma, Jianyi Zhang, Yiran Chen
Vision-Language Models (VLMs) excel across diverse tasks but suffer from high inference costs in time and memory. Token sparsity mitigates inefficiencies in token usage, while neuron sparsity reduces high-dimensional computations, both offering promising solutions to enhance efficiency. Recently, these two sparsity paradigms have evolved largely in parallel, fostering the prevailing assumption that they function independently. However, a fundamental yet underexplored question remains: Do they truly operate in isolation, or is there a deeper underlying interplay that has yet to be uncovered? In this paper, we conduct the first comprehensive investigation into this question. By introducing and analyzing the matching mechanism between Core Neurons and Core Tokens, we found that key neurons and tokens for inference mutually influence and reinforce each other. Building on this insight, we propose CoreMatching, a co-adaptive sparse inference framework, which leverages the synergy between token and neuron sparsity to enhance inference efficiency. Through theoretical analysis and efficiency evaluations, we demonstrate that the proposed method surpasses state-of-the-art baselines on ten image understanding tasks and three hardware devices. Notably, on the NVIDIA Titan Xp, it achieved 5x FLOPs reduction and a 10x overall speedup. Code is released at https://github.com/wangqinsi1/2025-ICML-CoreMatching/tree/main.
nan
Article 1619
Title@2025-05-25 (7): Learning Flexible Forward Trajectories for Masked Molecular Diffusion
Title: Learning Flexible Forward Trajectories for Masked Molecular Diffusion | Flexible Forward-Trajektorien für maskierte molekulare Diffusion lernen | 蒙面分子扩散学习灵活前向轨迹 2505.16790v2 |
Authors: Hyunjin Seo, Taewon Kim, Sihyun Yu, SungSoo Ahn
Masked diffusion models (MDMs) have achieved notable progress in modeling discrete data, while their potential in molecular generation remains underexplored. In this work, we explore their potential and introduce the surprising result that naively applying standards MDMs severely degrades the performance. We identify the critical cause of this issue as a state-clashing problem-where the forward diffusion of distinct molecules collapse into a common state, resulting in a mixture of reconstruction targets that cannot be learned using typical reverse diffusion process with unimodal predictions. To mitigate this, we propose Masked Element-wise Learnable Diffusion (MELD) that orchestrates per-element corruption trajectories to avoid collision between distinct molecular graphs. This is achieved through a parameterized noise scheduling network that assigns distinct corruption rates to individual graph elements, i.e., atoms and bonds. Extensive experiments on diverse molecular benchmarks reveal that MELD markedly enhances overall generation quality compared to element-agnostic noise scheduling, increasing the chemical validity of vanilla MDMs on ZINC250K from 15% to 93%, Furthermore, it achieves state-of-the-art property alignment in conditional generation tasks.
nan
Article 1620
Title@2025-05-25 (7): Statistical Collusion by Collectives on Learning Platforms
Title: Statistical Collusion by Collectives on Learning Platforms | Statistische Kollusion von Kollektiven über Lernplattformen | 学习平台集体统计协作 2502.04879v3 |
Authors: Etienne Gauthier, Francis Bach, Michael I. Jordan
As platforms increasingly rely on learning algorithms, collectives may form and seek ways to influence these platforms to align with their own interests. This can be achieved by coordinated submission of altered data. To evaluate the potential impact of such behavior, it is essential to understand the computations that collectives must perform to impact platforms in this way. In particular, collectives need to make a priori assessments of the effect of the collective before taking action, as they may face potential risks when modifying their data. Moreover they need to develop implementable coordination algorithms based on quantities that can be inferred from observed data. We develop a framework that provides a theoretical and algorithmic treatment of these issues and present experimental results in a product evaluation domain.
nan
Article 1621
Title@2025-05-25 (7): Imitation Learning via Focused Satisficing
Title: Imitation Learning via Focused Satisficing | Imitation Learning via Focused Satisficing | 通过有重点的满意度学习模拟学习 2505.14820v2 |
Authors: Rushit N. Shah, Nikolaos Agadakos, Synthia Sasulski, Ali Farajzadeh, Sanjiban Choudhury, Brian Ziebart
Imitation learning often assumes that demonstrations are close to optimal according to some fixed, but unknown, cost function. However, according to satisficing theory, humans often choose acceptable behavior based on their personal (and potentially dynamic) levels of aspiration, rather than achieving (near-) optimality. For example, a lunar lander demonstration that successfully lands without crashing might be acceptable to a novice despite being slow or jerky. Using a margin-based objective to guide deep reinforcement learning, our focused satisficing approach to imitation learning seeks a policy that surpasses the demonstrator’s aspiration levels – defined over trajectories or portions of trajectories – on unseen demonstrations without explicitly learning those aspirations. We show experimentally that this focuses the policy to imitate the highest quality (portions of) demonstrations better than existing imitation learning methods, providing much higher rates of guaranteed acceptability to the demonstrator, and competitive true returns on a range of environments.
nan
Article 1622
Title@2025-05-25 (7): CLEVER: A Curated Benchmark for Formally Verified Code Generation
Title: CLEVER: A Curated Benchmark for Formally Verified Code Generation | CLEVER: Ein kuratierter Benchmark für die formal verifizierte Codegenerierung | 正式核实的代码生成基准 2505.13938v3 |
Authors: Amitayush Thakur, Jasper Lee, George Tsoukalas, Meghana Sistla, Matthew Zhao, Stefan Zetzsche, Greg Durrett, Yisong Yue, Swarat Chaudhuri
We introduce ${\rm C{\small LEVER}}$, a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a specification that matches a held-out ground-truth specification, and (2) the task of generating a Lean implementation that provably satisfies this specification. Unlike prior benchmarks, ${\rm C{\small LEVER}}$ avoids test-case supervision, LLM-generated annotations, and specifications that leak implementation logic or allow vacuous solutions. All outputs are verified post-hoc using Lean’s type checker to ensure machine-checkable correctness. We use ${\rm C{\small LEVER}}$ to evaluate several few-shot and agentic approaches based on state-of-the-art language models. These methods all struggle to achieve full verification, establishing it as a challenging frontier benchmark for program synthesis and formal reasoning. Our benchmark can be found on GitHub(https://github.com/trishullab/clever) as well as HuggingFace(https://huggingface.co/datasets/amitayusht/clever). All our evaluation code is also available online(https://github.com/trishullab/clever-prover).
nan
Article 1623
Title@2025-05-25 (7): Scalarisation-based risk concepts for robust multi-objective optimisation
Title: Scalarisation-based risk concepts for robust multi-objective optimisation | Scalarisierungsbasierte Risikokonzepte für eine robuste multiobjektive Optimierung | 实现稳健的多目标优化的以尺度化为基础的风险风险概念 2405.10221v4 |
Authors: Ben Tu, Nikolas Kantas, Robert M. Lee, Behrang Shafei
Robust optimisation is a well-established framework for optimising functions in the presence of uncertainty. The inherent goal of this problem is to identify a collection of inputs whose outputs are both desirable for the decision maker, whilst also being robust to the underlying uncertainties in the problem. In this work, we study the multi-objective case of this problem. We identify that the majority of all robust multi-objective algorithms rely on two key operations: robustification and scalarisation. Robustification refers to the strategy that is used to account for the uncertainty in the problem. Scalarisation refers to the procedure that is used to encode the relative importance of each objective to a scalar-valued reward. As these operations are not necessarily commutative, the order that they are performed in has an impact on the resulting solutions that are identified and the final decisions that are made. The purpose of this work is to give a thorough exposition on the effects of these different orderings and in particular highlight when one should opt for one ordering over the other. As part of our analysis, we showcase how many existing risk concepts can be integrated into the specification and solution of a robust multi-objective optimisation problem. Besides this, we also demonstrate how one can principally define the notion of a robust Pareto front and a robust performance metric based on our ``robustify and scalarise’’ methodology. To illustrate the efficacy of these new ideas, we present two insightful case studies which are based on real-world data sets.
nan
Article 1624
Title@2025-05-25 (7): Dynamic Angle Selection in X-Ray CT: A Reinforcement Learning Approach to Optimal Stopping
Title: Dynamic Angle Selection in X-Ray CT: A Reinforcement Learning Approach to Optimal Stopping | Dynamische Winkelauswahl in X-Ray CT: Ein verstärkten Lernansatz zum optimalen Stoppen | X- Ray CT: 优化停止的强化学习方法 2503.12688v2 |
Authors: Tianyuan Wang, Felix Lucka, Daniël M. Pelt, K. Joost Batenburg, Tristan van Leeuwen
In industrial X-ray Computed Tomography (CT), the need for rapid in-line inspection is critical. Sparse-angle tomography plays a significant role in this by reducing the required number of projections, thereby accelerating processing and conserving resources. Most existing methods aim to balance reconstruction quality and scanning time, typically relying on fixed scan durations. Adaptive adjustment of the number of angles is essential; for instance, more angles may be required for objects with complex geometries or noisier projections. The concept of optimal stopping, which dynamically adjusts this balance according to varying industrial needs, remains overlooked. Building on our previous work, we integrate optimal stopping into sequential Optimal Experimental Design (sOED) and Reinforcement Learning (RL). We propose a novel method for computing the policy gradient within the Actor-Critic framework, enabling the development of adaptive policies for informative angle selection and scan termination. Additionally, we investigate the gap between simulation and real-world applications in the context of the developed learning-based method. Our trained model, developed using synthetic data, demonstrates reliable performance when applied to experimental X-ray CT data. This approach enhances the flexibility of CT operations and expands the applicability of sparse-angle tomography in industrial settings.
nan
Article 1625
Title@2025-05-25 (7): Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Title: Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More | Sprachmodelle, Graph Searching und Überwachung Ehebruch: Wenn mehr Aufsicht weniger ist und wie man mehr macht | 语言模式、图图搜索和监督通配:越少越少监督,如何做越多 2503.10542v3 |
Authors: Arvid Frydenlund
This work concerns the path-star task, a minimal example of searching over a graph. The graph, $G$, is star-shaped with $D$ arms radiating from a start node, $s$. A language model (LM) is given $G$, $s$, and a target node $t$, which ends one of the arms and is tasked with generating the arm containing $t$. The minimal nature of this task means only a single choice needs to be made: which of the $D$ arms contains $t$? Decoder-only LMs fail to solve this elementary task above $1/D$ chance due to a learned shortcut that absorbs training supervision. We show how this pathology is caused by excess supervision and we present a series of solutions demonstrating that the task is solvable via decoder-only LMs. We find that the task’s minimal nature causes its difficulty, as it prevents task decomposition. Our solutions provide insight into the pathology and its implications for LMs trained via next-token prediction.
nan
Article 1626
Title@2025-05-25 (7): Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law
Title: Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law | Skalierungsgesetze für gradienten Abstieg und Zeichenabstieg für lineare Bigram-Modelle unter Zipf’s Gesetz | 齐普夫法下线形大梁模型的渐渐后裔和信号后裔法律扩大法 2505.19227v1 |
Authors: Frederik Kunstner, Francis Bach
Recent works have highlighted optimization difficulties faced by gradient descent in training the first and last layers of transformer-based language models, which are overcome by optimizers such as Adam. These works suggest that the difficulty is linked to the heavy-tailed distribution of words in text data, where the frequency of the $k$th most frequent word $\pi_k$ is proportional to $1/k$, following Zipf’s law. To better understand the impact of the data distribution on training performance, we study a linear bigram model for next-token prediction when the tokens follow a power law $\pi_k \propto 1/k^\alpha$ parameterized by the exponent $\alpha > 0$. We derive optimization scaling laws for deterministic gradient descent and sign descent as a proxy for Adam as a function of the exponent $\alpha$. Existing theoretical investigations in scaling laws assume that the eigenvalues of the data decay as a power law with exponent $\alpha > 1$. This assumption effectively makes the problem finite dimensional'' as most of the loss comes from a few of the largest eigencomponents. In comparison, we show that the problem is more difficult when the data have heavier tails. The case $\alpha = 1$ as found in text data is
worst-case’’ for gradient descent, in that the number of iterations required to reach a small relative error scales almost linearly with dimension. While the performance of sign descent also depends on the dimension, for Zipf-distributed data the number of iterations scales only with the square-root of the dimension, leading to a large improvement for large vocabularies.
nan
Article 1627
Title@2025-05-25 (7): LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Title: LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | LLaDA 1.5: Varianzreduzierte Preference-Optimierung für große Sprachdiffusionsmodelle | LLADA 1.5:大语言传播模式差异-减少优惠 2505.19223v1 |
Authors: Fengqi Zhu, Rongzhen Wang, Shen Nie, Xiaolu Zhang, Chunwei Wu, Jun Hu, Jun Zhou, Jianfei Chen, Yankai Lin, Ji-Rong Wen, Chongxuan Li
While Masked Diffusion Models (MDMs), such as LLaDA, present a promising paradigm for language modeling, there has been relatively little effort in aligning these models with human preferences via reinforcement learning. The challenge primarily arises from the high variance in Evidence Lower Bound (ELBO)-based likelihood estimates required for preference optimization. To address this issue, we propose Variance-Reduced Preference Optimization (VRPO), a framework that formally analyzes the variance of ELBO estimators and derives bounds on both the bias and variance of preference optimization gradients. Building on this theoretical foundation, we introduce unbiased variance reduction strategies, including optimal Monte Carlo budget allocation and antithetic sampling, that significantly improve the performance of MDM alignment. We demonstrate the effectiveness of VRPO by applying it to LLaDA, and the resulting model, LLaDA 1.5, outperforms its SFT-only predecessor consistently and significantly across mathematical (GSM8K +4.7), code (HumanEval +3.0, MBPP +1.8), and alignment benchmarks (IFEval +4.0, Arena-Hard +4.3). Furthermore, LLaDA 1.5 demonstrates a highly competitive mathematical performance compared to strong language MDMs and ARMs. Project page: https://ml-gsai.github.io/LLaDA-1.5-Demo/.
nan
Article 1628
Title@2025-05-25 (7): A Novel Transformer-Based Self-Supervised Learning Method to Enhance Photoplethysmogram Signal Artifact Detection
Title: A Novel Transformer-Based Self-Supervised Learning Method to Enhance Photoplethysmogram Signal Artifact Detection | Eine neuartige, auf Transformer basierende, selbstüberwachte Lernmethode zur Verbesserung der Photoplethysmogramm-Signal-Artefakt-Erkennung | 一种基于新颖变形器的以自我监督为基础的学习方法,用以加强光膜成像信号异形探测 2401.01013v2 |
Authors: Thanh-Dung Le, Clara Macabiau, Kévin Albert, Philippe Jouvet, Rita Noumeir
Recent research at CHU Sainte Justine’s Pediatric Critical Care Unit (PICU) has revealed that traditional machine learning methods, such as semi-supervised label propagation and K-nearest neighbors, outperform Transformer-based models in artifact detection from PPG signals, mainly when data is limited. This study addresses the underutilization of abundant unlabeled data by employing self-supervised learning (SSL) to extract latent features from these data, followed by fine-tuning on labeled data. Our experiments demonstrate that SSL significantly enhances the Transformer model’s ability to learn representations, improving its robustness in artifact classification tasks. Among various SSL techniques, including masking, contrastive learning, and DINO (self-distillation with no labels)-contrastive learning exhibited the most stable and superior performance in small PPG datasets. Further, we delve into optimizing contrastive loss functions, which are crucial for contrastive SSL. Inspired by InfoNCE, we introduce a novel contrastive loss function that facilitates smoother training and better convergence, thereby enhancing performance in artifact classification. In summary, this study establishes the efficacy of SSL in leveraging unlabeled data, particularly in enhancing the capabilities of the Transformer model. This approach holds promise for broader applications in PICU environments, where annotated data is often limited.
nan
Article 1629
Title@2025-05-25 (7): Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding
Title: Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding | Where Paths Collide: Eine umfassende Untersuchung der klassischen und lernbasierten multi-agenten Pathfinding | 路径相撞之处:对经典和以学习为基础的多方代理调查的全面调查 2505.19219v1 |
Authors: Shiyue Wang, Haozheng Xu, Yuhan Zhang, Jingran Lin, Changhong Lu, Xiangfeng Wang, Wenhao Li
Multi-Agent Path Finding (MAPF) is a fundamental problem in artificial intelligence and robotics, requiring the computation of collision-free paths for multiple agents navigating from their start locations to designated goals. As autonomous systems become increasingly prevalent in warehouses, urban transportation, and other complex environments, MAPF has evolved from a theoretical challenge to a critical enabler of real-world multi-robot coordination. This comprehensive survey bridges the long-standing divide between classical algorithmic approaches and emerging learning-based methods in MAPF research. We present a unified framework that encompasses search-based methods (including Conflict-Based Search, Priority-Based Search, and Large Neighborhood Search), compilation-based approaches (SAT, SMT, CSP, ASP, and MIP formulations), and data-driven techniques (reinforcement learning, supervised learning, and hybrid strategies). Through systematic analysis of experimental practices across 200+ papers, we uncover significant disparities in evaluation methodologies, with classical methods typically tested on larger-scale instances (up to 200 by 200 grids with 1000+ agents) compared to learning-based approaches (predominantly 10-100 agents). We provide a comprehensive taxonomy of evaluation metrics, environment types, and baseline selections, highlighting the need for standardized benchmarking protocols. Finally, we outline promising future directions including mixed-motive MAPF with game-theoretic considerations, language-grounded planning with large language models, and neural solver architectures that combine the rigor of classical methods with the flexibility of deep learning. This survey serves as both a comprehensive reference for researchers and a practical guide for deploying MAPF solutions in increasingly complex real-world applications.
nan
Article 1630
Title@2025-05-25 (7): Clustering by Nonparametric Smoothing
Title: Clustering by Nonparametric Smoothing | Clustering durch nichtparametrisches Glätten | 以非参数平滑为群集 2503.09134v2 |
Authors: David P. Hofmeyr
A novel formulation of the clustering problem is introduced in which the task is expressed as an estimation problem, where the object to be estimated is a function which maps a point to its distribution of cluster membership. Unlike existing approaches which implicitly estimate such a function, like Gaussian Mixture Models (GMMs), the proposed approach bypasses any explicit modelling assumptions and exploits the flexible estimation potential of nonparametric smoothing. An intuitive approach for selecting the tuning parameters governing estimation is provided, which allows the proposed method to automatically determine both an appropriate level of flexibility and also the number of clusters to extract from a given data set. Experiments on a large collection of publicly available data sets are used to document the strong performance of the proposed approach, in comparison with relevant benchmarks from the literature. R code to implement the proposed approach is available from https://github.com/DavidHofmeyr/ CNS
nan
Article 1631
Title@2025-05-25 (7): Symmetries in Overparametrized Neural Networks: A Mean-Field View
Title: Symmetries in Overparametrized Neural Networks: A Mean-Field View | Symmetrien in überparametrisierten Neuralen Netzwerken: Eine Mittelfeldansicht | 过度对称的神经神经网络的对称性:平均实地观点 2405.19995v3 |
Authors: Javier Maass, Joaquin Fontbona
We develop a Mean-Field (MF) view of the learning dynamics of overparametrized Artificial Neural Networks (NN) under data symmetric in law wrt the action of a general compact group $G$. We consider for this a class of generalized shallow NNs given by an ensemble of $N$ multi-layer units, jointly trained using stochastic gradient descent (SGD) and possibly symmetry-leveraging (SL) techniques, such as Data Augmentation (DA), Feature Averaging (FA) or Equivariant Architectures (EA). We introduce the notions of weakly and strongly invariant laws (WI and SI) on the parameter space of each single unit, corresponding, respectively, to $G$-invariant distributions, and to distributions supported on parameters fixed by the group action (which encode EA). This allows us to define symmetric models compatible with taking $N\to\infty$ and give an interpretation of the asymptotic dynamics of DA, FA and EA in terms of Wasserstein Gradient Flows describing their MF limits. When activations respect the group action, we show that, for symmetric data, DA, FA and freely-trained models obey the exact same MF dynamic, which stays in the space of WI laws and minimizes therein the population risk. We also give a counterexample to the general attainability of an optimum over SI laws. Despite this, quite remarkably, we show that the set of SI laws is also preserved by the MF dynamics even when freely trained. This sharply contrasts the finite-$N$ setting, in which EAs are generally not preserved by unconstrained SGD. We illustrate the validity of our findings as $N$ gets larger in a teacher-student experimental setting, training a student NN to learn from a WI, SI or arbitrary teacher model through various SL schemes. We last deduce a data-driven heuristic to discover the largest subspace of parameters supporting SI distributions for a problem, that could be used for designing EA with minimal generalization error.
nan
Article 1632
Title@2025-05-25 (7): Adaptive Cyclic Diffusion for Inference Scaling
Title: Adaptive Cyclic Diffusion for Inference Scaling | Adaptive zyklische Diffusion zur Inferenzskalierung | 用于推断力缩放的适应性二次循环传播 2505.14036v2 |
Authors: Gyubin Lee, Truong Nhat Nguyen Bao, Jaesik Yoon, Dongwoo Lee, Minsu Kim, Yoshua Bengio, Sungjin Ahn
Diffusion models have demonstrated strong generative capabilities across domains ranging from image synthesis to complex reasoning tasks. However, most inference-time scaling methods rely on fixed denoising schedules, limiting their ability to allocate computation based on instance difficulty or task-specific demands adaptively. We introduce the challenge of adaptive inference-time scaling-dynamically adjusting computational effort during inference-and propose Adaptive Bi-directional Cyclic Diffusion (ABCD), a flexible, search-based inference framework. ABCD refines outputs through bi-directional diffusion cycles while adaptively controlling exploration depth and termination. It comprises three components: Cyclic Diffusion Search, Automatic Exploration-Exploitation Balancing, and Adaptive Thinking Time. Experiments show that ABCD improves performance across diverse tasks while maintaining computational efficiency.
nan
Article 1633
Title@2025-05-25 (7): SpeakStream: Streaming Text-to-Speech with Interleaved Data
Title: SpeakStream: Streaming Text-to-Speech with Interleaved Data | SpeakStream: Streaming von Text-zu-Speech mit interleaved Daten | 语音Stream:用断开数据流流流文本到语音 2505.19206v1 |
Authors: Richard He Bai, Zijin Gu, Tatiana Likhomanenko, Navdeep Jaitly
The latency bottleneck of traditional text-to-speech (TTS) systems fundamentally hinders the potential of streaming large language models (LLMs) in conversational AI. These TTS systems, typically trained and inferenced on complete utterances, introduce unacceptable delays, even with optimized inference speeds, when coupled with streaming LLM outputs. This is particularly problematic for creating responsive conversational agents where low first-token latency is critical. In this paper, we present SpeakStream, a streaming TTS system that generates audio incrementally from streaming text using a decoder-only architecture. SpeakStream is trained using a next-step prediction loss on interleaved text-speech data. During inference, it generates speech incrementally while absorbing streaming input text, making it particularly suitable for cascaded conversational AI agents where an LLM streams text to a TTS system. Our experiments demonstrate that SpeakStream achieves state-of-the-art latency results in terms of first-token latency while maintaining the quality of non-streaming TTS systems.
nan
Article 1634
Title@2025-05-25 (7): Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
Title: Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety | Benign Proben Materie! Feinabstimmung auf Aussergewöhnliche Benign Proben stark bricht Sicherheit | 重大事件 重大事件 重大事件 安全 重大事件 重大事件 重大事件 重大事件 重大事件 2505.06843v2 |
Authors: Zihan Guan, Mengxuan Hu, Ronghang Zhu, Sheng Li, Anil Vullikanti
Recent studies have uncovered a troubling vulnerability in the fine-tuning stage of large language models (LLMs): even fine-tuning on entirely benign datasets can lead to a significant increase in the harmfulness of LLM outputs. Building on this finding, our red teaming study takes this threat one step further by developing a more effective attack. Specifically, we analyze and identify samples within benign datasets that contribute most to safety degradation, then fine-tune LLMs exclusively on these samples. We approach this problem from an outlier detection perspective and propose Self-Inf-N, to detect and extract outliers for fine-tuning. Our findings reveal that fine-tuning LLMs on 100 outlier samples selected by Self-Inf-N in the benign datasets severely compromises LLM safety alignment. Extensive experiments across seven mainstream LLMs demonstrate that our attack exhibits high transferability across different architectures and remains effective in practical scenarios. Alarmingly, our results indicate that most existing mitigation strategies fail to defend against this attack, underscoring the urgent need for more robust alignment safeguards. Codes are available at https://github.com/GuanZihan/Benign-Samples-Matter.
nan
Article 1635
Title@2025-05-25 (7): FedGuCci: Making Local Models More Connected in Landscape for Federated Learning
Title: FedGuCci: Making Local Models More Connected in Landscape for Federated Learning | FedGuCci: Lokale Modelle in der Landschaft für das Federated Learning stärker miteinander verbunden | FedGuCci:使地方模型在全局景观中更紧密地连接起来,促进联邦学习 2402.18949v3 |
Authors: Zexi Li, Jie Lin, Zhiqi Li, Didi Zhu, Tao Shen, Tao Lin, Chao Wu, Nicholas D. Lane
Federated learning (FL) involves multiple heterogeneous clients collaboratively training a global model via iterative local updates and model fusion. The generalization of FL’s global model has a large gap compared with centralized training, which is its bottleneck for broader applications. In this paper, we study and improve FL’s generalization through a fundamental connectivity'' perspective, which means how the local models are connected in the parameter region and fused into a generalized global model. The term
connectivity’’ is derived from linear mode connectivity (LMC), studying the interpolated loss landscape of two different solutions (e.g., modes) of neural networks. Bridging the gap between LMC and FL, in this paper, we leverage fixed anchor models to empirically and theoretically study the transitivity property of connectivity from two models (LMC) to a group of models (model fusion in FL). Based on the findings, we propose FedGuCci(+), improving group connectivity for better generalization. It is shown that our methods can boost the generalization of FL under client heterogeneity across various tasks (4 CV datasets and 6 NLP datasets) and model architectures (e.g., ViTs and PLMs). The code is available here: \href{https://github.com/ZexiLee/fedgucci}{\faGithub~FedGuCci Codebase}.
nan
Article 1636
Title@2025-05-25 (7): iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use
Title: iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use | iTool: Verstärkte Feinsteuerung mit dynamischer Kalibrierung bei fortgeschrittenem Werkzeugeinsatz | i Tool:加强先进工具使用动态缺乏度校准的精细测试 2501.09766v4 |
Authors: Yirong Zeng, Xiao Ding, Yuxian Wang, Weiwen Liu, Wu Ning, Yutai Hou, Xu Huang, Bing Qin, Ting Liu
Augmenting large language models (LLMs) with external tools is a promising approach to enhance their capabilities, especially for complex tasks. Synthesizing tool-use data through real-world simulations is an effective way to achieve this. However, our investigation reveals that training gains significantly decay as synthetic data increases. The model struggles to benefit from more synthetic data, and it can not equip the model with advanced tool-use capabilities in complex scenarios. Moreover, we discovered that the above limitation usually manifests as a fragment deficiency (i.e., parameter errors) in response. To this end, we propose an iterative reinforced fine-tuning strategy designed to alleviate this limitation. This strategy involves: (1) enhancing the diversity of response for synthetic data through path exploration of Monte Carlo Tree Search. (2) iteratively pinpointing the model’s deficiency by constructing fine-grained preference pairs, and then improving it by preference optimization algorithms for targeted improvement. The experiments show that our method achieves 13.11% better performance than the same-size base model. It achieves an improvement of 6.5% in complex scenarios compared to the baseline, and it also outperforms larger open-source and closed-source models.
nan
Article 1637
Title@2025-05-25 (7): Diffusion Instruction Tuning
Title: Diffusion Instruction Tuning | Diffusions-Anleitung Tuning | 传播指示图 2502.06814v2 |
Authors: Chen Jin, Ryutaro Tanno, Amrutha Saseendran, Tom Diethe, Philip Teare
We introduce Lavender, a simple supervised fine-tuning (SFT) method that boosts the performance of advanced vision-language models (VLMs) by leveraging state-of-the-art image generation models such as Stable Diffusion. Specifically, Lavender aligns the text-vision attention in the VLM transformer with the equivalent used by Stable Diffusion during SFT, instead of adapting separate encoders. This alignment enriches the model’s visual understanding and significantly boosts performance across in- and out-of-distribution tasks. Lavender requires just 0.13 million training examples, 2.5% of typical large-scale SFT datasets, and fine-tunes on standard hardware (8 GPUs) in a single day. It consistently improves state-of-the-art open-source multimodal LLMs (e.g., Llama-3.2-11B, MiniCPM-Llama3-v2.5), achieving up to 30% gains and a 68% boost on challenging out-of-distribution medical QA tasks. By efficiently transferring the visual expertise of image generators with minimal supervision, Lavender offers a scalable solution for more accurate vision-language systems. All code, training data, and models will be shared at https://astrazeneca.github.io/vlm/.
nan
Article 1638
Title@2025-05-25 (7): Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation
Title: Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation | Krümmung Dynamischer Black-Box-Angriff: Wiederherstellung der gegnerischen Robustheit durch dynamische Krümmungsschätzung | 曲线 动态黑盒攻击: 通过动态曲线估计, 重新审视对抗性对称稳健性 2505.19194v1 |
Authors: Peiran Sun
Adversarial attack reveals the vulnerability of deep learning models. For about a decade, countless attack and defense methods have been proposed, leading to robustified classifiers and better understanding of models. Among these methods, curvature-based approaches have attracted attention because it is assumed that high curvature may give rise to rough decision boundary. However, the most commonly used \textit{curvature} is the curvature of loss function, scores or other parameters from within the model as opposed to decision boundary curvature, since the former can be relatively easily formed using second order derivative. In this paper, we propose a new query-efficient method, dynamic curvature estimation(DCE), to estimate the decision boundary curvature in a black-box setting. Our approach is based on CGBA, a black-box adversarial attack. By performing DCE on a wide range of classifiers, we discovered, statistically, a connection between decision boundary curvature and adversarial robustness. We also propose a new attack method, curvature dynamic black-box attack(CDBA) with improved performance using the dynamically estimated curvature.
nan
Article 1639
Title@2025-05-25 (7): Interpretable Graph Learning Over Sets of Temporally-Sparse Data
Title: Interpretable Graph Learning Over Sets of Temporally-Sparse Data | Interpretable Graph Learning Over Sets von temporär-Spardaten | 一组暂时分隔数据上的解释性图表学习 2505.19193v1 |
Authors: Andrea Zerio, Maya Bechler-Speicher, Maor Huri, Marie Vibeke Vestergaard, Ran Gilad-Bachrach, Tine Jess, Samir Bhatt, Aleksejs Sazonovs
Real-world medical data often includes measurements from multiple signals that are collected at irregular and asynchronous time intervals. For example, different types of blood tests can be measured at different times and frequencies, resulting in fragmented and unevenly scattered temporal data. Similar issues of irregular sampling of different attributes occur in other domains, such as monitoring of large systems using event log files or the spread of fake news on social networks. Effectively learning from such data requires models that can handle sets of temporally sparse and heterogeneous signals. In this paper, we propose Graph Mixing Additive Networks (GMAN), a novel and interpretable-by-design model for learning over irregular sets of temporal signals. Our method achieves state-of-the-art performance in real-world medical tasks, including a 4-point increase in the AUROC score of in-hospital mortality prediction, compared to existing methods. We further showcase GMAN’s flexibility by applying it to a fake news detection task. We demonstrate how its interpretability capabilities, including node-level, graph-level, and subset-level importance, allow for transition phases detection and gaining medical insights with real-world high-stakes implications. Finally, we provide theoretical insights on GMAN expressive power.
nan
Article 1640
Title@2025-05-25 (7): I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts
Title: I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | I2MoE:可解释的多式多式互动意识混合企业专家 2505.19190v1 |
Authors: Jiayi Xin, Sukwon Yun, Jie Peng, Inyoung Choi, Jenna L. Ballard, Tianlong Chen, Qi Long
Modality fusion is a cornerstone of multimodal learning, enabling information integration from diverse data sources. However, vanilla fusion methods are limited by (1) inability to account for heterogeneous interactions between modalities and (2) lack of interpretability in uncovering the multimodal interactions inherent in the data. To this end, we propose I2MoE (Interpretable Multimodal Interaction-aware Mixture of Experts), an end-to-end MoE framework designed to enhance modality fusion by explicitly modeling diverse multimodal interactions, as well as providing interpretation on a local and global level. First, I2MoE utilizes different interaction experts with weakly supervised interaction losses to learn multimodal interactions in a data-driven way. Second, I2MoE deploys a reweighting model that assigns importance scores for the output of each interaction expert, which offers sample-level and dataset-level interpretation. Extensive evaluation of medical and general multimodal datasets shows that I2MoE is flexible enough to be combined with different fusion techniques, consistently improves task performance, and provides interpretation across various real-world scenarios. Code is available at https://github.com/Raina-Xin/I2MoE.
nan
Article 1641
Title@2025-05-25 (7): Chordless Structure: A Pathway to Simple and Expressive GNNs
Title: Chordless Structure: A Pathway to Simple and Expressive GNNs | Chordless Structure: Ein Weg zu einfachen und expressiven GNNs | 无字结构:通往简单和表达性全球NNN的路径 2505.19188v1 |
Authors: Hongxu Pan, Shuxian Hu, Mo Zhou, Zhibin Wang, Rong Gu, Chen Tian, Kun Yang, Sheng Zhong
Researchers have proposed various methods of incorporating more structured information into the design of Graph Neural Networks (GNNs) to enhance their expressiveness. However, these methods are either computationally expensive or lacking in provable expressiveness. In this paper, we observe that the chords increase the complexity of the graph structure while contributing little useful information in many cases. In contrast, chordless structures are more efficient and effective for representing the graph. Therefore, when leveraging the information of cycles, we choose to omit the chords. Accordingly, we propose a Chordless Structure-based Graph Neural Network (CSGNN) and prove that its expressiveness is strictly more powerful than the k-hop GNN (KPGNN) with polynomial complexity. Experimental results on real-world datasets demonstrate that CSGNN outperforms existing GNNs across various graph tasks while incurring lower computational costs and achieving better performance than the GNNs of 3-WL expressiveness.
nan
Article 1642
Title@2025-05-25 (7): Heterogeneous networks in drug-target interaction prediction
Title: Heterogeneous networks in drug-target interaction prediction | Heterogene Netzwerke in der Vorhersage von Wechselwirkungen mit Drogenzielen | 药物目标相互作用预测中的不同类型网络 2504.16152v2 |
Authors: Mohammad Molaee, Nasrollah Moghadam Charkari, Foad Ghaderi
Drug discovery requires a tremendous amount of time and cost. Computational drug-target interaction prediction, a significant part of this process, can reduce these requirements by narrowing the search space for wet lab experiments. In this survey, we provide comprehensive details of graph machine learning-based methods in predicting drug-target interaction, as they have shown promising results in this field. These details include the overall framework, main contribution, datasets, and their source codes. The selected papers were mainly published from 2020 to 2024. Prior to discussing papers, we briefly introduce the datasets commonly used with these methods and measurements to assess their performance. Finally, future challenges and some crucial areas that need to be explored are discussed.
nan
Article 1643
Title@2025-05-25 (7): A Physics-preserved Transfer Learning Method for Differential Equations
Title: A Physics-preserved Transfer Learning Method for Differential Equations | Eine physikkonservierte Transfer-Lernmethode für Differentialgleichungen | 不同等分法的受物理保留转移学习方法 2505.01281v2 |
Authors: Hao-Ran Yang, Chuan-Xian Ren
While data-driven methods such as neural operator have achieved great success in solving differential equations (DEs), they suffer from domain shift problems caused by different learning environments (with data bias or equation changes), which can be alleviated by transfer learning (TL). However, existing TL methods adopted in DEs problems lack either generalizability in general DEs problems or physics preservation during training. In this work, we focus on a general transfer learning method that adaptively correct the domain shift and preserve physical information. Mathematically, we characterize the data domain as product distribution and the essential problems as distribution bias and operator bias. A Physics-preserved Optimal Tensor Transport (POTT) method that simultaneously admits generalizability to common DEs and physics preservation of specific problem is proposed to adapt the data-driven model to target domain utilizing the push-forward distribution induced by the POTT map. Extensive experiments demonstrate the superior performance, generalizability and physics preservation of the proposed POTT method.
nan
Article 1644
Title@2025-05-25 (7): CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization
Title: CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian Optimization | CAGES: Kostenbewusste Gradienten-Entropie Suche nach effizienter lokaler Multi-Fidelity Bayesian-Optimierung | CAGES: 成本-软件软件渐进式 Entropy 搜索以高效的本地多纤维贝叶斯优化 2405.07760v2 |
Authors: Wei-Ting Tang, Joel A. Paulson
Bayesian optimization (BO) is a popular approach for optimizing expensive-to-evaluate black-box objective functions. An important challenge in BO is its application to high-dimensional search spaces due in large part to the curse of dimensionality. One way to overcome this challenge is to focus on local BO methods that aim to efficiently learn gradients, which have shown strong empirical performance on high-dimensional problems including policy search in reinforcement learning (RL). Current local BO methods assume access to only a single high-fidelity information source whereas, in many problems, one has access to multiple cheaper approximations of the objective. We propose a novel algorithm, Cost-Aware Gradient Entropy Search (CAGES), for local BO of multi-fidelity black-box functions. CAGES makes no assumption about the relationship between different information sources, making it more flexible than other multi-fidelity methods. It also employs a new information-theoretic acquisition function, which enables systematic identification of samples that maximize the information gain about the unknown gradient per evaluation cost. We demonstrate CAGES can achieve significant performance improvements compared to other state-of-the-art methods on synthetic and benchmark RL problems.
nan
Article 1645
Title@2025-05-25 (7): Federated Learning: From Theory to Practice
Title: Federated Learning: From Theory to Practice | Föderiertes Lernen: Von der Theorie zur Praxis | 联邦学习:从理论到实践 2505.19183v1 |
Authors: A. Jung
This book offers a hands-on introduction to building and understanding federated learning (FL) systems. FL enables multiple devices – such as smartphones, sensors, or local computers – to collaboratively train machine learning (ML) models, while keeping their data private and local. It is a powerful solution when data cannot or should not be centralized due to privacy, regulatory, or technical reasons. The book is designed for students, engineers, and researchers who want to learn how to design scalable, privacy preserving FL systems. Our main focus is on personalization: enabling each device to train its own model while still benefiting from collaboration with relevant devices. This is achieved by leveraging similarities between (the learning tasks associated with) devices that are encoded by the weighted edges (or links) of a federated learning network (FL network). The key idea is to represent real-world FL systems as networks of devices, where nodes correspond to device and edges represent communication links and data similarities between them. The training of personalized models for these devices can be naturally framed as a distributed optimization problem. This optimization problem is referred to as generalized total variation minimization (GTVMin) and ensures that devices with similar learning tasks learn similar model parameters. Our approach is both mathematically principled and practically motivated. While we introduce some advanced ideas from optimization theory and graph-based learning, we aim to keep the book accessible. Readers are guided through the core ideas step by step, with intuitive explanations.
nan
Article 1646
Title@2025-05-25 (7): DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
Title: DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation | DiTAR: Diffusion Transformer Autoregressive Modellierung für Sprachgenerierung | DITAR: 发声的传播变异器自动递减模型 2502.03930v3 |
Authors: Dongya Jia, Zhuo Chen, Jiawei Chen, Chenpeng Du, Jian Wu, Jian Cong, Xiaobin Zhuang, Chumin Li, Zhen Wei, Yuping Wang, Yuxuan Wang
Several recent studies have attempted to autoregressively generate continuous speech representations without discrete speech tokens by combining diffusion and autoregressive models, yet they often face challenges with excessive computational loads or suboptimal outcomes. In this work, we propose Diffusion Transformer Autoregressive Modeling (DiTAR), a patch-based autoregressive framework combining a language model with a diffusion transformer. This approach significantly enhances the efficacy of autoregressive models for continuous tokens and reduces computational demands. DiTAR utilizes a divide-and-conquer strategy for patch generation, where the language model processes aggregated patch embeddings and the diffusion transformer subsequently generates the next patch based on the output of the language model. For inference, we propose defining temperature as the time point of introducing noise during the reverse diffusion ODE to balance diversity and determinism. We also show in the extensive scaling analysis that DiTAR has superb scalability. In zero-shot speech generation, DiTAR achieves state-of-the-art performance in robustness, speaker similarity, and naturalness.
nan
Article 1647
Title@2025-05-25 (7): Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
Title: Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees | Auf dem Weg zu Graph Foundation Models: Allgemeines Lernen über Graphen über Task-Trees | 走向图图基础模型:通过TLT-Trees对图的学习概观 2412.16441v3 |
Authors: Zehong Wang, Zheyuan Zhang, Tianyi Ma, Nitesh V Chawla, Chuxu Zhang, Yanfang Ye
Foundation models are pretrained on large-scale corpora to learn generalizable patterns across domains and tasks – such as contours, textures, and edges in images, or tokens and sentences in text. In contrast, discovering such generalities in graph-structured data, especially across heterogeneous graph tasks, remains an open challenge. To address this, we propose a novel approach to cross-task generalization in graphs via task-trees, which serve as unified learning instances aligning node-, edge-, and graph-level tasks. We theoretically analyze the stability, transferability, and generalization properties of task-trees, showing that pretraining a graph neural network (GNN) on diverse task-trees with a reconstruction objective induces transferable knowledge. This enables efficient adaptation to downstream tasks with minimal fine-tuning. To validate our framework, we introduce Graph Generality Identifier on Task-Trees (GIT), a graph foundation model that demonstrates strong performance on over 30 graphs across five domains via fine-tuning, in-context learning, and zero-shot generalization. Code and data are available at https://github.com/Zehong-Wang/GIT.
nan
Article 1648
Title@2025-05-25 (7): Nteasee: Understanding Needs in AI for Health in Africa – A Mixed-Methods Study of Expert and General Population Perspectives
Title: Nteasee: Understanding Needs in AI for Health in Africa – A Mixed-Methods Study of Expert and General Population Perspectives | Nteasee: Die Bedürfnisse von KI für die Gesundheit in Afrika verstehen – Eine gemischte Studie von Experten und allgemeinen Bevölkerungsperspektiven | Nteasee:了解大赦国际关于非洲保健的需要 – – 专家和一般人口观点混合方法研究 2409.12197v4 |
Authors: Mercy Nyamewaa Asiedu, Iskandar Haykel, Awa Dieng, Kerrie Kauer, Tousif Ahmed, Florence Ofori, Charisma Chan, Stephen Pfohl, Negar Rostamzadeh, Katherine Heller
Artificial Intelligence (AI) for health has the potential to significantly change and improve healthcare. However in most African countries, identifying culturally and contextually attuned approaches for deploying these solutions is not well understood. To bridge this gap, we conduct a qualitative study to investigate the best practices, fairness indicators, and potential biases to mitigate when deploying AI for health in African countries, as well as explore opportunities where artificial intelligence could make a positive impact in health. We used a mixed methods approach combining in-depth interviews (IDIs) and surveys. We conduct 1.5-2 hour long IDIs with 50 experts in health, policy, and AI across 17 countries, and through an inductive approach we conduct a qualitative thematic analysis on expert IDI responses. We administer a blinded 30-minute survey with case studies to 672 general population participants across 5 countries in Africa and analyze responses on quantitative scales, statistically comparing responses by country, age, gender, and level of familiarity with AI. We thematically summarize open-ended responses from surveys. Our results find generally positive attitudes, high levels of trust, accompanied by moderate levels of concern among general population participants for AI usage for health in Africa. This contrasts with expert responses, where major themes revolved around trust/mistrust, ethical concerns, and systemic barriers to integration, among others. This work presents the first-of-its-kind qualitative research study of the potential of AI for health in Africa from an algorithmic fairness angle, with perspectives from both experts and the general population. We hope that this work guides policymakers and drives home the need for further research and the inclusion of general population perspectives in decision-making around AI usage.
nan
Article 1649
Title@2025-05-25 (7): Beyond Message Passing: Neural Graph Pattern Machine
Title: Beyond Message Passing: Neural Graph Pattern Machine | Beyond Message Passing: Neural Graph Pattern Machine | 超过消息传递: 神经图样机 2501.18739v2 |
Authors: Zehong Wang, Zheyuan Zhang, Tianyi Ma, Nitesh V Chawla, Chuxu Zhang, Yanfang Ye
Graph learning tasks often hinge on identifying key substructure patterns – such as triadic closures in social networks or benzene rings in molecular graphs – that underpin downstream performance. However, most existing graph neural networks (GNNs) rely on message passing, which aggregates local neighborhood information iteratively and struggles to explicitly capture such fundamental motifs, like triangles, k-cliques, and rings. This limitation hinders both expressiveness and long-range dependency modeling. In this paper, we introduce the Neural Graph Pattern Machine (GPM), a novel framework that bypasses message passing by learning directly from graph substructures. GPM efficiently extracts, encodes, and prioritizes task-relevant graph patterns, offering greater expressivity and improved ability to capture long-range dependencies. Empirical evaluations across four standard tasks – node classification, link prediction, graph classification, and graph regression – demonstrate that GPM outperforms state-of-the-art baselines. Further analysis reveals that GPM exhibits strong out-of-distribution generalization, desirable scalability, and enhanced interpretability. Code and datasets are available at: https://github.com/Zehong-Wang/GPM.
nan
Article 1650
Title@2025-05-25 (7): Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli
Title: Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli | Saliency-guided Emotion Modeling: Vorhersage von Zuschauerreaktionen aus Video-Stimuli | 以色素为指导的情感建模:视频刺激的预测查看器反应 2505.19178v1 |
Authors: Akhila Yaragoppa, Siddharth
Understanding the emotional impact of videos is crucial for applications in content creation, advertising, and Human-Computer Interaction (HCI). Traditional affective computing methods rely on self-reported emotions, facial expression analysis, and biosensing data, yet they often overlook the role of visual saliency – the naturally attention-grabbing regions within a video. In this study, we utilize deep learning to introduce a novel saliency-based approach to emotion prediction by extracting two key features: saliency area and number of salient regions. Using the HD2S saliency model and OpenFace facial action unit analysis, we examine the relationship between video saliency and viewer emotions. Our findings reveal three key insights: (1) Videos with multiple salient regions tend to elicit high-valence, low-arousal emotions, (2) Videos with a single dominant salient region are more likely to induce low-valence, high-arousal responses, and (3) Self-reported emotions often misalign with facial expression-based emotion detection, suggesting limitations in subjective reporting. By leveraging saliency-driven insights, this work provides a computationally efficient and interpretable alternative for emotion modeling, with implications for content creation, personalized media experiences, and affective computing research.
nan
Article 1651
Title@2025-05-25 (7): Mixture of Lookup Experts
Title: Mixture of Lookup Experts | Mischung von Lookup-Experten | 查找专家混合 2503.15798v2 |
Authors: Shibo Jie, Yehui Tang, Kai Han, Yitong Li, Duyu Tang, Zhi-Hong Deng, Yunhe Wang
Mixture-of-Experts (MoE) activates only a subset of experts during inference, allowing the model to maintain low inference FLOPs and latency even as the parameter count scales up. However, since MoE dynamically selects the experts, all the experts need to be loaded into VRAM. Their large parameter size still limits deployment, and offloading, which load experts into VRAM only when needed, significantly increase inference latency. To address this, we propose Mixture of Lookup Experts (MoLE), a new MoE architecture that is efficient in both communication and VRAM usage. In MoLE, the experts are Feed-Forward Networks (FFNs) during training, taking the output of the embedding layer as input. Before inference, these experts can be re-parameterized as lookup tables (LUTs) that retrieves expert outputs based on input ids, and offloaded to storage devices. Therefore, we do not need to perform expert computations during inference. Instead, we directly retrieve the expert’s computation results based on input ids and load them into VRAM, and thus the resulting communication overhead is negligible. Experiments show that, with the same FLOPs and VRAM usage, MoLE achieves inference speeds comparable to dense models and significantly faster than MoE with experts offloading, while maintaining performance on par with MoE.
nan
Article 1652
Title@2025-05-25 (7): Computational Inertia as a Conserved Quantity in Frictionless and Damped Learning Dynamics
Title: Computational Inertia as a Conserved Quantity in Frictionless and Damped Learning Dynamics | Computational Inertia als konservierte Menge in friktionsloser und gedämpfter Lerndynamik | 计算无损和断裂学习动力学的计算因电量 2505.19171v1 |
Authors: Atahan Karagoz
We identify a conserved quantity in continuous-time optimization dynamics, termed computational inertia. Defined as the sum of kinetic energy (parameter velocity) and potential energy (loss), this scalar remains invariant under idealized, frictionless training. We formalize this conservation law, derive its analytic decay under damping and stochastic perturbations, and demonstrate its behavior in a synthetic system. The invariant offers a compact lens for interpreting learning trajectories, and may inform theoretical tools for analyzing convergence, stability, and training geometry.
nan
Article 1653
Title@2025-05-25 (7): JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models
Title: JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models | JEDI: Die Macht der Jensen-Shannon-Divergenz bei entwirrenden Diffusionsmodellen | JEDI: 詹森-夏农分解扩散模型的分解力量 2505.19166v1 |
Authors: Eric Tillmann Bill, Enis Simsar, Thomas Hofmann
We introduce JEDI, a test-time adaptation method that enhances subject separation and compositional alignment in diffusion models without requiring retraining or external supervision. JEDI operates by minimizing semantic entanglement in attention maps using a novel Jensen-Shannon divergence based objective. To improve efficiency, we leverage adversarial optimization, reducing the number of updating steps required. JEDI is model-agnostic and applicable to architectures such as Stable Diffusion 1.5 and 3.5, consistently improving prompt alignment and disentanglement in complex scenes. Additionally, JEDI provides a lightweight, CLIP-free disentanglement score derived from internal attention distributions, offering a principled benchmark for compositional alignment under test-time conditions. We will publicly release the implementation of our method.
nan
Article 1654
Title@2025-05-25 (7): CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Title: CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | CORAL: Lerne konsistente Repräsentationen über mehrstufiges Training mit leichterem spekulativen Entwurfer | CORAL: 利用轻型投机性起草者在多阶段培训中学习一致的代表性 2502.16880v3 |
Authors: Yepeng Weng, Dianwen Mei, Huishi Qiu, Xujie Chen, Li Liu, Jiang Tian, Zhongchao Shi
Speculative decoding is a powerful technique that accelerates Large Language Model (LLM) inference by leveraging a lightweight speculative draft model. However, existing designs suffers in performance due to misalignment between training and inference. Recent methods have tried to solve this issue by adopting a multi-step training strategy, but the complex inputs of different training steps make it harder for the draft model to converge. To address this, we propose CORAL, a novel framework that improves both accuracy and efficiency in speculative drafting. CORAL introduces Cross-Step Representation Alignment, a method that enhances consistency across multiple training steps, significantly improving speculative drafting performance. Additionally, we identify the LM head as a major bottleneck in the inference speed of the draft model. We introduce a weight-grouping mechanism that selectively activates a subset of LM head parameters during inference, substantially reducing the latency of the draft model. We evaluate CORAL on three LLM families and three benchmark datasets, achieving speedup ratios of 2.50x-4.07x, outperforming state-of-the-art methods such as EAGLE-2 and HASS. Our results demonstrate that CORAL effectively mitigates training-inference misalignment and delivers significant speedup for modern LLMs with large vocabularies.
nan
Article 1655
Title@2025-05-25 (7): Efficient Training of Multi-task Neural Solver for Combinatorial Optimization
Title: Efficient Training of Multi-task Neural Solver for Combinatorial Optimization | Effiziente Schulung von Multi-Task-Neural Solver zur kombinatorischen Optimierung | 综合优化多任务神经溶剂高效培训 2305.06361v5 |
Authors: Chenguang Wang, Zhang-Hua Fu, Pinyan Lu, Tianshu Yu
Efficiently training a multi-task neural solver for various combinatorial optimization problems (COPs) has been less studied so far. Naive application of conventional multi-task learning approaches often falls short in delivering a high-quality, unified neural solver. This deficiency primarily stems from the significant computational demands and a lack of adequate consideration for the complexities inherent in COPs. In this paper, we propose a general and efficient training paradigm to deliver a unified combinatorial multi-task neural solver. To this end, we resort to the theoretical loss decomposition for multiple tasks under an encoder-decoder framework, which enables more efficient training via proper bandit task-sampling algorithms through an intra-task influence matrix. By employing theoretically grounded approximations, our method significantly enhances overall performance, regardless of whether it is within constrained training budgets, across equivalent training epochs, or in terms of generalization capabilities, when compared to conventional training schedules. On the real-world datasets of TSPLib and CVRPLib, our method also achieved the best results compared to single task learning and multi-task learning approaches. Additionally, the influence matrix provides empirical evidence supporting common practices in the field of learning to optimize, further substantiating the effectiveness of our approach. Our code is open-sourced and available at https://github.com/LOGO-CUHKSZ/MTL-COP.
nan
Article 1656
Title@2025-05-25 (7): Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation
Title: Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation | Divide-Then-Aggregat: Eine effiziente Tool-Learning-Methode über parallele Tool-Invokation | 分离后生成工具:通过平行工具使用使用效率高的工具学习方法 2501.12432v2 |
Authors: Dongsheng Zhu, Weixian Shi, Zhengliang Shi, Zhaochun Ren, Shuaiqiang Wang, Lingyong Yan, Dawei Yin
Although current Large Language Models (LLMs) exhibit impressive capabilities, performing complex real-world tasks still requires tool learning. Mainstream methods, such as CoT/ReAct, rely on step-by-step tool invocation to interact with external environments, but they are limited in perceptual scope and lack adequate task-planning capability. To address these limitations, other studies introduce the first Search-based Decision Tree (DFSDT), which still suffers from the high computational cost. In this paper, we introduce a novel parallel tool invocation paradigm, DTA-Llama (Divide-Then-Aggregate Llama). First, we transform traditional tree-based tool search paths into Directed Acyclic Graph (DAG) structure, generating a high-quality parallel tool invocation dataset. The DTA-Llama is then trained on the dataset to learn to iteratively divide the current task into several parallel tool invocation sub-tasks and aggregate the invocation results to decide the next actions. Furthermore, we introduce an efficient inference framework inspired by the Process/Threads mechanism when applying the DTA-Llama to practical tasks. Experimental results show that our approach substantially enhances task performance while reducing token consumption and inference time. Llama2-7B, using our method, is comparable to the official parallel function calling method of GPT-3.5. The relevant code, dataset, and model weights are available at https://corn0205.github.io/
nan
Article 1657
Title@2025-05-25 (7): Mean-Shift Distillation for Diffusion Mode Seeking
Title: Mean-Shift Distillation for Diffusion Mode Seeking | Mean-Shift-Destillation für den Diffusionsmodus | 用于扩散模式搜索的中质蒸馏 2502.15989v2 |
Authors: Vikas Thamizharasan, Nikitas Chatzis, Iliyan Georgiev, Matthew Fisher, Evangelos Kalogerakis, Difan Liu, Nanxuan Zhao, Michal Lukac
We present mean-shift distillation, a novel diffusion distillation technique that provides a provably good proxy for the gradient of the diffusion output distribution. This is derived directly from mean-shift mode seeking on the distribution, and we show that its extrema are aligned with the modes. We further derive an efficient product distribution sampling procedure to evaluate the gradient. Our method is formulated as a drop-in replacement for score distillation sampling (SDS), requiring neither model retraining nor extensive modification of the sampling procedure. We show that it exhibits superior mode alignment as well as improved convergence in both synthetic and practical setups, yielding higher-fidelity results when applied to both text-to-image and text-to-3D applications with Stable Diffusion.
nan
Article 1658
Title@2025-05-25 (7): Do Large Language Models (Really) Need Statistical Foundations?
Title: Do Large Language Models (Really) Need Statistical Foundations? | Brauchen große Sprachmodelle (wirklich) statistische Grundlagen? | 大语言模式(真正)是否需要统计基础? 2505.19145v1 |
Authors: Weijie Su
Large language models (LLMs) represent a new paradigm for processing unstructured data, with applications across an unprecedented range of domains. In this paper, we address, through two arguments, whether the development and application of LLMs would genuinely benefit from foundational contributions from the statistics discipline. First, we argue affirmatively, beginning with the observation that LLMs are inherently statistical models due to their profound data dependency and stochastic generation processes, where statistical insights are naturally essential for handling variability and uncertainty. Second, we argue that the persistent black-box nature of LLMs – stemming from their immense scale, architectural complexity, and development practices often prioritizing empirical performance over theoretical interpretability – renders closed-form or purely mechanistic analyses generally intractable, thereby necessitating statistical approaches due to their flexibility and often demonstrated effectiveness. To substantiate these arguments, the paper outlines several research areas – including alignment, watermarking, uncertainty quantification, evaluation, and data mixture optimization – where statistical methodologies are critically needed and are already beginning to make valuable contributions. We conclude with a discussion suggesting that statistical research concerning LLMs will likely form a diverse ``mosaic’’ of specialized topics rather than deriving from a single unifying theory, and highlighting the importance of timely engagement by our statistics community in LLM research.
nan
Article 1659
Title@2025-05-25 (7): ADGSyn: Dual-Stream Learning for Efficient Anticancer Drug Synergy Prediction
Title: ADGSyn: Dual-Stream Learning for Efficient Anticancer Drug Synergy Prediction | ADGSyn: Dual-Stream-Lernen für effiziente Anti-Krebs-Arzneimittel-Synergie-Vorhersage | ADGSyn:双层学习促进高效抗癌药物协同效应预测 2505.19144v1 |
Authors: Yuxuan Nie, Yutong Song, Hong Peng
Drug combinations play a critical role in cancer therapy by significantly enhancing treatment efficacy and overcoming drug resistance. However, the combinatorial space of possible drug pairs grows exponentially, making experimental screening highly impractical. Therefore, developing efficient computational methods to predict promising drug combinations and guide experimental validation is of paramount importance. In this work, we propose ADGSyn, an innovative method for predicting drug synergy. The key components of our approach include: (1) shared projection matrices combined with attention mechanisms to enable cross-drug feature alignment; (2) automatic mixed precision (AMP)-optimized graph operations that reduce memory consumption by 40\% while accelerating training speed threefold; and (3) residual pathways stabilized by LayerNorm to ensure stable gradient propagation during training. Evaluated on the O’Neil dataset containing 13,243 drug–cell line combinations, ADGSyn demonstrates superior performance over eight baseline methods. Moreover, the framework supports full-batch processing of up to 256 molecular graphs on a single GPU, setting a new standard for efficiency in drug synergy prediction within the field of computational oncology.
nan
Article 1660
Title@2025-05-25 (7): AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
Title: AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning | AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering durch Verstärkungslernen | AdaCot:通过强化学习开拓探索的探索链 2505.11896v2 |
Authors: Chenwei Lou, Zewei Sun, Xinnian Liang, Meng Qu, Wei Shen, Wenqi Wang, Yuntao Li, Qingping Yang, Shuangzhi Wu
Large Language Models (LLMs) have demonstrated remarkable capabilities but often face challenges with tasks requiring sophisticated reasoning. While Chain-of-Thought (CoT) prompting significantly enhances reasoning, it indiscriminately generates lengthy reasoning steps for all queries, leading to substantial computational costs and inefficiency, especially for simpler inputs. To address this critical issue, we introduce AdaCoT (Adaptive Chain-of-Thought), a novel framework enabling LLMs to adaptively decide when to invoke CoT. AdaCoT framed adaptive reasoning as a Pareto optimization problem that seeks to balance model performance with the costs associated with CoT invocation (both frequency and computational overhead). We propose a reinforcement learning (RL) based method, specifically utilizing Proximal Policy Optimization (PPO), to dynamically control the CoT triggering decision boundary by adjusting penalty coefficients, thereby allowing the model to determine CoT necessity based on implicit query complexity. A key technical contribution is Selective Loss Masking (SLM), designed to counteract decision boundary collapse during multi-stage RL training, ensuring robust and stable adaptive triggering. Experimental results demonstrate that AdaCoT successfully navigates the Pareto frontier, achieving substantial reductions in CoT usage for queries not requiring elaborate reasoning. For instance, on our production traffic testset, AdaCoT reduced CoT triggering rates to as low as 3.18\% and decreased average response tokens by 69.06%, while maintaining high performance on complex tasks.
nan
Article 1661
Title@2025-05-25 (7): CER: Confidence Enhanced Reasoning in LLMs
Title: CER: Confidence Enhanced Reasoning in LLMs | CER: Vertrauen in LLMs gestärkte Vernunft | CER: LLM 中增强信任的理由 2502.14634v2 |
Authors: Ali Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah
Ensuring the reliability of Large Language Models (LLMs) in complex reasoning tasks remains a formidable challenge, particularly in scenarios that demand precise mathematical calculations and knowledge-intensive open-domain generation. In this work, we introduce an uncertainty-aware framework designed to enhance the accuracy of LLM responses by systematically incorporating model confidence at critical decision points. We propose an approach that encourages multi-step reasoning in LLMs and quantify the confidence of intermediate answers such as numerical results in mathematical reasoning and proper nouns in open-domain generation. Then, the overall confidence of each reasoning chain is evaluated based on confidence of these critical intermediate steps. Finally, we aggregate the answer of generated response paths in a way that reflects the reliability of each generated content (as opposed to self-consistency in which each generated chain contributes equally to majority voting). We conducted extensive experiments in five datasets, three mathematical datasets and two open-domain datasets, using four LLMs. The results consistently validate the effectiveness of our novel confidence aggregation method, leading to an accuracy improvement of up to 7.4% and 5.8% over baseline approaches in math and open-domain generation tasks, respectively. Code is publicly available at https://github.com/ Aquasar11/CER.
nan
Article 1662
Title@2025-05-25 (7): Uncertainty Quantification for Physics-Informed Neural Networks with Extended Fiducial Inference
Title: Uncertainty Quantification for Physics-Informed Neural Networks with Extended Fiducial Inference | Ungewissheitsquantifizierung für physikinformierte Neuronale Netzwerke mit erweiterter fiduzieller Schlussfolgerung | 具有扩展影响推断力的物理成形神经网络的不确定性量化 2505.19136v1 |
Authors: Frank Shih, Zhenghao Jiang, Faming Liang
Uncertainty quantification (UQ) in scientific machine learning is increasingly critical as neural networks are widely adopted to tackle complex problems across diverse scientific disciplines. For physics-informed neural networks (PINNs), a prominent model in scientific machine learning, uncertainty is typically quantified using Bayesian or dropout methods. However, both approaches suffer from a fundamental limitation: the prior distribution or dropout rate required to construct honest confidence sets cannot be determined without additional information. In this paper, we propose a novel method within the framework of extended fiducial inference (EFI) to provide rigorous uncertainty quantification for PINNs. The proposed method leverages a narrow-neck hyper-network to learn the parameters of the PINN and quantify their uncertainty based on imputed random errors in the observations. This approach overcomes the limitations of Bayesian and dropout methods, enabling the construction of honest confidence sets based solely on observed data. This advancement represents a significant breakthrough for PINNs, greatly enhancing their reliability, interpretability, and applicability to real-world scientific and engineering challenges. Moreover, it establishes a new theoretical framework for EFI, extending its application to large-scale models, eliminating the need for sparse hyper-networks, and significantly improving the automaticity and robustness of statistical inference.
nan
Article 1663
Title@2025-05-25 (7): Incentivizing High-Quality Human Annotations with Golden Questions
Title: Incentivizing High-Quality Human Annotations with Golden Questions | Anreize für hochwertige menschliche Anmerkungen mit goldenen Fragen | 以金质问题激励高品质人文说明 2505.19134v1 |
Authors: Shang Liu, Zhongze Cai, Hanzhao Wang, Zhongyao Ma, Xiaocheng Li
Human-annotated data plays a vital role in training large language models (LLMs), such as supervised fine-tuning and human preference alignment. However, it is not guaranteed that paid human annotators produce high-quality data. In this paper, we study how to incentivize human annotators to do so. We start from a principal-agent model to model the dynamics between the company (the principal) and the annotator (the agent), where the principal can only monitor the annotation quality by examining $n$ samples. We investigate the maximum likelihood estimators (MLE) and the corresponding hypothesis testing to incentivize annotators: the agent is given a bonus if the MLE passes the test. By analyzing the variance of the outcome, we show that the strategic behavior of the agent makes the hypothesis testing very different from traditional ones: Unlike the exponential rate proved by the large deviation theory, the principal-agent model’s hypothesis testing rate is of $\Theta(1/\sqrt{n \log n})$. Our theory implies two criteria for the \emph{golden questions} to monitor the performance of the annotators: they should be of (1) high certainty and (2) similar format to normal ones. In that light, we select a set of golden questions in human preference data. By doing incentive-compatible experiments, we find out that the annotators’ behavior is better revealed by those golden questions, compared to traditional survey techniques such as instructed manipulation checks.
nan
Article 1664
Title@2025-05-25 (7): Fast and Accurate Power Load Data Completion via Regularization-optimized Low-Rank Factorization
Title: Fast and Accurate Power Load Data Completion via Regularization-optimized Low-Rank Factorization | Schnelle und präzise Leistungslastdatenvervollständigung über Regularisierungsoptimierte Low-Rank-Fabrikisierung | 通过正规化、优化低射速电荷因子化完成快速和准确电源负载数据 2505.19133v1 |
Authors: Yan Xia, Hao Feng, Hongwei Sun, Junjie Wang, Qicong Hu
Low-rank representation learning has emerged as a powerful tool for recovering missing values in power load data due to its ability to exploit the inherent low-dimensional structures of spatiotemporal measurements. Among various techniques, low-rank factorization models are favoured for their efficiency and interpretability. However, their performance is highly sensitive to the choice of regularization parameters, which are typically fixed or manually tuned, resulting in limited generalization capability or slow convergence in practical scenarios. In this paper, we propose a Regularization-optimized Low-Rank Factorization, which introduces a Proportional-Integral-Derivative controller to adaptively adjust the regularization coefficient. Furthermore, we provide a detailed algorithmic complexity analysis, showing that our method preserves the computational efficiency of stochastic gradient descent while improving adaptivity. Experimental results on real-world power load datasets validate the superiority of our method in both imputation accuracy and training efficiency compared to existing baselines.
nan
Article 1665
Title@2025-05-25 (7): Rank-One Modified Value Iteration
Title: Rank-One Modified Value Iteration | Rang eins geänderte Wert Iteration | Ran- One 修改值迭代 2505.01828v2 |
Authors: Arman Sharifi Kolarijani, Tolga Ok, Peyman Mohajerin Esfahani, Mohamad Amin Sharif Kolarijani
In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.
nan
Article 1666
Title@2025-05-25 (7): Natural Language Generation from Visual Events: Challenges and Future Directions
Title: Natural Language Generation from Visual Events: Challenges and Future Directions | Natürliche Sprachgenerierung aus visuellen Veranstaltungen: Herausforderungen und Zukunftsrichtungen | 从视觉活动中产生自然语言:挑战和未来方向 2502.13034v2 |
Authors: Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle
The ability to use natural language to talk about visual events is at the core of human intelligence and a crucial feature of any artificial intelligence system. In recent years, a substantial body of work in visually grounded NLP has focused on describing content depicted in single images. By contrast, comparatively less attention has been devoted to exhaustively modeling scenarios in which natural language is employed to interpret and talk about events presented through videos or sequences of images. In this position paper, we argue that any NLG task dealing with sequences of images or frames is an instance of the broader, more general problem of modeling the intricate relationships between visual events unfolding over time and the features of the language used to interpret, describe, or narrate them. Therefore, solving these tasks requires models to be capable of identifying and managing such intricacies. We consider five seemingly different tasks, which we argue are compelling instances of this broader multimodal problem. Consistently, we claim that these tasks pose a common set of challenges and share similarities in terms of modeling and evaluation approaches. Building on this perspective, we identify key open questions and propose several research directions for future investigation. We claim that improving language-and-vision models’ understanding of visual events is both timely and essential, given their growing applications. Additionally, this challenge offers significant scientific insight, advancing model development through principles of human cognition and language use.
nan
Article 1667
Title@2025-05-25 (7): Interacting Large Language Model Agents. Interpretable Models and Social Learning
Title: Interacting Large Language Model Agents. Interpretable Models and Social Learning | Interagieren von Large Language Model Agents. Interpretierbare Modelle und soziales Lernen | 跨大语言示范工具、可解释模型和社会学习 2411.01271v2 |
Authors: Adit Jain, Vikram Krishnamurthy
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under 2 settings: (a) centrally controlled LLMAs (b) autonomous LLMAs with incentives. We demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and closed-source models like ChatGPT. The main takeaway of this paper, based on empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.
nan
Article 1668
Title@2025-05-25 (7): Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins
Title: Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins | Adaptive Sensorlenkungsstrategie mit tief greifendem Verstärkungslernen für die dynamische Datenerfassung in digitalen Zwillingen | 利用深强化学习促进数字双对动态数据采集的适应感感感感指导战略 2504.10248v2 |
Authors: Collins O. Ogbodo, Timothy J. Rogers, Mattia Dal Borgo, David J. Wagg
This paper introduces a sensor steering methodology based on deep reinforcement learning to enhance the predictive accuracy and decision support capabilities of digital twins by optimising the data acquisition process. Traditional sensor placement techniques are often constrained by one-off optimisation strategies, which limit their applicability for online applications requiring continuous informative data assimilation. The proposed approach addresses this limitation by offering an adaptive framework for sensor placement within the digital twin paradigm. The sensor placement problem is formulated as a Markov decision process, enabling the training and deployment of an agent capable of dynamically repositioning sensors in response to the evolving conditions of the physical structure as represented by the digital twin. This ensures that the digital twin maintains a highly representative and reliable connection to its physical counterpart. The proposed framework is validated through a series of comprehensive case studies involving a cantilever plate structure subjected to diverse conditions, including healthy and damaged conditions. The results demonstrate the capability of the deep reinforcement learning agent to adaptively reposition sensors improving the quality of data acquisition and hence enhancing the overall accuracy of digital twins.
nan
Article 1669
Title@2025-05-25 (7): Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods
Title: Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods | Birke SGD: Ein Baumdiagramm-Framework für lokale und asynchrone SGD-Methoden | Birch SGD: 当地和非同步 SGD 方法树图框架 2505.09218v2 |
Authors: Alexander Tyurin, Danil Sivtsov
We propose a new unifying framework, Birch SGD, for analyzing and designing distributed SGD methods. The central idea is to represent each method as a weighted directed tree, referred to as a computation tree. Leveraging this representation, we introduce a general theoretical result that reduces convergence analysis to studying the geometry of these trees. This perspective yields a purely graph-based interpretation of optimization dynamics, offering a new and intuitive foundation for method development. Using Birch SGD, we design eight new methods and analyze them alongside previously known ones, with at least six of the new methods shown to have optimal computational time complexity. Our research leads to two key insights: (i) all methods share the same “iteration rate” of $O\left(\frac{(R + 1) L \Delta}{\varepsilon} + \frac{\sigma^2 L \Delta}{\varepsilon^2}\right)$, where $R$ the maximum “tree distance” along the main branch of a tree; and (ii) different methods exhibit different trade-offs-for example, some update iterates more frequently, improving practical performance, while others are more communication-efficient or focus on other aspects. Birch SGD serves as a unifying framework for navigating these trade-offs. We believe these results provide a unified foundation for understanding, analyzing, and designing efficient asynchronous and parallel optimization methods.
nan
Article 1670
Title@2025-05-25 (7): Deep Active Speech Cancellation with Mamba-Masking Network
Title: Deep Active Speech Cancellation with Mamba-Masking Network | Deep Active Speech Stornierung mit Mamba-Masking Network | 使用 Mamba- Masking 网络的深活动语音取消 2502.01185v2 |
Authors: Yehuda Mishaly, Lior Wolf, Eliya Nachmani
We present a novel deep learning network for Active Speech Cancellation (ASC), advancing beyond Active Noise Cancellation (ANC) methods by effectively canceling both noise and speech signals. The proposed Mamba-Masking architecture introduces a masking mechanism that directly interacts with the encoded reference signal, enabling adaptive and precisely aligned anti-signal generation-even under rapidly changing, high-frequency conditions, as commonly found in speech. Complementing this, a multi-band segmentation strategy further improves phase alignment across frequency bands. Additionally, we introduce an optimization-driven loss function that provides near-optimal supervisory signals for anti-signal generation. Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC, significantly outperforming existing methods.
nan
Article 1671
Title@2025-05-25 (7): Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers
Title: Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers | Erforschung der Magnitudenerhaltung und Rotationsmodulation in Diffusionstransformatoren | 在扩散变异器中探索磁力保护与旋转调节 2505.19122v1 |
Authors: Eric Tillman Bill, Cristian Perez Jensen, Sotiris Anagnostidis, Dimitri von Rütte
Denoising diffusion models exhibit remarkable generative capabilities, but remain challenging to train due to their inherent stochasticity, where high-variance gradient estimates lead to slow convergence. Previous works have shown that magnitude preservation helps with stabilizing training in the U-net architecture. This work explores whether this effect extends to the Diffusion Transformer (DiT) architecture. As such, we propose a magnitude-preserving design that stabilizes training without normalization layers. Motivated by the goal of maintaining activation magnitudes, we additionally introduce rotation modulation, which is a novel conditioning method using learned rotations instead of traditional scaling or shifting. Through empirical evaluations and ablation studies on small-scale models, we show that magnitude-preserving strategies significantly improve performance, notably reducing FID scores by $\sim$12.8%. Further, we show that rotation modulation combined with scaling is competitive with AdaLN, while requiring $\sim$5.4% fewer parameters. This work provides insights into conditioning strategies and magnitude control. We will publicly release the implementation of our method.
nan
Article 1672
Title@2025-05-25 (7): FP4 All the Way: Fully Quantized Training of LLMs
Title: FP4 All the Way: Fully Quantized Training of LLMs | RP4: Vollständig quantifizierte Ausbildung von LLMs | FP4 全程:充分量化的LLMM培训 2505.19115v1 |
Authors: Brian Chmiel, Maxim Fishman, Ron Banner, Daniel Soudry
We demonstrate, for the first time, fully quantized training (FQT) of large language models (LLMs) using predominantly 4-bit floating-point (FP4) precision for weights, activations, and gradients on datasets up to 200 billion tokens. We extensively investigate key design choices for FP4, including block sizes, scaling formats, and rounding methods. Our analysis shows that the NVFP4 format, where each block of 16 FP4 values (E2M1) shares a scale represented in E4M3, provides optimal results. We use stochastic rounding for backward and update passes and round-to-nearest for the forward pass to enhance stability. Additionally, we identify a theoretical and empirical threshold for effective quantized training: when the gradient norm falls below approximately $\sqrt{3}$ times the quantization noise, quantized training becomes less effective. Leveraging these insights, we successfully train a 7-billion-parameter model on 256 Intel Gaudi2 accelerators. The resulting FP4-trained model achieves downstream task performance comparable to a standard BF16 baseline, confirming that FP4 training is a practical and highly efficient approach for large-scale LLM training. A reference implementation is supplied in https://github.com/Anonymous1252022/fp4-all-the-way .
nan
Article 1673
Title@2025-05-25 (7): Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Title: Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling | Verwandeln von Müll in Schatz: Beschleunigen von Inferenzen von großen Sprachmodellen mit Token-Recycling | 将垃圾垃圾变成宝库:加快使用 Tok 回收利用大语言模型的推论 2408.08696v3 |
Authors: Xianzhen Luo, Yixuan Wang, Qingfu Zhu, Zhiming Zhang, Xuanyu Zhang, Qing Yang, Dongliang Xu
Massive parameters of LLMs have made inference latency a fundamental bottleneck. Speculative decoding represents a lossless approach to accelerate inference through a guess-and-verify paradigm. Some methods rely on additional architectures to guess draft tokens, which need extra training before use. Alternatively, retrieval-based training-free techniques build libraries from pre-existing corpora or by n-gram generation. However, they face challenges like large storage requirements, time-consuming retrieval, and limited adaptability. Observing that candidate tokens generated during the decoding process are likely to reoccur in future sequences, we propose Token Recycling. It stores candidate tokens in an adjacency matrix and employs a breadth-first-search (BFS)-like algorithm to construct a draft tree, which is then validated through tree attention. New candidate tokens from the decoding process are then used to update the matrix. Token Recycling requires \textless2MB of additional storage and achieves approximately 2x speedup across all sizes of LLMs. It significantly outperforms existing train-free methods by 30\% and even a widely recognized training method by 25\%.
nan
Article 1674
Title@2025-05-25 (7): Stochastic Compositional Optimization with Compositional Constraints
Title: Stochastic Compositional Optimization with Compositional Constraints | Stochastische kompositorische Optimierung mit kompositorischen Einschränkungen | 具有组成限制的斯托具组成优化 2209.04086v2 |
Authors: Shuoguang Yang, Wei You, Zhe Zhang, Ethan X. Fang
Stochastic compositional optimization (SCO) has attracted considerable attention because of its broad applicability to important real-world problems. However, existing works on SCO assume that the projection within a solution update is simple, which fails to hold for problem instances where the constraints are in the form of expectations, such as empirical conditional value-at-risk constraints. We study a novel model that incorporates single-level expected value and two-level compositional constraints into the current SCO framework. Our model can be applied widely to data-driven optimization and risk management, including risk-averse optimization and high-moment portfolio selection, and can handle multiple constraints. We further propose a class of primal-dual algorithms that generates sequences converging to the optimal solution at the rate of $\cO(\frac{1}{\sqrt{N}})$under both single-level expected value and two-level compositional constraints, where $N$ is the iteration counter, establishing the benchmarks in expected value constrained SCO.
nan
Article 1675
Title@2025-05-25 (7): An Interpretable Representation Learning Approach for Diffusion Tensor Imaging
Title: An Interpretable Representation Learning Approach for Diffusion Tensor Imaging | Ein interpretierbarer Representations-Lernansatz für Diffusion Tensor Imaging | 传播显像成像的可解释代表性学习方法 2505.19110v1 |
Authors: Vishwa Mohan Singh, Alberto Gaston Villagran Asiares, Luisa Sophie Schuhmacher, Kate Rendall, Simon Weißbrod, David Rügamer, Inga Körte
Diffusion Tensor Imaging (DTI) tractography offers detailed insights into the structural connectivity of the brain, but presents challenges in effective representation and interpretation in deep learning models. In this work, we propose a novel 2D representation of DTI tractography that encodes tract-level fractional anisotropy (FA) values into a 9x9 grayscale image. This representation is processed through a Beta-Total Correlation Variational Autoencoder with a Spatial Broadcast Decoder to learn a disentangled and interpretable latent embedding. We evaluate the quality of this embedding using supervised and unsupervised representation learning strategies, including auxiliary classification, triplet loss, and SimCLR-based contrastive learning. Compared to the 1D Group deep neural network (DNN) baselines, our approach improves the F1 score in a downstream sex classification task by 15.74% and shows a better disentanglement than the 3D representation.
nan
Article 1676
Title@2025-05-25 (7): Optimization-Inspired Few-Shot Adaptation for Large Language Models
Title: Optimization-Inspired Few-Shot Adaptation for Large Language Models | Optimization-Inspired Wenig-Shot-Anpassung für große Sprachmodelle | 优化- 激发了对大语言模型的微热适应 2505.19107v1 |
Authors: Boyan Gao, Xin Wang, Yibo Yang, David Clifton
Large Language Models (LLMs) have demonstrated remarkable performance in real-world applications. However, adapting LLMs to novel tasks via fine-tuning often requires substantial training data and computational resources that are impractical in few-shot scenarios. Existing approaches, such as in-context learning and Parameter-Efficient Fine-Tuning (PEFT), face key limitations: in-context learning introduces additional inference computational overhead with limited performance gains, while PEFT models are prone to overfitting on the few demonstration examples. In this work, we reinterpret the forward pass of LLMs as an optimization process, a sequence of preconditioned gradient descent steps refining internal representations. Based on this connection, we propose Optimization-Inspired Few-Shot Adaptation (OFA), integrating a parameterization that learns preconditioners without introducing additional trainable parameters, and an objective that improves optimization efficiency by learning preconditioners based on a convergence bound, while simultaneously steering the optimization path toward the flat local minimum. Our method overcomes both issues of ICL-based and PEFT-based methods, and demonstrates superior performance over the existing methods on a variety of few-shot adaptation tasks in experiments.
nan
Article 1677
Title@2025-05-25 (7): Statistical inference for Linear Stochastic Approximation with Markovian Noise
Title: Statistical inference for Linear Stochastic Approximation with Markovian Noise | Statistische Schlussfolgerung zur linearen stochastischen Annäherung an Markovsche Geräusche | 与Markovian噪音的线性斯托口接近的统计推推 2505.19102v1 |
Authors: Sergey Samsonov, Marina Sheshukova, Eric Moulines, Alexey Naumov
In this paper we derive non-asymptotic Berry-Esseen bounds for Polyak-Ruppert averaged iterates of the Linear Stochastic Approximation (LSA) algorithm driven by the Markovian noise. Our analysis yields $\mathcal{O}(n^{-1/4})$ convergence rates to the Gaussian limit in the Kolmogorov distance. We further establish the non-asymptotic validity of a multiplier block bootstrap procedure for constructing the confidence intervals, guaranteeing consistent inference under Markovian sampling. Our work provides the first non-asymptotic guarantees on the rate of convergence of bootstrap-based confidence intervals for stochastic approximation with Markov noise. Moreover, we recover the classical rate of order $\mathcal{O}(n^{-1/8})$ up to logarithmic factors for estimating the asymptotic variance of the iterates of the LSA algorithm.
nan
Article 1678
Title@2025-05-25 (7): Towards Robust Influence Functions with Flat Validation Minima
Title: Towards Robust Influence Functions with Flat Validation Minima | Auf dem Weg zu robusten Einflussfunktionen mit Flat Validation Minima | 以平滑校准微型方式向强力影响函数方向 2505.19097v1 |
Authors: Xichen Ye, Yifan Wu, Weizhong Zhang, Cheng Jin, Yifan Chen
The Influence Function (IF) is a widely used technique for assessing the impact of individual training samples on model predictions. However, existing IF methods often fail to provide reliable influence estimates in deep neural networks, particularly when applied to noisy training data. This issue does not stem from inaccuracies in parameter change estimation, which has been the primary focus of prior research, but rather from deficiencies in loss change estimation, specifically due to the sharpness of validation risk. In this work, we establish a theoretical connection between influence estimation error, validation set risk, and its sharpness, underscoring the importance of flat validation minima for accurate influence estimation. Furthermore, we introduce a novel estimation form of Influence Function specifically designed for flat validation minima. Experimental results across various tasks validate the superiority of our approach.
nan
Article 1679
Title@2025-05-25 (7): A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random
Title: A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random | Ein einheitliches Framework zur variablen Auswahl im modellbasierten Clustering mit Fehlen nicht zufällig | 以模型为基础的集束模式中变量选择的统一框架, 随机不失踪 2505.19093v1 |
Authors: Binh H. Ho, Long Nguyen Chi, TrungTin Nguyen, Binh T. Nguyen, Van Ha Hoang, Christopher Drovandi
Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define heterogeneous subgroups and handling data that are missing not at random, a prevalent issue in fields like transcriptomics. While several notable methods have been proposed to address these problems, they typically tackle each issue in isolation, thereby limiting their flexibility and adaptability. This paper introduces a unified framework designed to address these challenges simultaneously. Our approach incorporates a data-driven penalty matrix into penalized clustering to enable more flexible variable selection, along with a mechanism that explicitly models the relationship between missingness and latent class membership. We demonstrate that, under certain regularity conditions, the proposed framework achieves both asymptotic consistency and selection consistency, even in the presence of missing data. This unified strategy significantly enhances the capability and efficiency of model-based clustering, advancing methodologies for identifying informative variables that define homogeneous subgroups in the presence of complex missing data patterns. The performance of the framework, including its computational efficiency, is evaluated through simulations and demonstrated using both synthetic and real-world transcriptomic datasets.
nan
Article 1680
Title@2025-05-25 (7): ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models
Title: ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models | ReadBench: Vermessen der Dichte an Text Visuelle Lesefähigkeit von Vision-Sprachen-Modellen | ” 阅读 “ :衡量视觉-语言模型的阅读能力 2505.19091v1 |
Authors: Benjamin Clavié, Florian Brand
Recent advancements in Large Vision-Language Models (VLMs), have greatly enhanced their capability to jointly process text and images. However, despite extensive benchmarks evaluating visual comprehension (e.g., diagrams, color schemes, OCR tasks…), there is limited assessment of VLMs’ ability to read and reason about text-rich images effectively. To fill this gap, we introduce ReadBench, a multimodal benchmark specifically designed to evaluate the reading comprehension capabilities of VLMs. ReadBench transposes contexts from established text-only benchmarks into images of text while keeping textual prompts and questions intact. Evaluating leading VLMs with ReadBench, we find minimal-but-present performance degradation on short, text-image inputs, while performance sharply declines for longer, multi-page contexts. Our experiments further reveal that text resolution has negligible effects on multimodal performance. These findings highlight needed improvements in VLMs, particularly their reasoning over visually presented extensive textual content, a capability critical for practical applications. ReadBench is available at https://github.com/answerdotai/ReadBench .
nan
Article 1681
Title@2025-05-25 (7): CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations
Title: CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations | CMoS: Die Vorhersage der Zeitreihen durch die Linse der spaltweisen räumlichen Korrelationen neu denken | CMoS: 重新思考时间序列,通过整节空间交汇的镜头预测 2505.19090v1 |
Authors: Haotian Si, Changhua Pei, Jianhui Li, Dan Pei, Gaogang Xie
Recent advances in lightweight time series forecasting models suggest the inherent simplicity of time series forecasting tasks. In this paper, we present CMoS, a super-lightweight time series forecasting model. Instead of learning the embedding of the shapes, CMoS directly models the spatial correlations between different time series chunks. Additionally, we introduce a Correlation Mixing technique that enables the model to capture diverse spatial correlations with minimal parameters, and an optional Periodicity Injection technique to ensure faster convergence. Despite utilizing as low as 1% of the lightweight model DLinear’s parameters count, experimental results demonstrate that CMoS outperforms existing state-of-the-art models across multiple datasets. Furthermore, the learned weights of CMoS exhibit great interpretability, providing practitioners with valuable insights into temporal structures within specific application scenarios.
nan
Article 1682
Title@2025-05-25 (7): Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Title: Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes | Temperatur ist alles, was Sie für die Generalisierung in Langevin Dynamics und anderen Markov-Prozessen benötigen | Langevin Dynamics 和其他Markov 进程需要的温度是全部您需要的普遍化 2505.19087v1 |
Authors: Itamar Harel, Yonathan Wolanowsky, Gal Vardi, Nathan Srebro, Daniel Soudry
We analyze the generalization gap (gap between the training and test errors) when training a potentially over-parametrized model using a Markovian stochastic training algorithm, initialized from some distribution $\theta_0 \sim p_0$. We focus on Langevin dynamics with a positive temperature $\beta^{-1}$, i.e. gradient descent on a training loss $L$ with infinitesimal step size, perturbed with $\beta^{-1}$-variances Gaussian noise, and lightly regularized or bounded. There, we bound the generalization gap, at any time during training, by $\sqrt{(\beta\mathbb{E} L (\theta_0) + \log(1/\delta))/N}$ with probability $1-\delta$ over the dataset, where $N$ is the sample size, and $\mathbb{E} L (\theta_0) =O(1)$ with standard initialization scaling. In contrast to previous guarantees, we have no dependence on either training time or reliance on mixing, nor a dependence on dimensionality, gradient norms, or any other properties of the loss or model. This guarantee follows from a general analysis of any Markov process-based training that has a Gibbs-style stationary distribution. The proof is surprisingly simple, once we observe that the marginal distribution divergence from initialization remains bounded, as implied by a generalized second law of thermodynamics.
nan
Article 1683
Title@2025-05-25 (7): Jodi: Unification of Visual Generation and Understanding via Joint Modeling
Title: Jodi: Unification of Visual Generation and Understanding via Joint Modeling | Jodi: Vereinheitlichung der visuellen Erzeugung und des Verständnisses durch gemeinsame Modellierung | Jodi:通过联合建模统一视觉生成和理解 2505.19084v1 |
Authors: Yifeng Xu, Zhenliang He, Meina Kan, Shiguang Shan, Xilin Chen
Visual generation and understanding are two deeply interconnected aspects of human intelligence, yet they have been traditionally treated as separate tasks in machine learning. In this paper, we propose Jodi, a diffusion framework that unifies visual generation and understanding by jointly modeling the image domain and multiple label domains. Specifically, Jodi is built upon a linear diffusion transformer along with a role switch mechanism, which enables it to perform three particular types of tasks: (1) joint generation, where the model simultaneously generates images and multiple labels; (2) controllable generation, where images are generated conditioned on any combination of labels; and (3) image perception, where multiple labels can be predicted at once from a given image. Furthermore, we present the Joint-1.6M dataset, which contains 200,000 high-quality images collected from public sources, automatic labels for 7 visual domains, and LLM-generated captions. Extensive experiments demonstrate that Jodi excels in both generation and understanding tasks and exhibits strong extensibility to a wider range of visual domains. Code is available at https://github.com/VIPL-GENUN/Jodi.
nan
Article 1684
Title@2025-05-25 (7): Geometric Determinations Of Characteristic Redshifts From DESI-DR2 BAO and DES-SN5YR Observations: Hints For New Expansion Rate Anomalies
Title: Geometric Determinations Of Characteristic Redshifts From DESI-DR2 BAO and DES-SN5YR Observations: Hints For New Expansion Rate Anomalies | Geometrische Bestimmung charakteristischer Rotverschiebungen aus DESI-DR2 BAO und DES-SN5YR Beobachtungen: Hinweise für neue Erweiterungsraten Anomalien | DESSI-DD2 BAO和DES-SN5YR观测的典型变迁的几何测定:新扩张率异常现象的提示 2505.19083v1 |
Authors: Purba Mukherjee, Anjan A Sen
In this work, we perform a model-agnostic reconstruction of the cosmic expansion history by combining DESI-DR2 BAO and DES-SN5YR data, with a focus on geometric determination of characteristic redshifts where notable tensions in the expansion rate are found to emerge. Employing Gaussian process regression alongside knot-based spline techniques, we reconstruct cosmic distances and their derivatives to pinpoint these characteristic redshifts and infer $E(z)$. Our analysis reveals significant deviations of approximately 4 to 5$\sigma$ from the Planck 2018 $\Lambda$CDM predictions, particularly pronounced in the redshift range $z \sim 0.35-0.55$. These anomalies are consistently observed across both reconstruction methods and combined datasets, indicating robust late-time departures that could signal new physics beyond the standard cosmological framework. The joint use of BAO and SN probes enhances the precision of our constraints, allowing us to isolate these deviations without reliance on specific cosmological assumptions. Our findings underscore the role of characteristic redshifts as sensitive indicators of expansion rate anomalies and motivate further scrutiny with forthcoming datasets from DESI-5YR BAO, Euclid, and LSST. These future surveys will tighten constraints and help distinguish whether these late-time anomalies arise from new fundamental physics or unresolved systematics in the data.
nan
Article 1685
Title@2025-05-25 (7): On Continuity of Robust and Accurate Classifiers
Title: On Continuity of Robust and Accurate Classifiers | Über die Kontinuität von robusten und präzisen Klassifikatoren | 关于强力和准确性分类的连续性 2309.17048v2 |
Authors: Ramin Barati, Reza Safabakhsh, Mohammad Rahmati
The reliability of a learning model is key to the successful deployment of machine learning in various applications. However, it is difficult to describe the phenomenon due to the complicated nature of the problems in machine learning. It has been shown that adversarial training can improve the robustness of the hypothesis. However, this improvement usually comes at the cost of decreased performance on natural samples. Hence, it has been suggested that robustness and accuracy of a hypothesis are at odds with each other. In this paper, we put forth the alternative proposal that it is the continuity of a hypothesis that is incompatible with its robustness and accuracy in many of these scenarios. In other words, a continuous function cannot effectively learn the optimal robust hypothesis. We introduce a framework for a rigorous study of harmonic and holomorphic hypothesis in learning theory terms and provide empirical evidence that continuous hypotheses do not perform as well as discontinuous hypotheses in some common machine learning tasks. From a practical point of view, our results suggests that a robust and accurate learning rule would train different continuous hypotheses for different regions of the domain. From a theoretical perspective, our analysis explains the adversarial examples phenomenon in these situations as a conflict between the continuity of a sequence of functions and its uniform convergence to a discontinuous function. Given that many of the contemporary machine learning models are continuous functions, it is important to theoretically study the continuity of robust and accurate classifiers as it is consequential in their construction, analysis and evaluation.
nan
Article 1686
Title@2025-05-25 (7): Flow Annealed Importance Sampling Bootstrap meets Differentiable Particle Physics
Title: Flow Annealed Importance Sampling Bootstrap meets Differentiable Particle Physics | Flow Annealed Bedeutung Sampling Bootstrap trifft differenzierbare Teilchenphysik | 流动的隐形重要性取样器装置符合可区分的粒子物理 2411.16234v2 |
Authors: Annalena Kofler, Vincent Stimper, Mikhail Mikhasenko, Michael Kagan, Lukas Heinrich
High-energy physics requires the generation of large numbers of simulated data samples from complex but analytically tractable distributions called matrix elements. Surrogate models, such as normalizing flows, are gaining popularity for this task due to their computational efficiency. We adopt an approach based on Flow Annealed importance sampling Bootstrap (FAB) that evaluates the differentiable target density during training and helps avoid the costly generation of training data in advance. We show that FAB reaches higher sampling efficiency with fewer target evaluations in high dimensions in comparison to other methods.
nan
Article 1687
Title@2025-05-25 (7): Cluster-Aware Multi-Round Update for Wireless Federated Learning in Heterogeneous Environments
Title: Cluster-Aware Multi-Round Update for Wireless Federated Learning in Heterogeneous Environments | Cluster-Aware Multi-Round Update für drahtloses Federated Learning in heterogenen Umgebungen | 为不同不同环境无线联邦学习提供多功能集群软件多功能更新 2505.06268v2 |
Authors: Pengcheng Sun, Erwu Liu, Wei Ni, Kanglei Yu, Rui Wang, Abbas Jamalipour
The aggregation efficiency and accuracy of wireless Federated Learning (FL) are significantly affected by resource constraints, especially in heterogeneous environments where devices exhibit distinct data distributions and communication capabilities. This paper proposes a clustering strategy that leverages prior knowledge similarity to group devices with similar data and communication characteristics, mitigating performance degradation from heterogeneity. On this basis, a novel Cluster- Aware Multi-round Update (CAMU) strategy is proposed, which treats clusters as the basic units and adjusts the local update frequency based on the clustered contribution threshold, effectively reducing update bias and enhancing aggregation accuracy. The theoretical convergence of the CAMU strategy is rigorously validated. Meanwhile, based on the convergence upper bound, the local update frequency and transmission power of each cluster are jointly optimized to achieve an optimal balance between computation and communication resources under constrained conditions, significantly improving the convergence efficiency of FL. Experimental results demonstrate that the proposed method effectively improves the model performance of FL in heterogeneous environments and achieves a better balance between communication cost and computational load under limited resources.
nan
Article 1688
Title@2025-05-25 (7): Recalibrating binary probabilistic classifiers
Title: Recalibrating binary probabilistic classifiers | Rekalibrierung von binären probabilistischen Klassifikatoren | 重新计算二进制概率分解器 2505.19068v1 |
Authors: Dirk Tasche
Recalibration of binary probabilistic classifiers to a target prior probability is an important task in areas like credit risk management. We analyse methods for recalibration from a distribution shift perspective. Distribution shift assumptions linked to the area under the curve (AUC) of a probabilistic classifier are found to be useful for the design of meaningful recalibration methods. Two new methods called parametric covariate shift with posterior drift (CSPD) and ROC-based quasi moment matching (QMM) are proposed and tested together with some other methods in an example setting. The outcomes of the test suggest that the QMM methods discussed in the paper can provide appropriately conservative results in evaluations with concave functionals like for instance risk weights functions for credit risk.
nan
Article 1689
Title@2025-05-25 (7): Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management
Title: Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management | Adversarial Bandit über Bandits: Hierarchische Bandits für Online-Konfigurationsmanagement | 反强盗强盗: 用于在线配置管理的等级强盗 2505.19061v1 |
Authors: Chen Avin, Zvi Lotker, Shie Mannor, Gil Shabat, Hanan Shteingart, Roey Yadgar
Motivated by dynamic parameter optimization in finite, but large action (configurations) spaces, this work studies the nonstochastic multi-armed bandit (MAB) problem in metric action spaces with oblivious Lipschitz adversaries. We propose ABoB, a hierarchical Adversarial Bandit over Bandits algorithm that can use state-of-the-art existing “flat” algorithms, but additionally clusters similar configurations to exploit local structures and adapt to changing environments. We prove that in the worst-case scenario, such clustering approach cannot hurt too much and ABoB guarantees a standard worst-case regret bound of $O\left(k^{\frac{1}{2}}T^{\frac{1}{2}}\right)$, where $T$ is the number of rounds and $k$ is the number of arms, matching the traditional flat approach. However, under favorable conditions related to the algorithm properties, clusters properties, and certain Lipschitz conditions, the regret bound can be improved to $O\left(k^{\frac{1}{4}}T^{\frac{1}{2}}\right)$. Simulations and experiments on a real storage system demonstrate that ABoB, using standard algorithms like EXP3 and Tsallis-INF, achieves lower regret and faster convergence than the flat method, up to 50% improvement in known previous setups, nonstochastic and stochastic, as well as in our settings.
nan
Article 1690
Title@2025-05-25 (7): An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection
Title: An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection | Eine erste Erkundung von Feinsteuerungs-Kleinsprachenmodellen für intelligente Vertragsrepentrancy Sicherheitserkennung | 初步探索智能合同留置率易变性探测智能合同微调小型语言模型 2505.19059v1 |
Authors: Ignacio Mariano Andreozzi Pofcher, Joshua Ellul
Large Language Models (LLMs) are being used more and more for various coding tasks, including to help coders identify bugs and are a promising avenue to support coders in various tasks including vulnerability detection – particularly given the flexibility of such generative AI models and tools. Yet for many tasks it may not be suitable to use LLMs, for which it may be more suitable to use smaller language models that can fit and easily execute and train on a developer’s computer. In this paper we explore and evaluate whether smaller language models can be fine-tuned to achieve reasonable results for a niche area: vulnerability detection – specifically focusing on detecting the reentrancy bug in Solidity smart contracts.
nan
Article 1691
Title@2025-05-25 (7): Policy Gradient with Tree Expansion
Title: Policy Gradient with Tree Expansion | Politischer Gradient mit Baumerweiterung | 随着树树扩张的政策渐变 2301.13236v2 |
Authors: Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik
Policy gradient methods are notorious for having a large variance and high sample complexity. To mitigate this, we introduce SoftTreeMax – a generalization of softmax that employs planning. In SoftTreeMax, we extend the traditional logits with the multi-step discounted cumulative reward, topped with the logits of future states. We analyze SoftTreeMax and explain how tree expansion helps to reduce its gradient variance. We prove that the variance depends on the chosen tree-expansion policy. Specifically, we show that the closer the induced transitions are to being state-independent, the stronger the variance decay. With approximate forward models, we prove that the resulting gradient bias diminishes with the approximation error while retaining the same variance reduction. Ours is the first result to bound the gradient bias for an approximate model. In a practical implementation of SoftTreeMax, we utilize a parallel GPU-based simulator for fast and efficient tree expansion. Using this implementation in Atari, we show that SoftTreeMax reduces the gradient variance by three orders of magnitude. This leads to better sample complexity and improved performance compared to distributed PPO.
nan
Article 1692
Title@2025-05-25 (7): Distributionally Robust Deep Q-Learning
Title: Distributionally Robust Deep Q-Learning | Verteilungsstarkes tiefes Q-Lernen | 分布强力深学习 Q- 学习 2505.19058v1 |
Authors: Chung I Lu, Julian Sester, Aijia Zhang
We propose a novel distributionally robust $Q$-learning algorithm for the non-tabular case accounting for continuous state spaces where the state transition of the underlying Markov decision process is subject to model uncertainty. The uncertainty is taken into account by considering the worst-case transition from a ball around a reference probability measure. To determine the optimal policy under the worst-case state transition, we solve the associated non-linear Bellman equation by dualising and regularising the Bellman operator with the Sinkhorn distance, which is then parameterized with deep neural networks. This approach allows us to modify the Deep Q-Network algorithm to optimise for the worst case state transition. We illustrate the tractability and effectiveness of our approach through several applications, including a portfolio optimisation task based on S\&{P}~500 data.
nan
Article 1693
Title@2025-05-25 (7): An Embarrassingly Simple Defense Against LLM Abliteration Attacks
Title: An Embarrassingly Simple Defense Against LLM Abliteration Attacks | Eine erschreckend einfache Verteidigung gegen LLM-Abliterationsangriffe | 一种令人尴尬的简单防御 对付LLM 缩写攻击 2505.19056v1 |
Authors: Harethah Abu Shairah, Hasan Abed Al Kader Hammoud, Bernard Ghanem, George Turkiyyah
Large language models (LLMs) are typically aligned to comply with safety guidelines by refusing harmful instructions. A recent attack, termed abliteration, isolates and suppresses the single latent direction most responsible for refusal behavior, enabling the model to generate unethical content. We propose a defense that modifies how models generate refusals. We construct an extended-refusal dataset that contains harmful prompts with a full response that justifies the reason for refusal. We then fine-tune Llama-2-7B-Chat and Qwen2.5-Instruct (1.5B and 3B parameters) on our extended-refusal dataset, and evaluate the resulting systems on a set of harmful prompts. In our experiments, extended-refusal models maintain high refusal rates, dropping at most by 10%, whereas baseline models’ refusal rates drop by 70-80% after abliteration. A broad evaluation of safety and utility shows that extended-refusal fine-tuning neutralizes the abliteration attack while preserving general performance.
nan
Article 1694
Title@2025-05-25 (7): Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Title: Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning | Computerische Kosten im Deep-Verstärkung-Lernen durch Randomized Policy Learning reduzieren | 降低深强化学习的计算成本 2505.19054v1 |
Authors: Zhuochen Liu, Rahul Jain, Quan Nguyen
Recent advancements in reinforcement learning (RL) have leveraged neural networks to achieve state-of-the-art performance across various control tasks. However, these successes often come at the cost of significant computational resources, as training deep neural networks requires substantial time and data. In this paper, we introduce an actor-critic algorithm that utilizes randomized neural networks to drastically reduce computational costs while maintaining strong performance. Despite its simple architecture, our method effectively solves a range of control problems, including the locomotion control of a highly dynamic 12-motor quadruped robot, and achieves results comparable to leading algorithms such as Proximal Policy Optimization (PPO). Notably, our approach does not outperform other algorithms in terms of sample efficnency but rather in terms of wall-clock training time. That is, although our algorithm requires more timesteps to converge to an optimal policy, the actual time required for training turns out to be lower.
nan
Article 1695
Title@2025-05-25 (7): Structured Reinforcement Learning for Combinatorial Decision-Making
Title: Structured Reinforcement Learning for Combinatorial Decision-Making | Strukturiertes Stärkungslernen für kombinatorische Entscheidungsfindung | 结构强化学习促进综合决策决策 2505.19053v1 |
Authors: Heiko Hoppe, Léo Baty, Louis Bouvier, Axel Parmentier, Maximilian Schiffer
Reinforcement learning (RL) is increasingly applied to real-world problems involving complex and structured decisions, such as routing, scheduling, and assortment planning. These settings challenge standard RL algorithms, which struggle to scale, generalize, and exploit structure in the presence of combinatorial action spaces. We propose Structured Reinforcement Learning (SRL), a novel actor-critic framework that embeds combinatorial optimization layers into the actor neural network. We enable end-to-end learning of the actor via Fenchel-Young losses and provide a geometric interpretation of SRL as a primal-dual algorithm in the dual of the moment polytope. Across six environments with exogenous and endogenous uncertainty, SRL matches or surpasses the performance of unstructured RL and imitation learning on static tasks and improves over these baselines by up to 92% on dynamic problems, with improved stability and convergence speed.
nan
Article 1696
Title@2025-05-25 (7): Efficient Data Selection at Scale via Influence Distillation
Title: Efficient Data Selection at Scale via Influence Distillation | Effiziente Datenauswahl auf Scale durch Einflussdestillation | 通过影响蒸馏在规模上高效数据选择 2505.19051v1 |
Authors: Mahdi Nikdan, Vincent Cohen-Addad, Dan Alistarh, Vahab Mirrokni
Effective data selection is critical for efficient training of modern Large Language Models (LLMs). This paper introduces Influence Distillation, a novel, mathematically-justified framework for data selection that employs second-order information to optimally weight training samples. By distilling each sample’s influence on a target distribution, our method assigns model-specific weights that are used to select training data for LLM fine-tuning, guiding it toward strong performance on the target domain. We derive these optimal weights for both Gradient Descent and Adam optimizers. To ensure scalability and reduce computational cost, we propose a $\textit{landmark-based approximation}$: influence is precisely computed for a small subset of “landmark” samples and then efficiently propagated to all other samples to determine their weights. We validate Influence Distillation by applying it to instruction tuning on the Tulu V2 dataset, targeting a range of tasks including GSM8k, SQuAD, and MMLU, across several models from the Llama and Qwen families. Experiments show that Influence Distillation matches or outperforms state-of-the-art performance while achieving up to $3.5\times$ faster selection.
nan
Article 1697
Title@2025-05-25 (7): SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Title: SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models | SliM-LLM: Salience-getriebene Mixed-Precision-Quantisierung für große Sprachmodelle | SliM-LLM:大语言模型的盐度驱动混合精度量 2405.14917v2 |
Authors: Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Qinshuo Liu, Xianglong Liu, Luca Benini, Michele Magno, Shiming Zhang, Xiaojuan Qi
Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs). However, while uniform-precision quantization is computationally efficient, it often compromises model performance. To address this, we propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise. Our approach leverages the observation that important weights follow a structured distribution and introduces two key components: \textbf{1)} \textit{Salience-Determined Bit Allocation} adaptively assigns bit-widths to groups within each layer based on their salience; and \textbf{2)} \textit{Salience-Weighted Quantizer Calibration} optimizes quantizer parameters by incorporating element-level salience. With its structured partitioning, SliM-LLM provides a hardware-friendly solution that matches the efficiency of uniform quantization methods while improving accuracy. Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths. For example, a 2-bit quantized LLaMA-7B model reduces memory usage by nearly 6x compared to the floating-point baseline, decreases perplexity by 48\% compared to state-of-the-art gradient-free PTQ methods, and maintains GPU inference speed. Additionally, the extended version, SliM-LLM$^+$, which incorporates gradient-based quantization, further reduces perplexity by 35.1\%. Our code is available at https://github.com/Aaronhuang-778/SliM-LLM
nan
Article 1698
Title@2025-05-25 (7): PII-Scope: A Comprehensive Study on Training Data PII Extraction Attacks in LLMs
Title: PII-Scope: A Comprehensive Study on Training Data PII Extraction Attacks in LLMs | PII-Scope: Eine umfassende Studie über Trainingsdaten PII-Extraktionsangriffe in LLMs | PII-范围:关于培训数据的综合研究 2410.06704v2 |
Authors: Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou
In this work, we introduce PII-Scope, a comprehensive benchmark designed to evaluate state-of-the-art methodologies for PII extraction attacks targeting LLMs across diverse threat settings. Our study provides a deeper understanding of these attacks by uncovering several hyperparameters (e.g., demonstration selection) crucial to their effectiveness. Building on this understanding, we extend our study to more realistic attack scenarios, exploring PII attacks that employ advanced adversarial strategies, including repeated and diverse querying, and leveraging iterative learning for continual PII extraction. Through extensive experimentation, our results reveal a notable underestimation of PII leakage in existing single-query attacks. In fact, we show that with sophisticated adversarial capabilities and a limited query budget, PII extraction rates can increase by up to fivefold when targeting the pretrained model. Moreover, we evaluate PII leakage on finetuned models, showing that they are more vulnerable to leakage than pretrained models. Overall, our work establishes a rigorous empirical benchmark for PII extraction attacks in realistic threat scenarios and provides a strong foundation for developing effective mitigation strategies.
nan
Article 1699
Title@2025-05-25 (7): When Models Don’t Collapse: On the Consistency of Iterative MLE
Title: When Models Don’t Collapse: On the Consistency of Iterative MLE | Wenn Modelle nicht zusammenbrechen: Über die Konsistenz iterativer MLE | 当模型不折叠时: 在迭代 MLE 一致性上 2505.19046v1 |
Authors: Daniel Barzilai, Ohad Shamir
The widespread use of generative models has created a feedback loop, in which each generation of models is trained on data partially produced by its predecessors. This process has raised concerns about \emph{model collapse}: A critical degradation in performance caused by repeated training on synthetic data. However, different analyses in the literature have reached different conclusions as to the severity of model collapse. As such, it remains unclear how concerning this phenomenon is, and under which assumptions it can be avoided. To address this, we theoretically study model collapse for maximum likelihood estimation (MLE), in a natural setting where synthetic data is gradually added to the original data set. Under standard assumptions (similar to those long used for proving asymptotic consistency and normality of MLE), we establish non-asymptotic bounds showing that collapse can be avoided even as the fraction of real data vanishes. On the other hand, we prove that some assumptions (beyond MLE consistency) are indeed necessary: Without them, model collapse can occur arbitrarily quickly, even when the original data is still present in the training set. To the best of our knowledge, these are the first rigorous examples of iterative generative modeling with accumulating data that rapidly leads to model collapse.
nan
Article 1700
Title@2025-05-25 (7): Offline Clustering of Linear Bandits: Unlocking the Power of Clusters in Data-Limited Environments
Title: Offline Clustering of Linear Bandits: Unlocking the Power of Clusters in Data-Limited Environments | Offline-Clustering von linearen Banditen: Entriegelung der Macht von Clustern in datenbeschränkten Umgebungen | 线性强盗离线集群:解锁数据限制环境中的群集力量 2505.19043v1 |
Authors: Jingyuan Liu, Zeyu Zhang, Xuchuang Wang, Xutong Liu, John C. S. Lui, Mohammad Hajiesmaili, Carlee Joe-Wong
Contextual linear multi-armed bandits are a learning framework for making a sequence of decisions, e.g., advertising recommendations for a sequence of arriving users. Recent works have shown that clustering these users based on the similarity of their learned preferences can significantly accelerate the learning. However, prior work has primarily focused on the online setting, which requires continually collecting user data, ignoring the offline data widely available in many applications. To tackle these limitations, we study the offline clustering of bandits (Off-ClusBand) problem, which studies how to use the offline dataset to learn cluster properties and improve decision-making across multiple users. The key challenge in Off-ClusBand arises from data insufficiency for users: unlike the online case, in the offline case, we have a fixed, limited dataset to work from and thus must determine whether we have enough data to confidently cluster users together. To address this challenge, we propose two algorithms: Off-C$^2$LUB, which we analytically show performs well for arbitrary amounts of user data, and Off-CLUB, which is prone to bias when data is limited but, given sufficient data, matches a theoretical lower bound that we derive for the offline clustered MAB problem. We experimentally validate these results on both real and synthetic datasets.
nan
Article 1701
Title@2025-05-25 (7): Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias
Title: Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias | Turb-L1: Langfristige Turbulenzen erreichen, die durch das Greifen spektraler Bias verfolgt werden | Turb-L1:通过处理光辉双鱼,实现长期动荡追踪 2505.19038v1 |
Authors: Hao Wu, Yuan Gao, Ruiqi Shu, Zean Han, Fan Xu, Zhihong Zhu, Qingsong Wen, Xian Wu, Kun Wang, Xiaomeng Huang
Accurately predicting the long-term evolution of turbulence is crucial for advancing scientific understanding and optimizing engineering applications. However, existing deep learning methods face significant bottlenecks in long-term autoregressive prediction, which exhibit excessive smoothing and fail to accurately track complex fluid dynamics. Our extensive experimental and spectral analysis of prevailing methods provides an interpretable explanation for this shortcoming, identifying Spectral Bias as the core obstacle. Concretely, spectral bias is the inherent tendency of models to favor low-frequency, smooth features while overlooking critical high-frequency details during training, thus reducing fidelity and causing physical distortions in long-term predictions. Building on this insight, we propose Turb-L1, an innovative turbulence prediction method, which utilizes a Hierarchical Dynamics Synthesis mechanism within a multi-grid architecture to explicitly overcome spectral bias. It accurately captures cross-scale interactions and preserves the fidelity of high-frequency dynamics, enabling reliable long-term tracking of turbulence evolution. Extensive experiments on the 2D turbulence benchmark show that Turb-L1 demonstrates excellent performance: (I) In long-term predictions, it reduces Mean Squared Error (MSE) by $80.3\%$ and increases Structural Similarity (SSIM) by over $9\times$ compared to the SOTA baseline, significantly improving prediction fidelity. (II) It effectively overcomes spectral bias, accurately reproducing the full enstrophy spectrum and maintaining physical realism in high-wavenumber regions, thus avoiding the spectral distortions or spurious energy accumulation seen in other methods.
nan
Article 1702
Title@2025-05-25 (7): Optimal Conformal Prediction under Epistemic Uncertainty
Title: Optimal Conformal Prediction under Epistemic Uncertainty | Optimale konforme Vorhersage unter epistemischer Unsicherheit | 在不确定性下最优化的共变预测 2505.19033v1 |
Authors: Alireza Javanmardi, Soroush H. Zargarbashi, Santo M. A. R. Thies, Willem Waegeman, Aleksandar Bojchevski, Eyke Hüllermeier
Conformal prediction (CP) is a popular frequentist framework for representing uncertainty by providing prediction sets that guarantee coverage of the true label with a user-adjustable probability. In most applications, CP operates on confidence scores coming from a standard (first-order) probabilistic predictor (e.g., softmax outputs). Second-order predictors, such as credal set predictors or Bayesian models, are also widely used for uncertainty quantification and are known for their ability to represent both aleatoric and epistemic uncertainty. Despite their popularity, there is still an open question on ``how they can be incorporated into CP’’. In this paper, we discuss the desiderata for CP when valid second-order predictions are available. We then introduce Bernoulli prediction sets (BPS), which produce the smallest prediction sets that ensure conditional coverage in this setting. When given first-order predictions, BPS reduces to the well-known adaptive prediction sets (APS). Furthermore, when the validity assumption on the second-order predictions is compromised, we apply conformal risk control to obtain a marginal coverage guarantee while still accounting for epistemic uncertainty.
nan
Article 1703
Title@2025-05-25 (7): SoK: Dataset Copyright Auditing in Machine Learning Systems
Title: SoK: Dataset Copyright Auditing in Machine Learning Systems | SoK: Datensatz Copyright Auditing in Machine Learning Systemen | SoK:机器学习系统中的数据集版权审计 2410.16618v2 |
Authors: Linkang Du, Xuanru Zhou, Min Chen, Chusong Zhang, Zhou Su, Peng Cheng, Jiming Chen, Zhikun Zhang
As the implementation of machine learning (ML) systems becomes more widespread, especially with the introduction of larger ML models, we perceive a spring demand for massive data. However, it inevitably causes infringement and misuse problems with the data, such as using unauthorized online artworks or face images to train ML models. To address this problem, many efforts have been made to audit the copyright of the model training dataset. However, existing solutions vary in auditing assumptions and capabilities, making it difficult to compare their strengths and weaknesses. In addition, robustness evaluations usually consider only part of the ML pipeline and hardly reflect the performance of algorithms in real-world ML applications. Thus, it is essential to take a practical deployment perspective on the current dataset copyright auditing tools, examining their effectiveness and limitations. Concretely, we categorize dataset copyright auditing research into two prominent strands: intrusive methods and non-intrusive methods, depending on whether they require modifications to the original dataset. Then, we break down the intrusive methods into different watermark injection options and examine the non-intrusive methods using various fingerprints. To summarize our results, we offer detailed reference tables, highlight key points, and pinpoint unresolved issues in the current literature. By combining the pipeline in ML systems and analyzing previous studies, we highlight several future directions to make auditing tools more suitable for real-world copyright protection requirements.
nan
Article 1704
Title@2025-05-25 (7): Learn Beneficial Noise as Graph Augmentation
Title: Learn Beneficial Noise as Graph Augmentation | Benefitial Noise als Graph Augmentation lernen | 学习以图增益为受益噪音 2505.19024v1 |
Authors: Siqi Huang, Yanchen Xu, Hongyuan Zhang, Xuelong Li
Although graph contrastive learning (GCL) has been widely investigated, it is still a challenge to generate effective and stable graph augmentations. Existing methods often apply heuristic augmentation like random edge dropping, which may disrupt important graph structures and result in unstable GCL performance. In this paper, we propose Positive-incentive Noise driven Graph Data Augmentation (PiNGDA), where positive-incentive noise (pi-noise) scientifically analyzes the beneficial effect of noise under the information theory. To bridge the standard GCL and pi-noise framework, we design a Gaussian auxiliary variable to convert the loss function to information entropy. We prove that the standard GCL with pre-defined augmentations is equivalent to estimate the beneficial noise via the point estimation. Following our analysis, PiNGDA is derived from learning the beneficial noise on both topology and attributes through a trainable noise generator for graph augmentations, instead of the simple estimation. Since the generator learns how to produce beneficial perturbations on graph topology and node attributes, PiNGDA is more reliable compared with the existing methods. Extensive experimental results validate the effectiveness and stability of PiNGDA.
nan
Article 1705
Title@2025-05-25 (7): A Smart Healthcare System for Monkeypox Skin Lesion Detection and Tracking
Title: A Smart Healthcare System for Monkeypox Skin Lesion Detection and Tracking | Ein intelligentes Gesundheitssystem für Monkeypox-Hautläsionserkennung und -verfolgung | 用于探测和跟踪猴子天花皮肤皮层的智能保健系统 2505.19023v1 |
Authors: Huda Alghoraibi, Nuha Alqurashi, Sarah Alotaibi, Renad Alkhudaydi, Bdoor Aldajani, Lubna Alqurashi, Jood Batweel, Maha A. Thafar
Monkeypox is a viral disease characterized by distinctive skin lesions and has been reported in many countries. The recent global outbreak has emphasized the urgent need for scalable, accessible, and accurate diagnostic solutions to support public health responses. In this study, we developed ITMAINN, an intelligent, AI-driven healthcare system specifically designed to detect Monkeypox from skin lesion images using advanced deep learning techniques. Our system consists of three main components. First, we trained and evaluated several pretrained models using transfer learning on publicly available skin lesion datasets to identify the most effective models. For binary classification (Monkeypox vs. non-Monkeypox), the Vision Transformer, MobileViT, Transformer-in-Transformer, and VGG16 achieved the highest performance, each with an accuracy and F1-score of 97.8%. For multiclass classification, which contains images of patients with Monkeypox and five other classes (chickenpox, measles, hand-foot-mouth disease, cowpox, and healthy), ResNetViT and ViT Hybrid models achieved 92% accuracy, with F1 scores of 92.24% and 92.19%, respectively. The best-performing and most lightweight model, MobileViT, was deployed within the mobile application. The second component is a cross-platform smartphone application that enables users to detect Monkeypox through image analysis, track symptoms, and receive recommendations for nearby healthcare centers based on their location. The third component is a real-time monitoring dashboard designed for health authorities to support them in tracking cases, analyzing symptom trends, guiding public health interventions, and taking proactive measures. This system is fundamental in developing responsive healthcare infrastructure within smart cities. Our solution, ITMAINN, is part of revolutionizing public health management.
nan
Article 1706
Title@2025-05-25 (7): Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Title: Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs | Unbestimmte Quantifizierung auf Funktionsebene für die Kalibrierung von Feinabstimmungen auf LLMs | 对LLMML进行校准微调的不确定性定量 2410.06431v3 |
Authors: Ruijia Niu, Dongxia Wu, Rose Yu, Yi-An Ma
Accurate uncertainty quantification in large language models (LLMs) is essential for providing credible confidence estimates over their outputs. However, fine-tuned LLMs often exhibit overconfidence in uncertain predictions, which stems from their limited ability to generalize with sparse data. Existing parameter efficient fine-tuning (PEFT) uncertainty quantification methods for LLMs focus on post fine-tuning stage, and thus fail to address the core issue: limited specialization of PEFT adapters to accurately capture task-specific input-output relationships. To address these limitations, we propose Functional-Level Uncertainty Quantification for Calibrated Fine-Tuning (UQ4CT), which captures and calibrates uncertainty over the space of functions that map input prompts to outputs. We implement UQ4CT during the fine-tuning stage via a mixture-of-experts framework that hierarchically decomposes the functional space. Empirically, UQ4CT achieves over $25\%$ reduction in Expected Calibration Error (ECE) while preserving high accuracy across five benchmarks. Even under distribution shift, UQ4CT maintains superior ECE performance with high accuracy, showcasing improved generalizability.
nan
Article 1707
Title@2025-05-25 (7): AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
Title: AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer | AnchorFormer: Differentielle Anker-Achtung für effizienten Vision Transformer | Anchor Former: 高效愿景变异器的可区别的锁定器注意 2505.16463v2 |
Authors: Jiquan Shan, Junxiao Wang, Lifeng Zhao, Liang Cai, Hongyuan Zhang, Ioannis Liritzis
Recently, vision transformers (ViTs) have achieved excellent performance on vision tasks by measuring the global self-attention among the image patches. Given $n$ patches, they will have quadratic complexity such as $\mathcal{O}(n^2)$ and the time cost is high when splitting the input image with a small granularity. Meanwhile, the pivotal information is often randomly gathered in a few regions of an input image, some tokens may not be helpful for the downstream tasks. To handle this problem, we introduce an anchor-based efficient vision transformer (AnchorFormer), which employs the anchor tokens to learn the pivotal information and accelerate the inference. Firstly, by estimating the bipartite attention between the anchors and tokens, the complexity will be reduced from $\mathcal{O}(n^2)$ to $\mathcal{O}(mn)$, where $m$ is an anchor number and $m < n$. Notably, by representing the anchors with the neurons in a neural layer, we can differentiable learn these distributions and approximate global self-attention through the Markov process. Moreover, we extend the proposed model to three downstream tasks including classification, detection, and segmentation. Extensive experiments show the effectiveness of our AnchorFormer, e.g., achieving up to a 9.0% higher accuracy or 46.7% FLOPs reduction on ImageNet classification, 81.3% higher mAP on COCO detection under comparable FLOPs, as compared to the current baselines.
nan
Article 1708
Title@2025-05-25 (7): When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Title: When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers | Wann ist Task Vector für die Modellbearbeitung wahrscheinlich wirksam? Eine Generalisierungsanalyse von nichtlinearen Transformern | 任务矢量何时对模式编辑有效? 非线性变换器的概括分析 2504.10957v3 |
Authors: Hongkang Li, Yihua Zhang, Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors, each of which is the weight update from the pre-trained model to fine-tuned models for certain tasks. This approach recently gained attention as a computationally efficient inference method for model editing, e.g., multi-task learning, forgetting, and out-of-domain generalization capabilities. However, the theoretical understanding of why task vectors can execute various conceptual operations remains limited, due to the highly non-convexity of training Transformer-based models. To the best of our knowledge, this paper provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. We consider a conceptual learning setting, where each task is a binary classification problem based on a discriminative pattern. We theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or aligned tasks, as well as the success of task negation in unlearning one task from irrelevant or contradictory tasks. Moreover, we prove the proper selection of linear coefficients for task arithmetic to achieve guaranteed generalization to out-of-domain tasks. All of our theoretical results hold for both dense-weight parameters and their low-rank approximations. Although established in a conceptual setting, our theoretical findings were validated on a practical machine unlearning task using the large language model Phi-1.5 (1.3B).
nan
Article 1709
Title@2025-05-25 (7): Fractured Chain-of-Thought Reasoning
Title: Fractured Chain-of-Thought Reasoning | Zersplitterte Kette von nachdenklichen Gründen | 断断断断断断断断断断断断的探讨链原因 2505.12992v2 |
Authors: Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong
Inference-time scaling techniques have significantly bolstered the reasoning capabilities of large language models (LLMs) by harnessing additional computational effort at inference without retraining. Similarly, Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy by generating rich intermediate reasoning trajectories, but these approaches incur substantial token costs that impede their deployment in latency-sensitive settings. In this work, we first show that truncated CoT, which stops reasoning before completion and directly generates the final answer, often matches full CoT sampling while using dramatically fewer tokens. Building on this insight, we introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling along three orthogonal axes: (1) the number of reasoning trajectories, (2) the number of final solutions per trajectory, and (3) the depth at which reasoning traces are truncated. Through extensive experiments on five diverse reasoning benchmarks and several model scales, we demonstrate that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget. Our analysis reveals how to allocate computation across these dimensions to maximize performance, paving the way for more efficient and scalable LLM reasoning. Code is available at https://github.com/BaohaoLiao/frac-cot.
nan
Article 1710
Title@2025-05-25 (7): Lorentzian Graph Isomorphic Network
Title: Lorentzian Graph Isomorphic Network | Lorentzian Graph Isomorphic Network | Lorentzian 图形异形网络 2504.00142v4 |
Authors: Srinitish Srinivasan, Omkumar CU
While hyperbolic GNNs show promise for hierarchical data, they often have limited discriminative power compared to Euclidean counterparts or the WL test, due to non-injective aggregation. To address this expressivity gap, we propose the Lorentzian Graph Isomorphic Network (LGIN), a novel HGNN designed for enhanced discrimination within the Lorentzian model. LGIN introduces a new update rule that preserves the Lorentzian metric while effectively capturing richer structural information. This marks a significant step towards more expressive GNNs on Riemannian manifolds. Extensive evaluations across nine benchmark datasets demonstrate LGIN’s superior performance, consistently outperforming or matching state-of-the-art hyperbolic and Euclidean baselines, showcasing its ability to capture complex graph structures. LGIN is the first to adapt principles of powerful, highly discriminative GNN architectures to a Riemannian manifold. The code for our paper can be found at https://github.com/Deceptrax123/LGIN
nan
Article 1711
Title@2025-05-25 (7): Querying Kernel Methods Suffices for Reconstructing their Training Data
Title: Querying Kernel Methods Suffices for Reconstructing their Training Data | Abfrage von Kernel-Methoden Möglichkeiten zur Wiederherstellung ihrer Trainingsdaten | 查询重新构建其培训数据所需的核心内核方法 2505.19019v1 |
Authors: Daniel Barzilai, Yuval Margalit, Eitan Gronich, Gilad Yehudai, Meirav Galun, Ronen Basri
Over-parameterized models have raised concerns about their potential to memorize training data, even when achieving strong generalization. The privacy implications of such memorization are generally unclear, particularly in scenarios where only model outputs are accessible. We study this question in the context of kernel methods, and demonstrate both empirically and theoretically that querying kernel models at various points suffices to reconstruct their training data, even without access to model parameters. Our results hold for a range of kernel methods, including kernel regression, support vector machines, and kernel density estimation. Our hope is that this work can illuminate potential privacy concerns for such models.
nan
Article 1712
Title@2025-05-25 (7): Accurate and Efficient Multivariate Time Series Forecasting via Offline Clustering
Title: Accurate and Efficient Multivariate Time Series Forecasting via Offline Clustering | Genaue und effiziente Multivariate Zeitreihenprognose über Offline-Clustering | 通过离线群集预测准确而高效的多变量时间序列 2505.05738v2 |
Authors: Yiming Niu, Jinliang Deng, Lulu Zhang, Zimu Zhou, Yongxin Tong
Accurate and efficient multivariate time series (MTS) forecasting is essential for applications such as traffic management and weather prediction, which depend on capturing long-range temporal dependencies and interactions between entities. Existing methods, particularly those based on Transformer architectures, compute pairwise dependencies across all time steps, leading to a computational complexity that scales quadratically with the length of the input. To overcome these challenges, we introduce the Forecaster with Offline Clustering Using Segments (FOCUS), a novel approach to MTS forecasting that simplifies long-range dependency modeling through the use of prototypes extracted via offline clustering. These prototypes encapsulate high-level events in the real-world system underlying the data, summarizing the key characteristics of similar time segments. In the online phase, FOCUS dynamically adapts these patterns to the current input and captures dependencies between the input segment and high-level events, enabling both accurate and efficient forecasting. By identifying prototypes during the offline clustering phase, FOCUS reduces the computational complexity of modeling long-range dependencies in the online phase to linear scaling. Extensive experiments across diverse benchmarks demonstrate that FOCUS achieves state-of-the-art accuracy while significantly reducing computational costs.
nan
Article 1713
Title@2025-05-25 (7): Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Title: Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis | Ausbildung nichtlinearer Transformer für den Schlussfolgerungsketten-of-Thought: Eine theoretische Generalisierungsanalyse | 培训非线性非线性变换器,用于研究链推论:理论一般分析 2410.02167v3 |
Authors: Hongkang Li, Songtao Lu, Pin-Yu Chen, Xiaodong Cui, Meng Wang
Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability of large language models by augmenting the query using multiple examples with multiple intermediate steps. Despite the empirical success, the theoretical understanding of how to train a Transformer to achieve the CoT ability remains less explored. This is primarily due to the technical challenges involved in analyzing the nonconvex optimization on nonlinear attention models. To the best of our knowledge, this work provides the first theoretical study of training Transformers with nonlinear attention to obtain the CoT generalization capability so that the resulting model can inference on unseen tasks when the input is augmented by examples of the new task. We first quantify the required training samples and iterations to train a Transformer model towards CoT ability. We then prove the success of its CoT generalization on unseen tasks with distribution-shifted testing data. Moreover, we theoretically characterize the conditions for an accurate reasoning output by CoT even when the provided reasoning examples contain noises and are not always accurate. In contrast, in-context learning (ICL), which can be viewed as one-step CoT without intermediate steps, may fail to provide an accurate output when CoT does. These theoretical findings are justified through experiments.
nan
Article 1714
Title@2025-05-25 (7): Understanding the Robustness of Graph Neural Networks against Adversarial Attacks
Title: Understanding the Robustness of Graph Neural Networks against Adversarial Attacks | Verständnis der Robustheit von Graphen-Neuralen Netzwerken gegen feindliche Angriffe | 理解反对反向攻击的平面神经网络的强大力 2406.13920v2 |
Authors: Tao Wu, Canyixing Cui, Xingping Xian, Shaojie Qiao, Chao Wang, Lin Yuan, Shui Yu
Recent studies have shown that graph neural networks (GNNs) are vulnerable to adversarial attacks, posing significant challenges to their deployment in safety-critical scenarios. This vulnerability has spurred a growing focus on designing robust GNNs. Despite this interest, current advancements have predominantly relied on empirical trial and error, resulting in a limited understanding of the robustness of GNNs against adversarial attacks. To address this issue, we conduct the first large-scale systematic study on the adversarial robustness of GNNs by considering the patterns of input graphs, the architecture of GNNs, and their model capacity, along with discussions on sensitive neurons and adversarial transferability. This work proposes a comprehensive empirical framework for analyzing the adversarial robustness of GNNs. To support the analysis of adversarial robustness in GNNs, we introduce two evaluation metrics: the confidence-based decision surface and the accuracy-based adversarial transferability rate. Through experimental analysis, we derive 11 actionable guidelines for designing robust GNNs, enabling model developers to gain deeper insights. The code of this study is available at https://github.com/star4455/GraphRE.
nan
Article 1715
Title@2025-05-25 (7): WorldEval: World Model as Real-World Robot Policies Evaluator
Title: WorldEval: World Model as Real-World Robot Policies Evaluator | WorldEval: Weltmodell als Real-World-Roboterpolitik Evaluator | WorldEval:世界作为真实世界机器人政策评价人的世界模式 2505.19017v1 |
Authors: Yaxuan Li, Yichen Zhu, Junjie Wen, Chaomin Shen, Yi Xu
The field of robotics has made significant strides toward developing generalist robot manipulation policies. However, evaluating these policies in real-world scenarios remains time-consuming and challenging, particularly as the number of tasks scales and environmental conditions change. In this work, we demonstrate that world models can serve as a scalable, reproducible, and reliable proxy for real-world robot policy evaluation. A key challenge is generating accurate policy videos from world models that faithfully reflect the robot actions. We observe that directly inputting robot actions or using high-dimensional encoding methods often fails to generate action-following videos. To address this, we propose Policy2Vec, a simple yet effective approach to turn a video generation model into a world simulator that follows latent action to generate the robot video. We then introduce WorldEval, an automated pipeline designed to evaluate real-world robot policies entirely online. WorldEval effectively ranks various robot policies and individual checkpoints within a single policy, and functions as a safety detector to prevent dangerous actions by newly developed robot models. Through comprehensive paired evaluations of manipulation policies in real-world environments, we demonstrate a strong correlation between policy performance in WorldEval and real-world scenarios. Furthermore, our method significantly outperforms popular methods such as real-to-sim approach.
nan
Article 1716
Title@2025-05-25 (7): Tokenizing Electron Cloud in Protein-Ligand Interaction Learning
Title: Tokenizing Electron Cloud in Protein-Ligand Interaction Learning | Tokenizing Electron Cloud in Protein-Ligand Interaktion Lernen | 将电云投入蛋白碱的相互作用学习 2505.19014v1 |
Authors: Haitao Lin, Odin Zhang, Jia Xu, Yunfan Liu, Zheng Cheng, Lirong Wu, Yufei Huang, Zhifeng Gao, Stan Z. Li
The affinity and specificity of protein-molecule binding directly impact functional outcomes, uncovering the mechanisms underlying biological regulation and signal transduction. Most deep-learning-based prediction approaches focus on structures of atoms or fragments. However, quantum chemical properties, such as electronic structures, are the key to unveiling interaction patterns but remain largely underexplored. To bridge this gap, we propose ECBind, a method for tokenizing electron cloud signals into quantized embeddings, enabling their integration into downstream tasks such as binding affinity prediction. By incorporating electron densities, ECBind helps uncover binding modes that cannot be fully represented by atom-level models. Specifically, to remove the redundancy inherent in electron cloud signals, a structure-aware transformer and hierarchical codebooks encode 3D binding sites enriched with electron structures into tokens. These tokenized codes are then used for specific tasks with labels. To extend its applicability to a wider range of scenarios, we utilize knowledge distillation to develop an electron-cloud-agnostic prediction model. Experimentally, ECBind demonstrates state-of-the-art performance across multiple tasks, achieving improvements of 6.42\% and 15.58\% in per-structure Pearson and Spearman correlation coefficients, respectively.
nan
Article 1717
Title@2025-05-25 (7): Faithful Group Shapley Value
Title: Faithful Group Shapley Value | Treue Gruppe Shapley Wert | 忠实的群群形状值 2505.19013v1 |
Authors: Kiljae Lee, Ziqi Liu, Weijing Tang, Yuan Zhang
Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. In practice, group-level data valuation is desirable when data providers contribute data in batch. However, we identify that existing group-level extensions of Data Shapley are vulnerable to shell company attacks, where strategic group splitting can unfairly inflate valuations. We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks. Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV. Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.
nan
Article 1718
Title@2025-05-25 (7): Alberta Wells Dataset: Pinpointing Oil and Gas Wells from Satellite Imagery
Title: Alberta Wells Dataset: Pinpointing Oil and Gas Wells from Satellite Imagery | Alberta Wells Datensatz: Pinpointing Öl- und Gasquellen aus Satellitenbildern | 艾伯塔·韦尔斯数据集:从卫星图象中点出石油和天然气井 2410.09032v3 |
Authors: Pratinav Seth, Michelle Lin, Brefo Dwamena Yaw, Jade Boutot, Mary Kang, David Rolnick
Millions of abandoned oil and gas wells are scattered across the world, leaching methane into the atmosphere and toxic compounds into the groundwater. Many of these locations are unknown, preventing the wells from being plugged and their polluting effects averted. Remote sensing is a relatively unexplored tool for pinpointing abandoned wells at scale. We introduce the first large-scale benchmark dataset for this problem, leveraging medium-resolution multi-spectral satellite imagery from Planet Labs. Our curated dataset comprises over 213,000 wells (abandoned, suspended, and active) from Alberta, a region with especially high well density, sourced from the Alberta Energy Regulator and verified by domain experts. We evaluate baseline algorithms for well detection and segmentation, showing the promise of computer vision approaches but also significant room for improvement.
nan
Article 1719
Title@2025-05-25 (7): FERGI: Automatic Scoring of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction
Title: FERGI: Automatic Scoring of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction | FERGI: Automatische Bewertung von Benutzereinstellungen für die Text-zu-Bild-Erzeugung aus spontaner Gesichtsausdrucksreaktion | FERGI: 自动自发面性表达反应生成文本到图像的用户首选项自动排序 2312.03187v4 |
Authors: Shuangquan Feng, Junhua Ma, Virginia R. de Sa
Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically score user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images. We develop an FAU-Net (Facial Action Units Neural Network), which receives inputs from an AU estimation model, to automatically score user preferences for text-to-image generation based on their facial expression reactions, which is complementary to the pre-trained scoring models based on the input text prompts and generated images. Integrating our FAU-Net valence score with the pre-trained scoring models improves their consistency with human preferences. This method of automatic annotation with facial expression analysis can be potentially generalized to other generation tasks. The code is available at https://github.com/ShuangquanFeng/FERGI, and the dataset is also available at the same link for research purposes.
nan
Article 1720
Title@2025-05-25 (7): Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization
Title: Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization | Handhabung von Etikettengeräuschen über Instance-Level-Schwierigkeitsmodellierung und dynamische Optimierung | 通过实度难度建模和动态优化处理标签噪音 2505.00812v2 |
Authors: Kuan Zhang, Chengliang Chai, Jingzhe Xu, Chi Zhang, Ye Yuan, Guoren Wang, Lei Cao
Recent studies indicate that deep neural networks degrade in generalization performance under noisy supervision. Existing methods focus on isolating clean subsets or correcting noisy labels, facing limitations such as high computational costs, heavy hyperparameter tuning process, and coarse-grained optimization. To address these challenges, we propose a novel two-stage noisy learning framework that enables instance-level optimization through a dynamically weighted loss function, avoiding hyperparameter tuning. To obtain stable and accurate information about noise modeling, we introduce a simple yet effective metric, termed wrong event, which dynamically models the cleanliness and difficulty of individual samples while maintaining computational costs. Our framework first collects wrong event information and builds a strong base model. Then we perform noise-robust training on the base model, using a probabilistic model to handle the wrong event information of samples. Experiments on five synthetic and real-world LNL benchmarks demonstrate our method surpasses state-of-the-art methods in performance, achieves a nearly 75% reduction in computational time and improves model scalability.
nan
Article 1721
Title@2025-05-25 (7): Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
Title: Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding | Galaxy Walker: Geometry-aware VLMs für Galaxy-Skala Verständnis | Galaxy Walker: 用于银河系统系统理解的几何觉测甚低LMS 2503.18578v3 |
Authors: Tianyu Chen, Xingcheng Fu, Yisen Gao, Haodong Qian, Yuecen Wei, Kun Yan, Haoyi Zhou, Jianxin Li
Modern vision-language models (VLMs) develop patch embedding and convolution backbone within vector space, especially Euclidean ones, at the very founding. When expanding VLMs to a galaxy scale for understanding astronomical phenomena, the integration of spherical space for planetary orbits and hyperbolic spaces for black holes raises two formidable challenges. a) The current pre-training model is confined to Euclidean space rather than a comprehensive geometric embedding. b) The predominant architecture lacks suitable backbones for anisotropic physical geometries. In this paper, we introduced Galaxy-Walker, a geometry-aware VLM, for the universe-level vision understanding tasks. We proposed the geometry prompt that generates geometry tokens by random walks across diverse spaces on a multi-scale physical graph, along with a geometry adapter that compresses and reshapes the space anisotropy in a mixture-of-experts manner. Extensive experiments demonstrate the effectiveness of our approach, with Galaxy-Walker achieving state-of-the-art performance in both galaxy property estimation ($R^2$ scores up to $0.91$) and morphology classification tasks (up to $+0.17$ F1 improvement in challenging features), significantly outperforming both domain-specific models and general-purpose VLMs.
nan
Article 1722
Title@2025-05-25 (7): Inductive Gradient Adjustment For Spectral Bias In Implicit Neural Representations
Title: Inductive Gradient Adjustment For Spectral Bias In Implicit Neural Representations | Induktive Gradientenanpassung für Spektralbien in impliziten Neuraldarstellungen | 隐含神经表层旁观生物的感应梯度调整 2410.13271v2 |
Authors: Kexuan Shi, Hai Chen, Leheng Zhang, Shuhang Gu
Implicit Neural Representations (INRs), as a versatile representation paradigm, have achieved success in various computer vision tasks. Due to the spectral bias of the vanilla multi-layer perceptrons (MLPs), existing methods focus on designing MLPs with sophisticated architectures or repurposing training techniques for highly accurate INRs. In this paper, we delve into the linear dynamics model of MLPs and theoretically identify the empirical Neural Tangent Kernel (eNTK) matrix as a reliable link between spectral bias and training dynamics. Based on this insight, we propose a practical Inductive Gradient Adjustment (IGA) method, which could purposefully improve the spectral bias via inductive generalization of eNTK-based gradient transformation matrix. Theoretical and empirical analyses validate impacts of IGA on spectral bias. Further, we evaluate our method on different INRs tasks with various INR architectures and compare to existing training techniques. The superior and consistent improvements clearly validate the advantage of our IGA. Armed with our gradient adjustment method, better INRs with more enhanced texture details and sharpened edges can be learned from data by tailored impacts on spectral bias.
nan
Article 1723
Title@2025-05-25 (7): Semi-pessimistic Reinforcement Learning
Title: Semi-pessimistic Reinforcement Learning | Halbpessimistisches Erlernen der Verstärkung | 半悲观强化学习 2505.19002v1 |
Authors: Jin Zhu, Xin Zhou, Jiaang Yao, Gholamali Aminian, Omar Rivasplata, Simon Little, Lexin Li, Chengchun Shi
Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected data. However, it faces challenges of distributional shift, where the learned policy may encounter unseen scenarios not covered in the offline data. Additionally, numerous applications suffer from a scarcity of labeled reward data. Relying on labeled data alone often leads to a narrow state-action distribution, further amplifying the distributional shift, and resulting in suboptimal policy learning. To address these issues, we first recognize that the volume of unlabeled data is typically substantially larger than that of labeled data. We then propose a semi-pessimistic RL method to effectively leverage abundant unlabeled data. Our approach offers several advantages. It considerably simplifies the learning process, as it seeks a lower bound of the reward function, rather than that of the Q-function or state transition function. It is highly flexible, and can be integrated with a range of model-free and model-based RL algorithms. It enjoys the guaranteed improvement when utilizing vast unlabeled data, but requires much less restrictive conditions. We compare our method with a number of alternative solutions, both analytically and numerically, and demonstrate its clear competitiveness. We further illustrate with an application to adaptive deep brain stimulation for Parkinson’s disease.
nan
Article 1724
Title@2025-05-25 (7): Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs
Title: Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs | Automatische und strukturschonende Sparsifikation von Hybrid-Neural-ODEs | 混合神经代码的自动和结构软件分离 2505.18996v1 |
Authors: Bob Junyi Zou, Lu Tian
Hybrid neural ordinary differential equations (neural ODEs) integrate mechanistic models with neural ODEs, offering strong inductive bias and flexibility, and are particularly advantageous in data-scarce healthcare settings. However, excessive latent states and interactions from mechanistic models can lead to training inefficiency and over-fitting, limiting practical effectiveness of hybrid neural ODEs. In response, we propose a new hybrid pipeline for automatic state selection and structure optimization in mechanistic neural ODEs, combining domain-informed graph modifications with data-driven regularization to sparsify the model for improving predictive performance and stability while retaining mechanistic plausibility. Experiments on synthetic and real-world data show improved predictive performance and robustness with desired sparsity, establishing an effective solution for hybrid model reduction in healthcare applications.
nan
Article 1725
Title@2025-05-25 (7): Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Title: Reinforcement Learning for Reasoning in Large Language Models with One Training Example | Verstärktes Lernen zur Vernunft in großen Sprachmodellen mit einem Trainingsbeispiel | 采用 “ 一个培训实例 “ 采用大语言模式强化学习 2504.20571v2 |
Authors: Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Liyuan Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen
We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the mathematical reasoning capabilities of large language models (LLMs). Applying RLVR to the base model Qwen2.5-Math-1.5B, we identify a single example that elevates model performance on MATH500 from 36.0% to 73.6%, and improves the average performance across six common mathematical reasoning benchmarks from 17.6% to 35.7%. This result matches the performance obtained using the 1.2k DeepScaleR subset (MATH500: 73.6%, average: 35.9%), which includes the aforementioned example. Furthermore, RLVR with only two examples even slightly exceeds these results (MATH500: 74.8%, average: 36.6%). Similar substantial improvements are observed across various models (Qwen2.5-Math-7B, Llama3.2-3B-Instruct, DeepSeek-R1-Distill-Qwen-1.5B), RL algorithms (GRPO and PPO), and different math examples (when employed as a single training example). In addition, we identify some interesting phenomena during 1-shot RLVR, including cross-domain generalization, increased frequency of self-reflection, and sustained test performance improvement even after the training accuracy has saturated, a phenomenon we term post-saturation generalization. Moreover, we verify that the effectiveness of 1-shot RLVR primarily arises from the policy gradient loss, distinguishing it from the “grokking” phenomenon. We also show the critical role of promoting exploration (e.g., by incorporating entropy loss with an appropriate coefficient) in 1-shot RLVR training. We also further discuss related observations about format correction, label robustness and prompt modification. These findings can inspire future work on RLVR efficiency and encourage a re-examination of recent progress and the underlying mechanisms in RLVR. Our code, model, and data are open source at https://github.com/ypwang61/One-Shot-RLVR.
nan
Article 1726
Title@2025-05-25 (7): PDFBench: A Benchmark for De novo Protein Design from Function
Title: PDFBench: A Benchmark for De novo Protein Design from Function | PDFBench: Ein Benchmark für De novo Protein Design von der Funktion | PDFBench:从函数调出新蛋白设计基准 2505.20346v1 |
Authors: Jiahao Kuang, Nuowei Liu, Changzhi Sun, Tao Ji, Yuanbin Wu
In recent years, while natural language processing and multimodal learning have seen rapid advancements, the field of de novo protein design has also experienced significant growth. However, most current methods rely on proprietary datasets and evaluation rubrics, making fair comparisons between different approaches challenging. Moreover, these methods often employ evaluation metrics that capture only a subset of the desired properties of designed proteins, lacking a comprehensive assessment framework. To address these, we introduce PDFBench, the first comprehensive benchmark for evaluating de novo protein design from function. PDFBench supports two tasks: description-guided design and keyword-guided design. To ensure fair and multifaceted evaluation, we compile 22 metrics covering sequence plausibility, structural fidelity, and language-protein alignment, along with measures of novelty and diversity. We evaluate five state-of-the-art baselines, revealing their respective strengths and weaknesses across tasks. Finally, we analyze inter-metric correlations, exploring the relationships between four categories of metrics, and offering guidelines for metric selection. PDFBench establishes a unified framework to drive future advances in function-driven de novo protein design.
nan
Article 1727
Title@2025-05-25 (7): STRICT: Stress Test of Rendering Images Containing Text
Title: STRICT: Stress Test of Rendering Images Containing Text | STRICT: Stresstest von Rendering-Bildern mit Text | STICT: 含有文字的图像的显示压力测试 2505.18985v1 |
Authors: Tianyu Zhang, Xinyu Wang, Zhenghan Tai, Lu Li, Jijun Chi, Jingrui Tian, Hailin He, Suyuchen Wang
While diffusion models have revolutionized text-to-image generation with their ability to synthesize realistic and diverse scenes, they continue to struggle to generate consistent and legible text within images. This shortcoming is commonly attributed to the locality bias inherent in diffusion-based generation, which limits their ability to model long-range spatial dependencies. In this paper, we introduce $\textbf{STRICT}$, a benchmark designed to systematically stress-test the ability of diffusion models to render coherent and instruction-aligned text in images. Our benchmark evaluates models across multiple dimensions: (1) the maximum length of readable text that can be generated; (2) the correctness and legibility of the generated text, and (3) the ratio of not following instructions for generating text. We evaluate several state-of-the-art models, including proprietary and open-source variants, and reveal persistent limitations in long-range consistency and instruction-following capabilities. Our findings provide insights into architectural bottlenecks and motivate future research directions in multimodal generative modeling. We release our entire evaluation pipeline at https://github.com/tianyu-z/STRICT-Bench.
nan
Article 1728
Title@2025-05-25 (7): AmorLIP: Efficient Language-Image Pretraining via Amortization
Title: AmorLIP: Efficient Language-Image Pretraining via Amortization | AmorLIP: Effizientes Sprach-Bild-Vortraining über Amortisation | AmorLIP:通过摊销进行高效的语文图像预培训 2505.18983v1 |
Authors: Haotian Sun, Yitong Li, Yuchen Zhuang, Niao He, Hanjun Dai, Bo Dai
Contrastive Language-Image Pretraining (CLIP) has demonstrated strong zero-shot performance across diverse downstream text-image tasks. Existing CLIP methods typically optimize a contrastive objective using negative samples drawn from each minibatch. To achieve robust representation learning, these methods require extremely large batch sizes and escalate computational demands to hundreds or even thousands of GPUs. Prior approaches to mitigate this issue often compromise downstream performance, prolong training duration, or face scalability challenges with very large datasets. To overcome these limitations, we propose AmorLIP, an efficient CLIP pretraining framework that amortizes expensive computations involved in contrastive learning through lightweight neural networks, which substantially improves training efficiency and performance. Leveraging insights from a spectral factorization of energy-based models, we introduce novel amortization objectives along with practical techniques to improve training stability. Extensive experiments across 38 downstream tasks demonstrate the superior zero-shot classification and retrieval capabilities of AmorLIP, consistently outperforming standard CLIP baselines with substantial relative improvements of up to 12.24%.
nan
Article 1729
Title@2025-05-25 (7): Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning
Title: Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning | Mamba als Continual Learner lernen: Meta-Learning Selective State Space Models für effizientes Continual Learning | Mamba作为不断学习者学习Mamba:高效持续学习的元学习选择性国家空间模型 2412.00776v4 |
Authors: Chongyang Zhao, Dong Gong
Continual learning (CL) aims to efficiently learn from a non-stationary data stream, without storing or recomputing all seen samples. CL enables prediction on new tasks by incorporating sequential training samples. Building on this connection between CL and sequential modeling, meta-continual learning (MCL) aims to meta-learn an efficient continual learner as a sequence prediction model, with advanced sequence models like Transformers being natural choices. However, despite decent performance, Transformers rely on a linearly growing cache to store all past representations, conflicting with CL’s objective of not storing all seen samples and limiting efficiency. In this paper, we focus on meta-learning sequence-prediction-based continual learners without retaining all past representations. While attention-free models with fixed-size hidden states (e.g., Linear Transformers) align with CL’s essential goal and efficiency needs, they have shown limited effectiveness in MCL in previous literature. Given Mamba’s strong sequence modeling performance and attention-free nature, we explore a key question: Can attention-free models like Mamba perform well on MCL? By formulating Mamba and the SSM for MCL tasks, we propose MambaCL, a meta-learned continual learner. To enhance MambaCL’s training, we introduce selectivity regularization, leveraging the connection between Mamba and Transformers to guide its behavior over sequences. Furthermore, we study how Mamba and other models perform across various MCL scenarios through extensive and well-designed experiments. Our results highlight the promising performance and strong generalization of Mamba and attention-free models in MCL, demonstrating its potential for efficient continual learning and adaptation.
nan
Article 1730
Title@2025-05-25 (7): LLMScan: Causal Scan for LLM Misbehavior Detection
Title: LLMScan: Causal Scan for LLM Misbehavior Detection | LLMScan: Kausalscan zur Erkennung von LLM-Missverhalten | LLMScan:用于LLM Misbehavavor探测的成因扫描 2410.16638v4 |
Authors: Mengdi Zhang, Kai Kiat Goh, Peixin Zhang, Jun Sun, Rose Lin Xin, Hongyu Zhang
Despite the success of Large Language Models (LLMs) across various fields, their potential to generate untruthful, biased and harmful responses poses significant risks, particularly in critical applications. This highlights the urgent need for systematic methods to detect and prevent such misbehavior. While existing approaches target specific issues such as harmful responses, this work introduces LLMScan, an innovative LLM monitoring technique based on causality analysis, offering a comprehensive solution. LLMScan systematically monitors the inner workings of an LLM through the lens of causal inference, operating on the premise that the LLM’s `brain’ behaves differently when misbehaving. By analyzing the causal contributions of the LLM’s input tokens and transformer layers, LLMScan effectively detects misbehavior. Extensive experiments across various tasks and models reveal clear distinctions in the causal distributions between normal behavior and misbehavior, enabling the development of accurate, lightweight detectors for a variety of misbehavior detection tasks.
nan
Article 1731
Title@2025-05-25 (7): FedSKC: Federated Learning with Non-IID Data via Structural Knowledge Collaboration
Title: FedSKC: Federated Learning with Non-IID Data via Structural Knowledge Collaboration | FedSKC: Föderiertes Lernen mit nicht-ID-Daten über strukturelle Wissenskooperation | FDSKC:通过结构性知识协作,采用非IID数据的联邦学习 2505.18981v1 |
Authors: Huan Wang, Haoran Li, Huaming Chen, Jun Yan, Lijuan Wang, Jiahua Shi, Shiping Chen, Jun Shen
With the advancement of edge computing, federated learning (FL) displays a bright promise as a privacy-preserving collaborative learning paradigm. However, one major challenge for FL is the data heterogeneity issue, which refers to the biased labeling preferences among multiple clients, negatively impacting convergence and model performance. Most previous FL methods attempt to tackle the data heterogeneity issue locally or globally, neglecting underlying class-wise structure information contained in each client. In this paper, we first study how data heterogeneity affects the divergence of the model and decompose it into local, global, and sampling drift sub-problems. To explore the potential of using intra-client class-wise structural knowledge in handling these drifts, we thus propose Federated Learning with Structural Knowledge Collaboration (FedSKC). The key idea of FedSKC is to extract and transfer domain preferences from inter-client data distributions, offering diverse class-relevant knowledge and a fair convergent signal. FedSKC comprises three components: i) local contrastive learning, to prevent weight divergence resulting from local training; ii) global discrepancy aggregation, which addresses the parameter deviation between the server and clients; iii) global period review, correcting for the sampling drift introduced by the server randomly selecting devices. We have theoretically analyzed FedSKC under non-convex objectives and empirically validated its superiority through extensive experimental results.
nan
Article 1732
Title@2025-05-25 (7): GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization
Title: GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization | GhostPrompt: Jailbreaking Text-to-image Generative Modelle basierend auf dynamischer Optimierung | GhostPropt:基于动态最佳化的破狱用文字到图像生成模型 2505.18979v1 |
Authors: Zixuan Chen, Hao Lin, Ke Xu, Xinghao Jiang, Tanfeng Sun
Text-to-image (T2I) generation models can inadvertently produce not-safe-for-work (NSFW) content, prompting the integration of text and image safety filters. Recent advances employ large language models (LLMs) for semantic-level detection, rendering traditional token-level perturbation attacks largely ineffective. However, our evaluation shows that existing jailbreak methods are ineffective against these modern filters. We introduce GhostPrompt, the first automated jailbreak framework that combines dynamic prompt optimization with multimodal feedback. It consists of two key components: (i) Dynamic Optimization, an iterative process that guides a large language model (LLM) using feedback from text safety filters and CLIP similarity scores to generate semantically aligned adversarial prompts; and (ii) Adaptive Safety Indicator Injection, which formulates the injection of benign visual cues as a reinforcement learning problem to bypass image-level filters. GhostPrompt achieves state-of-the-art performance, increasing the ShieldLM-7B bypass rate from 12.5\% (Sneakyprompt) to 99.0\%, improving CLIP score from 0.2637 to 0.2762, and reducing the time cost by $4.2 \times$. Moreover, it generalizes to unseen filters including GPT-4.1 and successfully jailbreaks DALLE 3 to generate NSFW images in our evaluation, revealing systemic vulnerabilities in current multimodal defenses. To support further research on AI safety and red-teaming, we will release code and adversarial prompts under a controlled-access protocol.
nan
Article 1733
Title@2025-05-25 (7): ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Title: ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting | ScaleBiO: Skalierbare Bilevel-Optimierung für LLM-Datenumgewichtung | 缩放 BIO: LLM 数据重新加权的可缩放双级优化 2406.19976v2 |
Authors: Rui Pan, Dylan Zhang, Hanning Zhang, Xingyuan Pan, Minrui Xu, Jipeng Zhang, Renjie Pi, Xiaoyu Wang, Tong Zhang
Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms has emerged in the theoretical literature, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particularly in the context of large language models (LLMs). This paper introduces the first scalable instantiation of this paradigm called ScaleBiO, focusing on bilevel optimization for large-scale LLM data reweighting. By combining with a recently proposed memory-efficient training technique called LISA, our novel algorithm allows the paradigm to scale to $\sim$30B-sized LLMs on $8\times$H100 GPUs, marking the first successful application of bilevel optimization under practical scenarios for large-sized LLMs. Empirically, extensive experiments on data reweighting verify the effectiveness of ScaleBiO for different-scaled models, including Llama-3-8B, Gemma-2-9B, Qwen-2-7B, and Qwen-2.5-32B, where bilevel optimization succeeds in instruction-following and math reasoning tasks, outperforming several popular baselines, including uniform sampling, influence-aware data filtering, and reference-model-based sampling methods. Theoretically, ScaleBiO ensures the optimality of the learned data weights, along with a convergence guarantee matching the conventional first-order bilevel optimization paradigm on smooth and strongly convex objectives.
nan
Article 1734
Title@2025-05-25 (7): GraSS: Scalable Influence Function with Sparse Gradient Compression
Title: GraSS: Scalable Influence Function with Sparse Gradient Compression | GraSS: Skalierbare Einflussfunktion mit Sparse Gradient Compression | GraSS: 带有微缩梯度压缩的可缩放影响函数 2505.18976v1 |
Authors: Pingbang Hu, Joseph Melkonian, Weijing Tang, Han Zhao, Jiaqi W. Ma
Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.
nan
Article 1735
Title@2025-05-25 (7): The Final Layer Holds the Key: A Unified and Efficient GNN Calibration Framework
Title: The Final Layer Holds the Key: A Unified and Efficient GNN Calibration Framework | Die letzte Ebene hält den Schlüssel: Ein einheitliches und effizientes GNN-Kalibrierungssystem | 最后层掌握着关键:统一有效的全球NNN校准框架 2505.11335v2 |
Authors: Jincheng Huang, Jie Xu, Xiaoshuang Shi, Ping Hu, Lei Feng, Xiaofeng Zhu
Graph Neural Networks (GNNs) have demonstrated remarkable effectiveness on graph-based tasks. However, their predictive confidence is often miscalibrated, typically exhibiting under-confidence, which harms the reliability of their decisions. Existing calibration methods for GNNs normally introduce additional calibration components, which fail to capture the intrinsic relationship between the model and the prediction confidence, resulting in limited theoretical guarantees and increased computational overhead. To address this issue, we propose a simple yet efficient graph calibration method. We establish a unified theoretical framework revealing that model confidence is jointly governed by class-centroid-level and node-level calibration at the final layer. Based on this insight, we theoretically show that reducing the weight decay of the final-layer parameters alleviates GNN under-confidence by acting on the class-centroid level, while node-level calibration acts as a finer-grained complement to class-centroid level calibration, which encourages each test node to be closer to its predicted class centroid at the final-layer representations. Extensive experiments validate the superiority of our method.
nan
Article 1736
Title@2025-05-25 (7): MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models
Title: MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models | MoLAE: Mischung aus latenten Experten für Parameter-Effiziente Sprachmodelle | MoLAE:参数有效语言模型原始专家混合 2503.23100v2 |
Authors: Zehua Liu, Han Wu, Ruifeng She, Xiaojin Fu, Xiongwei Han, Tao Zhong, Mingxuan Yuan
Mixture of Experts (MoE) has become a key architectural paradigm for efficiently scaling Large Language Models (LLMs) by selectively activating a subset of parameters for each input token. However, standard MoE architectures face significant challenges, including high memory consumption and communication overhead during distributed training. In this paper, we introduce Mixture of Latent Experts (MoLAE), a novel parameterization that addresses these limitations by reformulating expert operations through a shared projection into a lower-dimensional latent space, followed by expert-specific transformations. This factorized approach substantially reduces parameter count and computational requirements, particularly in existing LLMs where hidden dimensions significantly exceed MoE intermediate dimensions. We provide a rigorous mathematical framework for transforming pre-trained MoE models into MoLAE architecture, characterizing conditions for optimal factorization, and developing a systematic two-step algorithm for this conversion. Our comprehensive theoretical analysis demonstrates that MoLAE significantly improves efficiency across multiple dimensions while preserving model capabilities. Experimental results confirm that MoLAE achieves comparable performance to standard MoE with substantially reduced resource requirements.
nan
Article 1737
Title@2025-05-25 (7): Multi-Step Consistency Models: Fast Generation with Theoretical Guarantees
Title: Multi-Step Consistency Models: Fast Generation with Theoretical Guarantees | Multi-Step-Konsistenzmodelle: Schnelle Generation mit theoretischen Garantien | 多层次一致性模式:有理论保障的快速一代 2505.01049v2 |
Authors: Nishant Jain, Xunpeng Huang, Yian Ma, Tong Zhang
Consistency models have recently emerged as a compelling alternative to traditional SDE-based diffusion models. They offer a significant acceleration in generation by producing high-quality samples in very few steps. Despite their empirical success, a proper theoretic justification for their speed-up is still lacking. In this work, we address the gap by providing a theoretical analysis of consistency models capable of mapping inputs at a given time to arbitrary points along the reverse trajectory. We show that one can achieve a KL divergence of order $ O(\varepsilon^2) $ using only $ O\left(\log\left(\frac{d}{\varepsilon}\right)\right) $ iterations with a constant step size. Additionally, under minimal assumptions on the data distribution (non smooth case) an increasingly common setting in recent diffusion model analyses we show that a similar KL convergence guarantee can be obtained, with the number of steps scaling as $ O\left(d \log\left(\frac{d}{\varepsilon}\right)\right) $. Going further, we also provide a theoretical analysis for estimation of such consistency models, concluding that accurate learning is feasible using small discretization steps, both in smooth and non-smooth settings. Notably, our results for the non-smooth case yield best in class convergence rates compared to existing SDE or ODE based analyses under minimal assumptions.
nan
Article 1738
Title@2025-05-25 (7): Genetic Influences on Brain Aging: Analyzing Sex Differences in the UK Biobank using Structural MRI
Title: Genetic Influences on Brain Aging: Analyzing Sex Differences in the UK Biobank using Structural MRI | Genetische Einflüsse auf das Altern des Gehirns: Analyse von Geschlechtsunterschieden in der britischen Biobank mittels struktureller MRT | 对大脑老龄化的遗传基因影响:利用结构MRI分析联合王国生物库中的性别差异 2505.20344v1 |
Authors: Karen Ardila, Aashka Mohite, Abdoljalil Addeh, Amanda V. Tyndall, Cindy K. Barha, Quan Long, M. Ethan MacDonald
Brain aging trajectories differ between males and females, yet the genetic factors underlying these differences remain underexplored. Using structural MRI and genotyping data from 40,940 UK Biobank participants (aged 45-83), we computed Brain Age Gap Estimates (BrainAGE) for total brain, hippocampal, and ventricular volumes. We conducted sex-stratified genome-wide association studies (GWAS) and Post-GWAS analyses to identify genetic variants associated with accelerated brain aging. Distinct gene sets emerged by sex: in females, neurotransmitter transport and mitochondrial stress response genes were implicated; in males, immune and inflammation-related genes dominated. Shared genes, including GMNC and OSTN, were consistently linked to brain volumes across sexes, suggesting core roles in neurostructural maintenance. Tissue expression analyses revealed sex-specific enrichment in pathways tied to neurodegeneration. These findings highlight the importance of sex-stratified approaches in aging research and suggest genetic targets for personalized interventions against age-related cognitive decline.
nan
Article 1739
Title@2025-05-25 (7): Protein Design with Dynamic Protein Vocabulary
Title: Protein Design with Dynamic Protein Vocabulary | Protein Design mit dynamischem Protein Vokabular | 配有动态蛋白质词汇词典的蛋白因设计 2505.18966v1 |
Authors: Nuowei Liu, Jiahao Kuang, Yanting Liu, Changzhi Sun, Tao Ji, Yuanbin Wu, Man Lan
Protein design is a fundamental challenge in biotechnology, aiming to design novel sequences with specific functions within the vast space of possible proteins. Recent advances in deep generative models have enabled function-based protein design from textual descriptions, yet struggle with structural plausibility. Inspired by classical protein design methods that leverage natural protein structures, we explore whether incorporating fragments from natural proteins can enhance foldability in generative models. Our empirical results show that even random incorporation of fragments improves foldability. Building on this insight, we introduce ProDVa, a novel protein design approach that integrates a text encoder for functional descriptions, a protein language model for designing proteins, and a fragment encoder to dynamically retrieve protein fragments based on textual functional descriptions. Experimental results demonstrate that our approach effectively designs protein sequences that are both functionally aligned and structurally plausible. Compared to state-of-the-art models, ProDVa achieves comparable function alignment using less than 0.04% of the training data, while designing significantly more well-folded proteins, with the proportion of proteins having pLDDT above 70 increasing by 7.38% and those with PAE below 10 increasing by 9.6%.
nan
Article 1740
Title@2025-05-25 (7): Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Title: Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models | Expansion Span: Kombinieren von Fading Memory und Retrieval in Hybrid State Space Models | 扩展空间:在混合国家空间模型中将平缓内存和检索合并 2412.13328v2 |
Authors: Elvis Nunez, Luca Zancato, Benjamin Bowman, Aditya Golatkar, Wei Xia, Stefano Soatto
The “state” of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have “eidetic” (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Space layers with Attention, but still cannot recall the distant past and can access only the most recent tokens eidetically. Unlike current methods of combining SSM and Attention layers, we allow the state to be allocated based on relevancy rather than recency. In this way, for every new set of query tokens, our models can “eidetically” access tokens from beyond the Attention span of current Hybrid SSMs without requiring extra hardware resources. We introduce a method to expand the memory span of the hybrid state by “reserving” a fraction of the Attention context for tokens retrieved from arbitrarily distant in the past, thus expanding the eidetic memory span of the overall state. We call this reserved fraction of tokens the “expansion span,” and the mechanism to retrieve and aggregate it “Span-Expanded Attention” (SE-Attn). To adapt Hybrid models to using SE-Attn, we propose a novel fine-tuning method that extends LoRA to Hybrid models (HyLoRA) and allows efficient adaptation on long spans of tokens. We show that SE-Attn enables us to efficiently adapt pre-trained Hybrid models on sequences of tokens up to 8 times longer than the ones used for pre-training. We show that HyLoRA with SE-Attn is cheaper and more performant than alternatives like LongLoRA when applied to Hybrid models on natural language benchmarks with long-range dependencies, such as PG-19, RULER, and other common natural language downstream tasks.
nan
Article 1741
Title@2025-05-25 (7): How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
Title: How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation | Wie richten und ergänzen Bilder LiDAR? Auf dem Weg zu einer harmonisierten multimodalen 3D-Panoptischen Segmentierung | 图像如何对齐和补充 LiDAR ? 2505.18956v1 |
Authors: Yining Pan, Qiongjie Cui, Xulei Yang, Na Zhao
LiDAR-based 3D panoptic segmentation often struggles with the inherent sparsity of data from LiDAR sensors, which makes it challenging to accurately recognize distant or small objects. Recently, a few studies have sought to overcome this challenge by integrating LiDAR inputs with camera images, leveraging the rich and dense texture information provided by the latter. While these approaches have shown promising results, they still face challenges, such as misalignment during data augmentation and the reliance on post-processing steps. To address these issues, we propose Image-Assists-LiDAR (IAL), a novel multi-modal 3D panoptic segmentation framework. In IAL, we first introduce a modality-synchronized data augmentation strategy, PieAug, to ensure alignment between LiDAR and image inputs from the start. Next, we adopt a transformer decoder to directly predict panoptic segmentation results. To effectively fuse LiDAR and image features into tokens for the decoder, we design a Geometric-guided Token Fusion (GTF) module. Additionally, we leverage the complementary strengths of each modality as priors for query initialization through a Prior-based Query Generation (PQG) module, enhancing the decoder’s ability to generate accurate instance masks. Our IAL framework achieves state-of-the-art performance compared to previous multi-modal 3D panoptic segmentation methods on two widely used benchmarks. Code and models are publicly available at https://github.com/IMPL-Lab/IAL.git.
nan
Article 1742
Title@2025-05-25 (7): Online Knowledge Distillation with Reward Guidance
Title: Online Knowledge Distillation with Reward Guidance | Online-Wissensdestillation mit lohnender Anleitung | 网上知识蒸馏与奖励指导 2505.18952v1 |
Authors: Chen Jia
This work studies knowledge distillation (KD) for large language models (LLMs) through preference optimization. We propose a reward-guided imitation learning framework for sequential KD, formulating a min-max optimization problem between the policy and reward model (RM) to minimize the performance gap between the student and teacher policies. Specifically, the reward optimization is constrained to achieve near-optimality within a confidence set for preference alignment. For preference data construction, we explore both offline and online preference-based KD. Additionally, we reformulate the RM using the $Q$-value function and extend the framework to white-box KD, where the teacher policy’s predicted probabilities are accessible. Theoretical analysis and empirical results demonstrate the effectiveness of the proposed framework.
nan
Article 1743
Title@2025-05-25 (7): The Price of Format: Diversity Collapse in LLMs
Title: The Price of Format: Diversity Collapse in LLMs | Der Preis des Formats: Diversity Collapse in LLMs | 格式价格:多样化在LLMM中崩溃 2505.18949v1 |
Authors: Longfei Yun, Chenyang An, Zilong Wang, Letian Peng, Jingbo Shang
Instruction-tuned large language models (LLMs) employ structured templates, such as role markers and special tokens, to enforce format consistency during inference. However, we identify a critical limitation of such formatting: it induces a phenomenon we term diversity collapse, where the model generates semantically similar outputs for open-ended inputs, undermining creativity and variability. We systematically evaluate this effect across tasks like story completion and free-form generation, finding that (1) diversity collapse persists even under high-temperature sampling, and (2) structural tokens in templates significantly constrain the model’s output space. To contextualize these findings, we fine-tune the same model using a range of structured prompts and then evaluate them across three axes: downstream task performance, alignment behavior, and output diversity. Our analysis shows that format consistency between fine-tuning and inference is crucial for structure-sensitive tasks (e.g., GSM8K, IFEval), but has marginal influence on knowledge-heavy tasks (e.g., MMLU, WebQuestions). In contrast, output diversity is primarily governed by the presence or absence of structural tokens, with minimal formatting yielding the most diverse outputs. These findings reveal that current prompting conventions, while beneficial for alignment, may inadvertently suppress output diversity, underscoring the need for diversity-aware prompt design and instruction tuning.
nan
Article 1744
Title@2025-05-25 (7): Exact Expressive Power of Transformers with Padding
Title: Exact Expressive Power of Transformers with Padding | Exakte Expressive Kraft von Transformatoren mit Padding | 带有斜面的变形器的精确表达力 2505.18948v1 |
Authors: William Merrill, Ashish Sabharwal
Chain of thought is a natural inference-time method for increasing the computational power of transformer-based large language models (LLMs), but comes at the cost of sequential decoding. Are there more efficient alternatives to expand a transformer’s expressive power without adding parameters? We consider transformers with padding tokens as a form of parallelizable test-time compute. We show that averaging-hard-attention, masked-pre-norm transformers with polynomial padding converge to precisely the class $\mathsf{TC}^0$ of extremely parallelizable problems. While the $\mathsf{TC}^0$ upper bound was known, proving a matching lower bound had been elusive. Further, our novel analysis reveals the precise expanded power of padded transformers when coupled with another form of inference-time compute, namely dynamically increasing depth via looping. Our core technical contribution is to show how padding helps bring the notions of complete problems and reductions, which have been a cornerstone of classical complexity theory, to the formal study of transformers. Armed with this new tool, we prove that padded transformers with $O(\log^d n)$ looping on inputs of length $n$ recognize exactly the class $\mathsf{TC}^d$ of moderately parallelizable problems. Thus, padding and looping together systematically expand transformers’ expressive power: with polylogarithmic looping, padded transformers converge to the class $\mathsf{NC}$, the best that could be expected without losing parallelism (unless $\mathsf{NC} = \mathsf{P}$). Our results thus motivate further exploration of padding and looping as parallelizable alternatives to chain of thought.
nan
Article 1745
Title@2025-05-25 (7): Minimax Optimal Reinforcement Learning with Quasi-Optimism
Title: Minimax Optimal Reinforcement Learning with Quasi-Optimism | Minimax Optimales Stärkungslernen mit Quasi-Optimismus | 以准适应主义进行最优化强化学习 2503.00810v2 |
Authors: Harin Lee, Min-hwan Oh
In our quest for a reinforcement learning (RL) algorithm that is both practical and provably optimal, we introduce EQO (Exploration via Quasi-Optimism). Unlike existing minimax optimal approaches, EQO avoids reliance on empirical variances and employs a simple bonus term proportional to the inverse of the state-action visit count. Central to EQO is the concept of quasi-optimism, where estimated values need not be fully optimistic, allowing for a simpler yet effective exploration strategy. The algorithm achieves the sharpest known regret bound for tabular RL under the mildest assumptions, proving that fast convergence can be attained with a practical and computationally efficient approach. Empirical evaluations demonstrate that EQO consistently outperforms existing algorithms in both regret performance and computational efficiency, providing the best of both theoretical soundness and practical effectiveness.
nan
Article 1746
Title@2025-05-25 (7): Efficient Pauli channel estimation with logarithmic quantum memory
Title: Efficient Pauli channel estimation with logarithmic quantum memory | Effiziente Pauli-Kanalschätzung mit logarithmischem Quantenspeicher | 具有对数量内存的高效保利频道估计 2309.14326v4 |
Authors: Sitan Chen, Weiyuan Gong
Here we revisit one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating every eigenvalue of an $n$-qubit Pauli noise channel to error $\epsilon$. Prior work [14] proved no-go theorems for this task in the practical regime where one has a limited amount of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum memory must make exponentially many measurements, provided it is non-concatenating. Such protocols can only interact with the channel by repeatedly preparing a state, passing it through the channel, and measuring immediately afterward. This left open a natural question: does the lower bound hold even for general protocols, i.e. ones which chain together many queries to the channel, interleaved with arbitrary data-processing channels, before measuring? Surprisingly, in this work we show the opposite: there is a protocol that can estimate the eigenvalues of a Pauli channel to error $\epsilon$ using only $O(\log n/\epsilon^2)$ ancilla and $\tilde{O}(n^2/\epsilon^2)$ measurements. In contrast, we show that any protocol with zero ancilla, even a concatenating one, must make $\Omega(2^n/\epsilon^2)$ measurements, which is tight. Our results imply, to our knowledge, the first quantum learning task where logarithmically many qubits of quantum memory suffice for an exponential statistical advantage. Our protocol can be naturally extended to a protocol that learns the eigenvalues of Pauli terms within any subset $A$ of a Pauli channel with $O(\log\log( | A | )/\epsilon^2)$ ancilla and $\tilde{O}(n^2/\epsilon^2)$ measurements. |
nan
Article 1747
Title@2025-05-25 (7): Structural Alignment Improves Graph Test-Time Adaptation
Title: Structural Alignment Improves Graph Test-Time Adaptation | Struktural Alignment verbessert Graph Test-Time Anpassung | 结构调整改进图示测试时间适应 2502.18334v2 |
Authors: Hans Hao-Hsun Hsu, Shikun Liu, Han Zhao, Pan Li
Graph-based learning excels at capturing interaction patterns in diverse domains like recommendation, fraud detection, and particle physics. However, its performance often degrades under distribution shifts, especially those altering network connectivity. Current methods to address these shifts typically require retraining with the source dataset, which is often infeasible due to computational or privacy limitations. We introduce Test-Time Structural Alignment (TSA), a novel algorithm for Graph Test-Time Adaptation (GTTA) that aligns graph structures during inference without accessing the source data. Grounded in a theoretical understanding of graph data distribution shifts, TSA employs three synergistic strategies: uncertainty-aware neighborhood weighting to accommodate neighbor label distribution shifts, adaptive balancing of self-node and aggregated neighborhood representations based on their signal-to-noise ratio, and decision boundary refinement to correct residual label and feature shifts. Extensive experiments on synthetic and real-world datasets demonstrate TSA’s consistent outperformance of both non-graph TTA methods and state-of-the-art GTTA baselines.
nan
Article 1748
Title@2025-05-25 (7): Chi-Square Wavelet Graph Neural Networks for Heterogeneous Graph Anomaly Detection
Title: Chi-Square Wavelet Graph Neural Networks for Heterogeneous Graph Anomaly Detection | Chi-Square Wavelet Graph Neural Networks für Heterogene Graph Anomalie Detection | 用于异源图异常异常图探测的千平方波浪图神经网络 2505.18934v1 |
Authors: Xiping Li, Xiangyu Dong, Xingyi Zhang, Kun Xie, Yuanhao Feng, Bo Wang, Guilin Li, Wuxiong Zeng, Xiujun Shu, Sibo Wang
Graph Anomaly Detection (GAD) in heterogeneous networks presents unique challenges due to node and edge heterogeneity. Existing Graph Neural Network (GNN) methods primarily focus on homogeneous GAD and thus fail to address three key issues: (C1) Capturing abnormal signal and rich semantics across diverse meta-paths; (C2) Retaining high-frequency content in HIN dimension alignment; and (C3) Learning effectively from difficult anomaly samples with class imbalance. To overcome these, we propose ChiGAD, a spectral GNN framework based on a novel Chi-Square filter, inspired by the wavelet effectiveness in diverse domains. Specifically, ChiGAD consists of: (1) Multi-Graph Chi-Square Filter, which captures anomalous information via applying dedicated Chi-Square filters to each meta-path graph; (2) Interactive Meta-Graph Convolution, which aligns features while preserving high-frequency information and incorporates heterogeneous messages by a unified Chi-Square Filter; and (3) Contribution-Informed Cross-Entropy Loss, which prioritizes difficult anomalies to address class imbalance. Extensive experiments on public and industrial datasets show that ChiGAD outperforms state-of-the-art models on multiple metrics. Additionally, its homogeneous variant, ChiGNN, excels on seven GAD datasets, validating the effectiveness of Chi-Square filters. Our code is available at https://github.com/HsipingLi/ChiGAD.
nan
Article 1749
Title@2025-05-25 (7): Can Large Language Models Infer Causal Relationships from Real-World Text?
Title: Can Large Language Models Infer Causal Relationships from Real-World Text? | Können große Sprachmodelle Kausalbeziehungen aus Real-World Text ableiten? | 大语言模型能否从真实世界文本中推断出因果关系? 2505.18931v1 |
Authors: Ryan Saklad, Aman Chadha, Oleg Pavlov, Raha Moraffah
Understanding and inferring causal relationships from texts is a core aspect of human cognition and is essential for advancing large language models (LLMs) towards artificial general intelligence. Existing work primarily focuses on synthetically generated texts which involve simple causal relationships explicitly mentioned in the text. This fails to reflect the complexities of real-world tasks. In this paper, we investigate whether LLMs are capable of inferring causal relationships from real-world texts. We develop a benchmark drawn from real-world academic literature which includes diverse texts with respect to length, complexity of relationships (different levels of explicitness, number of events, and causal relationships), and domains and sub-domains. To the best of our knowledge, our benchmark is the first-ever real-world dataset for this task. Our experiments on state-of-the-art LLMs evaluated on our proposed benchmark demonstrate significant challenges, with the best-performing model achieving an average F1 score of only 0.477. Analysis reveals common pitfalls: difficulty with implicitly stated information, in distinguishing relevant causal factors from surrounding contextual details, and with connecting causally relevant information spread across lengthy textual passages. By systematically characterizing these deficiencies, our benchmark offers targeted insights for further research into advancing LLM causal reasoning.
nan
Article 1750
Title@2025-05-25 (7): Hybrid Neural-MPM for Interactive Fluid Simulations in Real-Time
Title: Hybrid Neural-MPM for Interactive Fluid Simulations in Real-Time | Hybrid-Neural-MPM für interaktive Fluidsimulationen in Echtzeit | 用于实时交互流力模拟的神经-MPM混合神经-MPM 2505.18926v1 |
Authors: Jingxuan Xu, Hong Huang, Chuhang Zou, Manolis Savva, Yunchao Wei, Wuyang Chen
We propose a neural physics system for real-time, interactive fluid simulations. Traditional physics-based methods, while accurate, are computationally intensive and suffer from latency issues. Recent machine-learning methods reduce computational costs while preserving fidelity; yet most still fail to satisfy the latency constraints for real-time use and lack support for interactive applications. To bridge this gap, we introduce a novel hybrid method that integrates numerical simulation, neural physics, and generative control. Our neural physics jointly pursues low-latency simulation and high physical fidelity by employing a fallback safeguard to classical numerical solvers. Furthermore, we develop a diffusion-based controller that is trained using a reverse modeling strategy to generate external dynamic force fields for fluid manipulation. Our system demonstrates robust performance across diverse 2D/3D scenarios, material types, and obstacle interactions, achieving real-time simulations at high frame rates (11~29% latency) while enabling fluid control guided by user-friendly freehand sketches. We present a significant step towards practical, controllable, and physically plausible fluid simulations for real-time interactive applications. We promise to release both models and data upon acceptance.
nan
Article 1751
Title@2025-05-25 (7): Graph-Based Operator Learning from Limited Data on Irregular Domains
Title: Graph-Based Operator Learning from Limited Data on Irregular Domains | Graph-based Operator Lernen von begrenzten Daten über irreguläre Domains | 以图图为基础的操作员 学习关于非常规域域的有限数据 2505.18923v1 |
Authors: Yile Li, Shandian Zhe
Operator learning seeks to approximate mappings from input functions to output solutions, particularly in the context of partial differential equations (PDEs). While recent advances such as DeepONet and Fourier Neural Operator (FNO) have demonstrated strong performance, they often rely on regular grid discretizations, limiting their applicability to complex or irregular domains. In this work, we propose a Graph-based Operator Learning with Attention (GOLA) framework that addresses this limitation by constructing graphs from irregularly sampled spatial points and leveraging attention-enhanced Graph Neural Netwoks (GNNs) to model spatial dependencies with global information. To improve the expressive capacity, we introduce a Fourier-based encoder that projects input functions into a frequency space using learnable complex coefficients, allowing for flexible embeddings even with sparse or nonuniform samples. We evaluated our approach across a range of 2D PDEs, including Darcy Flow, Advection, Eikonal, and Nonlinear Diffusion, under varying sampling densities. Our method consistently outperforms baselines, particularly in data-scarce regimes, demonstrating strong generalization and efficiency on irregular domains.
nan
Article 1752
Title@2025-05-25 (7): ALPCAHUS: Subspace Clustering for Heteroscedastic Data
Title: ALPCAHUS: Subspace Clustering for Heteroscedastic Data | ALPCAHUS: Subraum-Clustering für heterosexuelle Daten | ALPCAHUS: 用于河流测量数据的子空间集群 2505.18918v1 |
Authors: Javier Salazar Cavazos, Jeffrey A Fessler, Laura Balzano
Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. Various methods have been proposed to extend PCA to the union of subspace (UoS) setting for clustering data that come from multiple subspaces like K-Subspaces (KSS). However, some applications involve heterogeneous data that vary in quality due to noise characteristics associated with each data sample. Heteroscedastic methods aim to deal with such mixed data quality. This paper develops a heteroscedastic-focused subspace clustering method, named ALPCAHUS, that can estimate the sample-wise noise variances and use this information to improve the estimate of the subspace bases associated with the low-rank structure of the data. This clustering algorithm builds on K-Subspaces (KSS) principles by extending the recently proposed heteroscedastic PCA method, named LR-ALPCAH, for clusters with heteroscedastic noise in the UoS setting. Simulations and real-data experiments show the effectiveness of accounting for data heteroscedasticity compared to existing clustering algorithms. Code available at https://github.com/javiersc1/ALPCAHUS.
nan
Article 1753
Title@2025-05-25 (7): Behavior Injection: Preparing Language Models for Reinforcement Learning
Title: Behavior Injection: Preparing Language Models for Reinforcement Learning | Verhaltensinjektion: Vorbereitung von Sprachmodellen für verstärktes Lernen | 行为注射:为强化学习准备语言模式 2505.18917v1 |
Authors: Zhepeng Cen, Yihang Yao, William Han, Zuxin Liu, Ding Zhao
Reinforcement fine-tuning (RFT) has emerged as a powerful post-training technique to incentivize the reasoning ability of large language models (LLMs). However, LLMs can respond very inconsistently to RFT: some show substantial performance gains, while others plateau or even degrade. To understand this divergence, we analyze the per-step influence of the RL objective and identify two key conditions for effective post-training: (1) RL-informative rollout accuracy, and (2) strong data co-influence, which quantifies how much the training data affects performance on other samples. Guided by these insights, we propose behavior injection, a task-agnostic data-augmentation scheme applied prior to RL. Behavior injection enriches the supervised finetuning (SFT) data by seeding exploratory and exploitative behaviors, effectively making the model more RL-ready. We evaluate our method across two reasoning benchmarks with multiple base models. The results demonstrate that our theoretically motivated augmentation can significantly increases the performance gain from RFT over the pre-RL model.
nan
Article 1754
Title@2025-05-25 (7): PySAD: A Streaming Anomaly Detection Framework in Python
Title: PySAD: A Streaming Anomaly Detection Framework in Python | PySAD: Ein Streaming-Anomaly Detection-Framework in Python | PySAD: Python 流动异常检测框架 2009.02572v2 |
Authors: Selim F. Yilmaz, Suleyman S. Kozat
Streaming anomaly detection requires algorithms that operate under strict constraints: bounded memory, single-pass processing, and constant-time complexity. We present PySAD, a comprehensive Python framework addressing these challenges through a unified architecture. The framework implements 17+ streaming algorithms (LODA, Half-Space Trees, xStream) with specialized components including projectors, probability calibrators, and postprocessors. Unlike existing batch-focused frameworks, PySAD enables efficient real-time processing with bounded memory while maintaining compatibility with PyOD and scikit-learn. Supporting all learning paradigms for univariate and multivariate streams, PySAD provides the most comprehensive streaming anomaly detection toolkit in Python. The source code is publicly available at github.com/selimfirat/pysad.
nan
Article 1755
Title@2025-05-25 (7): Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach
Title: Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach | Multimodale LLMs unter Verteilungsverschiebungen verstehen: Ein informationstheoretischer Ansatz | 在分销变更下理解多式LLMs:信息理论方法 2502.00577v2 |
Authors: Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li
Multimodal large language models (MLLMs) have shown promising capabilities but struggle under distribution shifts, where evaluation data differ from instruction tuning distributions. Although previous works have provided empirical evaluations, we argue that establishing a formal framework that can characterize and quantify the risk of MLLMs is necessary to ensure the safe and reliable application of MLLMs in the real world. By taking an information-theoretic perspective, we propose the first theoretical framework that enables the quantification of the maximum risk of MLLMs under distribution shifts. Central to our framework is the introduction of Effective Mutual Information (EMI), a principled metric that quantifies the relevance between input queries and model responses. We derive an upper bound for the EMI difference between in-distribution (ID) and out-of-distribution (OOD) data, connecting it to visual and textual distributional discrepancies. Extensive experiments on real benchmark datasets, spanning 61 shift scenarios, empirically validate our theoretical insights.
nan
Article 1756
Title@2025-05-25 (7): On the Role of Label Noise in the Feature Learning Process
Title: On the Role of Label Noise in the Feature Learning Process | Über die Rolle von Etikettengeräuschen im Feature-Learning-Prozess | 关于标签噪音在专题学习过程中的作用 2505.18909v1 |
Authors: Andi Han, Wei Huang, Zhanpeng Zhou, Gang Niu, Wuyang Chen, Junchi Yan, Akiko Takeda, Taiji Suzuki
Deep learning with noisy labels presents significant challenges. In this work, we theoretically characterize the role of label noise from a feature learning perspective. Specifically, we consider a signal-noise data distribution, where each sample comprises a label-dependent signal and label-independent noise, and rigorously analyze the training dynamics of a two-layer convolutional neural network under this data setup, along with the presence of label noise. Our analysis identifies two key stages. In Stage I, the model perfectly fits all the clean samples (i.e., samples without label noise) while ignoring the noisy ones (i.e., samples with noisy labels). During this stage, the model learns the signal from the clean samples, which generalizes well on unseen data. In Stage II, as the training loss converges, the gradient in the direction of noise surpasses that of the signal, leading to overfitting on noisy samples. Eventually, the model memorizes the noise present in the noisy samples and degrades its generalization ability. Furthermore, our analysis provides a theoretical basis for two widely used techniques for tackling label noise: early stopping and sample selection. Experiments on both synthetic and real-world setups validate our theory.
nan
Article 1757
Title@2025-05-25 (7): Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations
Title: Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations | Stärkere Durchsetzung der Instruktionshierarchie durch Augmented Intermediate Representations | 通过扩大中级代表,加强执行指示分级制度 2505.18907v1 |
Authors: Sanjay Kariyappa, G. Edward Suh
Prompt injection attacks are a critical security vulnerability in large language models (LLMs), allowing attackers to hijack model behavior by injecting malicious instructions within the input context. Recent defense mechanisms have leveraged an Instruction Hierarchy (IH) Signal, often implemented through special delimiter tokens or additive embeddings to denote the privilege level of input tokens. However, these prior works typically inject the IH signal exclusively at the initial input layer, which we hypothesize limits its ability to effectively distinguish the privilege levels of tokens as it propagates through the different layers of the model. To overcome this limitation, we introduce a novel approach that injects the IH signal into the intermediate token representations within the network. Our method augments these representations with layer-specific trainable embeddings that encode the privilege information. Our evaluations across multiple models and training methods reveal that our proposal yields between $1.6\times$ and $9.2\times$ reduction in attack success rate on gradient-based prompt injection attacks compared to state-of-the-art methods, without significantly degrading the model’s utility.
nan
Article 1758
Title@2025-05-24 (6): Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services
Title: Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services | Pre-trained Encoder-Schlussfolgerung: Enthüllen Upstream-Encoder in Downstream Machine Learning Services | 培训前编码器推断:在下游机器学习服务中向上游编码器 2408.02814v2 |
Authors: Shaopeng Fu, Xuexue Sun, Ke Qing, Tianhang Zheng, Di Wang
Pre-trained encoders available online have been widely adopted to build downstream machine learning (ML) services, but various attacks against these encoders also post security and privacy threats toward such a downstream ML service paradigm. We unveil a new vulnerability: the Pre-trained Encoder Inference (PEI) attack, which can extract sensitive encoder information from a targeted downstream ML service that can then be used to promote other ML attacks against the targeted service. By only providing API accesses to a targeted downstream service and a set of candidate encoders, the PEI attack can successfully infer which encoder is secretly used by the targeted service based on candidate ones. Compared with existing encoder attacks, which mainly target encoders on the upstream side, the PEI attack can compromise encoders even after they have been deployed and hidden in downstream ML services, which makes it a more realistic threat. We empirically verify the effectiveness of the PEI attack on vision encoders. we first conduct PEI attacks against two downstream services (i.e., image classification and multimodal generation), and then show how PEI attacks can facilitate other ML attacks (i.e., model stealing attacks vs. image classification models and adversarial attacks vs. multimodal generative models). Our results call for new security and privacy considerations when deploying encoders in downstream services. The code is available at https://github.com/fshp971/encoder-inference.
nan
Article 1759
Title@2025-05-24 (6): PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models
Title: PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models | PromptWise: Online-Lernen für kostenbewusste Prompt-Zuweisung in generativen Modellen | 快速Wise:在创用模型中进行成本-软件快速指派在线学习 2505.18901v1 |
Authors: Xiaoyan Hu, Lauren Pick, Ho-fung Leung, Farzan Farnia
The rapid advancement of generative AI models has provided users with numerous options to address their prompts. When selecting a generative AI model for a given prompt, users should consider not only the performance of the chosen model but also its associated service cost. The principle guiding such consideration is to select the least expensive model among the available satisfactory options. However, existing model-selection approaches typically prioritize performance, overlooking pricing differences between models. In this paper, we introduce PromptWise, an online learning framework designed to assign a sequence of prompts to a group of large language models (LLMs) in a cost-effective manner. PromptWise strategically queries cheaper models first, progressing to more expensive options only if the lower-cost models fail to adequately address a given prompt. Through numerical experiments, we demonstrate PromptWise’s effectiveness across various tasks, including puzzles of varying complexity and code generation/translation tasks. The results highlight that PromptWise consistently outperforms cost-unaware baseline methods, emphasizing that directly assigning prompts to the most expensive models can lead to higher costs and potentially lower average performance.
nan
Article 1760
Title@2025-05-24 (6): Beyond Domain Randomization: Event-Inspired Perception for Visually Robust Adversarial Imitation from Videos
Title: Beyond Domain Randomization: Event-Inspired Perception for Visually Robust Adversarial Imitation from Videos | Beyond Domain Randomization: Event-inspirierte Wahrnehmung für visuell robuste Adversarial Imitation aus Videos | 超出域随机化: 视频中视觉强力反逆模仿受事件启发的感知 2505.18899v1 |
Authors: Andrea Ramazzina, Vittorio Giammarino, Matteo El-Hariry, Mario Bijelic
Imitation from videos often fails when expert demonstrations and learner environments exhibit domain shifts, such as discrepancies in lighting, color, or texture. While visual randomization partially addresses this problem by augmenting training data, it remains computationally intensive and inherently reactive, struggling with unseen scenarios. We propose a different approach: instead of randomizing appearances, we eliminate their influence entirely by rethinking the sensory representation itself. Inspired by biological vision systems that prioritize temporal transients (e.g., retinal ganglion cells) and by recent sensor advancements, we introduce event-inspired perception for visually robust imitation. Our method converts standard RGB videos into a sparse, event-based representation that encodes temporal intensity gradients, discarding static appearance features. This biologically grounded approach disentangles motion dynamics from visual style, enabling robust visual imitation from observations even in the presence of visual mismatches between expert and agent environments. By training policies on event streams, we achieve invariance to appearance-based distractors without requiring computationally expensive and environment-specific data augmentation techniques. Experiments across the DeepMind Control Suite and the Adroit platform for dynamic dexterous manipulation show the efficacy of our method. Our code is publicly available at Eb-LAIfO.
nan
Article 1761
Title@2025-05-24 (6): Marginal Fairness: Fair Decision-Making under Risk Measures
Title: Marginal Fairness: Fair Decision-Making under Risk Measures | Marginal Fairness: Faire Entscheidungsfindung im Rahmen von Risikomaßnahmen | 边际公平:风险措施下的公平决策 2505.18895v1 |
Authors: Fei Huang, Silvana M. Pesenti
This paper introduces marginal fairness, a new individual fairness notion for equitable decision-making in the presence of protected attributes such as gender, race, and religion. This criterion ensures that decisions based on generalized distortion risk measures are insensitive to distributional perturbations in protected attributes, regardless of whether these attributes are continuous, discrete, categorical, univariate, or multivariate. To operationalize this notion and reflect real-world regulatory environments (such as the EU gender-neutral pricing regulation), we model business decision-making in highly regulated industries (such as insurance and finance) as a two-step process: (i) a predictive modeling stage, in which a prediction function for the target variable (e.g., insurance losses) is estimated based on both protected and non-protected covariates; and (ii) a decision-making stage, in which a generalized distortion risk measure is applied to the target variable, conditional only on non-protected covariates, to determine the decision. In this second step, we modify the risk measure such that the decision becomes insensitive to the protected attribute, thus enforcing fairness to ensure equitable outcomes under risk-sensitive, regulatory constraints. Furthermore, by utilizing the concept of cascade sensitivity, we extend the marginal fairness framework to capture how dependencies between covariates propagate the influence of protected attributes through the modeling pipeline. A numerical study and an empirical implementation using an auto insurance dataset demonstrate how the framework can be applied in practice.
nan
Article 1762
Title@2025-05-24 (6): Conformal Prediction for Uncertainty Estimation in Drug-Target Interaction Prediction
Title: Conformal Prediction for Uncertainty Estimation in Drug-Target Interaction Prediction | Konforme Vorhersage für Unsicherheitsschätzungen in der Drogen-Ziel-Interaktionsvorhersage | 药物-目标相互作用预测中不确定性估计的 非正式预测 2505.18890v1 |
Authors: Morteza Rakhshaninejad, Mira Jurgens, Nicolas Dewolf, Willem Waegeman
Accurate drug-target interaction (DTI) prediction with machine learning models is essential for drug discovery. Such models should also provide a credible representation of their uncertainty, but applying classical marginal conformal prediction (CP) in DTI prediction often overlooks variability across drug and protein subgroups. In this work, we analyze three cluster-conditioned CP methods for DTI prediction, and compare them with marginal and group-conditioned CP. Clusterings are obtained via nonconformity scores, feature similarity, and nearest neighbors, respectively. Experiments on the KIBA dataset using four data-splitting strategies show that nonconformity-based clustering yields the tightest intervals and most reliable subgroup coverage, especially in random and fully unseen drug-protein splits. Group-conditioned CP works well when one entity is familiar, but residual-driven clustering provides robust uncertainty estimates even in sparse or novel scenarios. These results highlight the potential of cluster-based CP for improving DTI prediction under uncertainty.
nan
Article 1763
Title@2025-05-24 (6): Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
Title: Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators | Ermöglichung unstrukturierter Spars-Beschleunigung bei strukturierten Spars-Beschleunigern | 启用结构散开加速器, 启用无结构的分散加速器 2403.07953v3 |
Authors: Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna
Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support, but it provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse HW cannot be accelerated by other structured hardware. To enable acceleration using unstructured sparsity of DNNs on structured sparse hardware, we propose an approximation method leveraging the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. We also develop a software framework, TASDER, to apply high-quality structured approximation on weights and activations of DNNs. Our method accelerates dense and sparse DNNs without fine-tuning and improves energy-delay-product (EDP) by up to 83% and 74%. It achieves up to 39% speed-up on a real system.
nan
Article 1764
Title@2025-05-24 (6): Neural Encoding and Decoding at Scale
Title: Neural Encoding and Decoding at Scale | Neurale Enkodierung und Dekodierung auf Scale | 缩放时神经编码和解码 2504.08201v4 |
Authors: Yizi Zhang, Yanchen Wang, Mehdi Azabou, Alexandre Andre, Zixuan Wang, Hanrui Lyu, The International Brain Laboratory, Eva Dyer, Liam Paninski, Cole Hurwitz
Recent work has demonstrated that large-scale, multi-animal models are powerful tools for characterizing the relationship between neural activity and behavior. Current large-scale approaches, however, focus exclusively on either predicting neural activity from behavior (encoding) or predicting behavior from neural activity (decoding), limiting their ability to capture the bidirectional relationship between neural activity and behavior. To bridge this gap, we introduce a multimodal, multi-task model that enables simultaneous Neural Encoding and Decoding at Scale (NEDS). Central to our approach is a novel multi-task-masking strategy, which alternates between neural, behavioral, within-modality, and cross-modality masking. We pretrain our method on the International Brain Laboratory (IBL) repeated site dataset, which includes recordings from 83 animals performing the same visual decision-making task. In comparison to other large-scale models, we demonstrate that NEDS achieves state-of-the-art performance for both encoding and decoding when pretrained on multi-animal data and then fine-tuned on new animals. Surprisingly, NEDS’s learned embeddings exhibit emergent properties: even without explicit training, they are highly predictive of the brain regions in each recording. Altogether, our approach is a step towards a foundation model of the brain that enables seamless translation between neural activity and behavior.
nan
Article 1765
Title@2025-05-24 (6): Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey
Title: Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey | Datenvergrößerung für die Zeitreihenklassifikation: Eine umfangreiche empirische Studie und umfassende Umfrage | 时间-系列分类数据扩充:广泛经验研究和全面调查 2310.10060v6 |
Authors: Zijun Gao, Haibao Liu, Lingbo Li
Data Augmentation (DA) has become a critical approach in Time Series Classification (TSC), primarily for its capacity to expand training datasets, enhance model robustness, introduce diversity, and reduce overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible and user-oriented tools. This study addresses these challenges through a comprehensive examination of DA methodologies within the TSC domain.Our research began with an extensive literature review spanning a decade, revealing significant gaps in existing surveys and necessitating a detailed analysis of over 100 scholarly articles to identify more than 60 distinct DA techniques. This rigorous review led to the development of a novel taxonomy tailored to the specific needs of DA in TSC, categorizing techniques into five primary categories: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. This taxonomy is intended to guide researchers in selecting appropriate methods with greater clarity. In response to the lack of comprehensive evaluations of foundational DA techniques, we conducted a thorough empirical study, testing nearly 20 DA strategies across 15 diverse datasets representing all types within the UCR time-series repository. Using ResNet and LSTM architectures, we employed a multifaceted evaluation approach, including metrics such as Accuracy, Method Ranking, and Residual Analysis, resulting in a benchmark accuracy of 84.98 +- 16.41% in ResNet and 82.41 +- 18.71% in LSTM. Our investigation underscored the inconsistent efficacies of DA techniques, for instance, methods like RGWs and Random Permutation significantly improved model performance, whereas others, like EMD, were less effective.
nan
Article 1766
Title@2025-05-24 (6): KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
Title: KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning | KerZOO: Kernel-Funktion informierte Zeroth-Order-Optimierung für präzise und beschleunigte LLM-Feinsteuerung | KerZOO:为准确和加速 LLM 精密推荐而优化使用核心(KerZOO): 2505.18886v1 |
Authors: Zhendong Mi, Qitao Tan, Xiaodong Yu, Zining Zhu, Geng Yuan, Shaoyi Huang
Large language models (LLMs) have demonstrated impressive capabilities across numerous NLP tasks. Nevertheless, conventional first-order fine-tuning techniques impose heavy memory demands, creating practical obstacles to real-world applications. Zeroth-order (ZO) optimization has recently emerged as a promising memory-efficient alternative, as it circumvents the need for backpropagation by estimating gradients solely through forward passes–making it particularly suitable for resource-limited environments. Despite its efficiency, ZO optimization suffers from gradient estimation bias, which significantly hinders convergence speed. To address this, we analytically identify and characterize the lower-order bias introduced during ZO-based gradient estimation in LLM fine-tuning. Motivated by tools in mathematical physics, we introduce a kernel-function-based ZO framework aimed at mitigating this bias and improving optimization stability. KerZOO achieves comparable or superior performance to existing ZO baselines in both full-parameter and parameter-efficient fine-tuning settings of LLMs, while significantly reducing the number of iterations required to reach convergence. For example, KerZOO reduces total GPU training hours by as much as 74% and 44% on WSC and MultiRC datasets in fine-tuning OPT-2.7B model and can exceed the MeZO baseline by 2.9% and 2.6% in accuracy. We show that the kernel function is an effective avenue for reducing estimation bias in ZO methods.
nan
Article 1767
Title@2025-05-24 (6): LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
Title: LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders | LORE: Lagrangian-optimierte robuste Einbettungen für visuelle Encoder | Lagrangian- 优化的视觉编码器强力嵌入器 2505.18884v1 |
Authors: Borna Khodabandeh, Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall, Sajjad Amini, Seyed-Mohsen Moosavi-Dezfooli
Visual encoders have become fundamental components in modern computer vision pipelines. However, ensuring robustness against adversarial perturbations remains a critical challenge. Recent efforts have explored both supervised and unsupervised adversarial fine-tuning strategies. We identify two key limitations in these approaches: (i) they often suffer from instability, especially during the early stages of fine-tuning, resulting in suboptimal convergence and degraded performance on clean data, and (ii) they exhibit a suboptimal trade-off between robustness and clean data accuracy, hindering the simultaneous optimization of both objectives. To overcome these challenges, we propose Lagrangian-Optimized Robust Embeddings (LORE), a novel unsupervised adversarial fine-tuning framework. LORE utilizes constrained optimization, which offers a principled approach to balancing competing goals, such as improving robustness while preserving nominal performance. By enforcing embedding-space proximity constraints, LORE effectively maintains clean data performance throughout adversarial fine-tuning. Extensive experiments show that LORE significantly improves zero-shot adversarial robustness with minimal degradation in clean data accuracy. Furthermore, we demonstrate the effectiveness of the adversarially fine-tuned CLIP image encoder in out-of-distribution generalization and enhancing the interpretability of image embeddings.
nan
Article 1768
Title@2025-05-24 (6): LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Title: LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity | LinGen: Auf dem Weg zur High-Resolution Minute-Length Text-to-Video-Generation mit linearer Computational Complexity | LinGen:迈向具有线性比较复杂度的高分辨率分钟-语言文本到视频的生成 2412.09856v2 |
Authors: Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai
Text-to-video generation enhances content creation but is highly computationally intensive: The computational cost of Diffusion Transformers (DiTs) scales quadratically in the number of pixels. This makes minute-length video generation extremely expensive, limiting most existing models to generating videos of only 10-20 seconds length. We propose a Linear-complexity text-to-video Generation (LinGen) framework whose cost scales linearly in the number of pixels. For the first time, LinGen enables high-resolution minute-length video generation on a single GPU without compromising quality. It replaces the computationally-dominant and quadratic-complexity block, self-attention, with a linear-complexity block called MATE, which consists of an MA-branch and a TE-branch. The MA-branch targets short-to-long-range correlations, combining a bidirectional Mamba2 block with our token rearrangement method, Rotary Major Scan, and our review tokens developed for long video generation. The TE-branch is a novel TEmporal Swin Attention block that focuses on temporal correlations between adjacent tokens and medium-range tokens. The MATE block addresses the adjacency preservation issue of Mamba and improves the consistency of generated videos significantly. Experimental results show that LinGen outperforms DiT (with a 75.6% win rate) in video quality with up to 15$\times$ (11.5$\times$) FLOPs (latency) reduction. Furthermore, both automatic metrics and human evaluation demonstrate our LinGen-4B yields comparable video quality to state-of-the-art models (with a 50.5%, 52.1%, 49.1% win rate with respect to Gen-3, LumaLabs, and Kling, respectively). This paves the way to hour-length movie generation and real-time interactive video generation. We provide 68s video generation results and more examples in our project website: https://lineargen.github.io/.
nan
Article 1769
Title@2025-05-24 (6): Partition Generative Modeling: Masked Modeling Without Masks
Title: Partition Generative Modeling: Masked Modeling Without Masks | Partition Generative Modellierung: Maskenmodellierung ohne Masken | 生成建模:没有遮罩的蒙面建模 2505.18883v1 |
Authors: Justin Deschenaux, Lan Tran, Caglar Gulcehre
We introduce ``Partition Generative Models’’ (PGMs), a novel approach to masked generative modeling (MGMs), particularly effective for masked diffusion language modeling (MDLMs). PGM divides tokens into two distinct groups and employs sparse attention patterns to prevent cross-group information exchange. Hence, the model is trained to predict tokens in one group based solely on information from the other group. This partitioning strategy eliminates the need for MASK tokens entirely. While traditional MGMs inefficiently process MASK tokens during generation, PGMs achieve greater computational efficiency by operating exclusively on unmasked tokens. Our experiments on OpenWebText with a context length of 1024 tokens demonstrate that PGMs deliver at least 5x improvements in both latency and throughput compared to MDLM when using the same number of sampling steps, while generating samples with better generative perplexity than MDLM. Finally, we show that PGMs can be distilled with Self-Distillation Through Time (SDTT), a method originally devised for MDLM, in order to achieve further inference gains.
nan
Article 1770
Title@2025-05-24 (6): RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Title: RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models | RefLoRA: Refactored Low-Rank-Anpassung für effizientes Feintuning großer Modelle | RefLORA:为对大型模型进行高效微调而进行重构的低Rank适应 2505.18877v1 |
Authors: Yilang Zhang, Bingcong Li, Georgios B. Giannakis
Low-Rank Adaptation (LoRA) lowers the computational and memory overhead of fine-tuning large models by updating a low-dimensional subspace of the pre-trained weight matrix. Albeit efficient, LoRA exhibits suboptimal convergence and noticeable performance degradation, due to inconsistent and imbalanced weight updates induced by its nonunique low-rank factorizations. To overcome these limitations, this article identifies the optimal low-rank factorization per step that minimizes an upper bound on the loss. The resultant refactored low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates, thus speeding up stable convergence. Extensive experiments evaluate RefLoRA on natural language understanding, and commonsense reasoning tasks with popular large language models including DeBERTaV3, LLaMA-7B, LLaMA2-7B and LLaMA3-8B. The numerical tests corroborate that RefLoRA converges faster, outperforms various benchmarks, and enjoys negligible computational overhead compared to state-of-the-art LoRA variants.
nan
Article 1771
Title@2025-05-24 (6): Non-Stationary Lipschitz Bandits
Title: Non-Stationary Lipschitz Bandits | Nicht-stationäre Lipschitz Banditen | 非固定的利普施奇茨猛匪 2505.18871v1 |
Authors: Nicolas Nguyen, Solenne Gaucher, Claire Vernade
We study the problem of non-stationary Lipschitz bandits, where the number of actions is infinite and the reward function, satisfying a Lipschitz assumption, can change arbitrarily over time. We design an algorithm that adaptively tracks the recently introduced notion of significant shifts, defined by large deviations of the cumulative reward function. To detect such reward changes, our algorithm leverages a hierarchical discretization of the action space. Without requiring any prior knowledge of the non-stationarity, our algorithm achieves a minimax-optimal dynamic regret bound of $\mathcal{\widetilde{O}}(\tilde{L}^{1/3}T^{2/3})$, where $\tilde{L}$ is the number of significant shifts and $T$ the horizon. This result provides the first optimal guarantee in this setting.
nan
Article 1772
Title@2025-05-24 (6): Sci-LoRA: Mixture of Scientific LoRAs for Cross-Domain Lay Paraphrasing
Title: Sci-LoRA: Mixture of Scientific LoRAs for Cross-Domain Lay Paraphrasing | Sci-LoRA: Mischung aus wissenschaftlichen LoRAs für Cross-Domain Lay Paraphrasing | Sci-LORA:将科学LORA混合起来,用于跨域地谱图谱绘制 2505.18867v1 |
Authors: Ming Cheng, Jiaying Gong, Hoda Eldardiry
Lay paraphrasing aims to make scientific information accessible to audiences without technical backgrounds. However, most existing studies focus on a single domain, such as biomedicine. With the rise of interdisciplinary research, it is increasingly necessary to comprehend knowledge spanning multiple technical fields. To address this, we propose Sci-LoRA, a model that leverages a mixture of LoRAs fine-tuned on multiple scientific domains. In particular, Sci-LoRA dynamically generates and applies weights for each LoRA, enabling it to adjust the impact of different domains based on the input text, without requiring explicit domain labels. To balance domain-specific knowledge and generalization across various domains, Sci-LoRA integrates information at both the data and model levels. This dynamic fusion enhances the adaptability and performance across various domains. Experimental results across twelve domains on five public datasets show that Sci-LoRA significantly outperforms state-of-the-art large language models and demonstrates flexible generalization and adaptability in cross-domain lay paraphrasing.
nan
Article 1773
Title@2025-05-24 (6): Distribution-Aware Mobility-Assisted Decentralized Federated Learning
Title: Distribution-Aware Mobility-Assisted Decentralized Federated Learning | Distribution-Aware Mobility-Assisted Dezentrales Federated Learning | 分发通知 – – 流动协助 – – 分权力下放的联邦学习 2505.18866v1 |
Authors: Md Farhamdur Reza, Reza Jahani, Richeng Jin, Huaiyu Dai
Decentralized federated learning (DFL) has attracted significant attention due to its scalability and independence from a central server. In practice, some participating clients can be mobile, yet the impact of user mobility on DFL performance remains largely unexplored, despite its potential to facilitate communication and model convergence. In this work, we demonstrate that introducing a small fraction of mobile clients, even with random movement, can significantly improve the accuracy of DFL by facilitating information flow. To further enhance performance, we propose novel distribution-aware mobility patterns, where mobile clients strategically navigate the network, leveraging knowledge of data distributions and static client locations. The proposed moving strategies mitigate the impact of data heterogeneity and boost learning convergence. Extensive experiments validate the effectiveness of induced mobility in DFL and demonstrate the superiority of our proposed mobility patterns over random movement.
nan
Article 1774
Title@2025-05-24 (6): Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning
Title: Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning | Geführt von Guardrails: Steuerungsbarrierenfunktionen als Sicherheitsinstruktoren für das Roboterlernen | 由警卫队指导:作为机器人学习安全教官的控制障碍功能 2505.18858v1 |
Authors: Maeva Guerrier, Karthik Soma, Hassan Fouad, Giovanni Beltrame
Safety stands as the primary obstacle preventing the widespread adoption of learning-based robotic systems in our daily lives. While reinforcement learning (RL) shows promise as an effective robot learning paradigm, conventional RL frameworks often model safety by using single scalar negative rewards with immediate episode termination, failing to capture the temporal consequences of unsafe actions (e.g., sustained collision damage). In this work, we introduce a novel approach that simulates these temporal effects by applying continuous negative rewards without episode termination. Our experiments reveal that standard RL methods struggle with this model, as the accumulated negative values in unsafe zones create learning barriers. To address this challenge, we demonstrate how Control Barrier Functions (CBFs), with their proven safety guarantees, effectively help robots avoid catastrophic regions while enhancing learning outcomes. We present three CBF-based approaches, each integrating traditional RL methods with Control Barrier Functions, guiding the agent to learn safe behavior. Our empirical analysis, conducted in both simulated environments and real-world settings using a four-wheel differential drive robot, explores the possibilities of employing these approaches for safe robotic learning.
nan
Article 1775
Title@2025-05-24 (6): USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations
Title: USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations | USDC: Ein Datensatz von $\underline{U}$ser $\underline{S}$tance und $\underline{D}$ogmatism in langen $\underline{C}$onversations | USCC: 以 $\ underline{U}$ser $\ underline{S}$tance 和 $\ underline{D}$ogmatism 的数据集, 以 Long $\ underline{C} 美元对数值 2406.16833v2 |
Authors: Mounika Marreddy, Subba Reddy Oota, Venkata Charan Chinni, Manish Gupta, Lucie Flek
Analyzing user opinion changes in long conversation threads is extremely critical for applications like enhanced personalization, market research, political campaigns, customer service, targeted advertising, and content moderation. Unfortunately, previous studies on stance and dogmatism in user conversations have focused on training models using datasets annotated at the post level, treating each post as independent and randomly sampling posts from conversation threads. Hence, first, we build a dataset for studying user opinion fluctuations in 764 long multi-user Reddit conversation threads, called USDC. USDC contains annotations for 2 tasks: i) User Stance classification, which involves labeling a user’s stance in a post within a conversation on a five-point scale; ii) User Dogmatism classification, which involves labeling a user’s overall opinion in the conversation on a four-point scale. Besides being time-consuming and costly, manual annotations for USDC are challenging because: 1) Conversation threads could be very long, increasing the chances of noisy annotations; and 2) Interpreting instances where a user changes their opinion within a conversation is difficult because often such transitions are subtle and not expressed explicitly. Hence, we leverage majority voting on zero-shot, one-shot, and few-shot annotations from Mistral Large and GPT-4 to automate the annotation process. Human annotations on 200 test conversations achieved inter-annotator agreement scores of 0.49 for stance and 0.50 for dogmatism with these LLM annotations, indicating a reasonable level of consistency between human and LLM annotations. USDC is then used to finetune and instruction-tune multiple deployable small language models like LLaMA, Falcon and Vicuna for the stance and dogmatism classification tasks. We make the code and dataset publicly available [https://github.com/mounikamarreddy/USDC].
nan
Article 1776
Title@2025-05-24 (6): Toward Malicious Clients Detection in Federated Learning
Title: Toward Malicious Clients Detection in Federated Learning | Auf dem Weg zu bösartigen Kunden Erkennung im Föderierten Lernen | 争取在联邦学习中发现恶意客户 2505.09110v2 |
Authors: Zhihao Dou, Jiaqi Wang, Wei Sun, Zhuqing Liu, Minghong Fang
Federated learning (FL) enables multiple clients to collaboratively train a global machine learning model without sharing their raw data. However, the decentralized nature of FL introduces vulnerabilities, particularly to poisoning attacks, where malicious clients manipulate their local models to disrupt the training process. While Byzantine-robust aggregation rules have been developed to mitigate such attacks, they remain inadequate against more advanced threats. In response, recent advancements have focused on FL detection techniques to identify potentially malicious participants. Unfortunately, these methods often misclassify numerous benign clients as threats or rely on unrealistic assumptions about the server’s capabilities. In this paper, we propose a novel algorithm, SafeFL, specifically designed to accurately identify malicious clients in FL. The SafeFL approach involves the server collecting a series of global models to generate a synthetic dataset, which is then used to distinguish between malicious and benign models based on their behavior. Extensive testing demonstrates that SafeFL outperforms existing methods, offering superior efficiency and accuracy in detecting malicious clients.
nan
Article 1777
Title@2025-05-24 (6): Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation
Title: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation | Korruption-Bewusst Training von latenten Video-Diffusions-Modellen für robuste Text-zu-Video-Generation | 原始视频视频传播模型的反腐败知识培训 2505.21545v1 |
Authors: Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang
Latent Video Diffusion Models (LVDMs) achieve high-quality generation but are sensitive to imperfect conditioning, which causes semantic drift and temporal incoherence on noisy, web-scale video-text datasets. We introduce CAT-LVDM, the first corruption-aware training framework for LVDMs that improves robustness through structured, data-aligned noise injection. Our method includes Batch-Centered Noise Injection (BCNI), which perturbs embeddings along intra-batch semantic directions to preserve temporal consistency. BCNI is especially effective on caption-rich datasets like WebVid-2M, MSR-VTT, and MSVD. We also propose Spectrum-Aware Contextual Noise (SACN), which injects noise along dominant spectral directions to improve low-frequency smoothness, showing strong results on UCF-101. On average, BCNI reduces FVD by 31.9% across WebVid-2M, MSR-VTT, and MSVD, while SACN yields a 12.3% improvement on UCF-101. Ablation studies confirm the benefit of low-rank, data-aligned noise. Our theoretical analysis further explains how such perturbations tighten entropy, Wasserstein, score-drift, mixing-time, and generalization bounds. CAT-LVDM establishes a principled, scalable training approach for robust video diffusion under multimodal noise. Code and models: https://github.com/chikap421/catlvdm
nan
Article 1778
Title@2025-05-24 (6): On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
Title: On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization | Auf die Wirkung des negativen Gradienten in der Gruppe Relative Tiefenverstärkung Optimierung | 对群体相对深强化优化中的负梯度效应的影响 2505.18830v1 |
Authors: Wenlong Deng, Yi Ren, Muchen Li, Danica J. Sutherland, Xiaoxiao Li, Christos Thrampoulidis
Reinforcement learning (RL) has become popular in enhancing the reasoning capabilities of large language models (LLMs), with Group Relative Policy Optimization (GRPO) emerging as a widely used algorithm in recent systems. Despite GRPO’s widespread adoption, we identify a previously unrecognized phenomenon we term Lazy Likelihood Displacement (LLD), wherein the likelihood of correct responses marginally increases or even decreases during training. This behavior mirrors a recently discovered misalignment issue in Direct Preference Optimization (DPO), attributed to the influence of negative gradients. We provide a theoretical analysis of GRPO’s learning dynamic, identifying the source of LLD as the naive penalization of all tokens in incorrect responses with the same strength. To address this, we develop a method called NTHR, which downweights penalties on tokens contributing to the LLD. Unlike prior DPO-based approaches, NTHR takes advantage of GRPO’s group-based structure, using correct responses as anchors to identify influential tokens. Experiments on math reasoning benchmarks demonstrate that NTHR effectively mitigates LLD, yielding consistent performance gains across models ranging from 0.5B to 3B parameters.
nan
Article 1779
Title@2025-05-24 (6): Multi-Agent Best Arm Identification in Stochastic Linear Bandits
Title: Multi-Agent Best Arm Identification in Stochastic Linear Bandits | Multi-Agent Best Arm Identification in stochastische Linear Banditen | 斯托切斯定线强盗中多代理最佳武器识别 2411.13690v2 |
Authors: Sanjana Agrawal, Saúl A. Blanco
We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we first consider multiple agents connected through a star network, interacting with a linear bandit instance in parallel. We then extend our analysis to arbitrary network topologies. The objective of the agents is to collaboratively identify the best arm of the given bandit instance with the help of a central server while minimizing the probability of error in best arm estimation. To this end, we propose two algorithms, MaLinBAI-Star and MaLinBAI-Gen for star networks and networks with arbitrary structure, respectively. Both algorithms utilize the technique of G-optimal design along with the successive elimination based strategy where agents share their knowledge through a central server at each communication round. We demonstrate, both theoretically and empirically, that our algorithms achieve exponentially decaying probability of error in the allocated time budget. Furthermore, experimental results on both synthetic and real-world data validate the effectiveness of our algorithms over the state-of-the art existing multi-agent algorithms.
nan
Article 1780
Title@2025-05-24 (6): Improved Regret and Contextual Linear Extension for Pandora’s Box and Prophet Inequality
Title: Improved Regret and Contextual Linear Extension for Pandora’s Box and Prophet Inequality | Verbesserte regret und kontextuelle lineare Erweiterung für Pandora’s Box und Prophet Inequality | 改进潘多拉盒子和先知不平等的遗憾和背景扩展线性扩展 2505.18828v1 |
Authors: Junyan Liu, Ziyun Chen, Kun Wang, Haipeng Luo, Lillian J. Ratliff
We study the Pandora’s Box problem in an online learning setting with semi-bandit feedback. In each round, the learner sequentially pays to open up to $n$ boxes with unknown reward distributions, observes rewards upon opening, and decides when to stop. The utility of the learner is the maximum observed reward minus the cumulative cost of opened boxes, and the goal is to minimize regret defined as the gap between the cumulative expected utility and that of the optimal policy. We propose a new algorithm that achieves $\widetilde{O}(\sqrt{nT})$ regret after $T$ rounds, which improves the $\widetilde{O}(n\sqrt{T})$ bound of Agarwal et al. [2024] and matches the known lower bound up to logarithmic factors. To better capture real-life applications, we then extend our results to a natural but challenging contextual linear setting, where each box’s expected reward is linear in some known but time-varying $d$-dimensional context and the noise distribution is fixed over time. We design an algorithm that learns both the linear function and the noise distributions, achieving $\widetilde{O}(nd\sqrt{T})$ regret. Finally, we show that our techniques also apply to the online Prophet Inequality problem, where the learner must decide immediately whether or not to accept a revealed reward. In both non-contextual and contextual settings, our approach achieves similar improvements and regret bounds.
nan
Article 1781
Title@2025-05-24 (6): A Real-World Energy Management Dataset from a Smart Company Building for Optimization and Machine Learning
Title: A Real-World Energy Management Dataset from a Smart Company Building for Optimization and Machine Learning | Ein Echtzeit-Energiemanagement-Datensatz aus einem Smart Company Building für Optimierung und maschinelles Lernen | 最佳优化和机器学习智能公司大楼的 “ 现实世界能源管理数据集 “ 2503.11469v2 |
Authors: Jens Engel, Andrea Castellani, Patricia Wollstadt, Felix Lanfermann, Thomas Schmitt, Sebastian Schmitt, Lydia Fischer, Steffen Limmer, David Luttropp, Florian Jomrich, René Unger, Tobias Rodemann
We present a large real-world dataset obtained from monitoring a smart company facility over the course of six years, from 2018 to 2023. The dataset includes energy consumption data from various facility areas and components, energy production data from a photovoltaic system and a combined heat and power plant, operational data from heating and cooling systems, and weather data from an on-site weather station. The measurement sensors installed throughout the facility are organized in a hierarchical metering structure with multiple sub-metering levels, which is reflected in the dataset. The dataset contains measurement data from 72 energy meters, 9 heat meters and a weather station. Both raw and processed data at different processing levels, including labeled issues, is available. In this paper, we describe the data acquisition and post-processing employed to create the dataset. The dataset enables the application of a wide range of methods in the domain of energy management, including optimization, modeling, and machine learning to optimize building operations and reduce costs and carbon emissions.
nan
Article 1782
Title@2025-05-24 (6): How to build a consistency model: Learning flow maps via self-distillation
Title: How to build a consistency model: Learning flow maps via self-distillation | Wie man ein Konsistenzmodell baut: Flusskarten über Selbstdestillation lernen | 如何建立一致性模式:通过自我蒸馏学习流程图 2505.18825v1 |
Authors: Nicholas M. Boffi, Michael S. Albergo, Eric Vanden-Eijnden
Building on the framework proposed in Boffi et al. (2024), we present a systematic approach for learning flow maps associated with flow and diffusion models. Flow map-based models, commonly known as consistency models, encompass recent efforts to improve the efficiency of generative models based on solutions to differential equations. By exploiting a relationship between the velocity field underlying a continuous-time flow and the instantaneous rate of change of the flow map, we show how to convert existing distillation schemes into direct training algorithms via self-distillation, eliminating the need for pre-trained models. We empirically evaluate several instantiations of our framework, finding that high-dimensional tasks like image synthesis benefit from objective functions that avoid temporal and spatial derivatives of the flow map, while lower-dimensional tasks can benefit from objectives incorporating higher-order derivatives to capture sharp features.
nan
Article 1783
Title@2025-05-24 (6): Robust multi-coil MRI reconstruction via self-supervised denoising
Title: Robust multi-coil MRI reconstruction via self-supervised denoising | Robuste Multi-Coil-MRT-Rekonstruktion durch selbstüberwachte Denoisierung | 通过自我监督的自监管的去注水进行强有力的多石油MRI重建 2411.12919v4 |
Authors: Asad Aali, Marius Arvinte, Sidharth Kumar, Yamin I. Arefeen, Jonathan I. Tamir
We study the effect of incorporating self-supervised denoising as a pre-processing step for training deep learning (DL) based reconstruction methods on data corrupted by Gaussian noise. K-space data employed for training are typically multi-coil and inherently noisy. Although DL-based reconstruction methods trained on fully sampled data can enable high reconstruction quality, obtaining large, noise-free datasets is impractical. We leverage Generalized Stein’s Unbiased Risk Estimate (GSURE) for denoising. We evaluate two DL-based reconstruction methods: Diffusion Probabilistic Models (DPMs) and Model-Based Deep Learning (MoDL). We evaluate the impact of denoising on the performance of these DL-based methods in solving accelerated multi-coil magnetic resonance imaging (MRI) reconstruction. The experiments were carried out on T2-weighted brain and fat-suppressed proton-density knee scans. We observed that self-supervised denoising enhances the quality and efficiency of MRI reconstructions across various scenarios. Specifically, employing denoised images rather than noisy counterparts when training DL networks results in lower normalized root mean squared error (NRMSE), higher structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) across different SNR levels, including 32dB, 22dB, and 12dB for T2-weighted brain data, and 24dB, 14dB, and 4dB for fat-suppressed knee data. Overall, we showed that denoising is an essential pre-processing technique capable of improving the efficacy of DL-based MRI reconstruction methods under diverse conditions. By refining the quality of input data, denoising enables training more effective DL networks, potentially bypassing the need for noise-free reference MRI scans.
nan
Article 1784
Title@2025-05-24 (6): Fully tensorial approach to hypercomplex neural networks
Title: Fully tensorial approach to hypercomplex neural networks | Voller Tensoransatz für hyperkomplexe neuronale Netzwerke | 对超复合性神经神经网络采取完全强制的全方位方法 2407.00449v3 |
Authors: Agnieszka Niemczynowicz, Radosław Antoni Kycia
Fully tensorial theory of hypercomplex neural networks is given. It allows neural networks to use arithmetic based on arbitrary algebras. The key point is to observe that algebra multiplication can be represented as a rank three tensor and use this tensor in every algebraic operation. This approach is attractive for neural network libraries that support effective tensorial operations. It agrees with previous implementations for four-dimensional algebras.
nan
Article 1785
Title@2025-05-24 (6): Stealing Training Graphs from Graph Neural Networks
Title: Stealing Training Graphs from Graph Neural Networks | Stealing Training Graphen aus Graph Neural Networks | 图表神经网络中的偷窃培训图 2411.11197v2 |
Authors: Minhua Lin, Enyan Dai, Junjie Xu, Jinyuan Jia, Xiang Zhang, Suhang Wang
Graph Neural Networks (GNNs) have shown promising results in modeling graphs in various tasks. The training of GNNs, especially on specialized tasks such as bioinformatics, demands extensive expert annotations, which are expensive and usually contain sensitive information of data providers. The trained GNN models are often shared for deployment in the real world. As neural networks can memorize the training samples, the model parameters of GNNs have a high risk of leaking private training data. Our theoretical analysis shows the strong connections between trained GNN parameters and the training graphs used, confirming the training graph leakage issue. However, explorations into training data leakage from trained GNNs are rather limited. Therefore, we investigate a novel problem of stealing graphs from trained GNNs. To obtain high-quality graphs that resemble the target training set, a graph diffusion model with diffusion noise optimization is deployed as a graph generator. Furthermore, we propose a selection method that effectively leverages GNN model parameters to identify training graphs from samples generated by the graph diffusion model. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed framework in stealing training graphs from the trained GNN.
nan
Article 1786
Title@2025-05-24 (6): GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets
Title: GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets | GRoQ-LoCO: Generalist und Roboter-agnostische Quadruped Locomotion Control mit Offline-Datensätzen | GROQ-LoCO:使用离线数据集的通用和机器人-不可知性四分流移动控制 2505.10973v3 |
Authors: Narayanan PP, Sarvesh Prasanth Venkatesan, Srinivas Kantha Reddy, Shishir Kolathaya
Recent advancements in large-scale offline training have demonstrated the potential of generalist policy learning for complex robotic tasks. However, applying these principles to legged locomotion remains a challenge due to continuous dynamics and the need for real-time adaptation across diverse terrains and robot morphologies. In this work, we propose GRoQ-LoCO, a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains, relying solely on offline datasets. Our approach leverages expert demonstrations from two distinct locomotion behaviors - stair traversal (non-periodic gaits) and flat terrain traversal (periodic gaits) - collected across multiple quadruped robots, to train a generalist model that enables behavior fusion. Crucially, our framework operates solely on proprioceptive data from all robots without incorporating any robot-specific encodings. The policy is directly deployable on an Intel i7 nuc, producing low-latency control outputs without any test-time optimization. Our extensive experiments demonstrate zero-shot transfer across highly diverse quadruped robots and terrains, including hardware deployment on the Unitree Go1, a commercially available 12kg robot. Notably, we evaluate challenging cross-robot training setups where different locomotion skills are unevenly distributed across robots, yet observe successful transfer of both flat walking and stair traversal behaviors to all robots at test time. We also show preliminary walking on Stoch 5, a 70kg quadruped, on flat and outdoor terrains without requiring any fine tuning. These results demonstrate the potential of offline, data-driven learning to generalize locomotion across diverse quadruped morphologies and behaviors.
nan
Article 1787
Title@2025-05-24 (6): Preference Leakage: A Contamination Problem in LLM-as-a-judge
Title: Preference Leakage: A Contamination Problem in LLM-as-a-judge | Bevorzugte Leckage: Ein Kontaminierungsproblem im LLM-as-a-Richter | 优先渗漏:LLM-作为法官的LLM中的污染问题 2502.01534v2 |
Authors: Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhances the efficiency of model training and evaluation, little attention has been given to the potential contamination brought by this new model development paradigm. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators. To study this issue, we first define three common relatednesses between the data generator LLM and the judge LLM: being the same model, having an inheritance relationship, and belonging to the same model family. Through extensive experiments, we empirically confirm the bias of judges towards their related student models caused by preference leakage across multiple LLM baselines and benchmarks. Further analysis suggests that preference leakage is a pervasive and real-world problem that is harder to detect compared to previously identified biases in LLM-as-a-judge scenarios. All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge. We release all codes and data at: https://github.com/David-Li0406/Preference-Leakage.
nan
Article 1788
Title@2025-05-24 (6): Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis
Title: Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis | Erforschung der QUIC-Dynamik: Ein großformatiger Datensatz für verschlüsselte Verkehrsanalyse | 探索 QUIC 动态动态:加密流量分析的大型数据集 2410.03728v6 |
Authors: Barak Gahtan, Robert J. Shahla, Alex M. Bronstein, Reuven Cohen
The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, collected over four months. Unlike prior datasets, VisQUIC provides SSL keys for controlled decryption, supports multiple QUIC implementations (Chromium QUIC, Facebooks mvfst, Cloudflares quiche), and introduces a novel image-based representation that enables machine learning-driven encrypted traffic analysis. The dataset includes standardized benchmarking tools, ensuring reproducibility. To demonstrate VisQUICs utility, we present a benchmarking task for estimating HTTP/3 responses in encrypted QUIC traffic, achieving 97% accuracy using only observable packet features. By publicly releasing VisQUIC, we provide an open foundation for advancing encrypted traffic analysis, QUIC security research, and network monitoring.
nan
Article 1789
Title@2025-05-24 (6): DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services
Title: DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services | DiSCo: Geräte-Server Kollaborative LLM-basierte Text-Streaming-Dienste | DisCo: 设备-服务器协作协作LLM基于LLM的文本流服务 2502.11417v2 |
Authors: Ting Sun, Penghan Wang, Fan Lai
The rapid rise of large language models (LLMs) in text streaming services has introduced significant cost and Quality of Experience (QoE) challenges in serving millions of daily requests, especially in meeting Time-To-First-Token (TTFT) and Time-Between-Token (TBT) requirements for real-time interactions. Our real-world measurements show that both server-based and on-device deployments struggle to meet diverse QoE demands: server deployments face high costs and last-hop issues (e.g., Internet latency and dynamics), while on-device LLM inference is constrained by resources. We introduce DiSCo, a device-server cooperative scheduler designed to optimize users’ QoE by adaptively routing requests and migrating response generation between endpoints while maintaining cost constraints. DiSCo employs cost-aware scheduling, leveraging the predictable speed of on-device LLM inference with the flexible capacity of server-based inference to dispatch requests on the fly, while introducing a token-level migration mechanism to ensure consistent token delivery during migration. Evaluations on real-world workloads – including commercial services like OpenAI GPT and DeepSeek, and open-source deployments such as LLaMA3 – show that DiSCo can improve users’ QoE by reducing tail TTFT (11-52\%) and mean TTFT (6-78\%) across different model-device configurations, while dramatically reducing serving costs by up to 84\% through its migration mechanism while maintaining comparable QoE levels.
nan
Article 1790
Title@2025-05-24 (6): Operator-Informed Score Matching for Markov Diffusion Models
Title: Operator-Informed Score Matching for Markov Diffusion Models | Operator-Informed Score Matching für Markov Diffusion Modelle | Markov 扩散模型的操作员不完善的评分匹配 2406.09084v2 |
Authors: Zheyang Shen, Huihui Wang, Marina Riabiz, Chris J. Oates
Diffusion models are typically trained using score matching, a learning objective agnostic to the underlying noising process that guides the model. This paper argues that Markov noising processes enjoy an advantage over alternatives, as the Markov operators that govern the noising process are well-understood. Specifically, by leveraging the spectral decomposition of the infinitesimal generator of the Markov noising process, we obtain parametric estimates of the score functions simultaneously for all marginal distributions, using only sample averages with respect to the data distribution. The resulting operator-informed score matching provides both a standalone approach to sample generation for low-dimensional distributions, as well as a recipe for better informed neural score estimators in high-dimensional settings.
nan
Article 1791
Title@2025-05-24 (6): Expert-Agnostic Learning to Defer
Title: Expert-Agnostic Learning to Defer | Experten-Agnostisches Lernen zur Abwehr | 专家 – – 无法无天学习 2502.10533v2 |
Authors: Joshua Strong, Pramit Saha, Yasin Ibrahim, Cheng Ouyang, Alison Noble
Learning to Defer (L2D) trains autonomous systems to handle straightforward cases while deferring uncertain ones to human experts. Recent advancements in this field have introduced methods that offer flexibility to unseen experts at test time. However, we find these approaches struggle to generalise to experts with behaviours not seen during training, require extensive human annotation, and lack mechanisms for incorporating prior knowledge of expert capabilities. To address these challenges, we introduce Expert-Agnostic Learning to Defer (EA-L2D), a novel L2D framework that employs a Bayesian approach to model expert behaviour in an \textit{expert-agnostic} fashion. Across benchmark medical imaging datasets (HAM10000, Blood Cells, Retinal OCT, and Liver Tumours), EA-L2D significantly outperforms prior methods on unseen experts, achieving up to a 28\% relative improvement, while also matching or exceeding state-of-the-art performance on seen experts.
nan
Article 1792
Title@2025-05-24 (6): Partial Distribution Matching via Partial Wasserstein Adversarial Networks
Title: Partial Distribution Matching via Partial Wasserstein Adversarial Networks | Teilverteilung Passend über Teilwasserstein Adversarial Networks | 通过部分瓦森斯坦对冲网络进行部分配配 2409.10499v2 |
Authors: Zi-Ming Wang, Nan Xue, Ling Lei, Rebecka Jörnsten, Gui-Song Xia
This paper studies the problem of distribution matching (DM), which is a fundamental machine learning problem seeking to robustly align two probability distributions. Our approach is established on a relaxed formulation, called partial distribution matching (PDM), which seeks to match a fraction of the distributions instead of matching them completely. We theoretically derive the Kantorovich-Rubinstein duality for the partial Wasserstain-1 (PW) discrepancy, and develop a partial Wasserstein adversarial network (PWAN) that efficiently approximates the PW discrepancy based on this dual form. Partial matching can then be achieved by optimizing the network using gradient descent. Two practical tasks, point set registration and partial domain adaptation are investigated, where the goals are to partially match distributions in 3D space and high-dimensional feature space respectively. The experiment results confirm that the proposed PWAN effectively produces highly robust matching results, performing better or on par with the state-of-the-art methods.
nan
Article 1793
Title@2025-05-24 (6): MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation
Title: MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation | MAPLE: Verbesserung der Review Generation mit Multi-Aspect Prompt Learning in erklärbarer Empfehlung | MMALE: 在可解释建议中以多角度迅速和迅速的分解方式加强审查的产生 2408.09865v2 |
Authors: Ching-Wen Yang, Zhi-Quan Feng, Ying-Jia Lin, Che-Wei Chen, Kun-da Wu, Hao Xu, Jui-Feng Yao, Hung-Yu Kao
The Explainable Recommendation task is designed to receive a pair of user and item and output explanations to justify why an item is recommended to a user. Many models approach review generation as a proxy for explainable recommendations. While these models can produce fluent and grammatically correct sentences, they often lack precision and fail to provide personalized, informative recommendations. To address this issue, we propose a personalized, aspect-controlled model called Multi-Aspect Prompt LEarner (MAPLE), which integrates aspect category as another input dimension to facilitate memorizing fine-grained aspect terms. Experiments conducted on two real-world review datasets in the restaurant domain demonstrate that MAPLE significantly outperforms baseline review-generation models. MAPLE excels in both text and feature diversity, ensuring that the generated content covers a wide range of aspects. Additionally, MAPLE delivers good generation quality while maintaining strong coherence and factual relevance. The code and dataset used in this paper can be found here https://github.com/Nana2929/MAPLE.git.
nan
Article 1794
Title@2025-05-24 (6): Governing Equation Discovery from Data Based on Differential Invariants
Title: Governing Equation Discovery from Data Based on Differential Invariants | Regulierende Gleichungs-Entdeckung aus Daten basierend auf unterschiedlichen Invarianten | 从基于差异内在变量的数据中分离出来的数据 2505.18798v1 |
Authors: Lexiang Hu, Yikang Li, Zhouchen Lin
The explicit governing equation is one of the simplest and most intuitive forms for characterizing physical laws. However, directly discovering partial differential equations (PDEs) from data poses significant challenges, primarily in determining relevant terms from a vast search space. Symmetry, as a crucial prior knowledge in scientific fields, has been widely applied in tasks such as designing equivariant networks and guiding neural PDE solvers. In this paper, we propose a pipeline for governing equation discovery based on differential invariants, which can losslessly reduce the search space of existing equation discovery methods while strictly adhering to symmetry. Specifically, we compute the set of differential invariants corresponding to the infinitesimal generators of the symmetry group and select them as the relevant terms for equation discovery. Taking DI-SINDy (SINDy based on Differential Invariants) as an example, we demonstrate that its success rate and accuracy in PDE discovery surpass those of other symmetry-informed governing equation discovery methods across a series of PDEs.
nan
Article 1795
Title@2025-05-24 (6): Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection
Title: Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection | Überwachung von Graphen-Neuralnetzwerken für unbeaufsichtigte Graphenanomalienerkennung | 用于不受监督的异常图图探测的 保护图形神经网络 2404.16366v2 |
Authors: Yuanchen Bei, Sheng Zhou, Jinke Shi, Yao Ma, Haishuai Wang, Jiajun Bu
Unsupervised graph anomaly detection aims at identifying rare patterns that deviate from the majority in a graph without the aid of labels, which is important for a variety of real-world applications. Recent advances have utilized Graph Neural Networks (GNNs) to learn effective node representations by aggregating information from neighborhoods. This is motivated by the hypothesis that nodes in the graph tend to exhibit consistent behaviors with their neighborhoods. However, such consistency can be disrupted by graph anomalies in multiple ways. Most existing methods directly employ GNNs to learn representations, disregarding the negative impact of graph anomalies on GNNs, resulting in sub-optimal node representations and anomaly detection performance. While a few recent approaches have redesigned GNNs for graph anomaly detection under semi-supervised label guidance, how to address the adverse effects of graph anomalies on GNNs in unsupervised scenarios and learn effective representations for anomaly detection are still under-explored. To bridge this gap, in this paper, we propose a simple yet effective framework for Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection (G3AD). Specifically, G3AD first introduces two auxiliary networks along with correlation constraints to guard the GNNs against inconsistent information encoding. Furthermore, G3AD introduces an adaptive caching module to guard the GNNs from directly reconstructing the observed graph data that contains anomalies. Extensive experiments demonstrate that our G3AD can outperform twenty state-of-the-art methods on both synthetic and real-world graph anomaly datasets, with flexible generalization ability in different GNN backbones.
nan
Article 1796
Title@2025-05-24 (6): Leveraging Per-Instance Privacy for Machine Unlearning
Title: Leveraging Per-Instance Privacy for Machine Unlearning | Per-Instance-Leveraging-Privatsphäre für das maschinelle Lernen | 利用个人隐私促进机器脱学 2505.18786v1 |
Authors: Nazanin Mohammadi Sepahvand, Anvith Thudi, Berivan Isik, Ashmita Bhattacharyya, Nicolas Papernot, Eleni Triantafillou, Daniel M. Roy, Gintare Karolina Dziugaite
We present a principled, per-instance approach to quantifying the difficulty of unlearning via fine-tuning. We begin by sharpening an analysis of noisy gradient descent for unlearning (Chien et al., 2024), obtaining a better utility-unlearning tradeoff by replacing worst-case privacy loss bounds with per-instance privacy losses (Thudi et al., 2024), each of which bounds the (Renyi) divergence to retraining without an individual data point. To demonstrate the practical applicability of our theory, we present empirical results showing that our theoretical predictions are born out both for Stochastic Gradient Langevin Dynamics (SGLD) as well as for standard fine-tuning without explicit noise. We further demonstrate that per-instance privacy losses correlate well with several existing data difficulty metrics, while also identifying harder groups of data points, and introduce novel evaluation methods based on loss barriers. All together, our findings provide a foundation for more efficient and adaptive unlearning strategies tailored to the unique properties of individual data points.
nan
Article 1797
Title@2025-05-24 (6): A physics-guided smoothing method for material modeling with digital image correlation (DIC) measurements
Title: A physics-guided smoothing method for material modeling with digital image correlation (DIC) measurements | Ein physikgeführtes Glättverfahren für die Materialmodellierung mit Messungen der digitalen Bildkorrelation (DIC) | 采用物理制导平滑法进行数字图像相关测量材料建模 2505.18784v1 |
Authors: Jihong Wang, Chung-Hao Lee, William Richardson, Yue Yu
In this work, we present a novel approach to process the DIC measurements of multiple biaxial stretching protocols. In particular, we develop a optimization-based approach, which calculates the smoothed nodal displacements using a moving least-squares algorithm subject to positive strain constraints. As such, physically consistent displacement and strain fields are obtained. Then, we further deploy a data-driven workflow to heterogeneous material modeling from these physically consistent DIC measurements, by estimating a nonlocal constitutive law together with the material microstructure. To demonstrate the applicability of our approach, we apply it in learning a material model and fiber orientation field from DIC measurements of a porcine tricuspid valve anterior leaflet. Our results demonstrate that the proposed DIC data processing approach can significantly improve the accuracy of modeling biological materials.
nan
Article 1798
Title@2025-05-24 (6): Soft Weighted Machine Unlearning
Title: Soft Weighted Machine Unlearning | Weichgewichtete Maschine nicht lernen | 软加权机器脱学 2505.18783v1 |
Authors: Xinbao Qiao, Ningning Ding, Yushi Cheng, Meng Zhang
Machine unlearning, as a post-hoc processing technique, has gained widespread adoption in addressing challenges like bias mitigation and robustness enhancement, colloquially, machine unlearning for fairness and robustness. However, existing non-privacy unlearning-based solutions persist in using binary data removal framework designed for privacy-driven motivation, leading to significant information loss, a phenomenon known as over-unlearning. While over-unlearning has been largely described in many studies as primarily causing utility degradation, we investigate its fundamental causes and provide deeper insights in this work through counterfactual leave-one-out analysis. In this paper, we introduce a weighted influence function that assigns tailored weights to each sample by solving a convex quadratic programming problem analytically. Building on this, we propose a soft-weighted framework enabling fine-grained model adjustments to address the over-unlearning challenge. We demonstrate that the proposed soft-weighted scheme is versatile and can be seamlessly integrated into most existing unlearning algorithms. Extensive experiments show that in fairness- and robustness-driven tasks, the soft-weighted scheme significantly outperforms hard-weighted schemes in fairness/robustness metrics and alleviates the decline in utility metric, thereby enhancing machine unlearning algorithm as an effective correction solution.
nan
Article 1799
Title@2025-05-24 (6): One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion
Title: One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion | Eine Politik, aber viele Welten: Eine skalierbare, einheitliche Politik für vielseitige humanoide Lokomotion | 一个政策,但许多世界:一个可扩展的统一政策,促进有生命力的人类活动 2505.18780v1 |
Authors: Yahao Fan, Tianxiang Gui, Kaiyang Ji, Shutong Ding, Chixuan Zhang, Jiayuan Gu, Jingyi Yu, Jingya Wang, Ye Shi
Humanoid locomotion faces a critical scalability challenge: traditional reinforcement learning (RL) methods require task-specific rewards and struggle to leverage growing datasets, even as more training terrains are introduced. We propose DreamPolicy, a unified framework that enables a single policy to master diverse terrains and generalize zero-shot to unseen scenarios by systematically integrating offline data and diffusion-driven motion synthesis. At its core, DreamPolicy introduces Humanoid Motion Imagery (HMI) - future state predictions synthesized through an autoregressive terrain-aware diffusion planner curated by aggregating rollouts from specialized policies across various distinct terrains. Unlike human motion datasets requiring laborious retargeting, our data directly captures humanoid kinematics, enabling the diffusion planner to synthesize “dreamed” trajectories that encode terrain-specific physical constraints. These trajectories act as dynamic objectives for our HMI-conditioned policy, bypassing manual reward engineering and enabling cross-terrain generalization. DreamPolicy addresses the scalability limitations of prior methods: while traditional RL fails to exploit growing datasets, our framework scales seamlessly with more offline data. As the dataset expands, the diffusion prior learns richer locomotion skills, which the policy leverages to master new terrains without retraining. Experiments demonstrate that DreamPolicy achieves average 90% success rates in training environments and an average of 20% higher success on unseen terrains than the prevalent method. It also generalizes to perturbed and composite scenarios where prior approaches collapse. By unifying offline data, diffusion-based trajectory synthesis, and policy optimization, DreamPolicy overcomes the “one task, one policy” bottleneck, establishing a paradigm for scalable, data-driven humanoid control.
nan
Article 1800
Title@2025-05-24 (6): HD-PiSSA: High-Rank Distributed Orthogonal Adaptation
Title: HD-PiSSA: High-Rank Distributed Orthogonal Adaptation | HD-PiSSA: High-Rank verteilte Orthogonalanpassung | HD-PiSSA: 高射分散的正心调整适应 2505.18777v1 |
Authors: Yiding Wang, Fauxu meng, Xuefeng Zhang, Fan Jiang, Pingzhi Tang, Muhan Zhang
Existing parameter-efficient fine-tuning (PEFT) methods for large language models (LLMs), such as LoRA and PiSSA, constrain model updates to low-rank subspaces, limiting their expressiveness and leading to suboptimal performance on complex tasks. To address this, we introduce High-rank Distributed PiSSA (HD-PiSSA), a distributed PEFT approach that initializes orthogonal adapters across different devices and aggregates their delta updates collectively on W for fine-tuning. Unlike Data Parallel LoRA or PiSSA, which maintain identical adapters across all devices, HD-PiSSA assigns different principal components of the pre-trained weights to each GPU, significantly expanding the range of update directions. This results in over 16x higher effective updated ranks than data-parallel LoRA or PiSSA when fine-tuning on 8 GPUs with the same per-device adapter rank. Empirically, we evaluate HD-PiSSA across various challenging downstream tasks, including mathematics, code generation, and multi-task learning. In the multi-task setting, HD-PiSSA achieves average gains of 10.0 absolute points (14.63%) over LoRA and 4.98 points (6.60%) over PiSSA across 12 benchmarks, demonstrating its benefits from the extra optimization flexibility.
nan
Article 1801
Title@2025-05-24 (6): Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models
Title: Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models | Starke Mitgliedschafts-Inferenzangriffe auf massive Datensätze und (Moderate) große Sprachmodelle | 对大规模数据集和(口头)大语言模型的强烈成员推论攻击 2505.18773v1 |
Authors: Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Katherine Lee, Milad Nasr, Sahra Ghalebikesabi, Niloofar Mireshghallah, Meenatchi Sundaram Mutu Selva Annamalai, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Franziska Boenisch, Adam Dziedzic, A. Feder Cooper
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today’s LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.
nan
Article 1802
Title@2025-05-24 (6): CageNet: A Meta-Framework for Learning on Wild Meshes
Title: CageNet: A Meta-Framework for Learning on Wild Meshes | CageNet: Ein Meta-Rahmen für das Lernen auf Wild Meshes | CageNet:野生动物类学习的元框架 2505.18772v1 |
Authors: Michal Edelstein, Hsueh-Ti Derek Liu, Mirela Ben-Chen
Learning on triangle meshes has recently proven to be instrumental to a myriad of tasks, from shape classification, to segmentation, to deformation and animation, to mention just a few. While some of these applications are tackled through neural network architectures which are tailored to the application at hand, many others use generic frameworks for triangle meshes where the only customization required is the modification of the input features and the loss function. Our goal in this paper is to broaden the applicability of these generic frameworks to “wild”, i.e. meshes in-the-wild which often have multiple components, non-manifold elements, disrupted connectivity, or a combination of these. We propose a configurable meta-framework based on the concept of caged geometry: Given a mesh, a cage is a single component manifold triangle mesh that envelopes it closely. Generalized barycentric coordinates map between functions on the cage, and functions on the mesh, allowing us to learn and test on a variety of data, in different applications. We demonstrate this concept by learning segmentation and skinning weights on difficult data, achieving better performance to state of the art techniques on wild meshes.
nan
Article 1803
Title@2025-05-24 (6): Dual-Path Stable Soft Prompt Generation for Domain Generalization
Title: Dual-Path Stable Soft Prompt Generation for Domain Generalization | Dual-Path stabile Soft Prompt Generation für Domain-Verallgemeinerung | 两平面稳定软软生成域通用化快速生成 2505.18770v1 |
Authors: Yuedi Zhang, Shuanghao Bai, Wanqi Zhou, Zhirong Luan, Badong Chen
Domain generalization (DG) aims to learn a model using data from one or multiple related but distinct source domains that can generalize well to unseen out-of-distribution target domains. Inspired by the success of large pre-trained vision-language models (VLMs), prompt tuning has emerged as an effective generalization strategy. However, it often struggles to capture domain-specific features due to its reliance on manually or fixed prompt inputs. Recently, some prompt generation methods have addressed this limitation by dynamically generating instance-specific and domain-specific prompts for each input, enriching domain information and demonstrating potential for enhanced generalization. Through further investigation, we identify a notable issue in existing prompt generation methods: the same input often yields significantly different and suboptimal prompts across different random seeds, a phenomenon we term Prompt Variability. To address this, we introduce negative learning into the prompt generation process and propose Dual-Path Stable Soft Prompt Generation (DPSPG), a transformer-based framework designed to improve both the stability and generalization of prompts. Specifically, DPSPG incorporates a complementary prompt generator to produce negative prompts, thereby reducing the risk of introducing misleading information. Both theoretical and empirical analyses demonstrate that negative learning leads to more robust and effective prompts by increasing the effective margin and reducing the upper bound of the gradient norm. Extensive experiments on five DG benchmark datasets show that DPSPG consistently outperforms state-of-the-art methods while maintaining prompt stability.
nan
Article 1804
Title@2025-05-24 (6): Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization
Title: Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization | Vielfacher Wasserstein Gradient Descent Algorithmus für Multi-Objective Distributional Optimization | 多目标分布优化多瓦森斯坦梯度底源值 2505.18765v1 |
Authors: Dai Hai Nguyen, Hiroshi Mamitsuka, Atsuyoshi Nakamura
We address the optimization problem of simultaneously minimizing multiple objective functionals over a family of probability distributions. This type of Multi-Objective Distributional Optimization commonly arises in machine learning and statistics, with applications in areas such as multiple target sampling, multi-task learning, and multi-objective generative modeling. To solve this problem, we propose an iterative particle-based algorithm, which we call Muliple Wasserstein Gradient Descent (MWGraD), which constructs a flow of intermediate empirical distributions, each being represented by a set of particles, which gradually minimize the multiple objective functionals simultaneously. Specifically, MWGraD consists of two key steps at each iteration. First, it estimates the Wasserstein gradient for each objective functional based on the current particles. Then, it aggregates these gradients into a single Wasserstein gradient using dynamically adjusted weights and updates the particles accordingly. In addition, we provide theoretical analysis and present experimental results on both synthetic and real-world datasets, demonstrating the effectiveness of MWGraD.
nan
Article 1805
Title@2025-05-24 (6): Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model
Title: Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model | Textgeführte Multi-Property-Molekularoptimierung mit einem Diffusions-Sprachenmodell | 带有传播语言模型的文本引导多财产分子优化 2410.13597v2 |
Authors: Yida Xiong, Kun Li, Jiameng Chen, Hongzhi Zhang, Di Lin, Yan Che, Wenbin Hu
Molecular optimization (MO) is a crucial stage in drug discovery in which task-oriented generated molecules are optimized to meet practical industrial requirements. Existing mainstream MO approaches primarily utilize external property predictors to guide iterative property optimization. However, learning all molecular samples in the vast chemical space is unrealistic for predictors. As a result, errors and noise are inevitably introduced during property prediction due to the nature of approximation. This leads to discrepancy accumulation, generalization reduction and suboptimal molecular candidates. In this paper, we propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM). TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions, thereby mitigating error propagation during diffusion process. By fusing physically and chemically detailed textual semantics with specialized molecular representations, TransDLM effectively integrates diverse information sources to guide precise optimization, which enhances the model’s ability to balance structural retention and property enhancement. Additionally, the success of a case study further demonstrates TransDLM’s ability to solve practical problems. Experimentally, our approach surpasses state-of-the-art methods in maintaining molecular structural similarity and enhancing chemical properties on the benchmark dataset. The code is available at: https://github.com/Cello2195/TransDLM.
nan
Article 1806
Title@2025-05-24 (6): How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark
Title: How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark | Wie wird LLM-Reasoning vom irrelevanten Kontext abgelenkt? Eine Analyse mit einem kontrollierten Benchmark | LLM 为何被不相关背景所忽略? 2505.18761v1 |
Authors: Minglai Yang, Ethan Huang, Liang Zhang, Mihai Surdeanu, William Wang, Liangming Pan
We introduce Grade School Math with Distracting Context (GSM-DC), a synthetic benchmark to evaluate Large Language Models’ (LLMs) reasoning robustness against systematically controlled irrelevant context (IC). GSM-DC constructs symbolic reasoning graphs with precise distractor injections, enabling rigorous, reproducible evaluation. Our experiments demonstrate that LLMs are significantly sensitive to IC, affecting both reasoning path selection and arithmetic accuracy. Additionally, training models with strong distractors improves performance in both in-distribution and out-of-distribution scenarios. We further propose a stepwise tree search guided by a process reward model, which notably enhances robustness in out-of-distribution conditions.
nan
Article 1807
Title@2025-05-24 (6): The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
Title: The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation | Die Suche nach einer effizienten Begründung: Ein datenzentrischer Benchmark zur CoT-Destillation | 有效合理理由的查询:COT蒸馏的数据中心基准 2505.18759v1 |
Authors: Ruichen Zhang, Rana Muhammad Shahroz Khan, Zhen Tan, Dawei Li, Song Wang, Tianlong Chen
Data-centric distillation, including data augmentation, selection, and mixing, offers a promising path to creating smaller, more efficient student Large Language Models (LLMs) that retain strong reasoning abilities. However, there still lacks a comprehensive benchmark to systematically assess the effect of each distillation approach. This paper introduces DC-CoT, the first data-centric benchmark that investigates data manipulation in chain-of-thought (CoT) distillation from method, model and data perspectives. Utilizing various teacher models (e.g., o4-mini, Gemini-Pro, Claude-3.5) and student architectures (e.g., 3B, 7B parameters), we rigorously evaluate the impact of these data manipulations on student model performance across multiple reasoning datasets, with a focus on in-distribution (IID) and out-of-distribution (OOD) generalization, and cross-domain transfer. Our findings aim to provide actionable insights and establish best practices for optimizing CoT distillation through data-centric techniques, ultimately facilitating the development of more accessible and capable reasoning models. The dataset can be found at https://huggingface.co/datasets/rana-shahroz/DC-COT, while our code is shared in https://anonymous.4open.science/r/DC-COT-FF4C/.
nan
Article 1808
Title@2025-05-24 (6): Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees
Title: Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees | Lean and Mean Adaptive Optimization via Subset-Norm und Subspace-Momentum mit Konvergenzgarantien | 通过具有聚合担保的子元和子空间动力及子空间动力进行皮和平均适应性优化 2411.07120v2 |
Authors: Thien Hang Nguyen, Huy Le Nguyen
We introduce two complementary techniques for efficient optimization that reduce memory requirements while accelerating training of large-scale neural networks. The first technique, Subset-Norm step size, generalizes AdaGrad-Norm and AdaGrad(-Coordinate) through step-size sharing. Subset-Norm (SN) reduces AdaGrad’s memory footprint from $O(d)$ to $O(\sqrt{d})$, where $d$ is the model size. For non-convex smooth objectives under coordinate-wise sub-gaussian noise, we show a noise-adapted high-probability convergence guarantee with improved dimensional dependence of SN over existing methods. Our second technique, Subspace-Momentum, reduces the momentum state’s memory footprint by restricting momentum to a low-dimensional subspace while performing SGD in the orthogonal complement. We prove a high-probability convergence result for Subspace-Momentum under standard assumptions. Empirical evaluation on pre-training and fine-tuning LLMs demonstrates the effectiveness of our methods. For instance, combining Subset-Norm with Subspace-Momentum achieves Adam’s validation perplexity for LLaMA 1B in approximately half the training tokens (6.8B vs 13.1B) while reducing Adam’s optimizer-states memory footprint by more than 80\% with minimal additional hyperparameter tuning.
nan
Article 1809
Title@2025-05-24 (6): Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding
Title: Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding | Reduzierung der Speicherung vortrainierter neuraler Netzwerke durch ratenkontrainierte Quantisierung und Entropiecodierung | 通过受费率限制的量化和元件编码减少储存预培训神经网络 2505.18758v1 |
Authors: Alexander Conzelmann, Robert Bamler
The ever-growing size of neural networks poses serious challenges on resource-constrained devices, such as embedded sensors. Compression algorithms that reduce their size can mitigate these problems, provided that model performance stays close to the original. We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding by (1) extending the well-known layer-wise loss by a quadratic rate estimation, and (2) providing locally exact solutions to this modified objective following the Optimal Brain Surgeon (OBS) method. Our method allows for very fast decoding and is compatible with arbitrary quantization grids. We verify our results empirically by testing on various computer-vision networks, achieving a 20-40\% decrease in bit rate at the same performance as the popular compression algorithm NNCodec. Our code is available at https://github.com/Conzel/cerwu.
nan
Article 1810
Title@2025-05-24 (6): Smart Energy Guardian: A Hybrid Deep Learning Model for Detecting Fraudulent PV Generation
Title: Smart Energy Guardian: A Hybrid Deep Learning Model for Detecting Fraudulent PV Generation | Smart Energy Guardian: Ein hybrides Deep-Learning-Modell zur Erkennung betrügerischer PV-Generation | 智能能源守护者:发现欺诈性光电池发电的混合深学习模式 2505.18755v1 |
Authors: Xiaolu Chen, Chenghao Huang, Yanru Zhang, Hao Wang
With the proliferation of smart grids, smart cities face growing challenges due to cyber-attacks and sophisticated electricity theft behaviors, particularly in residential photovoltaic (PV) generation systems. Traditional Electricity Theft Detection (ETD) methods often struggle to capture complex temporal dependencies and integrating multi-source data, limiting their effectiveness. In this work, we propose an efficient ETD method that accurately identifies fraudulent behaviors in residential PV generation, thus ensuring the supply-demand balance in smart cities. Our hybrid deep learning model, combining multi-scale Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Transformer, excels in capturing both short-term and long-term temporal dependencies. Additionally, we introduce a data embedding technique that seamlessly integrates time-series data with discrete temperature variables, enhancing detection robustness. Extensive simulation experiments using real-world data validate the effectiveness of our approach, demonstrating significant improvements in the accuracy of detecting sophisticated energy theft activities, thereby contributing to the stability and fairness of energy systems in smart cities.
nan
Article 1811
Title@2025-05-24 (6): HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting
Title: HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting | HiMoE: Heterogenitäts-informierte Mixture-of-Experts für faire räumlich-zeitliche Vorhersagen | HimMoE:公平空间-时空预报专家的异异质性异构混合 2412.00316v3 |
Authors: Shaohan Yu, Pan Deng, Yu Zhao, Junting Liu, Zi’ang Wang
Achieving both accurate and consistent predictive performance across spatial nodes is crucial for ensuring the validity and reliability of outcomes in fair spatial-temporal forecasting tasks. However, existing training methods treat heterogeneous nodes with a fully averaged perspective, resulting in inherently biased prediction targets. Balancing accuracy and consistency is particularly challenging due to the multi-objective nature of spatial-temporal forecasting. To address this issue, we propose a novel Heterogeneity-Informed Mixture-of-Experts (HiMoE) framework that delivers both uniform and precise spatial-temporal predictions. From a model architecture perspective, we design the Heterogeneity-Informed Graph Convolutional Network (HiGCN) to address trend heterogeneity, and we introduce the Node-wise Mixture-of-Experts (NMoE) module to handle cardinality heterogeneity across nodes. From an evaluation perspective, we propose STFairBench, a benchmark that handles fairness in spatial-temporal prediction from both training and evaluation stages. Extensive experiments on four real-world datasets demonstrate that HiMoE achieves state-of-the-art performance, outperforming the best baseline by at least 9.22% across all evaluation metrics.
nan
Article 1812
Title@2025-05-24 (6): Season-Independent PV Disaggregation Using Multi-Scale Net Load Temporal Feature Extraction and Weather Factor Fusion
Title: Season-Independent PV Disaggregation Using Multi-Scale Net Load Temporal Feature Extraction and Weather Factor Fusion | Saisonunabhängige PV-Disaggregation mittels Multi-Scale Net Load Temporal Feature Extraktion und Wetterfaktor Fusion | 使用多种规模净负荷时间特征抽取和天气因素融合的季节独立光电池拆分 2505.18747v1 |
Authors: Xiaolu Chen, Chenghao Huang, Yanru Zhang, Hao Wang
With the advancement of energy Internet and energy system integration, the increasing adoption of distributed photovoltaic (PV) systems presents new challenges on smart monitoring and measurement for utility companies, particularly in separating PV generation from net electricity load. Existing methods struggle with feature extraction from net load and capturing the relevance between weather factors. This paper proposes a PV disaggregation method that integrates Hierarchical Interpolation (HI) and multi-head self-attention mechanisms. By using HI to extract net load features and multi-head self-attention to capture the complex dependencies between weather factors, the method achieves precise PV generation predictions. Simulation experiments demonstrate the effectiveness of the proposed method in real-world data, supporting improved monitoring and management of distributed energy systems.
nan
Article 1813
Title@2025-05-24 (6): C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging
Title: C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging | C3R: Kanalkonditionierte Zelldarstellungen zur einheitlichen Auswertung in der Mikroskopie-Bildgebung | C3R:用于对显微镜成像进行统一评价的有条件细胞代表的频道 2505.18745v1 |
Authors: Umar Marikkar, Syed Sameed Husain, Muhammad Awais, Sara Atito
Immunohistochemical (IHC) images reveal detailed information about structures and functions at the subcellular level. However, unlike natural images, IHC datasets pose challenges for deep learning models due to their inconsistencies in channel count and configuration, stemming from varying staining protocols across laboratories and studies. Existing approaches build channel-adaptive models, which unfortunately fail to support out-of-distribution (OOD) evaluation across IHC datasets and cannot be applied in a true zero-shot setting with mismatched channel counts. To address this, we introduce a structured view of cellular image channels by grouping them into either context or concept, where we treat the context channels as a reference to the concept channels in the image. We leverage this context-concept principle to develop Channel Conditioned Cell Representations (C3R), a framework designed for unified evaluation on in-distribution (ID) and OOD datasets. C3R is a two-fold framework comprising a channel-adaptive encoder architecture and a masked knowledge distillation training strategy, both built around the context-concept principle. We find that C3R outperforms existing benchmarks on both ID and OOD tasks, while a trivial implementation of our core idea also outperforms the channel-adaptive methods reported on the CHAMMI benchmark. Our method opens a new pathway for cross-dataset generalization between IHC datasets, without requiring dataset-specific adaptation or retraining.
nan
Article 1814
Title@2025-05-24 (6): Interpretable Company Similarity with Sparse Autoencoders
Title: Interpretable Company Similarity with Sparse Autoencoders | Interpretierbare Firmenähnlichkeit mit Sparse Autoencodern | 与Sparse Autoencolders 相似 2412.02605v3 |
Authors: Marco Molinari, Victor Shao, Luca Imeneo, Mateusz Mikolajczak, Vladimir Tregubiak, Abhimanyu Pandey, Sebastian Kuznetsov Ryder Torres Pereira
Determining company similarity is a vital task in finance, underpinning risk management, hedging, and portfolio diversification. Practitioners often rely on sector and industry classifications such as SIC and GICS codes to gauge similarity, the former being used by the U.S. Securities and Exchange Commission (SEC), and the latter widely used by the investment community. Since these classifications lack granularity and need regular updating, using clusters of embeddings of company descriptions has been proposed as a potential alternative, but the lack of interpretability in token embeddings poses a significant barrier to adoption in high-stakes contexts. Sparse Autoencoders (SAEs) have shown promise in enhancing the interpretability of Large Language Models (LLMs) by decomposing Large Language Model (LLM) activations into interpretable features. Moreover, SAEs capture an LLM’s internal representation of a company description, as opposed to semantic similarity alone, as is the case with embeddings. We apply SAEs to company descriptions, and obtain meaningful clusters of equities. We benchmark SAE features against SIC-codes, Industry codes, and Embeddings. Our results demonstrate that SAE features surpass sector classifications and embeddings in capturing fundamental company characteristics. This is evidenced by their superior performance in correlating logged monthly returns - a proxy for similarity - and generating higher Sharpe ratios in co-integration trading strategies, which underscores deeper fundamental similarities among companies. Finally, we verify the interpretability of our clusters, and demonstrate that sparse features form simple and interpretable explanations for our clusters.
nan
Article 1815
Title@2025-05-24 (6): Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Title: Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models | Feature-Extraktion und -Lenkung für eine verbesserte Kettenbildung in Sprachmodellen | 语言模型中强化研究链理由的特征采掘和指南 2505.15634v2 |
Authors: Zihao Li, Xu Wang, Yuzhe Yang, Ziyu Yao, Haoyi Xiong, Mengnan Du
Large Language Models (LLMs) demonstrate the ability to solve reasoning and mathematical problems using the Chain-of-Thought (CoT) technique. Expanding CoT length, as seen in models such as DeepSeek-R1, significantly enhances this reasoning for complex problems, but requires costly and high-quality long CoT data and fine-tuning. This work, inspired by the deep thinking paradigm of DeepSeek-R1, utilizes a steering technique to enhance the reasoning ability of an LLM without external datasets. Our method first employs Sparse Autoencoders (SAEs) to extract interpretable features from vanilla CoT. These features are then used to steer the LLM’s internal states during generation. Recognizing that many LLMs do not have corresponding pre-trained SAEs, we further introduce a novel SAE-free steering algorithm, which directly computes steering directions from the residual activations of an LLM, obviating the need for an explicit SAE. Experimental results demonstrate that both our SAE-based and subsequent SAE-free steering algorithms significantly enhance the reasoning capabilities of LLMs.
nan
Article 1816
Title@2025-05-24 (6): An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records
Title: An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records | Ein interpretierbarer Deep-Learning-Rahmen für die Vorhersage von Krankenhausrückübernahmen aus elektronischen Gesundheitsakten | 预测医院从电子健康记录中读取的医院可解释的深学习框架 2310.10187v2 |
Authors: Fabio Azzalini, Tommaso Dolci, Marco Vagaggini
With the increasing availability of patient data, modern medicine is shifting towards prospective healthcare. Electronic health records offer a variety of information useful for clinical patient characterization and the development of predictive models, given that similar medical histories often lead to analogous health progressions. One application is the prediction of unplanned hospital readmissions, an essential task for reducing healthcare costs and improving patient outcomes. While predictive models demonstrate strong performances especially with deep learning approaches, they are often criticized for their lack of interpretability, a critical requirement in the medical domain where incorrect predictions may have severe consequences for patient safety. In this paper, we propose a novel and interpretable deep learning framework for predicting unplanned hospital readmissions, supported by NLP findings on word embeddings and by ConvLSTM neural networks for better handling temporal data. We validate the framework on two predictive tasks for hospital readmission within 30 and 180 days, using real-world data. Additionally, we introduce and evaluate a model-dependent technique designed to enhance result interpretability for medical professionals. Our solution outperforms traditional machine learning models in prediction accuracy while simultaneously providing more interpretable results.
nan
Article 1817
Title@2025-05-24 (6): AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping
Title: AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping | AuroRA: Breaking Low-Rank Engpass von LoRA mit nichtlinearer Kartierung | AuroRA:用非线性绘图法打破LORA的低兰克瓶尾裂 2505.18738v1 |
Authors: Haonan Dong, Wenhao Zhu, Guojie Song, Liang Wang
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method validated across NLP and CV domains. However, LoRA faces an inherent low-rank bottleneck: narrowing its performance gap with full finetuning requires increasing the rank of its parameter matrix, resulting in significant parameter overhead. Recent linear LoRA variants have attempted to enhance expressiveness by introducing additional linear mappings; however, their composition remains inherently linear and fails to fundamentally improve LoRA’s representational capacity. To address this limitation, we propose AuroRA, which incorporates an Adaptive Nonlinear Layer (ANL) between two linear projectors to capture fixed and learnable nonlinearities. This combination forms an MLP-like structure with a compressed rank, enabling flexible and precise approximation of diverse target functions while theoretically guaranteeing lower approximation errors and bounded gradients. Extensive experiments on 22 datasets and 6 pretrained models demonstrate that AuroRA: (I) not only matches or surpasses full fine-tuning performance with only 6.18% ~ 25% of LoRA’s parameters but also (II) outperforms state-of-the-art PEFT methods by up to 10.88% in both NLP and CV tasks, and (III) exhibits robust performance across various rank configurations.
nan
Article 1818
Title@2025-05-24 (6): Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings
Title: Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings | Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings | 知识强化画画视觉表现神经网络 2105.08190v2 |
Authors: Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Marcel Worring, Nachoem Wijnberg
We propose ArtSAGENet, a novel multimodal architecture that integrates Graph Neural Networks (GNNs) and Convolutional Neural Networks (CNNs), to jointly learn visual and semantic-based artistic representations. First, we illustrate the significant advantages of multi-task learning for fine art analysis and argue that it is conceptually a much more appropriate setting in the fine art domain than the single-task alternatives. We further demonstrate that several GNN architectures can outperform strong CNN baselines in a range of fine art analysis tasks, such as style classification, artist attribution, creation period estimation, and tag prediction, while training them requires an order of magnitude less computational time and only a small amount of labeled data. Finally, through extensive experimentation we show that our proposed ArtSAGENet captures and encodes valuable relational dependencies between the artists and the artworks, surpassing the performance of traditional methods that rely solely on the analysis of visual content. Our findings underline a great potential of integrating visual content and semantics for fine art analysis and curation.
nan
Article 1819
Title@2025-05-24 (6): MADCAT: Combating Malware Detection Under Concept Drift with Test-Time Adaptation
Title: MADCAT: Combating Malware Detection Under Concept Drift with Test-Time Adaptation | MADCAT: Bekämpfung der Malware-Erkennung unter Konzept Drift mit Test-Zeit-Anpassung | MADCAT: 在 “ 漂流 “ 概念下,通过测试-时间适应来打击 “ 恶意探测 “ 2505.18734v1 |
Authors: Eunjin Roh, Yigitcan Kaya, Christopher Kruegel, Giovanni Vigna, Sanghyun Hong
We present MADCAT, a self-supervised approach designed to address the concept drift problem in malware detection. MADCAT employs an encoder-decoder architecture and works by test-time training of the encoder on a small, balanced subset of the test-time data using a self-supervised objective. During test-time training, the model learns features that are useful for detecting both previously seen (old) data and newly arriving samples. We demonstrate the effectiveness of MADCAT in continuous Android malware detection settings. MADCAT consistently outperforms baseline methods in detection performance at test time. We also show the synergy between MADCAT and prior approaches in addressing concept drift in malware detection
nan
Article 1820
Title@2025-05-24 (6): ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
Title: ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search | ReGUIDE: Dateneffizientes GUI Grounding über räumliche Vernunft und Suche | 数据高效界面:通过空间理性和搜索进行数据高效界面定位 2505.15259v2 |
Authors: Hyunseok Lee, Jeonghoon Kim, Beomjun Kim, Jihoon Tack, Chansong Jo, Jaehong Lee, Cheonbok Park, Sookyo In, Jinwoo Shin, Kang Min Yoo
Recent advances in Multimodal Large Language Models (MLLMs) have enabled autonomous agents to interact with computers via Graphical User Interfaces (GUIs), where accurately localizing the coordinates of interface elements (e.g., buttons) is often required for fine-grained actions. However, this remains significantly challenging, leading prior works to rely on large-scale web datasets to improve the grounding accuracy. In this work, we propose Reasoning Graphical User Interface Grounding for Data Efficiency (ReGUIDE), a novel and effective framework for web grounding that enables MLLMs to learn data efficiently through self-generated reasoning and spatial-aware criticism. More specifically, ReGUIDE learns to (i) self-generate a language reasoning process for the localization via online reinforcement learning, and (ii) criticize the prediction using spatial priors that enforce equivariance under input transformations. At inference time, ReGUIDE further boosts performance through a test-time scaling strategy, which combines spatial search with coordinate aggregation. Our experiments demonstrate that ReGUIDE significantly advances web grounding performance across multiple benchmarks, outperforming baselines with substantially fewer training data points (e.g., only 0.2% samples compared to the best open-sourced baselines).
nan
Article 1821
Title@2025-05-24 (6): Reward-Driven Interaction: Enhancing Proactive Dialogue Agents through User Satisfaction Prediction
Title: Reward-Driven Interaction: Enhancing Proactive Dialogue Agents through User Satisfaction Prediction | Reward-Driven Interaction: Verbesserung proaktiver Dialog-Agenten durch Nutzerzufriedenheitsvorhersage | 回报率互动:通过用户满意度预测加强积极主动的对话机构 2505.18731v1 |
Authors: Wei Shen, Xiaonan He, Chuheng Zhang, Xuyun Zhang, Xiaolong Xu, Wanchun Dou
Reward-driven proactive dialogue agents require precise estimation of user satisfaction as an intrinsic reward signal to determine optimal interaction strategies. Specifically, this framework triggers clarification questions when detecting potential user dissatisfaction during interactions in the industrial dialogue system. Traditional works typically rely on training a neural network model based on weak labels which are generated by a simple model trained on user actions after current turn. However, existing methods suffer from two critical limitations in real-world scenarios: (1) Noisy Reward Supervision, dependence on weak labels derived from post-hoc user actions introduces bias, particularly failing to capture satisfaction signals in ASR-error-induced utterances; (2) Long-Tail Feedback Sparsity, the power-law distribution of user queries causes reward prediction accuracy to drop in low-frequency domains. The noise in the weak labels and a power-law distribution of user utterances results in that the model is hard to learn good representation of user utterances and sessions. To address these limitations, we propose two auxiliary tasks to improve the representation learning of user utterances and sessions that enhance user satisfaction prediction. The first one is a contrastive self-supervised learning task, which helps the model learn the representation of rare user utterances and identify ASR errors. The second one is a domain-intent classification task, which aids the model in learning the representation of user sessions from long-tailed domains and improving the model’s performance on such domains. The proposed method is evaluated on DuerOS, demonstrating significant improvements in the accuracy of error recognition on rare user utterances and long-tailed domains.
nan
Article 1822
Title@2025-05-24 (6): Influence Functions for Scalable Data Attribution in Diffusion Models
Title: Influence Functions for Scalable Data Attribution in Diffusion Models | Einflussfunktionen für skalierbare Datenzuweisungen in Diffusionsmodellen | 扩散模型中可缩放数据归属的影响函数 2410.13850v5 |
Authors: Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, Richard Turner
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in diffusion models by developing an influence functions framework. Influence function-based data attribution methods approximate how a model’s output would have changed if some training data were removed. In supervised learning, this is usually used for predicting how the loss on a particular example would change. For diffusion models, we focus on predicting the change in the probability of generating a particular example via several proxy measurements. We show how to formulate influence functions for such quantities and how previously proposed methods can be interpreted as particular design choices in our framework. To ensure scalability of the Hessian computations in influence functions, we systematically develop K-FAC approximations based on generalised Gauss-Newton matrices specifically tailored to diffusion models. We recast previously proposed methods as specific design choices in our framework and show that our recommended method outperforms previous data attribution approaches on common evaluations, such as the Linear Data-modelling Score (LDS) or retraining without top influences, without the need for method-specific hyperparameter tuning.
nan
Article 1823
Title@2025-05-24 (6): Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling
Title: Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling | Message-Passing State-Space-Modelle: Verbesserung des Graphen-Lernens mit moderner Sequenzmodellierung | 传递信息的国家空间模型:利用现代序列模型改进图表学习 2505.18728v1 |
Authors: Andrea Ceni, Alessio Gravina, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Schonlieb, Moshe Eliasof
The recent success of State-Space Models (SSMs) in sequence modeling has motivated their adaptation to graph learning, giving rise to Graph State-Space Models (GSSMs). However, existing GSSMs operate by applying SSM modules to sequences extracted from graphs, often compromising core properties such as permutation equivariance, message-passing compatibility, and computational efficiency. In this paper, we introduce a new perspective by embedding the key principles of modern SSM computation directly into the Message-Passing Neural Network framework, resulting in a unified methodology for both static and temporal graphs. Our approach, MP-SSM, enables efficient, permutation-equivariant, and long-range information propagation while preserving the architectural simplicity of message passing. Crucially, MP-SSM enables an exact sensitivity analysis, which we use to theoretically characterize information flow and evaluate issues like vanishing gradients and over-squashing in the deep regime. Furthermore, our design choices allow for a highly optimized parallel implementation akin to modern SSMs. We validate MP-SSM across a wide range of tasks, including node classification, graph property prediction, long-range benchmarks, and spatiotemporal forecasting, demonstrating both its versatility and strong empirical performance.
nan
Article 1824
Title@2025-05-24 (6): Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints
Title: Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints | Längenunabhängige Verallgemeinerungsgrenzen für tiefe SSM-Architekturen über Rademacher Kontraktion und Stabilitätsbeschränkungen | 通过雷德马赫公司收缩和稳定制约因素对深层的SMS结构进行长度独立概括的界限 2405.20278v3 |
Authors: Dániel Rácz, Mihály Petreczky, Bálint Daróczy
Many state-of-the-art models trained on long-range sequences, for example S4, S5 or LRU, are made of sequential blocks combining State-Space Models (SSMs) with neural networks. In this paper we provide a PAC bound that holds for these kind of architectures with \emph{stable} SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.
nan
Article 1825
Title@2025-05-24 (6): Audio Geolocation: A Natural Sounds Benchmark
Title: Audio Geolocation: A Natural Sounds Benchmark | Audio Geolocation: Ein natürlicher Klang Benchmark | 音频地理定位:自然声音基准 2505.18726v1 |
Authors: Mustafa Chasmai, Wuao Liu, Subhransu Maji, Grant Van Horn
Can we determine someone’s geographic location purely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? We tackle the challenge of global-scale audio geolocation, formalize the problem, and conduct an in-depth analysis with wildlife audio from the iNatSounds dataset. Adopting a vision-inspired approach, we convert audio recordings to spectrograms and benchmark existing image geolocation techniques. We hypothesize that species vocalizations offer strong geolocation cues due to their defined geographic ranges and propose an approach that integrates species range prediction with retrieval-based geolocation. We further evaluate whether geolocation improves when analyzing species-rich recordings or when aggregating across spatiotemporal neighborhoods. Finally, we introduce case studies from movies to explore multimodal geolocation using both audio and visual content. Our work highlights the advantages of integrating audio and visual cues, and sets the stage for future research in audio geolocation.
nan
Article 1826
Title@2025-05-24 (6): LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
Title: LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning | LoTA-QAF: Lossless Ternary Adaptation für Quantization-Aware Fine-Tuning | LoTA-QAF:量化软件微调的无损失田间适应 2505.18724v1 |
Authors: Junyu Chen, Junzhuo Li, Zhen Peng, Wenjie Wang, Yuxiang Ren, Long Shi, Xuming Hu
Quantization and fine-tuning are crucial for deploying large language models (LLMs) on resource-constrained edge devices. However, fine-tuning quantized models presents significant challenges, primarily stemming from: First, the mismatch in data types between the low-precision quantized weights (e.g., 4-bit) and the high-precision adaptation weights (e.g., 16-bit). This mismatch limits the computational efficiency advantage offered by quantized weights during inference. Second, potential accuracy degradation when merging these high-precision adaptation weights into the low-precision quantized weights, as the adaptation weights often necessitate approximation or truncation. Third, as far as we know, no existing methods support the lossless merging of adaptation while adjusting all quantized weights. To address these challenges, we introduce lossless ternary adaptation for quantization-aware fine-tuning (LoTA-QAF). This is a novel fine-tuning method specifically designed for quantized LLMs, enabling the lossless merging of ternary adaptation weights into quantized weights and the adjustment of all quantized weights. LoTA-QAF operates through a combination of: i) A custom-designed ternary adaptation (TA) that aligns ternary weights with the quantization grid and uses these ternary weights to adjust quantized weights. ii) A TA-based mechanism that enables the lossless merging of adaptation weights. iii) Ternary signed gradient descent (t-SignSGD) for updating the TA weights. We apply LoTA-QAF to Llama-3.1/3.3 and Qwen-2.5 model families and validate its effectiveness on several downstream tasks. On the MMLU benchmark, our method effectively recovers performance for quantized models, surpassing 16-bit LoRA by up to 5.14\%. For task-specific fine-tuning, 16-bit LoRA achieves superior results, but LoTA-QAF still outperforms other methods.
nan
Article 1827
Title@2025-05-24 (6): Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Title: Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization | Optimales Transport-basiertes Token-Gewichtungssystem für verbesserte Preference-Optimierung | 增强优惠优化的优化运输托肯加权计划 2505.18720v1 |
Authors: Meng Li, Guangda Huzhang, Haibo Zhang, Xiting Wang, Anxiang Zeng
Direct Preference Optimization (DPO) has emerged as a promising framework for aligning Large Language Models (LLMs) with human preferences by directly optimizing the log-likelihood difference between chosen and rejected responses. However, existing methods assign equal importance to all tokens in the response, while humans focus on more meaningful parts. This leads to suboptimal preference optimization, as irrelevant or noisy tokens disproportionately influence DPO loss. To address this limitation, we propose \textbf{O}ptimal \textbf{T}ransport-based token weighting scheme for enhancing direct \textbf{P}reference \textbf{O}ptimization (OTPO). By emphasizing semantically meaningful token pairs and de-emphasizing less relevant ones, our method introduces a context-aware token weighting scheme that yields a more contrastive reward difference estimate. This adaptive weighting enhances reward stability, improves interpretability, and ensures that preference optimization focuses on meaningful differences between responses. Extensive experiments have validated OTPO’s effectiveness in improving instruction-following ability across various settings\footnote{Code is available at https://github.com/Mimasss2/OTPO.}.
nan
Article 1828
Title@2025-05-24 (6): Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer
Title: Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer | Neurale Parameter Suche nach schlankeren Modellen und besserer Übertragung | 搜索细微精制模型和更好传输的神经参数 2505.18713v1 |
Authors: Guodong Du, Zitao Fang, Jing Li, Junlin Li, Runhua Jiang, Shuyang Yu, Yifei Guo, Yangneng Chen, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Honghai Liu, Min Zhang
Foundation models and their checkpoints have significantly advanced deep learning, boosting performance across various applications. However, fine-tuned models often struggle outside their specific domains and exhibit considerable redundancy. Recent studies suggest that combining a pruned fine-tuned model with the original pre-trained model can mitigate forgetting, reduce interference when merging model parameters across tasks, and improve compression efficiency. In this context, developing an effective pruning strategy for fine-tuned models is crucial. Leveraging the advantages of the task vector mechanism, we preprocess fine-tuned models by calculating the differences between them and the original model. Recognizing that different task vector subspaces contribute variably to model performance, we introduce a novel method called Neural Parameter Search (NPS-Pruning) for slimming down fine-tuned models. This method enhances pruning efficiency by searching through neural parameters of task vectors within low-rank subspaces. Our method has three key applications: enhancing knowledge transfer through pairwise model interpolation, facilitating effective knowledge fusion via model merging, and enabling the deployment of compressed models that retain near-original performance while significantly reducing storage costs. Extensive experiments across vision, NLP, and multi-modal benchmarks demonstrate the effectiveness and robustness of our approach, resulting in substantial performance gains. The code is publicly available at: https://github.com/duguodong7/NPS-Pruning.
nan
Article 1829
Title@2025-05-24 (6): Learning on LLM Output Signatures for gray-box Behavior Analysis
Title: Learning on LLM Output Signatures for gray-box Behavior Analysis | Lernen auf LLM-Ausgangssignaturen für graue Verhaltensanalyse | 学习用于灰箱行为分析的 LLM 输出签名 2503.14043v2 |
Authors: Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Yoav Gelberg, Yftah Ziser, Ran El-Yaniv, Gal Chechik, Haggai Maron
Large Language Models (LLMs) have achieved widespread adoption, yet our understanding of their behavior remains limited, particularly in detecting data contamination and hallucinations. While recently proposed probing techniques provide insights through activation analysis, they require white-box'' access to model internals, often unavailable. Current
gray-box’’ approaches typically analyze only the probability of the actual tokens in the sequence with simple task-specific heuristics. Importantly, these methods overlook the rich information contained in the full token distribution at each processing step. To address these limitations, we propose that gray-box analysis should leverage the complete observable output of LLMs, consisting of both the previously used token probabilities as well as the complete token distribution sequences - a unified data type we term LOS (LLM Output Signature). To this end, we develop a transformer-based approach to process LOS that theoretically guarantees approximation of existing techniques while enabling more nuanced analysis. Our approach achieves superior performance on hallucination and data contamination detection in gray-box settings, significantly outperforming existing baselines. Furthermore, it demonstrates strong transfer capabilities across datasets and LLMs, suggesting that LOS captures fundamental patterns in LLM behavior. Our code is available at: https://github.com/BarSGuy/LLM-Output-Signatures-Network.
nan
Article 1830
Title@2025-05-24 (6): Steering LLM Reasoning Through Bias-Only Adaptation
Title: Steering LLM Reasoning Through Bias-Only Adaptation | Steuerung der LLM-Vernunft durch Bias-Only-Anpassung | 仅有的偏差调整导致的偏差调整 2505.18706v1 |
Authors: Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov
Recent work on reasoning-oriented language models, exemplified by o1-like systems, suggests that reinforcement-learning (RL) finetuning does not create new capabilities but instead strengthens reasoning patterns already latent in the pretrained network. We test this claim by training steering vectors: layer-wise biases that additively amplify selected hidden features while leaving all original weights unchanged. Experiments on four base models across the GSM8K and MATH benchmarks show that steering vectors recover, and in several cases exceed, the accuracy of fully-tuned counterparts. This result supports the view that the required reasoning skills pre-exist in the base model. Further, logit-lens analysis reveals that the trained vectors consistently boost token groups linked to structured languages and logical connectors, providing an interpretable account that aligns with the demands of quantitative reasoning tasks.
nan
Article 1831
Title@2025-05-24 (6): (Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models
Title: (Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models | (Implizit) Ensembles von Ensembles: Epistemische Ungewissheit bricht in großen Modellen zusammen | 群集集合:大型模型中的不确定性粒子折叠 2409.02628v2 |
Authors: Andreas Kirsch
Epistemic uncertainty is crucial for safety-critical applications and data acquisition tasks. Yet, we find an important phenomenon in deep learning models: an epistemic uncertainty collapse as model complexity increases, challenging the assumption that larger models invariably offer better uncertainty quantification. We introduce implicit ensembling as a possible explanation for this phenomenon. To investigate this hypothesis, we provide theoretical analysis and experiments that demonstrate uncertainty collapse in explicit ensembles of ensembles and show experimental evidence of similar collapse in wider models across various architectures, from simple MLPs to state-of-the-art vision models including ResNets and Vision Transformers. We further develop implicit ensemble extraction techniques to decompose larger models into diverse sub-models, showing we can thus recover epistemic uncertainty. We explore the implications of these findings for uncertainty estimation.
nan
Article 1832
Title@2025-05-24 (6): Data Overvaluation Attack and Truthful Data Valuation in Federated Learning
Title: Data Overvaluation Attack and Truthful Data Valuation in Federated Learning | Datenüberbewertung Angriff und Truthful Data Bewertung im Föderierten Lernen | 联邦学习联盟的数据评价高估攻击和真实数据估值 2502.00494v3 |
Authors: Shuyuan Zheng, Sudong Cai, Chuan Xiao, Yang Cao, Jianbin Qin, Masatoshi Yoshikawa, Makoto Onizuka
In collaborative machine learning (CML), data valuation, i.e., evaluating the contribution of each client’s data to the machine learning model, has become a critical task for incentivizing and selecting positive data contributions. However, existing studies often assume that clients engage in data valuation truthfully, overlooking the practical motivation for clients to exaggerate their contributions. To unlock this threat, this paper introduces the data overvaluation attack, enabling strategic clients to have their data significantly overvalued in federated learning, a widely adopted paradigm for decentralized CML. Furthermore, we propose a Bayesian truthful data valuation metric, named Truth-Shapley. Truth-Shapley is the unique metric that guarantees some promising axioms for data valuation while ensuring that clients’ optimal strategy is to perform truthful data valuation under certain conditions. Our experiments demonstrate the vulnerability of existing data valuation metrics to the proposed attack and validate the robustness and effectiveness of Truth-Shapley.
nan
Article 1833
Title@2025-05-24 (6): MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention
Title: MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention | MonarchAchtung: Null-Schuss-Umwandlung zu schneller, Hardware-Bewusst strukturierter Aufmerksamkeit | MonarchAttention: 零热转换为快速硬件软件 2505.18698v1 |
Authors: Can Yaras, Alec S. Xu, Pierre Abillama, Changwoo Lee, Laura Balzano
Transformers have achieved state-of-the-art performance across various tasks, but suffer from a notable quadratic complexity in sequence length due to the attention mechanism. In this work, we propose MonarchAttention – a novel approach to sub-quadratic attention approximation via Monarch matrices, an expressive class of structured matrices. Based on the variational form of softmax, we describe an efficient optimization-based algorithm to compute an approximate projection of softmax attention onto the class of Monarch matrices with $\Theta(N\sqrt{N} d)$ computational complexity and $\Theta(Nd)$ memory/IO complexity. Unlike previous approaches, MonarchAttention is both (1) transferable, yielding minimal performance loss with no additional training, even when replacing every attention layer of the transformer, and (2) hardware-efficient, utilizing the highest-throughput tensor core units on modern GPUs. With optimized kernels, MonarchAttention achieves substantial speed-ups in wall-time over FlashAttention-2: $1.4\times$ for shorter sequences $(N=256)$, $4.5\times$ for medium-length sequences $(N=4K)$, and $8.2\times$ for longer sequences $(N=16K)$. We demonstrate the quality of MonarchAttention on diverse tasks and architectures in vision and language problems, showing that it flexibly and accurately approximates softmax attention in a variety of contexts. Our code is available at https://github.com/cjyaras/monarch-attention.
nan
Article 1834
Title@2025-05-24 (6): Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study
Title: Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study | Kann LLMs in Graph Continual Learning Katastrophisches Vergessen lindern? Eine systematische Studie | LLMs LLM 能够减轻图持续学习中的灾难性遗忘吗?系统研究 2505.18697v1 |
Authors: Ziyang Cheng, Zhixun Li, Yuhan Li, Yixin Song, Kangyi Zhao, Dawei Cheng, Jia Li, Jeffrey Xu Yu
Nowadays, real-world data, including graph-structure data, often arrives in a streaming manner, which means that learning systems need to continuously acquire new knowledge without forgetting previously learned information. Although substantial existing works attempt to address catastrophic forgetting in graph machine learning, they are all based on training from scratch with streaming data. With the rise of pretrained models, an increasing number of studies have leveraged their strong generalization ability for continual learning. Therefore, in this work, we attempt to answer whether large language models (LLMs) can mitigate catastrophic forgetting in Graph Continual Learning (GCL). We first point out that current experimental setups for GCL have significant flaws, as the evaluation stage may lead to task ID leakage. Then, we evaluate the performance of LLMs in more realistic scenarios and find that even minor modifications can lead to outstanding results. Finally, based on extensive experiments, we propose a simple-yet-effective method, Simple Graph Continual Learning (SimGCL), that surpasses the previous state-of-the-art GNN-based baseline by around 20% under the rehearsal-free constraint. To facilitate reproducibility, we have developed an easy-to-use benchmark LLM4GCL for training and evaluating existing GCL methods. The code is available at: https://github.com/ZhixunLEE/LLM4GCL.
nan
Article 1835
Title@2025-05-24 (6): Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment
Title: Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment | Revisiting Model Inversion Evaluation: Von irreführenden Standards zur zuverlässigen Datenschutzbewertung | 重新审视示范反向评价:从错误领导标准到可靠隐私评估 2505.03519v3 |
Authors: Sy-Tuyen Ho, Koh Jun Hao, Ngoc-Bao Nguyen, Alexander Binder, Ngai-Man Cheung
Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to machine learning models T. To evaluate such attacks, the standard evaluation framework for such attacks relies on an evaluation model E, trained under the same task design as T. This framework has become the de facto standard for assessing progress in MI research, used across nearly all recent MI attacks and defenses without question. In this paper, we present the first in-depth study of this MI evaluation framework. In particular, we identify a critical issue of this standard MI evaluation framework: Type-I adversarial examples. These are reconstructions that do not capture the visual features of private training data, yet are still deemed successful by the target model T and ultimately transferable to E. Such false positives undermine the reliability of the standard MI evaluation framework. To address this issue, we introduce a new MI evaluation framework that replaces the evaluation model E with advanced Multimodal Large Language Models (MLLMs). By leveraging their general-purpose visual understanding, our MLLM-based framework does not depend on training of shared task design as in T, thus reducing Type-I transferability and providing more faithful assessments of reconstruction success. Using our MLLM-based evaluation framework, we reevaluate 26 diverse MI attack setups and empirically reveal consistently high false positive rates under the standard evaluation framework. Importantly, we demonstrate that many state-of-the-art (SOTA) MI methods report inflated attack accuracy, indicating that actual privacy leakage is significantly lower than previously believed. By uncovering this critical issue and proposing a robust solution, our work enables a reassessment of progress in MI research and sets a new standard for reliable and robust evaluation.
nan
Article 1836
Title@2025-05-24 (6): Simultaneous Optimization of Efficiency and Degradation in Tunable HTL-Free Perovskite Solar Cells with MWCNT-Integrated Back Contact Using a Machine Learning-Derived Polynomial Regressor
Title: Simultaneous Optimization of Efficiency and Degradation in Tunable HTL-Free Perovskite Solar Cells with MWCNT-Integrated Back Contact Using a Machine Learning-Derived Polynomial Regressor | Gleichzeitige Optimierung von Effizienz und Degradation in Tunablen HTL-freien Perovskite-Solarzellen mit MWCNT-Integriert Zurück Kontakt mit einem maschinenlernenden Polynom-Regressor | 利用机械学习多面制反转器,与MWCNT综合后退联系,同时优化金枪鱼可HTL-无 Perovskite的无Perovskite太阳能电池的效率和退化 2505.18693v1 |
Authors: Ihtesham Ibn Malek, Hafiz Imtiaz, Samia Subrina
Perovskite solar cells (PSCs) without a hole transport layer (HTL) offer a cost-effective and stable alternative to conventional architectures, utilizing only an absorber layer and an electron transport layer (ETL). This study presents a machine learning (ML)-driven framework to optimize the efficiency and stability of HTL-free PSCs by integrating experimental validation with numerical simulations. Excellent agreement is achieved between a fabricated device and its simulated counterpart at a molar fraction ( x = 68.7\% ) in (\mathrm{MAPb}{1-x}\mathrm{Sb}{2x/3}\mathrm{I}_3), where MA is methylammonium. A dataset of 1650 samples is generated by varying molar fraction, absorber defect density, thickness, and ETL doping, with corresponding efficiency and 50-hour degradation as targets. A fourth-degree polynomial regressor (PR-4) shows the best performance, achieving RMSEs of 0.0179 and 0.0117, and ( R^2 ) scores of 1 and 0.999 for efficiency and degradation, respectively. The derived model generalizes beyond the training range and is used in an L-BFGS-B optimization algorithm with a weighted objective function to maximize efficiency and minimize degradation. This improves device efficiency from 13.7\% to 16.84\% and reduces degradation from 6.61\% to 2.39\% over 1000 hours. Finally, the dataset is labeled into superior and inferior classes, and a multilayer perceptron (MLP) classifier achieves 100\% accuracy, successfully identifying optimal configurations.
nan
Article 1837
Title@2025-05-24 (6): Variational Schrödinger Diffusion Models
Title: Variational Schrödinger Diffusion Models | Variationelle Schrödinger-Diffusionsmodelle | 挥发模型 2405.04795v5 |
Authors: Wei Deng, Weijian Luo, Yixin Tan, Marin Biloš, Yu Chen, Yuriy Nevmyvaka, Ricky T. Q. Chen
Schr"odinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. However, SB requires estimating the intractable forward score functions, inevitably resulting in the costly implicit training loss based on simulated trajectories. To improve the scalability while preserving efficient transportation plans, we leverage variational inference to linearize the forward score functions (variational scores) of SB and restore simulation-free properties in training backward scores. We propose the variational Schr"odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport. Theoretically, we use stochastic approximation to prove the convergence of the variational scores and show the convergence of the adaptively generated samples based on the optimal variational scores. Empirically, we test the algorithm in simulated examples and observe that VSDM is efficient in generations of anisotropic shapes and yields straighter sample trajectories compared to the single-variate diffusion. We also verify the scalability of the algorithm in real-world data and achieve competitive unconditional generation performance in CIFAR10 and conditional generation in time series modeling. Notably, VSDM no longer depends on warm-up initializations and has become tuning-friendly in training large-scale experiments.
nan
Article 1838
Title@2025-05-24 (6): Large Language Models in the Task of Automatic Validation of Text Classifier Predictions
Title: Large Language Models in the Task of Automatic Validation of Text Classifier Predictions | Große Sprachmodelle in der Aufgabe der automatischen Validierung von Textklassifikatoren Vorhersagen | 文本分类自动验证任务中的大语言模型 2505.18688v1 |
Authors: Aleksandr Tsymbalov
Machine learning models for text classification are trained to predict a class for a given text. To do this, training and validation samples must be prepared: a set of texts is collected, and each text is assigned a class. These classes are usually assigned by human annotators with different expertise levels, depending on the specific classification task. Collecting such samples from scratch is labor-intensive because it requires finding specialists and compensating them for their work; moreover, the number of available specialists is limited, and their productivity is constrained by human factors. While it may not be too resource-intensive to collect samples once, the ongoing need to retrain models (especially in incremental learning pipelines) to address data drift (also called model drift) makes the data collection process crucial and costly over the model’s entire lifecycle. This paper proposes several approaches to replace human annotators with Large Language Models (LLMs) to test classifier predictions for correctness, helping ensure model quality and support high-quality incremental learning.
nan
Article 1839
Title@2025-05-24 (6): Predictive Performance of Deep Quantum Data Re-uploading Models
Title: Predictive Performance of Deep Quantum Data Re-uploading Models | Predictive Performance von Deep Quantum Data Re-Uploading-Modellen | 深量量数据数据重新加载模型的预测性性能 2505.20337v1 |
Authors: Xin Wang, Han-Xiao Tao, Re-Bing Wu
Quantum machine learning models incorporating data re-uploading circuits have garnered significant attention due to their exceptional expressivity and trainability. However, their ability to generate accurate predictions on unseen data, referred to as the predictive performance, remains insufficiently investigated. This study reveals a fundamental limitation in predictive performance when deep encoding layers are employed within the data re-uploading model. Concretely, we theoretically demonstrate that when processing high-dimensional data with limited-qubit data re-uploading models, their predictive performance progressively degenerates to near random-guessing levels as the number of encoding layers increases. In this context, the repeated data uploading cannot mitigate the performance degradation. These findings are validated through experiments on both synthetic linearly separable datasets and real-world datasets. Our results demonstrate that when processing high-dimensional data, the quantum data re-uploading models should be designed with wider circuit architectures rather than deeper and narrower ones.
nan
Article 1840
Title@2025-05-24 (6): A fast algorithm to minimize prediction loss of the optimal solution in inverse optimization problem of MILP
Title: A fast algorithm to minimize prediction loss of the optimal solution in inverse optimization problem of MILP | Ein schneller Algorithmus zur Minimierung des Vorhersageverlusts der optimalen Lösung im inversen Optimierungsproblem von MILP | 快速算法,以尽量减少MILP反优化问题最佳解决办法的预测损失 2405.14273v3 |
Authors: Akira Kitaoka
We consider the inverse optimization problem of estimating the weights of the objective function such that the given solution is an optimal solution for a mixed integer linear program (MILP). In this inverse optimization problem, the known methods exhibit inefficient convergence. Specifically, if $d$ denotes the dimension of the weights and $k$ the number of iterations, then the error of the weights is bounded by $O(k^{-1/(d-1)})$, leading to slow convergence as $d$ increases. We propose a projected subgradient method with a step size of $k^{-1/2}$ based on suboptimality loss. We theoretically show and demonstrate that the proposed method efficiently learns the weights. In particular, we show that there exists a constant $\gamma > 0$ such that the distance between the learned and true weights is bounded by $ O\left(k^{-1/(1+\gamma)} \exp\left(-\frac{\gamma k^{1/2}}{2+\gamma}\right)\right), $ or the optimal solution is exactly recovered. Furthermore, experiments demonstrate that the proposed method solves the inverse optimization problems of MILP using fewer than $1/7$ the number of MILP calls required by known methods, and converges within a finite number of iterations.
nan
Article 1841
Title@2025-05-24 (6): Thinking like a CHEMIST: Combined Heterogeneous Embedding Model Integrating Structure and Tokens
Title: Thinking like a CHEMIST: Combined Heterogeneous Embedding Model Integrating Structure and Tokens | Wie ein CHEMIST denken: Kombiniertes Heterogenes Einbetten von Modellintegrationsstrukturen und Tokens | 思考像CHEMIST: 混合异基因嵌入模型集成结构和调子 2502.17986v2 |
Authors: Nikolai Rekut, Alexey Orlov, Klea Ziu, Elizaveta Starykh, Martin Takac, Aleksandr Beznosikov
Representing molecular structures effectively in chemistry remains a challenging task. Language models and graph-based models are extensively utilized within this domain, consistently achieving state-of-the-art results across an array of tasks. However, the prevailing practice of representing chemical compounds in the SMILES format - used by most data sets and many language models - presents notable limitations as a training data format. In this study, we present a novel approach that decomposes molecules into substructures and computes descriptor-based representations for these fragments, providing more detailed and chemically relevant input for model training. We use this substructure and descriptor data as input for language model and also propose a bimodal architecture that integrates this language model with graph-based models. As LM we use RoBERTa, Graph Isomorphism Networks (GIN), Graph Convolutional Networks (GCN) and Graphormer as graph ones. Our framework shows notable improvements over traditional methods in various tasks such as Quantitative Structure-Activity Relationship (QSAR) prediction.
nan
Article 1842
Title@2025-05-24 (6): Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi
Title: Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi | Erweiterung des Aktionsraums mit Konventionen zur Verbesserung der Multi-Agenten-Kooperation in Hanabi | 与公约扩大行动空间,以改进哈纳比多剂合作 2412.06333v3 |
Authors: F. Bredell, H. A. Engelbrecht, J. C. Schoeman
The card game Hanabi is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, partial observability, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of “rules” or principles. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting an agent’s action space using conventions, which act as a sequence of special cooperative actions that span over and include multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These conventions are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play for various number of cooperators within Hanabi.
nan
Article 1843
Title@2025-05-24 (6): COPA: Comparing the incomparable in multi-objective model evaluation
Title: COPA: Comparing the incomparable in multi-objective model evaluation | COPA: Vergleich des Unvergleichbaren in der multiobjektiven Modellauswertung | CCOPA: 比较在多目标模式评价中无法比较的模型评价 2503.14321v2 |
Authors: Adrián Javaloy, Antonio Vergari, Isabel Valera
As machine learning (ML) practitioners, we often have hundreds of (trained) ML models at hand from which we need to choose one, based on various objectives such as accuracy, robustness, fairness, scalability, etc. However, how to compare, aggregate and, ultimately, trade-off these objectives is usually a time-consuming task that requires of expert knowledge, as they may be measured in different units or scales. In this work, we investigate how objectives can be automatically normalized and aggregated to systematically navigate their Pareto front. To do so, we make incomparable objectives comparable using their CDFs, approximated by their relative rankings. As a result, we can aggregate them while matching user-specific preferences, allowing practitioners to meaningfully navigate and search for models in the Pareto front. We demonstrate the potential impact of our approach, COPA, in both model selection and benchmarking tasks across diverse ML areas such as fair ML, domain generalization, AutoML and foundation models, where classical ways to normalize and aggregate objectives fall short.
nan
Article 1844
Title@2025-05-24 (6): End-to-End Framework for Predicting the Remaining Useful Life of Lithium-Ion Batteries
Title: End-to-End Framework for Predicting the Remaining Useful Life of Lithium-Ion Batteries | End-to-End-Framework zur Vorhersage der verbleibenden Nutzungsdauer von Lithium-Ionen-Batterien | 预测锂-碘电池剩余使用寿命的端至端框架 2505.16664v2 |
Authors: Khoa Tran, Tri Le, Bao Huynh, Hung-Cuong Trinh, Vy-Rin Nguyen
Accurate prediction of the Remaining Useful Life (RUL) is essential for enabling timely maintenance of lithium-ion batteries, impacting the operational efficiency of electric applications that rely on them. This paper proposes a RUL prediction approach that leverages data from recent charge-discharge cycles to estimate the number of remaining usable cycles. The approach introduces both a novel signal processing pipeline and a deep learning prediction model. In the signal preprocessing pipeline, a derived capacity feature $\dot{Q}(I, Q)$ is computed based on current and capacity signals. Alongside original capacity, voltage and current, these features are denoised and enhanced using statistical metrics and a delta-based method to capture differences between the current and previous cycles. In the prediction model, the processed features are then fed into a hybrid deep learning architecture composed of 1D Convolutional Neural Networks (CNN), Attentional Long Short-Term Memory (A-LSTM), and Ordinary Differential Equation-based LSTM (ODE-LSTM) blocks. This architecture is designed to capture both local signal characteristics and long-range temporal dependencies while modeling the continuous-time dynamics of battery degradation. The model is further evaluated using transfer learning across different learning strategies and target data partitioning scenarios. Results indicate that the model maintains robust performance, even when fine-tuned on limited target data. Experimental results on two publicly available large-scale datasets demonstrate that the proposed method outperforms a baseline deep learning approach and machine learning techniques, achieving an RMSE of 101.59, highlighting its strong potential for real-world RUL prediction applications.
nan
Article 1845
Title@2025-05-24 (6): A Quantum Approximation Scheme for k-Means
Title: A Quantum Approximation Scheme for k-Means | Ein Quantenannäherungsprogramm für k-Means | k- Means 的量接近量计划 2308.08167v3 |
Authors: Ragesh Jaiswal
We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^d$ stored in QRAM data structure, our quantum algorithm runs in time $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$ and with high probability outputs a set $C$ of $k$ centers such that $cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$. Here $C_{OPT}$ denotes the optimal $k$-centers, $cost(.)$ denotes the standard $k$-means cost function (i.e., the sum of the squared distance of points to the closest center), and $\eta$ is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of $(1+\varepsilon)$ for the $k$-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures.
nan
Article 1846
Title@2025-05-24 (6): Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations
Title: Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations | Erzeugen der Vollfeld-Evolution der physikalischen Dynamik aus irregulären Sparse-Beobachtungen | 从不定期的偏差观测中生成物理动态全场演变 2505.09284v2 |
Authors: Panqi Chen, Yifan Sun, Lei Cheng, Yang Yang, Weichang Li, Yang Liu, Weiqing Liu, Jiang Bian, Shikai Fang
Modeling and reconstructing multidimensional physical dynamics from sparse and off-grid observations presents a fundamental challenge in scientific research. Recently, diffusion-based generative modeling shows promising potential for physical simulation. However, current approaches typically operate on on-grid data with preset spatiotemporal resolution, but struggle with the sparsely observed and continuous nature of real-world physical dynamics. To fill the gaps, we present SDIFT, Sequential DIffusion in Functional Tucker space, a novel framework that generates full-field evolution of physical dynamics from irregular sparse observations. SDIFT leverages the functional Tucker model as the latent space representer with proven universal approximation property, and represents observations as latent functions and Tucker core sequences. We then construct a sequential diffusion model with temporally augmented UNet in the functional Tucker space, denoising noise drawn from a Gaussian process to generate the sequence of core tensors. At the posterior sampling stage, we propose a Message-Passing Posterior Sampling mechanism, enabling conditional generation of the entire sequence guided by observations at limited time steps. We validate SDIFT on three physical systems spanning astronomical (supernova explosions, light-year scale), environmental (ocean sound speed fields, kilometer scale), and molecular (organic liquid, millimeter scale) domains, demonstrating significant improvements in both reconstruction accuracy and computational efficiency compared to state-of-the-art approaches.
nan
Article 1847
Title@2025-05-24 (6): Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?
Title: Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment? | Findet Repräsentationsintervention wirklich Wunschvorstellungen und Ausgeglichenheit wieder? | 代表权干预是否真正确定了理想概念和目的一致? 2505.18672v1 |
Authors: Hongzheng Yang, Yongqiang Chen, Zeyu Qin, Tongliang Liu, Chaowei Xiao, Kun Zhang, Bo Han
Representation intervention aims to locate and modify the representations that encode the underlying concepts in Large Language Models (LLMs) to elicit the aligned and expected behaviors. Despite the empirical success, it has never been examined whether one could locate the faithful concepts for intervention. In this work, we explore the question in safety alignment. If the interventions are faithful, the intervened LLMs should erase the harmful concepts and be robust to both in-distribution adversarial prompts and the out-of-distribution (OOD) jailbreaks. While it is feasible to erase harmful concepts without degrading the benign functionalities of LLMs in linear settings, we show that it is infeasible in the general non-linear setting. To tackle the issue, we propose Concept Concentration (COCA). Instead of identifying the faithful locations to intervene, COCA refractors the training data with an explicit reasoning process, which firstly identifies the potential unsafe concepts and then decides the responses. Essentially, COCA simplifies the decision boundary between harmful and benign representations, enabling more effective linear erasure. Extensive experiments with multiple representation intervention methods and model architectures demonstrate that COCA significantly reduces both in-distribution and OOD jailbreak success rates, and meanwhile maintaining strong performance on regular tasks such as math and code generation.
nan
Article 1848
Title@2025-05-24 (6): Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape
Title: Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape | Flat-LoRA: Low-Rank Anpassung über eine flache verlorene Landschaft | Flat-LORA: 适应平坦损失地貌的低Rank适应 2409.14396v2 |
Authors: Tao Li, Zhengbao He, Yujun Li, Yasheng Wang, Lifeng Shang, Xiaolin Huang
Fine-tuning large-scale pre-trained models is prohibitively expensive in terms of computation and memory costs. Low-Rank Adaptation (LoRA), a popular Parameter-Efficient Fine-Tuning (PEFT) method, offers an efficient solution by optimizing only low-rank matrices. Despite recent progress in improving LoRA’s performance, the relationship between the LoRA optimization space and the full parameter space is often overlooked. A solution that appears flat in the loss landscape of the LoRA space may still exhibit sharp directions in the full parameter space, potentially compromising generalization. We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space. Instead of adopting the well-established sharpness-aware minimization approach, which incurs significant computation and memory overheads, we employ a Bayesian expectation loss objective to preserve training efficiency. Further, we design a refined random perturbation generation strategy for improved performance and carefully manage memory overhead using random seeds. Experiments across diverse tasks-including mathematical reasoning, coding abilities, dialogue generation, instruction following, and text-to-image generation-demonstrate that Flat-LoRA improves both in-domain and out-of-domain generalization. Code is available at https://github.com/nblt/Flat-LoRA.
nan
Article 1849
Title@2025-05-24 (6): DeCaFlow: A Deconfounding Causal Generative Model
Title: DeCaFlow: A Deconfounding Causal Generative Model | DeCaFlow: Ein entkonfoundierendes Kausalgeneratives Modell | DeCaFlow:一个破碎的因果创造模型 2503.15114v2 |
Authors: Alejandro Almodóvar, Adrián Javaloy, Juan Parras, Santiago Zazo, Isabel Valera
We introduce DeCaFlow, a deconfounding causal generative model. Training once per dataset using just observational data and the underlying causal graph, DeCaFlow enables accurate causal inference on continuous variables under the presence of hidden confounders. Specifically, we extend previous results on causal estimation under hidden confounding to show that a single instance of DeCaFlow provides correct estimates for all causal queries identifiable with do-calculus, leveraging proxy variables to adjust for the causal effects when do-calculus alone is insufficient. Moreover, we show that counterfactual queries are identifiable as long as their interventional counterparts are identifiable, and thus are also correctly estimated by DeCaFlow. Our empirical results on diverse settings (including the Ecoli70 dataset, with 3 independent hidden confounders, tens of observed variables and hundreds of causal queries) show that DeCaFlow outperforms existing approaches, while demonstrating its out-of-the-box applicability to any given causal graph. An implementation can be found in https://github.com/aalmodovares/DeCaFlow
nan
Article 1850
Title@2025-05-24 (6): Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems
Title: Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems | Selbstüberwachtes Evolutionsoperator-Lernen für hochdimensionelle dynamische Systeme | 高多元动态系统学习 2505.18671v1 |
Authors: Giacomo Turri, Luigi Bonati, Kai Zhu, Massimiliano Pontil, Pietro Novelli
We introduce an encoder-only approach to learn the evolution operators of large-scale non-linear dynamical systems, such as those describing complex natural phenomena. Evolution operators are particularly well-suited for analyzing systems that exhibit complex spatio-temporal patterns and have become a key analytical tool across various scientific communities. As terabyte-scale weather datasets and simulation tools capable of running millions of molecular dynamics steps per day are becoming commodities, our approach provides an effective tool to make sense of them from a data-driven perspective. The core of it lies in a remarkable connection between self-supervised representation learning methods and the recently established learning theory of evolution operators. To show the usefulness of the proposed method, we test it across multiple scientific domains: explaining the folding dynamics of small proteins, the binding process of drug-like molecules in host sites, and autonomously finding patterns in climate data. Code and data to reproduce the experiments are made available open source.
nan
Article 1851
Title@2025-05-24 (6): Memory-Efficient Super-Resolution of 3D Micro-CT Images Using Octree-Based GANs: Enhancing Resolution and Segmentation Accuracy
Title: Memory-Efficient Super-Resolution of 3D Micro-CT Images Using Octree-Based GANs: Enhancing Resolution and Segmentation Accuracy | Speichereffiziente Super-Resolution von 3D-Mikro-CT-Bildern mit oktree-basierten GANs: Verbesserung der Auflösung und Segmentierung Genauigkeit | 使用以屋底为主的GANs:加强分辨率和分解准确度 2505.18664v1 |
Authors: Evgeny Ugolkov, Xupeng He, Hyung Kwak, Hussein Hoteit
We present a memory-efficient algorithm for significantly enhancing the quality of segmented 3D micro-Computed Tomography (micro-CT) images of rocks using a generative model. The proposed model achieves a 16x increase in resolution and corrects inaccuracies in segmentation caused by the overlapping X-ray attenuation in micro-CT measurements across different minerals. The generative model employed is a 3D Octree-based convolutional Wasserstein generative adversarial network with gradient penalty. To address the challenge of high memory consumption inherent in standard 3D convolutional layers, we implemented an Octree structure within the 3D progressive growing generator model. This enabled the use of memory-efficient 3D Octree-based convolutional layers. The approach is pivotal in overcoming the long-standing memory bottleneck in volumetric deep learning, making it possible to reach 16x super-resolution in 3D, a scale that is challenging to attain due to cubic memory scaling. For training, we utilized segmented 3D low-resolution micro-CT images along with unpaired segmented complementary 2D high-resolution laser scanning microscope images. Post-training, resolution improved from 7 to 0.44 micro-m/voxel with accurate segmentation of constituent minerals. Validated on Berea sandstone, this framework demonstrates substantial improvements in pore characterization and mineral differentiation, offering a robust solution to one of the primary computational limitations in modern geoscientific imaging.
nan
Article 1852
Title@2025-05-24 (6): Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees
Title: Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees | Adaptive Vorhersage-Powered AutoEval mit Zuverlässigkeit und Effizienzgarantien | 具有可靠性和效率保障的适应性预测力自动评估 2505.18659v1 |
Authors: Sangwoo Park, Matteo Zecchin, Osvaldo Simeone
Selecting artificial intelligence (AI) models, such as large language models (LLMs), from multiple candidates requires accurate performance estimation. This is ideally achieved through empirical evaluations involving abundant real-world data. However, such evaluations are costly and impractical at scale. To address this challenge, autoevaluation methods leverage synthetic data produced by automated evaluators, such as LLMs-as-judges, reducing variance but potentially introducing bias. Recent approaches have employed semi-supervised prediction-powered inference (\texttt{PPI}) to correct for the bias of autoevaluators. However, the use of autoevaluators may lead in practice to a degradation in sample efficiency compared to conventional methods using only real-world data. In this paper, we propose \texttt{R-AutoEval+}, a novel framework that provides finite-sample reliability guarantees on the model evaluation, while also ensuring an enhanced (or at least no worse) sample efficiency compared to conventional methods. The key innovation of \texttt{R-AutoEval+} is an adaptive construction of the model evaluation variable, which dynamically tunes its reliance on synthetic data, reverting to conventional methods when the autoevaluator is insufficiently accurate. Experiments on the use of LLMs-as-judges for the optimization of quantization settings for the weights of an LLM, and for prompt design in LLMs confirm the reliability and efficiency of \texttt{R-AutoEval+}.
nan
Article 1853
Title@2025-05-24 (6): Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics
Title: Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics | Robustheit in großen Sprachmodellen: Eine Umfrage zu Mitigationsstrategien und Evaluationsmetrics | 大语言模式的强强力:减轻战略调查和评价 2505.18658v1 |
Authors: Pankaj Kumar, Subhankar Mishra
Large Language Models (LLMs) have emerged as a promising cornerstone for the development of natural language processing (NLP) and artificial intelligence (AI). However, ensuring the robustness of LLMs remains a critical challenge. To address these challenges and advance the field, this survey provides a comprehensive overview of current studies in this area. First, we systematically examine the nature of robustness in LLMs, including its conceptual foundations, the importance of consistent performance across diverse inputs, and the implications of failure modes in real-world applications. Next, we analyze the sources of non-robustness, categorizing intrinsic model limitations, data-driven vulnerabilities, and external adversarial factors that compromise reliability. Following this, we review state-of-the-art mitigation strategies, and then we discuss widely adopted benchmarks, emerging metrics, and persistent gaps in assessing real-world reliability. Finally, we synthesize findings from existing surveys and interdisciplinary studies to highlight trends, unresolved issues, and pathways for future research.
nan
Article 1854
Title@2025-05-24 (6): LLM-QFL: Distilling Large Language Model for Quantum Federated Learning
Title: LLM-QFL: Distilling Large Language Model for Quantum Federated Learning | LLM-QFL: Destillieren eines großen Sprachmodells für Quantum-Federated Learning | LLM-QFL:为量子联邦学习保留大语言模式 2505.18656v1 |
Authors: Dev Gurung, Shiva Raj Pokhrel
Inspired by the power of large language models (LLMs), our research adapts them to quantum federated learning (QFL) to boost efficiency and performance. We propose a federated fine-tuning method that distills an LLM within QFL, allowing each client to locally adapt the model to its own data while preserving privacy and reducing unnecessary global updates. The fine-tuned LLM also acts as a reinforcement agent, optimizing QFL by adjusting optimizer steps, cutting down communication rounds, and intelligently selecting clients. Experiments show significant efficiency gains. We pioneer a synergy between LLM and QFL, offering: i) practical efficiency: Reduced communication costs and faster convergence. ii) theoretical rigor: Provable guarantees for adaptive federated optimization. iii) scalability: PEFT methods (LoRA, QLoRA) enable deployment on resource-constrained quantum devices. Code implementation is available here 1.
nan
Article 1855
Title@2025-05-24 (6): On the Emergence of Linear Analogies in Word Embeddings
Title: On the Emergence of Linear Analogies in Word Embeddings | Zur Entstehung linearer Analogien in Word-Embeddings | 单线模拟在文字嵌入中的出现 2505.18651v1 |
Authors: Daniel J. Korchinski, Dhruva Karkada, Yasaman Bahri, Matthieu Wyart
Models such as Word2Vec and GloVe construct word embeddings based on the co-occurrence probability $P(i,j)$ of words $i$ and $j$ in text corpora. The resulting vectors $W_i$ not only group semantically similar words but also exhibit a striking linear analogy structure – for example, $W_{\text{king}} - W_{\text{man}} + W_{\text{woman}} \approx W_{\text{queen}}$ – whose theoretical origin remains unclear. Previous observations indicate that this analogy structure: (i) already emerges in the top eigenvectors of the matrix $M(i,j) = P(i,j)/P(i)P(j)$, (ii) strengthens and then saturates as more eigenvectors of $M (i, j)$, which controls the dimension of the embeddings, are included, (iii) is enhanced when using $\log M(i,j)$ rather than $M(i,j)$, and (iv) persists even when all word pairs involved in a specific analogy relation (e.g., king-queen, man-woman) are removed from the corpus. To explain these phenomena, we introduce a theoretical generative model in which words are defined by binary semantic attributes, and co-occurrence probabilities are derived from attribute-based interactions. This model analytically reproduces the emergence of linear analogy structure and naturally accounts for properties (i)-(iv). It can be viewed as giving fine-grained resolution into the role of each additional embedding dimension. It is robust to various forms of noise and agrees well with co-occurrence statistics measured on Wikipedia and the analogy benchmark introduced by Mikolov et al.
nan
Article 1856
Title@2025-05-24 (6): Flow Matching for Geometric Trajectory Simulation
Title: Flow Matching for Geometric Trajectory Simulation | Flow Matching für geometrische Trajektoriensimulation | 几何轨迹模拟流程匹配 2505.18647v1 |
Authors: Kiet Bennema ten Brinke, Koen Minartz, Vlado Menkovski
The simulation of N-body systems is a fundamental problem with applications in a wide range of fields, such as molecular dynamics, biochemistry, and pedestrian dynamics. Machine learning has become an invaluable tool for scaling physics-based simulators and developing models directly from experimental data. In particular, recent advances based on deep generative modeling and geometric deep learning have enabled probabilistic simulation by modeling complex distributions over trajectories while respecting the permutation symmetry that is fundamental to N-body systems. However, to generate realistic trajectories, existing methods must learn complex transformations starting from uninformed noise and do not allow for the exploitation of domain-informed priors. In this work, we propose STFlow to address this limitation. By leveraging flow matching and data-dependent couplings, STFlow facilitates physics-informed simulation of geometric trajectories without sacrificing model expressivity or scalability. Our evaluation on N-body dynamical systems, molecular dynamics, and pedestrian dynamics benchmarks shows that STFlow produces significantly lower prediction errors while enabling more efficient inference, highlighting the benefits of employing physics-informed prior distributions in probabilistic geometric trajectory modeling.
nan
Article 1857
Title@2025-05-24 (6): Randomized Midpoint Method for Log-Concave Sampling under Constraints
Title: Randomized Midpoint Method for Log-Concave Sampling under Constraints | Randomisierte Midpoint-Methode für Log-Concave-Sampling unter Einschränkungen | 制约下对日志集点取样的随机中点方法 2405.15379v2 |
Authors: Yifeng Yu, Lu Yu
In this paper, we study the problem of sampling from log-concave distributions supported on convex, compact sets, with a particular focus on the randomized midpoint discretization of both vanilla and kinetic Langevin diffusions in this constrained setting. We propose a unified proximal framework for handling constraints via a broad class of projection operators, including Euclidean, Bregman, and Gauge projections. Within this framework, we establish non-asymptotic bounds in both $\mathcal{W}_1$ and $\mathcal{W}_2$ distances, providing precise complexity guarantees and performance comparisons. In addition, our analysis leads to sharper convergence guarantees for both vanilla and kinetic Langevin Monte Carlo under constraints, improving upon existing theoretical results.
nan
Article 1858
Title@2025-05-24 (6): STaRFormer: Semi-Supervised Task-Informed Representation Learning via Dynamic Attention-Based Regional Masking for Sequential Data
Title: STaRFormer: Semi-Supervised Task-Informed Representation Learning via Dynamic Attention-Based Regional Masking for Sequential Data | StaRFormer: Halbüberwachtes Task-Informiertes Representation-Lernen über dynamisches, aufmerksamkeitsbasiertes regionales Masking für sequentielle Daten | STARFormer:通过动态关注-基于关注的区域按顺序数据区域掩码,进行半超常任务化代表性学习 2504.10097v2 |
Authors: Maximilian Forstenhäusler, Daniel Külzer, Christos Anagnostopoulos, Shameem Puthiya Parambath, Natascha Weber
Accurate predictions using sequential spatiotemporal data are crucial for various applications. Utilizing real-world data, we aim to learn the intent of a smart device user within confined areas of a vehicle’s surroundings. However, in real-world scenarios, environmental factors and sensor limitations result in non-stationary and irregularly sampled data, posing significant challenges. To address these issues, we developed a Transformer-based approach, STaRFormer, which serves as a universal framework for sequential modeling. STaRFormer employs a novel, dynamic attention-based regional masking scheme combined with semi-supervised contrastive learning to enhance task-specific latent representations. Comprehensive experiments on 15 datasets varying in types (including non-stationary and irregularly sampled), domains, sequence lengths, training samples, and applications, demonstrate the efficacy and practicality of STaRFormer. We achieve notable improvements over state-of-the-art approaches. Code and data will be made available.
nan
Article 1859
Title@2025-05-24 (6): ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation
Title: ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation | ThanoRA: Aufgabe Heterogenität bewusst Multi-Task Low-Rank-Anpassung | 塔诺拉:任务差异性-软件多功能、多任务、低风险适应 2505.18640v1 |
Authors: Jian Liang, Wenke Huang, Xianda Guo, Guancheng Wan, Bo Du, Mang Ye
Low-Rank Adaptation (LoRA) is widely adopted for downstream fine-tuning of foundation models due to its efficiency and zero additional inference cost. Many real-world applications require foundation models to specialize in multiple tasks simultaneously, motivating the need for efficient multi-task adaptation. While recent approaches integrate LoRA with mixture-of-experts (MoE) to address this, the use of routers prevents parameter mergeability, which increases inference overhead and hinders unified multi-task adaptation, thereby limiting deployment practicality. In this work, we propose ThanoRA, a Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation framework that enables multi-task adaptation while preserving the inference efficiency of LoRA. ThanoRA jointly models task heterogeneity and mitigates subspace interference throughout training. Specifically, motivated by inherent differences in complexity and heterogeneity across tasks, ThanoRA constructs task-specific LoRA subspaces at initialization, enabling fine-grained knowledge injection aligned with task heterogeneity. Furthermore, to prevent task interference and subspace collapse during multi-task training, ThanoRA introduces a subspace-preserving regularization that maintains the independence of task-specific representations. With the synergy of both components, ThanoRA enables efficient and unified multi-task adaptation. Extensive experiments across multimodal and text-only benchmarks under varying multi-task mixtures demonstrate that ThanoRA consistently achieves robust and superior performance over strong baselines without introducing additional inference overhead. Our code is publicly available at: https://github.com/LiangJian24/ThanoRA.
nan
Article 1860
Title@2025-05-24 (6): Graph-Supported Dynamic Algorithm Configuration for Multi-Objective Combinatorial Optimization
Title: Graph-Supported Dynamic Algorithm Configuration for Multi-Objective Combinatorial Optimization | Graphunterstützte dynamische Algorithmenkonfiguration für multi-objektive Kombinator-Optimierung | 多目标组合优化多目标组合优化支持的图形支持动态算法配置 2505.16471v2 |
Authors: Robbert Reijnen, Yaoxin Wu, Zaharah Bukhsh, Yingqian Zhang
Deep reinforcement learning (DRL) has been widely used for dynamic algorithm configuration, particularly in evolutionary computation, which benefits from the adaptive update of parameters during the algorithmic execution. However, applying DRL to algorithm configuration for multi-objective combinatorial optimization (MOCO) problems remains relatively unexplored. This paper presents a novel graph neural network (GNN) based DRL to configure multi-objective evolutionary algorithms. We model the dynamic algorithm configuration as a Markov decision process, representing the convergence of solutions in the objective space by a graph, with their embeddings learned by a GNN to enhance the state representation. Experiments on diverse MOCO challenges indicate that our method outperforms traditional and DRL-based algorithm configuration methods in terms of efficacy and adaptability. It also exhibits advantageous generalizability across objective types and problem sizes, and applicability to different evolutionary computation methods.
nan
Article 1861
Title@2025-05-24 (6): DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection
Title: DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection | DitHub: Modulares Framework zur inkrementellen Open-Vocabulary-Objekterkennung | DitHub: 递增开放词汇物体探测模块框架 2503.09271v2 |
Authors: Chiara Cappellino, Gianluca Mancusi, Matteo Mosconi, Angelo Porrello, Simone Calderara, Rita Cucchiara
Open-Vocabulary object detectors can generalize to an unrestricted set of categories through simple textual prompting. However, adapting these models to rare classes or reinforcing their abilities on multiple specialized domains remains essential. While recent methods rely on monolithic adaptation strategies with a single set of weights, we embrace modular deep learning. We introduce DitHub, a framework designed to build and maintain a library of efficient adaptation modules. Inspired by Version Control Systems, DitHub manages expert modules as branches that can be fetched and merged as needed. This modular approach allows us to conduct an in-depth exploration of the compositional properties of adaptation modules, marking the first such study in Object Detection. Our method achieves state-of-the-art performance on the ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to assess class reappearance. For more details, visit our project page: https://aimagelab.github.io/DitHub/
nan
Article 1862
Title@2025-05-24 (6): Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Title: Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees | Multi-Step Alignment als Markov Games: Ein optimaler Online-Gradient-Abstieg mit Konvergenzgarantien | 作为Markov运动会的多步对齐:带有一致保障的乐观的在线逐渐递增人种方法 2502.12678v2 |
Authors: Yongtao Wu, Luca Viano, Yihang Chen, Zhenyu Zhu, Kimon Antonakopoulos, Quanquan Gu, Volkan Cevher
Reinforcement Learning from Human Feedback (RLHF) has been highly successful in aligning large language models with human preferences. While prevalent methods like DPO have demonstrated strong performance, they frame interactions with the language model as a bandit problem, which limits their applicability in real-world scenarios where multi-turn conversations are common. Additionally, DPO relies on the Bradley-Terry model assumption, which does not adequately capture the non-transitive nature of human preferences. In this paper, we address these challenges by modeling the alignment problem as a two-player constant-sum Markov game, where each player seeks to maximize their winning rate against the other across all steps of the conversation. Our approach Optimistic Multi-step Preference Optimization (OMPO) is built upon the optimistic online mirror descent algorithm~\citep{rakhlin2013online,joulani17a}. Theoretically, we provide a rigorous analysis for the convergence of OMPO and show that OMPO requires $\mathcal{O}(\epsilon^{-1})$ policy updates to converge to an $\epsilon$-approximate Nash equilibrium. We also validate the effectiveness of our method on multi-turn conversations dataset and math reasoning dataset.
nan
Article 1863
Title@2025-05-24 (6): Leveraging Structural Knowledge in Diffusion Models for Source Localization in Data-Limited Graph Scenarios
Title: Leveraging Structural Knowledge in Diffusion Models for Source Localization in Data-Limited Graph Scenarios | Nutzung struktureller Kenntnisse in Diffusionsmodellen für die Quellenlokalisierung in datenbeschränkten Graphenszenarien | 利用传播模型中的结构性知识,在数据限制的图表假设情景中实现源本地化 2502.17928v2 |
Authors: Hongyi Chen, Jingtao Ding, Xiaojun Liang, Yong Li, Xiao-Ping Zhang
The source localization problem in graph information propagation is crucial for managing various network disruptions, from misinformation spread to infrastructure failures. While recent deep generative approaches have shown promise in this domain, their effectiveness is limited by the scarcity of real-world propagation data. This paper introduces SIDSL (\textbf{S}tructure-prior \textbf{I}nformed \textbf{D}iffusion model for \textbf{S}ource \textbf{L}ocalization), a novel framework that addresses three key challenges in limited-data scenarios: unknown propagation patterns, complex topology-propagation relationships, and class imbalance between source and non-source nodes. SIDSL incorporates topology-aware priors through graph label propagation and employs a propagation-enhanced conditional denoiser with a GNN-parameterized label propagation module (GNN-LP). Additionally, we propose a structure-prior biased denoising scheme that initializes from structure-based source estimations rather than random noise, effectively countering class imbalance issues. Experimental results across four real-world datasets demonstrate SIDSL’s superior performance, achieving 7.5-13.3% improvements in F1 scores compared to state-of-the-art methods. Notably, when pretrained with simulation data of synthetic patterns, SIDSL maintains robust performance with only 10% of training data, surpassing baselines by more than 18.8%. These results highlight SIDSL’s effectiveness in real-world applications where labeled data is scarce.
nan
Article 1864
Title@2025-05-24 (6): Asymmetric Duos: Sidekicks Improve Uncertainty
Title: Asymmetric Duos: Sidekicks Improve Uncertainty | Asymmetrische Duos: Sidekicks verbessern Unsicherheit | 非对称 Duos: 侧边icks 改善不确定性 2505.18636v1 |
Authors: Tim G. Zhou, Evan Shelhamer, Geoff Pleiss
The go-to strategy to apply deep networks in settings where uncertainty informs decisions–ensembling multiple training runs with random initializations–is ill-suited for the extremely large-scale models and practical fine-tuning workflows of today. We introduce a new cost-effective strategy for improving the uncertainty quantification and downstream decisions of a large model (e.g. a fine-tuned ViT-B): coupling it with a less accurate but much smaller “sidekick” (e.g. a fine-tuned ResNet-34) with a fraction of the computational cost. We propose aggregating the predictions of this \emph{Asymmetric Duo} by simple learned weighted averaging. Surprisingly, despite their inherent asymmetry, the sidekick model almost never harms the performance of the larger model. In fact, across five image classification benchmarks and a variety of model architectures and training schemes (including soups), Asymmetric Duos significantly improve accuracy, uncertainty quantification, and selective classification metrics with only ${\sim}10-20\%$ more computation.
nan
Article 1865
Title@2025-05-24 (6): You Can Wash Hands Better: Accurate Daily Handwashing Assessment with a Smartwatch
Title: You Can Wash Hands Better: Accurate Daily Handwashing Assessment with a Smartwatch | Sie können Hände besser waschen: Genaue tägliche Handwäsche Bewertung mit einer Smartwatch | 你可以更好地洗手:用智能观察准确进行每日洗手评估 2112.06657v5 |
Authors: Fei Wang, Tingting Zhang, Xilei Wu, Pengcheng Wang, Xin Wang, Han Ding, Jingang Shi, Jinsong Han, Dong Huang
Hand hygiene is among the most effective daily practices for preventing infectious diseases such as influenza, malaria, and skin infections. While professional guidelines emphasize proper handwashing to reduce the risk of viral infections, surveys reveal that adherence to these recommendations remains low. To address this gap, we propose UWash, a wearable solution leveraging smartwatches to evaluate handwashing procedures, aiming to raise awareness and cultivate high-quality handwashing habits. We frame the task of handwashing assessment as an action segmentation problem, similar to those in computer vision, and introduce a simple yet efficient two-stream UNet-like network to achieve this goal. Experiments involving 51 subjects demonstrate that UWash achieves 92.27% accuracy in handwashing gesture recognition, an error of <0.5 seconds in onset/offset detection, and an error of <5 points in gesture scoring under user-dependent settings. The system also performs robustly in user-independent and user-independent-location-independent evaluations. Remarkably, UWash maintains high performance in real-world tests, including evaluations with 10 random passersby at a hospital 9 months later and 10 passersby in an in-the-wild test conducted 2 years later. UWash is the first system to score handwashing quality based on gesture sequences, offering actionable guidance for improving daily hand hygiene. The code and dataset are publicly available at https://github.com/aiotgroup/UWash
nan
Article 1866
Title@2025-05-24 (6): Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
Title: Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding | Denken Sie, bevor Sie akzeptieren: Semantische Reflektierende Verifizierung für schnellere spekulative Dekodierung | 在你接受之前先想想: 快速投机代号的语义反省校验 2505.18629v1 |
Authors: Yixuan Wang, Yijun Liu, Shiyu ji, Yuzhuang Xu, Yang Xu, Qingfu Zhu, Wanxiang Che
Large language models (LLMs) suffer from high inference latency due to the auto-regressive decoding process. Speculative decoding accelerates inference by generating multiple draft tokens using a lightweight model and verifying them in parallel. However, existing verification methods rely heavily on distributional consistency while overlooking semantic correctness, thereby limiting the potential speedup of speculative decoding. While some methods employ additional models for relaxed verification of draft tokens, they often fail to generalize effectively to more diverse or open-domain settings. In this work, we propose Reflective Verification, a training-free and semantics-aware approach that achieves a better trade-off between correctness and efficiency. Specifically, we leverage the inherent reflective capacity of LLMs to semantically assess the correctness of draft tokens in parallel during verification. Using prompt-based probing, we obtain both the original and reflective distributions of draft tokens in a single forward pass. The fusion of these distributions enables semantic-level verification of draft tokens that incorporates both consistency and correctness. Experiments across multiple domain benchmarks and model scales demonstrate that our method significantly increases the acceptance length of draft tokens without compromising model performance. Furthermore, we find that the proposed Reflective Verification is orthogonal to existing statistical verification methods, and their combination yields additional 5$\sim$15\% improvements in decoding speed.
nan
Article 1867
Title@2025-05-24 (6): HARP: Hesitation-Aware Reframing in Transformer Inference Pass
Title: HARP: Hesitation-Aware Reframing in Transformer Inference Pass | HARP: Hezitation-Aware Reframing in Transformer Inferenz Pass | HARP: 变压器推断通过中的偏移-软件重新配置 2412.07282v2 |
Authors: Romain Storaï, Seung-won Hwang
This paper aims to improve the performance of large language models by addressing the variable computational demands in inference steps, where some tokens require more computational resources than others. We present HARP, a simple modification to “off-the-shelf” Transformer forward pass. Drawing from hesitation and the framing effect in decision-making, HARP selectively applies additional computation when the model encounters uncertainty during token generation. Our method mimics human cognitive processes by pausing at difficult decision points and reframing inputs for a different perspective. Unlike other approaches, HARP is model-agnostic, training-free, and easy to implement. We evaluate our method across various downstream tasks and model sizes, demonstrating performance improvements up to +5.16%. Notably, HARP achieves these gains while maintaining inference times twice faster than beam search. Simple and yet with significant gains, HARP provides insights into the potential of adaptive computation for enhancing the performance of Transformer-based language models.
nan
Article 1868
Title@2025-05-24 (6): QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual Explanations
Title: QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual Explanations | QUCE: Die Minimierung und Quantifizierung pfadbasierter Unsicherheiten für generative gegenfaktische Erklärungen | QUCE: 产生反事实解释的路径不确定性的最小化和量化 2402.17516v5 |
Authors: Jamie Duell, Monika Seisenberger, Hsuan Fu, Xiuyi Fan
Deep Neural Networks (DNNs) stand out as one of the most prominent approaches within the Machine Learning (ML) domain. The efficacy of DNNs has surged alongside recent increases in computational capacity, allowing these approaches to scale to significant complexities for addressing predictive challenges in big data. However, as the complexity of DNN models rises, interpretability diminishes. In response to this challenge, explainable models such as Adversarial Gradient Integration (AGI) leverage path-based gradients provided by DNNs to elucidate their decisions. Yet the performance of path-based explainers can be compromised when gradients exhibit irregularities during out-of-distribution path traversal. In this context, we introduce Quantified Uncertainty Counterfactual Explanations (QUCE), a method designed to mitigate out-of-distribution traversal by minimizing path uncertainty. QUCE not only quantifies uncertainty when presenting explanations but also generates more certain counterfactual examples. We showcase the performance of the QUCE method by comparing it with competing methods for both path-based explanations and generative counterfactual examples.
nan
Article 1869
Title@2025-05-24 (6): Mind The Gap: Deep Learning Doesn’t Learn Deeply
Title: Mind The Gap: Deep Learning Doesn’t Learn Deeply | Mind The Gap: Deep Learning lernt nicht tief | 思想差距:深学习不深入学习 2505.18623v1 |
Authors: Lucas Saldyt, Subbarao Kambhampati
This paper aims to understand how neural networks learn algorithmic reasoning by addressing two questions: How faithful are learned algorithms when they are effective, and why do neural networks fail to learn effective algorithms otherwise? To answer these questions, we use neural compilation, a technique that directly encodes a source algorithm into neural network parameters, enabling the network to compute the algorithm exactly. This enables comparison between compiled and conventionally learned parameters, intermediate vectors, and behaviors. This investigation is crucial for developing neural networks that robustly learn complexalgorithms from data. Our analysis focuses on graph neural networks (GNNs), which are naturally aligned with algorithmic reasoning tasks, specifically our choices of BFS, DFS, and Bellman-Ford, which cover the spectrum of effective, faithful, and ineffective learned algorithms. Commonly, learning algorithmic reasoning is framed as induction over synthetic data, where a parameterized model is trained on inputs, traces, and outputs produced by an underlying ground truth algorithm. In contrast, we introduce a neural compilation method for GNNs, which sets network parameters analytically, bypassing training. Focusing on GNNs leverages their alignment with algorithmic reasoning, extensive algorithmic induction literature, and the novel application of neural compilation to GNNs. Overall, this paper aims to characterize expressability-trainability gaps - a fundamental shortcoming in learning algorithmic reasoning. We hypothesize that inductive learning is most effective for parallel algorithms contained within the computational class \texttt{NC}.
nan
Article 1870
Title@2025-05-24 (6): Trust, or Don’t Predict: Introducing the CWSA Family for Confidence-Aware Model Evaluation
Title: Trust, or Don’t Predict: Introducing the CWSA Family for Confidence-Aware Model Evaluation | Vertrauen oder nicht voraussagen: Einführung der CWSA-Familie für vertrauensbewusste Modellbewertung | 信任或不要预测:介绍CWSA家庭促进信任-了解模型评价 2505.18622v1 |
Authors: Kourosh Shahnazari, Seyed Moein Ayyoubzadeh, Mohammadali Keshtparvar, Pegah Ghaffari
In recent machine learning systems, confidence scores are being utilized more and more to manage selective prediction, whereby a model can abstain from making a prediction when it is unconfident. Yet, conventional metrics like accuracy, expected calibration error (ECE), and area under the risk-coverage curve (AURC) do not capture the actual reliability of predictions. These metrics either disregard confidence entirely, dilute valuable localized information through averaging, or neglect to suitably penalize overconfident misclassifications, which can be particularly detrimental in real-world systems. We introduce two new metrics Confidence-Weighted Selective Accuracy (CWSA) and its normalized variant CWSA+ that offer a principled and interpretable way to evaluate predictive models under confidence thresholds. Unlike existing methods, our metrics explicitly reward confident accuracy and penalize overconfident mistakes. They are threshold-local, decomposable, and usable in both evaluation and deployment settings where trust and risk must be quantified. Through exhaustive experiments on both real-world data sets (MNIST, CIFAR-10) and artificial model variants (calibrated, overconfident, underconfident, random, perfect), we show that CWSA and CWSA+ both effectively detect nuanced failure modes and outperform classical metrics in trust-sensitive tests. Our results confirm that CWSA is a sound basis for developing and assessing selective prediction systems for safety-critical domains.
nan
Article 1871
Title@2025-05-24 (6): Neural Solver Selection for Combinatorial Optimization
Title: Neural Solver Selection for Combinatorial Optimization | Neural Solver Selection zur kombinatorischen Optimierung | 组合优化的神经溶剂选择 2410.09693v2 |
Authors: Chengrui Gao, Haopu Shang, Ke Xue, Chao Qian
Machine learning has increasingly been employed to solve NP-hard combinatorial optimization problems, resulting in the emergence of neural solvers that demonstrate remarkable performance, even with minimal domain-specific knowledge. To date, the community has created numerous open-source neural solvers with distinct motivations and inductive biases. While considerable efforts are devoted to designing powerful single solvers, our findings reveal that existing solvers typically demonstrate complementary performance across different problem instances. This suggests that significant improvements could be achieved through effective coordination of neural solvers at the instance level. In this work, we propose the first general framework to coordinate the neural solvers, which involves feature extraction, selection model, and selection strategy, aiming to allocate each instance to the most suitable solvers. To instantiate, we collect several typical neural solvers with state-of-the-art performance as alternatives, and explore various methods for each component of the framework. We evaluated our framework on two extensively studied combinatorial optimization problems, Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP). Experimental results show that the proposed framework can effectively distribute instances and the resulting composite solver can achieve significantly better performance (e.g., reduce the optimality gap by 0.88\% on TSPLIB and 0.71\% on CVRPLIB) than the best individual neural solver with little extra time cost.
nan
Article 1872
Title@2025-05-24 (6): Federated Class-Incremental Learning with Hierarchical Generative Prototypes
Title: Federated Class-Incremental Learning with Hierarchical Generative Prototypes | Föderiertes Klassen-Inkrementelles Lernen mit Hierarchischen Generativen Prototypen | 具有等级制起源原型的联邦高级高等程度学习 2406.02447v4 |
Authors: Riccardo Salami, Pietro Buzzega, Matteo Mosconi, Mattia Verasani, Simone Calderara
Federated Learning (FL) aims at unburdening the training of deep models by distributing computation across multiple devices (clients) while safeguarding data privacy. On top of that, Federated Continual Learning (FCL) also accounts for data distribution evolving over time, mirroring the dynamic nature of real-world environments. While previous studies have identified Catastrophic Forgetting and Client Drift as primary causes of performance degradation in FCL, we shed light on the importance of Incremental Bias and Federated Bias, which cause models to prioritize classes that are recently introduced or locally predominant, respectively. Our proposal constrains both biases in the last layer by efficiently finetuning a pre-trained backbone using learnable prompts, resulting in clients that produce less biased representations and more biased classifiers. Therefore, instead of solely relying on parameter aggregation, we leverage generative prototypes to effectively balance the predictions of the global model. Our method significantly improves the current State Of The Art, providing an average increase of +7.8% in accuracy.
nan
Article 1873
Title@2025-05-24 (6): MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation
Title: MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation | MAVL: Ein mehrsprachiger Audio-Video-Text Datensatz für animierte Song-Übersetzung | MAVL: 动动歌曲翻译多语种视听歌词数据集 2505.18614v1 |
Authors: Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu
Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video Lyrics Benchmark for Animated Song Translation (MAVL), the first multilingual, multimodal benchmark for singable lyrics translation. By integrating text, audio, and video, MAVL enables richer and more expressive translations than text-only approaches. Building on this, we propose Syllable-Constrained Audio-Video LLM with Chain-of-Thought SylAVL-CoT, which leverages audio-video cues and enforces syllabic constraints to produce natural-sounding lyrics. Experimental results demonstrate that SylAVL-CoT significantly outperforms text-based models in singability and contextual accuracy, emphasizing the value of multimodal, multilingual approaches for lyrics translation.
nan
Article 1874
Title@2025-05-24 (6): MLRan: A Behavioural Dataset for Ransomware Analysis and Detection
Title: MLRan: A Behavioural Dataset for Ransomware Analysis and Detection | MLRan: Ein Verhaltensdatensatz für Ransomware Analyse und Erkennung | MLran:用于分析和探测Ransomware 分析和探测的行为数据集 2505.18613v1 |
Authors: Faithful Chiagoziem Onwuegbuche, Adelodun Olaoluwa, Anca Delia Jurcut, Liliana Pasquale
Ransomware remains a critical threat to cybersecurity, yet publicly available datasets for training machine learning-based ransomware detection models are scarce and often have limited sample size, diversity, and reproducibility. In this paper, we introduce MLRan, a behavioural ransomware dataset, comprising over 4,800 samples across 64 ransomware families and a balanced set of goodware samples. The samples span from 2006 to 2024 and encompass the four major types of ransomware: locker, crypto, ransomware-as-a-service, and modern variants. We also propose guidelines (GUIDE-MLRan), inspired by previous work, for constructing high-quality behavioural ransomware datasets, which informed the curation of our dataset. We evaluated the ransomware detection performance of several machine learning (ML) models using MLRan. For this purpose, we performed feature selection by conducting mutual information filtering to reduce the initial 6.4 million features to 24,162, followed by recursive feature elimination, yielding 483 highly informative features. The ML models achieved an accuracy, precision and recall of up to 98.7%, 98.9%, 98.5%, respectively. Using SHAP and LIME, we identified critical indicators of malicious behaviour, including registry tampering, strings, and API misuse. The dataset and source code for feature extraction, selection, ML training, and evaluation are available publicly to support replicability and encourage future research, which can be found at https://github.com/faithfulco/mlran.
nan
Article 1875
Title@2025-05-24 (6): An Artificial Intelligence Model for Early Stage Breast Cancer Detection from Biopsy Images
Title: An Artificial Intelligence Model for Early Stage Breast Cancer Detection from Biopsy Images | Ein Modell der Künstlichen Intelligenz zur Früherkennung von Brustkrebs aus Biopsiebildern | 早期从生物心理图像中检测乳腺癌的人工智能模型 2505.20332v1 |
Authors: Neil Chaudhary, Zaynah Dhunny
Accurate identification of breast cancer types plays a critical role in guiding treatment decisions and improving patient outcomes. This paper presents an artificial intelligence enabled tool designed to aid in the identification of breast cancer types using histopathological biopsy images. Traditionally additional tests have to be done on women who are detected with breast cancer to find out the types of cancer it is to give the necessary cure. Those tests are not only invasive but also delay the initiation of treatment and increase patient burden. The proposed model utilizes a convolutional neural network (CNN) architecture to distinguish between benign and malignant tissues as well as accurate subclassification of breast cancer types. By preprocessing the images to reduce noise and enhance features, the model achieves reliable levels of classification performance. Experimental results on such datasets demonstrate the model’s effectiveness, outperforming several existing solutions in terms of accuracy, precision, recall, and F1-score. The study emphasizes the potential of deep learning techniques in clinical diagnostics and offers a promising tool to assist pathologists in breast cancer classification.
nan
Article 1876
Title@2025-05-24 (6): Exemplar-Free Continual Learning for State Space Models
Title: Exemplar-Free Continual Learning for State Space Models | Beispielfreies kontinuierliches Lernen für Staatsraummodelle | 国家空间模型免税免费持续学习 2505.18604v1 |
Authors: Isaac Ning Lee, Leila Mahmoodi, Trung Le, Mehrtash Harandi
State-Space Models (SSMs) excel at capturing long-range dependencies with structured recurrence, making them well-suited for sequence modeling. However, their evolving internal states pose challenges in adapting them under Continual Learning (CL). This is particularly difficult in exemplar-free settings, where the absence of prior data leaves updates to the dynamic SSM states unconstrained, resulting in catastrophic forgetting. To address this, we propose Inf-SSM, a novel and simple geometry-aware regularization method that utilizes the geometry of the infinite-dimensional Grassmannian to constrain state evolution during CL. Unlike classical continual learning methods that constrain weight updates, Inf-SSM regularizes the infinite-horizon evolution of SSMs encoded in their extended observability subspace. We show that enforcing this regularization requires solving a matrix equation known as the Sylvester equation, which typically incurs $\mathcal{O}(n^3)$ complexity. We develop a $\mathcal{O}(n^2)$ solution by exploiting the structure and properties of SSMs. This leads to an efficient regularization mechanism that can be seamlessly integrated into existing CL methods. Comprehensive experiments on challenging benchmarks, including ImageNet-R and Caltech-256, demonstrate a significant reduction in forgetting while improving accuracy across sequential tasks.
nan
Article 1877
Title@2025-05-24 (6): LLM-Meta-SR: Learning to Evolve Selection Operators for Symbolic Regression
Title: LLM-Meta-SR: Learning to Evolve Selection Operators for Symbolic Regression | LLM-Meta-SR: Lernen, Auswahloperatoren für symbolische Regression zu entwickeln | LLM-Meta-SR:学习如何向演进中的反射反射选择操作员学习 2505.18602v1 |
Authors: Hengzhe Zhang, Qi Chen, Bing Xue, Mengjie Zhang
Large language models (LLMs) have revolutionized algorithm development, yet their application in symbolic regression, where algorithms automatically discover symbolic expressions from data, remains constrained and is typically designed manually by human experts. In this paper, we propose a learning-to-evolve framework that enables LLMs to automatically design selection operators for evolutionary symbolic regression algorithms. We first identify two key limitations in existing LLM-based algorithm evolution techniques: code bloat and a lack of semantic guidance. Bloat results in unnecessarily complex components, and the absence of semantic awareness can lead to ineffective exchange of useful code components, both of which can reduce the interpretability of the designed algorithm or hinder evolutionary learning progress. To address these issues, we enhance the LLM-based evolution framework for meta symbolic regression with two key innovations: bloat control and a complementary, semantics-aware selection operator. Additionally, we embed domain knowledge into the prompt, enabling the LLM to generate more effective and contextually relevant selection operators. Our experimental results on symbolic regression benchmarks show that LLMs can devise selection operators that outperform nine expert-designed baselines, achieving state-of-the-art performance. This demonstrates that LLMs can exceed expert-level algorithm design for symbolic regression.
nan
Article 1878
Title@2025-05-24 (6): Learning to Program Quantum Measurements for Machine Learning
Title: Learning to Program Quantum Measurements for Machine Learning | Lernen, Quantenmessungen für maschinelles Lernen zu programmieren | 学习机器学习量度方案 2505.13525v2 |
Authors: Samuel Yen-Chi Chen, Huan-Hsin Tseng, Hsin-Yi Lin, Shinjae Yoo
The rapid advancements in quantum computing (QC) and machine learning (ML) have sparked significant interest, driving extensive exploration of quantum machine learning (QML) algorithms to address a wide range of complex challenges. The development of high-performance QML models requires expert-level expertise, presenting a key challenge to the widespread adoption of QML. Critical obstacles include the design of effective data encoding strategies and parameterized quantum circuits, both of which are vital for the performance of QML models. Furthermore, the measurement process is often neglected-most existing QML models employ predefined measurement schemes that may not align with the specific requirements of the targeted problem. We propose an innovative framework that renders the observable of a quantum system-specifically, the Hermitian matrix-trainable. This approach employs an end-to-end differentiable learning framework, enabling simultaneous optimization of the neural network used to program the parameterized observables and the standard quantum circuit parameters. Notably, the quantum observable parameters are dynamically programmed by the neural network, allowing the observables to adapt in real time based on the input data stream. Through numerical simulations, we demonstrate that the proposed method effectively programs observables dynamically within variational quantum circuits, achieving superior results compared to existing approaches. Notably, it delivers enhanced performance metrics, such as higher classification accuracy, thereby significantly improving the overall effectiveness of QML models.
nan
Article 1879
Title@2025-05-24 (6): Sum of Squares Circuits
Title: Sum of Squares Circuits | Summe der Quadrate Schaltungen | 平方电路总和 2408.11778v3 |
Authors: Lorenzo Loconte, Stefan Mengel, Antonio Vergari
Designing expressive generative models that support exact and efficient inference is a core question in probabilistic ML. Probabilistic circuits (PCs) offer a framework where this tractability-vs-expressiveness trade-off can be analyzed theoretically. Recently, squared PCs encoding subtractive mixtures via negative parameters have emerged as tractable models that can be exponentially more expressive than monotonic PCs, i.e., PCs with positive parameters only. In this paper, we provide a more precise theoretical characterization of the expressiveness relationships among these models. First, we prove that squared PCs can be less expressive than monotonic ones. Second, we formalize a novel class of PCs – sum of squares PCs – that can be exponentially more expressive than both squared and monotonic PCs. Around sum of squares PCs, we build an expressiveness hierarchy that allows us to precisely unify and separate different tractable model classes such as Born Machines and PSD models, and other recently introduced tractable probabilistic models by using complex parameters. Finally, we empirically show the effectiveness of sum of squares circuits in performing distribution estimation.
nan
Article 1880
Title@2025-05-24 (6): LLMs for Supply Chain Management
Title: LLMs for Supply Chain Management | LLMs für Supply Chain Management | 供应链管理LLMs 2505.18597v1 |
Authors: Haojie Wang, Jiuyun Jiang, L. Jeff Hong, Guangxin Jiang
The development of large language models (LLMs) has provided new tools for research in supply chain management (SCM). In this paper, we introduce a retrieval-augmented generation (RAG) framework that dynamically integrates external knowledge into the inference process, and develop a domain-specialized SCM LLM, which demonstrates expert-level competence by passing standardized SCM examinations and beer game tests. We further employ the use of LLMs to conduct horizontal and vertical supply chain games, in order to analyze competition and cooperation within supply chains. Our experiments show that RAG significantly improves performance on SCM tasks. Moreover, game-theoretic analysis reveals that the LLM can reproduce insights from the classical SCM literature, while also uncovering novel behaviors and offering fresh perspectives on phenomena such as the bullwhip effect. This paper opens the door for exploring cooperation and competition for complex supply chain network through the lens of LLMs.
nan
Article 1881
Title@2025-05-24 (6): MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations
Title: MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations | MisoDICE: Multi-Agent-Imitation aus nicht gekennzeichneten Mixed-Quality-Demonstrationen | MisoDICE:从未贴标签的混合质量示范中多机构吸收 2505.18595v1 |
Authors: The Viet Bui, Tien Mai, Hong Thanh Nguyen
We study offline imitation learning (IL) in cooperative multi-agent settings, where demonstrations have unlabeled mixed quality - containing both expert and suboptimal trajectories. Our proposed solution is structured in two stages: trajectory labeling and multi-agent imitation learning, designed jointly to enable effective learning from heterogeneous, unlabeled data. In the first stage, we combine advances in large language models and preference-based reinforcement learning to construct a progressive labeling pipeline that distinguishes expert-quality trajectories. In the second stage, we introduce MisoDICE, a novel multi-agent IL algorithm that leverages these labels to learn robust policies while addressing the computational complexity of large joint state-action spaces. By extending the popular single-agent DICE framework to multi-agent settings with a new value decomposition and mixing architecture, our method yields a convex policy optimization objective and ensures consistency between global and local policies. We evaluate MisoDICE on multiple standard multi-agent RL benchmarks and demonstrate superior performance, especially when expert data is scarce.
nan
Article 1882
Title@2025-05-24 (6): Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks
Title: Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks | Bayesian Meta-Reinforcement Learning mit Laplace Variational Recurrent Networks | 采用拉位变换经常网络加强Bayesian Met-加强学习 2505.18591v1 |
Authors: Joery A. de Vries, Jinke He, Mathijs M. de Weerdt, Matthijs T. J. Spaan
Meta-reinforcement learning trains a single reinforcement learning agent on a distribution of tasks to quickly generalize to new tasks outside of the training set at test time. From a Bayesian perspective, one can interpret this as performing amortized variational inference on the posterior distribution over training tasks. Among the various meta-reinforcement learning approaches, a common method is to represent this distribution with a point-estimate using a recurrent neural network. We show how one can augment this point estimate to give full distributions through the Laplace approximation, either at the start of, during, or after learning, without modifying the base model architecture. With our approximation, we are able to estimate distribution statistics (e.g., the entropy) of non-Bayesian agents and observe that point-estimate based methods produce overconfident estimators while not satisfying consistency. Furthermore, when comparing our approach to full-distribution based learning of the task posterior, our method performs on par with variational baselines while having much fewer parameters.
nan
Article 1883
Title@2025-05-24 (6): CiRL: Open-Source Environments for Reinforcement Learning in Circular Economy and Net Zero
Title: CiRL: Open-Source Environments for Reinforcement Learning in Circular Economy and Net Zero | CiRL: Open-Source-Umgebungen für verstärktes Lernen in der Kreislaufwirtschaft und Net Zero | CIRL: 在循环经济和净零中加强学习的开放源环境 2505.21536v1 |
Authors: Federico Zocco, Andrea Corti, Monica Malvezzi
The demand of finite raw materials will keep increasing as they fuel modern society. Simultaneously, solutions for stopping carbon emissions in the short term are not available, thus making the net zero target extremely challenging to achieve at scale. The circular economy (CE) paradigm is gaining attention as a solution to address climate change and the uncertainties of supplies of critical materials. Hence, in this paper, we introduce CiRL, a deep reinforcement learning (DRL) library of environments focused on the circularity of both solid and fluid materials. The integration of DRL into the design of material circularity is possible thanks to the formalism of thermodynamical material networks, which is underpinned by compartmental dynamical thermodynamics. Along with the focus on circularity, this library has three more features: the new CE-oriented environments are in the state-space form, which is typically used in dynamical systems analysis and control designs; it is based on a state-of-the-art Python library of DRL algorithms, namely, Stable-Baselines3; and it is developed in Google Colaboratory to be accessible to researchers from different disciplines and backgrounds as is often the case for circular economy researchers and engineers. CiRL is publicly available.
nan
Article 1884
Title@2025-05-24 (6): Model Extrapolation Expedites Alignment
Title: Model Extrapolation Expedites Alignment | Modell Extrapolation Expeditionen Ausrichtung | 模型外推快速调整 2404.16792v4 |
Authors: Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng
Given the high computational cost of preference alignment training of large language models (LLMs), exploring efficient methods to reduce the training overhead remains an important and compelling research problem. Motivated by the observation that alignment training typically involves only small parameter changes without injecting new knowledge into models, we propose a straightforward method called ExPO (model extrapolation) to expedite LLMs’ alignment with human preferences. Given a partially-trained model and its initial SFT checkpoint, ExPO improves the implicit optimization objective of alignment training by simply amplifying the parameter change based on a first-order approximation, without any additional training overhead. Through controlled experiments, we demonstrate that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one. Moreover, we show that ExPO notably improves existing open-source LLMs (ranging from 1.8B to 70B parameters) on the leading AlpacaEval 2.0 and MT-Bench benchmarks, which highlights ExPO’s broader utility in efficiently enhancing LLM alignment.
nan
Article 1885
Title@2025-05-24 (6): Continuous Multi-Task Pre-training for Malicious URL Detection and Webpage Classification
Title: Continuous Multi-Task Pre-training for Malicious URL Detection and Webpage Classification | Kontinuierliches Multi-Task-Vortraining für bösartige URL-Erkennung und Webpage-Klassifikation | 恶意URL探测和网页分类连续多任务连续培训 2402.11495v2 |
Authors: Yujie Li, Yiwei Liu, Peiyue Li, Yifan Jia, Yanbin Wang
Malicious URL detection and webpage classification are critical tasks in cybersecurity and information management. In recent years, extensive research has explored using BERT or similar language models to replace traditional machine learning methods for detecting malicious URLs and classifying webpages. While previous studies show promising results, they often apply existing language models to these tasks without accounting for the inherent differences in domain data (e.g., URLs being loosely structured and semantically sparse compared to text), leaving room for performance improvement. Furthermore, current approaches focus on single tasks and have not been tested in multi-task scenarios. To address these challenges, we propose urlBERT, a pre-trained URL encoder leveraging Transformer to encode foundational knowledge from billions of unlabeled URLs. To achieve it, we propose to use 5 unsupervised pretraining tasks to capture multi-level information of URL lexical, syntax, and semantics, and generate contrastive and adversarial representations. Furthermore, to avoid inter-pre-training competition and interference, we proposed a grouped sequential learning method to ensure effective training across multi-tasks. Finally, we leverage a two-stage fine-tuning approach to improve the training stability and efficiency of the task model. To assess the multitasking potential of urlBERT, we fine-tune the task model in both single-task and multi-task modes. The former creates a classification model for a single task, while the latter builds a classification model capable of handling multiple tasks. We evaluate urlBERT on three downstream tasks: phishing URL detection, advertising URL detection, and webpage classification. The results demonstrate that urlBERT outperforms standard pre-trained models, and its multi-task mode is capable of addressing the real-world demands of multitasking.
nan
Article 1886
Title@2025-05-24 (6): REAL: Representation Enhanced Analytic Learning for Exemplar-free Class-incremental Learning
Title: REAL: Representation Enhanced Analytic Learning for Exemplar-free Class-incremental Learning | REAL: Darstellungsverstärktes analytisches Lernen für exemplarisch-freies Klassen-inkrementelles Lernen | 实际:为免世禁初级入门学习加强代表性分析学习 2403.13522v2 |
Authors: Run He, Di Fang, Yizhu Chen, Kai Tong, Cen Chen, Yi Wang, Lap-pui Chau, Huiping Zhuang
Exemplar-free class-incremental learning (EFCIL) aims to mitigate catastrophic forgetting in class-incremental learning (CIL) without available historical training samples as exemplars. Compared with its exemplar-based CIL counterpart that stores exemplars, EFCIL suffers more from forgetting issues. Recently, a new EFCIL branch named Analytic Continual Learning (ACL) introduces a gradient-free paradigm via Recursive Least-Square, achieving a forgetting-resistant classifier training with a frozen backbone during CIL. However, existing ACL suffers from ineffective representations and insufficient utilization of backbone knowledge. In this paper, we propose a representation-enhanced analytic learning (REAL) to address these problems. To enhance the representation, REAL constructs a dual-stream base pretraining followed by representation enhancing distillation process. The dual-stream base pretraining combines self-supervised contrastive learning for general features and supervised learning for class-specific knowledge, followed by the representation enhancing distillation to merge both streams, enhancing representations for subsequent CIL paradigm. To utilize more knowledge from the backbone, REAL presents a feature fusion buffer to multi-layer backbone features, providing informative features for the subsequent classifier training. Our method can be incorporated into existing ACL techniques and provides more competitive performance. Empirical results demonstrate that, REAL achieves state-of-the-art performance on CIFAR-100, ImageNet-100 and ImageNet-1k benchmarks, outperforming exemplar-free methods and rivaling exemplar-based approaches.
nan
Article 1887
Title@2025-05-24 (6): AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
Title: AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models | AFL: Ein eingleisiger analytischer Ansatz für das Federated Learning mit vortrainierten Modellen | ACL: 采用培训前模式的联邦学习单一分析方法 2405.16240v2 |
Authors: Run He, Kai Tong, Di Fang, Han Sun, Haoran Li, Tianyi Chen, Ziqian Zeng, Huiping Zhuang
In this paper, we introduce analytic federated learning (AFL), a new training paradigm that brings analytical (i.e., closed-form) solutions to the federated learning (FL) with pre-trained models. Our AFL draws inspiration from analytic learning – a gradient-free technique that trains neural networks with analytical solutions in one epoch. In the local client training stage, the AFL facilitates a one-epoch training, eliminating the necessity for multi-epoch updates. In the aggregation stage, we derive an absolute aggregation (AA) law. This AA law allows a single-round aggregation, reducing heavy communication overhead and achieving fast convergence by removing the need for multiple aggregation rounds. More importantly, the AFL exhibits a property that $\textit{invariance to data partitioning}$, meaning that regardless of how the full dataset is distributed among clients, the aggregated result remains identical. This could spawn various potentials, such as data heterogeneity invariance and client-number invariance. We conduct experiments across various FL settings including extremely non-IID ones, and scenarios with a large number of clients (e.g., $\ge 1000$). In all these settings, our AFL constantly performs competitively while existing FL techniques encounter various obstacles. Our codes are available at https://github.com/ZHUANGHP/Analytic-federated-learning.
nan
Article 1888
Title@2025-05-24 (6): Mechanical in-sensor computing: a programmable meta-sensor for structural damage classification without external electronic power
Title: Mechanical in-sensor computing: a programmable meta-sensor for structural damage classification without external electronic power | Mechanische In-Sensor-Computing: ein programmierbarer Meta-Sensor für die Klassifizierung von Strukturschäden ohne externe elektronische Leistung | 传感器中的机械内传感器计算:可编程的元传感器,用于结构损害分类,无外部电子电源 2505.18579v1 |
Authors: Tingpeng Zhang, Xuzhang Peng, Mingyuan Zhou, Guobiao Hu, Zhilu Lai
Structural health monitoring (SHM) involves sensor deployment, data acquisition, and data interpretation, commonly implemented via a tedious wired system. The information processing in current practice majorly depends on electronic computers, albeit with universal applications, delivering challenges such as high energy consumption and low throughput due to the nature of digital units. In recent years, there has been a renaissance interest in shifting computations from electronic computing units to the use of real physical systems, a concept known as physical computation. This approach provides the possibility of thinking out of the box for SHM, seamlessly integrating sensing and computing into a pure-physical entity, without relying on external electronic power supplies, thereby properly coping with resource-restricted scenarios. The latest advances of metamaterials (MM) hold great promise for this proactive idea. In this paper, we introduce a programmable metamaterial-based sensor (termed as MM-sensor) for physically processing structural vibration information to perform specific SHM tasks, such as structural damage warning (binary classification) in this initiation, without the need for further information processing or resource-consuming, that is, the data collection and analysis are completed in-situ at the sensor level. We adopt the configuration of a locally resonant metamaterial plate (LRMP) to achieve the first fabrication of the MM-sensor. We take advantage of the bandgap properties of LRMP to physically differentiate the dynamic behavior of structures before and after damage. By inversely designing the geometric parameters, our current approach allows for adjustments to the bandgap features. This is effective for engineering systems with a first natural frequency ranging from 9.54 Hz to 81.86 Hz.
nan
Article 1889
Title@2025-05-24 (6): Trust-Region Twisted Policy Improvement
Title: Trust-Region Twisted Policy Improvement | Vertrauensregion verdrehte politische Verbesserung | 改变政策改进 2504.06048v3 |
Authors: Joery A. de Vries, Jinke He, Yaniv Oren, Matthijs T. J. Spaan
Monte-Carlo tree search (MCTS) has driven many recent breakthroughs in deep reinforcement learning (RL). However, scaling MCTS to parallel compute has proven challenging in practice which has motivated alternative planners like sequential Monte-Carlo (SMC). Many of these SMC methods adopt particle filters for smoothing through a reformulation of RL as a policy inference problem. Yet, persisting design choices of these particle filters often conflict with the aim of online planning in RL, which is to obtain a policy improvement at the start of planning. Drawing inspiration from MCTS, we tailor SMC planners specifically for RL by improving data generation within the planner through constrained action sampling and explicit terminal state handling, as well as improving policy and value target estimation. This leads to our Trust-Region Twisted SMC (TRT-SMC), which shows improved runtime and sample-efficiency over baseline MCTS and SMC methods in both discrete and continuous domains.
nan
Article 1890
Title@2025-05-24 (6): TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
Title: TabICL: A Tabular Foundation Model for In-Context Learning on Large Data | TabICL: Ein tabellarisches Grundlagenmodell für das In-Context-Lernen mit großen Datenmengen | TabICL: 大型数据内部知识学习表示基础模型 2502.05564v2 |
Authors: Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan
The long-standing dominance of gradient-boosted decision trees on tabular data is currently challenged by tabular foundation models using In-Context Learning (ICL): setting the training data as context for the test data and predicting in a single forward pass without parameter updates. While TabPFNv2 foundation model excels on tables with up to 10K samples, its alternating column- and row-wise attentions make handling large training sets computationally prohibitive. So, can ICL be effectively scaled and deliver a benefit for larger tables? We introduce TabICL, a tabular foundation model for classification, pretrained on synthetic datasets with up to 60K samples and capable of handling 500K samples on affordable resources. This is enabled by a novel two-stage architecture: a column-then-row attention mechanism to build fixed-dimensional embeddings of rows, followed by a transformer for efficient ICL. Across 200 classification datasets from the TALENT benchmark, TabICL is on par with TabPFNv2 while being systematically faster (up to 10 times), and significantly outperforms all other approaches. On 53 datasets with over 10K samples, TabICL surpasses both TabPFNv2 and CatBoost, demonstrating the potential of ICL for large data. Pretraining code, inference code, and pre-trained models are available at https://github.com/soda-inria/tabicl.
nan
Article 1891
Title@2025-05-24 (6): DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandit Environments
Title: DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandit Environments | DAL: Ein praktisches Prior-Free Black-Box Framework für nicht-stationäre Bandit-Umgebungen | DAL:非高度强盗环境实际的、事先免费的黑盒框架 2501.19401v2 |
Authors: Argyrios Gerogiannis, Yu-Han Huang, Subhonmesh Bose, Venugopal V. Veeravalli
We introduce a practical, black-box framework termed Detection Augmenting Learning (DAL) for the problem of non-stationary bandits without prior knowledge of the underlying non-stationarity. DAL is modular, accepting any stationary bandit algorithm as input and augmenting it with a change detector. Our approach is applicable to all common parametric and non-parametric bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses current state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL’s strong empirical performance on piecewise stationary and drift settings, complemented by thorough experimental validation.
nan
Article 1892
Title@2025-05-24 (6): Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks
Title: Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks | Konvergenzanalyse des natürlichen Gradientenabstiegs für überparameterisierte physikinformierte neurale Netzwerke | 超参数物理内成形神经神经网络的自然梯分源相趋同分析 2408.00573v3 |
Authors: Xianliang Xu, Ting Du, Wang Kong, Bin Shan, Ye Li, Zhongyi Huang
In the context of over-parameterization, there is a line of work demonstrating that randomly initialized (stochastic) gradient descent (GD) converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. However, the learning rate of GD for training two-layer neural networks exhibits poor dependence on the sample size and the Gram matrix, leading to a slow training process. In this paper, we show that for training two-layer $\text{ReLU}^3$ Physics-Informed Neural Networks (PINNs), the learning rate can be improved from $\mathcal{O}(\lambda_0)$ to $\mathcal{O}(1/|\bm{H}^{\infty}|_2)$, implying that GD actually enjoys a faster convergence rate. Despite such improvements, the convergence rate is still tied to the least eigenvalue of the Gram matrix, leading to slow convergence. We then develop the positive definiteness of Gram matrices with general smooth activation functions and provide the convergence analysis of natural gradient descent (NGD) in training two-layer PINNs, demonstrating that the learning rate can be $\mathcal{O}(1)$ and at this rate, the convergence rate is independent of the Gram matrix. In particular, for smooth activation functions, the convergence rate of NGD is quadratic. Numerical experiments are conducted to verify our theoretical results.
nan
Article 1893
Title@2025-05-24 (6): Autocomp: LLM-Driven Code Optimization for Tensor Accelerators
Title: Autocomp: LLM-Driven Code Optimization for Tensor Accelerators | Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger | 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v1 |
Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao
Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today’s computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.
nan
Article 1894
Title@2025-05-24 (6): Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs
Title: Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs | Steigerung der Effizienz und Exploration bei der Stärkung des Lernens für LLMs | 提高LLMM 强化学习的效率和探索 2505.18573v1 |
Authors: Mengqi Liao, Xiangyu Xi, Ruinian Chen, Jia Leng, Yangen Hu, Ke Zeng, Shuai Liu, Huaiyu Wan
Reasoning large language models (LLMs) excel in complex tasks, which has drawn significant attention to reinforcement learning (RL) for LLMs. However, existing approaches allocate an equal number of rollouts to all questions during the RL process, which is inefficient. This inefficiency stems from the fact that training on simple questions yields limited gains, whereas more rollouts are needed for challenging questions to sample correct answers. Furthermore, while RL improves response precision, it limits the model’s exploration ability, potentially resulting in a performance cap below that of the base model prior to RL. To address these issues, we propose a mechanism for dynamically allocating rollout budgets based on the difficulty of the problems, enabling more efficient RL training. Additionally, we introduce an adaptive dynamic temperature adjustment strategy to maintain the entropy at a stable level, thereby encouraging sufficient exploration. This enables LLMs to improve response precision while preserving their exploratory ability to uncover potential correct pathways. The code and data is available on: https://github.com/LiaoMengqi/E3-RL4LLMs
nan
Article 1895
Title@2025-05-24 (6): VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
Title: VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis | VISTA: Vision-Language-Schlussfolgerung für eine trainingsfreie Analyse der Stock-Zeitreihen | VISTA:无培训-库存无培训-时间-系列分析的远景-语言推断 2505.18570v1 |
Authors: Tina Khezresmaeilzadeh, Parsa Razmara, Seyedarmin Azizi, Mohammad Erfan Sadeghi, Erfan Baghaei Portaghloo
Stock price prediction remains a complex and high-stakes task in financial analysis, traditionally addressed using statistical models or, more recently, language models. In this work, we introduce VISTA (Vision-Language Inference for Stock Time-series Analysis), a novel, training-free framework that leverages Vision-Language Models (VLMs) for multi-modal stock forecasting. VISTA prompts a VLM with both textual representations of historical stock prices and their corresponding line charts to predict future price values. By combining numerical and visual modalities in a zero-shot setting and using carefully designed chain-of-thought prompts, VISTA captures complementary patterns that unimodal approaches often miss. We benchmark VISTA against standard baselines, including ARIMA and text-only LLM-based prompting methods. Experimental results show that VISTA outperforms these baselines by up to 89.83%, demonstrating the effectiveness of multi-modal inference for stock time-series analysis and highlighting the potential of VLMs in financial forecasting tasks without requiring task-specific training.
nan
Article 1896
Title@2025-05-24 (6): Learning without Isolation: Pathway Protection for Continual Learning
Title: Learning without Isolation: Pathway Protection for Continual Learning | Lernen ohne Isolation: Pfadschutz für kontinuierliches Lernen | 无孤立的学习:持续学习的路径保护 2505.18568v1 |
Authors: Zhikang Chen, Abudukelimu Wuerkaixi, Sen Cui, Haoxuan Li, Ding Li, Jingfeng Zhang, Bo Han, Gang Niu, Houfang Liu, Yi Yang, Sifan Yang, Changshui Zhang, Tianling Ren
Deep networks are prone to catastrophic forgetting during sequential task learning, i.e., losing the knowledge about old tasks upon learning new tasks. To this end, continual learning(CL) has emerged, whose existing methods focus mostly on regulating or protecting the parameters associated with the previous tasks. However, parameter protection is often impractical, since the size of parameters for storing the old-task knowledge increases linearly with the number of tasks, otherwise it is hard to preserve the parameters related to the old-task knowledge. In this work, we bring a dual opinion from neuroscience and physics to CL: in the whole networks, the pathways matter more than the parameters when concerning the knowledge acquired from the old tasks. Following this opinion, we propose a novel CL framework, learning without isolation(LwI), where model fusion is formulated as graph matching and the pathways occupied by the old tasks are protected without being isolated. Thanks to the sparsity of activation channels in a deep network, LwI can adaptively allocate available pathways for a new task, realizing pathway protection and addressing catastrophic forgetting in a parameter-efficient manner. Experiments on popular benchmark datasets demonstrate the superiority of the proposed LwI.
nan
Article 1897
Title@2025-05-24 (6): ReflectDiffu:Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework
Title: ReflectDiffu:Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework | ReflectDiffu: Reflect zwischen emotional-intent Ansteckung und Mimicry für Empathetic Response Generation über ein RL-Diffusion Framework | 反省:通过RL-扩散框架,对情感-情感内聚变和Mmimimicry之间的反射,以便产生同情性反应 2409.10289v3 |
Authors: Jiahao Yuan, Zixiang Di, Zhiqing Cui, Guisong Yang, Usman Naseem
Empathetic response generation necessitates the integration of emotional and intentional dynamics to foster meaningful interactions. Existing research either neglects the intricate interplay between emotion and intent, leading to suboptimal controllability of empathy, or resorts to large language models (LLMs), which incur significant computational overhead. In this paper, we introduce ReflectDiffu, a lightweight and comprehensive framework for empathetic response generation. This framework incorporates emotion contagion to augment emotional expressiveness and employs an emotion-reasoning mask to pinpoint critical emotional elements. Additionally, it integrates intent mimicry within reinforcement learning for refinement during diffusion. By harnessing an intent twice reflect mechanism of Exploring-Sampling-Correcting, ReflectDiffu adeptly translates emotional decision-making into precise intent actions, thereby addressing empathetic response misalignments stemming from emotional misrecognition. Through reflection, the framework maps emotional states to intents, markedly enhancing both response empathy and flexibility. Comprehensive experiments reveal that ReflectDiffu outperforms existing models regarding relevance, controllability, and informativeness, achieving state-of-the-art results in both automatic and human evaluations.
nan
Article 1898
Title@2025-05-24 (6): Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods
Title: Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods | Learning Fluid-Struktur-Interaktion Dynamik mit physikinformierten Neuronalen Netzwerken und eingetauchten Grenzmethoden | 与物理内成形神经网络和混合边界方法的互动动态 2505.18565v1 |
Authors: Afrah Farea, Saiful Khan, Reza Daryani, Emre Cenk Ersan, Mustafa Serdar Celebi
We introduce neural network architectures that combine physics-informed neural networks (PINNs) with the immersed boundary method (IBM) to solve fluid-structure interaction (FSI) problems. Our approach features two distinct architectures: a Single-FSI network with a unified parameter space, and an innovative Eulerian-Lagrangian network that maintains separate parameter spaces for fluid and structure domains. We study each architecture using standard Tanh and adaptive B-spline activation functions. Empirical studies on a 2D cavity flow problem involving a moving solid structure show that the Eulerian-Lagrangian architecture performs significantly better. The adaptive B-spline activation further enhances accuracy by providing locality-aware representation near boundaries. While our methodology shows promising results in predicting the velocity field, pressure recovery remains challenging due to the absence of explicit force-coupling constraints in the current formulation. Our findings underscore the importance of domain-specific architectural design and adaptive activation functions for modeling FSI problems within the PINN framework.
nan
Article 1899
Title@2025-05-24 (6): Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning
Title: Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning | Gelenk-Stochastische-Annäherung Random Fields mit Anwendung auf semi-überwachtes Lernen | 应用到半监督学习的混合随机场 2505.20330v1 |
Authors: Yunfu Song, Zhijian Ou
Our examination of deep generative models (DGMs) developed for semi-supervised learning (SSL), mainly GANs and VAEs, reveals two problems. First, mode missing and mode covering phenomenons are observed in genertion with GANs and VAEs. Second, there exists an awkward conflict between good classification and good generation in SSL by employing directed generative models. To address these problems, we formally present joint-stochastic-approximation random fields (JRFs) – a new family of algorithms for building deep undirected generative models, with application to SSL. It is found through synthetic experiments that JRFs work well in balancing mode covering and mode missing, and match the empirical data distribution well. Empirically, JRFs achieve good classification results comparable to the state-of-art methods on widely adopted datasets – MNIST, SVHN, and CIFAR-10 in SSL, and simultaneously perform good generation.
nan
Article 1900
Title@2025-05-24 (6): Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning
Title: Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning | Gelenkstochastische Approximation Autoencoder mit Anwendung auf semi-überwachtes Lernen | 应用到半监督学习的 联合研究- 接近自动校方 2505.18558v1 |
Authors: Wenbo He, Zhijian Ou
Our examination of existing deep generative models (DGMs), including VAEs and GANs, reveals two problems. First, their capability in handling discrete observations and latent codes is unsatisfactory, though there are interesting efforts. Second, both VAEs and GANs optimize some criteria that are indirectly related to the data likelihood. To address these problems, we formally present Joint-stochastic-approximation (JSA) autoencoders - a new family of algorithms for building deep directed generative models, with application to semi-supervised learning. The JSA learning algorithm directly maximizes the data log-likelihood and simultaneously minimizes the inclusive KL divergence the between the posteriori and the inference model. We provide theoretical results and conduct a series of experiments to show its superiority such as being robust to structure mismatch between encoder and decoder, consistent handling of both discrete and continuous variables. Particularly we empirically show that JSA autoencoders with discrete latent space achieve comparable performance to other state-of-the-art DGMs with continuous latent space in semi-supervised tasks over the widely adopted datasets - MNIST and SVHN. To the best of our knowledge, this is the first demonstration that discrete latent variable models are successfully applied in the challenging semi-supervised tasks.
nan
Article 1901
Title@2025-05-24 (6): LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
Title: LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis | LAMDA: Ein Longitudinal Android Malware Benchmark für Konzept Drift Analyse | LAMDA: 关于概念漂流分析的纵向和机器人毛毛虫基准 2505.18551v1 |
Authors: Md Ahsanul Haque, Ismail Hossain, Md Mahmuduzzaman Kamol, Md Jahangir Alam, Suresh Kumar Amalapuram, Sajedul Talukder, Mohammad Saidur Rahman
Machine learning (ML)-based malware detection systems often fail to account for the dynamic nature of real-world training and test data distributions. In practice, these distributions evolve due to frequent changes in the Android ecosystem, adversarial development of new malware families, and the continuous emergence of both benign and malicious applications. Prior studies have shown that such concept drift – distributional shifts in benign and malicious samples, leads to significant degradation in detection performance over time. Despite the practical importance of this issue, existing datasets are often outdated and limited in temporal scope, diversity of malware families, and sample scale, making them insufficient for the systematic evaluation of concept drift in malware detection. To address this gap, we present LAMDA, the largest and most temporally diverse Android malware benchmark to date, designed specifically for concept drift analysis. LAMDA spans 12 years (2013-2025, excluding 2015), includes over 1 million samples (approximately 37% labeled as malware), and covers 1,380 malware families and 150,000 singleton samples, reflecting the natural distribution and evolution of real-world Android applications. We empirically demonstrate LAMDA’s utility by quantifying the performance degradation of standard ML models over time and analyzing feature stability across years. As the most comprehensive Android malware dataset to date, LAMDA enables in-depth research into temporal drift, generalization, explainability, and evolving detection challenges. The dataset and code are available at: https://iqsec-lab.github.io/LAMDA/.
nan
Article 1902
Title@2025-05-24 (6): ReflectGAN: Modeling Vegetation Effects for Soil Carbon Estimation from Satellite Imagery
Title: ReflectGAN: Modeling Vegetation Effects for Soil Carbon Estimation from Satellite Imagery | ReflectGAN: Modellierung von Vegetationseffekten für Bodenkohlenstoffschätzungen aus Satellitenbildern | 反射GAN:从卫星图像中模拟土壤碳估计的植被效应 2505.18546v1 |
Authors: Dristi Datta, Manoranjan Paul, Manzur Murshed, Shyh Wei Teng, Leigh M. Schmidtke
Soil organic carbon (SOC) is a critical indicator of soil health, but its accurate estimation from satellite imagery is hindered in vegetated regions due to spectral contamination from plant cover, which obscures soil reflectance and reduces model reliability. This study proposes the Reflectance Transformation Generative Adversarial Network (ReflectGAN), a novel paired GAN-based framework designed to reconstruct accurate bare soil reflectance from vegetated soil satellite observations. By learning the spectral transformation between vegetated and bare soil reflectance, ReflectGAN facilitates more precise SOC estimation under mixed land cover conditions. Using the LUCAS 2018 dataset and corresponding Landsat 8 imagery, we trained multiple learning-based models on both original and ReflectGAN-reconstructed reflectance inputs. Models trained on ReflectGAN outputs consistently outperformed those using existing vegetation correction methods. For example, the best-performing model (RF) achieved an $R^2$ of 0.54, RMSE of 3.95, and RPD of 2.07 when applied to the ReflectGAN-generated signals, representing a 35\% increase in $R^2$, a 43\% reduction in RMSE, and a 43\% improvement in RPD compared to the best existing method (PMM-SU). The performance of the models with ReflectGAN is also better compared to their counterparts when applied to another dataset, i.e., Sentinel-2 imagery. These findings demonstrate the potential of ReflectGAN to improve SOC estimation accuracy in vegetated landscapes, supporting more reliable soil monitoring.
nan
Article 1903
Title@2025-05-24 (6): B-score: Detecting biases in large language models using response history
Title: B-score: Detecting biases in large language models using response history | B-Score: Voreingenommenheit in großen Sprachmodellen anhand der Antworthistorie erkennen | B-序号:利用回应历史在大型语言模型中发现偏见 2505.18545v1 |
Authors: An Vo, Mohammad Reza Taesiri, Daeyoung Kim, Anh Totti Nguyen
Large language models (LLMs) often exhibit strong biases, e.g, against women or in favor of the number 7. We investigate whether LLMs would be able to output less biased answers when allowed to observe their prior answers to the same question in a multi-turn conversation. To understand which types of questions invite more biased answers, we test LLMs on our proposed set of questions that span 9 topics and belong to three types: (1) Subjective; (2) Random; and (3) Objective. Interestingly, LLMs are able to “de-bias” themselves in a multi-turn conversation in response to questions that seek an Random, unbiased answer. Furthermore, we propose B-score, a novel metric that is effective in detecting biases to Subjective, Random, Easy, and Hard questions. On MMLU, HLE, and CSQA, leveraging B-score substantially improves the verification accuracy of LLM answers (i.e, accepting LLM correct answers and rejecting incorrect ones) compared to using verbalized confidence scores or the frequency of single-turn answers alone. Code and data are available at: https://b-score.github.io.
nan
Article 1904
Title@2025-05-24 (6): Benchmarking Poisoning Attacks against Retrieval-Augmented Generation
Title: Benchmarking Poisoning Attacks against Retrieval-Augmented Generation | Benchmarking von Giftangriffen gegen retrieval-angereicherte Generation | 制定基准,确定对回收一代人进行中毒袭击的基准 2505.18543v1 |
Authors: Baolei Zhang, Haoran Xin, Jiatong Li, Dongzhe Zhang, Minghong Fang, Zhuqing Liu, Lihai Nie, Zheli Liu
Retrieval-Augmented Generation (RAG) has proven effective in mitigating hallucinations in large language models by incorporating external knowledge during inference. However, this integration introduces new security vulnerabilities, particularly to poisoning attacks. Although prior work has explored various poisoning strategies, a thorough assessment of their practical threat to RAG systems remains missing. To address this gap, we propose the first comprehensive benchmark framework for evaluating poisoning attacks on RAG. Our benchmark covers 5 standard question answering (QA) datasets and 10 expanded variants, along with 13 poisoning attack methods and 7 defense mechanisms, representing a broad spectrum of existing techniques. Using this benchmark, we conduct a comprehensive evaluation of all included attacks and defenses across the full dataset spectrum. Our findings show that while existing attacks perform well on standard QA datasets, their effectiveness drops significantly on the expanded versions. Moreover, our results demonstrate that various advanced RAG architectures, such as sequential, branching, conditional, and loop RAG, as well as multi-turn conversational RAG, multimodal RAG systems, and RAG-based LLM agent systems, remain susceptible to poisoning attacks. Notably, current defense techniques fail to provide robust protection, underscoring the pressing need for more resilient and generalizable defense strategies.
nan
Article 1905
Title@2025-05-24 (6): Mind Your Vision: Multimodal Estimation of Refractive Disorders Using Electrooculography and Eye Tracking
Title: Mind Your Vision: Multimodal Estimation of Refractive Disorders Using Electrooculography and Eye Tracking | Denken Sie an Ihre Vision: Multimodale Abschätzung refraaktiver Störungen mittels Elektrookulographie und Eye Tracking | 思考你的愿景:利用电光学和眼视跟踪对折发性失常进行多模式估计 2505.18538v1 |
Authors: Xin Wei, Huakun Liu, Yutaro Hirao, Monica Perusquia-Hernandez, Katsutoshi Masai, Hideaki Uchiyama, Kiyoshi Kiyokawa
Refractive errors are among the most common visual impairments globally, yet their diagnosis often relies on active user participation and clinical oversight. This study explores a passive method for estimating refractive power using two eye movement recording techniques: electrooculography (EOG) and video-based eye tracking. Using a publicly available dataset recorded under varying diopter conditions, we trained Long Short-Term Memory (LSTM) models to classify refractive power from unimodal (EOG or eye tracking) and multimodal configuration. We assess performance in both subject-dependent and subject-independent settings to evaluate model personalization and generalizability across individuals. Results show that the multimodal model consistently outperforms unimodal models, achieving the highest average accuracy in both settings: 96.207\% in the subject-dependent scenario and 8.882\% in the subject-independent scenario. However, generalization remains limited, with classification accuracy only marginally above chance in the subject-independent evaluations. Statistical comparisons in the subject-dependent setting confirmed that the multimodal model significantly outperformed the EOG and eye-tracking models. However, no statistically significant differences were found in the subject-independent setting. Our findings demonstrate both the potential and current limitations of eye movement data-based refractive error estimation, contributing to the development of continuous, non-invasive screening methods using EOG signals and eye-tracking data.
nan
Article 1906
Title@2025-05-24 (6): Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD
Title: Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD | Konvergenz, Haft und Flucht: Stochastische Dynamik in der Nähe kritischer Punkte in SGD | 聚合、粘合和逃离:SGD中近临界点的斯托卡动态 2505.18535v1 |
Authors: Dmitry Dudukalov, Artem Logachov, Vladimir Lotov, Timofei Prasolov, Evgeny Prokopenko, Anton Tarasenko
We study the convergence properties and escape dynamics of Stochastic Gradient Descent (SGD) in one-dimensional landscapes, separately considering infinite- and finite-variance noise. Our main focus is to identify the time scales on which SGD reliably moves from an initial point to the local minimum in the same ‘‘basin’’. Under suitable conditions on the noise distribution, we prove that SGD converges to the basin’s minimum unless the initial point lies too close to a local maximum. In that near-maximum scenario, we show that SGD can linger for a long time in its neighborhood. For initial points near a ‘‘sharp’’ maximum, we show that SGD does not remain stuck there, and we provide results to estimate the probability that it will reach each of the two neighboring minima. Overall, our findings present a nuanced view of SGD’s transitions between local maxima and minima, influenced by both noise characteristics and the underlying function geometry.
nan
Article 1907
Title@2025-05-24 (6): CMoE: Converting Mixture-of-Experts from Dense to Accelerate LLM Inference
Title: CMoE: Converting Mixture-of-Experts from Dense to Accelerate LLM Inference | CMoE: Konvertieren von Mischungen von Experten aus Dense zu beschleunigter LLM-Inferenz | CMoE: 将混合专家从高能转换为加速LLM推理 2502.04416v2 |
Authors: Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu
Scaling large language models (LLMs) improves performance but dramatically increases inference costs. The feed-forward network (FFN), consuming approximately 70\% of inference compute, represents a critical bottleneck, particularly in large batch size scenarios. While mixture-of-experts (MoE) architectures leverage activation sparsity for efficiency, converting existing dense models to MoEs traditionally requires resource-intensive continual pre-training. We present CMoE, a framework that rapidly transforms dense LLMs into MoEs without training. The key innovation lies in analyzing FFN neuron activations to partition them into shared (always active) and routed experts. Routed neurons are clustered using a balanced assignment algorithm, and a differentiable router is constructed analytically from activation statistics, enabling immediate deployment or optional lightweight fine-tuning. Experiments demonstrate that, with activation ratio of 75\%, it achieves remarkable results, delivering lossless precision in terms of perplexity while still maintaining a 5\% acceleration. Further experiments reveal that a CMoE configuration activating just 25\% of parameters reduces end-to-end latency by 1.5x while preserving usable perplexity without additional training. Moreover, a brief LoRA fine-tuning process (requiring only 1 hour and 2,000 samples) successfully recovers over 76\% of the dense model’s downstream accuracy. By effectively balancing performance and efficiency, CMoE offers a viable path forward for deploying LLMs in real-world scenarios where computational resources are limited. We make our code publicly available at https://github.com/JarvisPei/CMoE.
nan
Article 1908
Title@2025-05-24 (6): Preserving AUC Fairness in Learning with Noisy Protected Groups
Title: Preserving AUC Fairness in Learning with Noisy Protected Groups | AUC Fairness beim Lernen mit geräuschgeschützten Gruppen bewahren | 维护AUC在与噪音保护群体学习中的公平公平 2505.18532v1 |
Authors: Mingyang Wu, Li Lin, Wenbin Zhang, Xin Wang, Zhenhuan Yang, Shu Hu
The Area Under the ROC Curve (AUC) is a key metric for classification, especially under class imbalance, with growing research focus on optimizing AUC over accuracy in applications like medical image analysis and deepfake detection. This leads to fairness in AUC optimization becoming crucial as biases can impact protected groups. While various fairness mitigation techniques exist, fairness considerations in AUC optimization remain in their early stages, with most research focusing on improving AUC fairness under the assumption of clean protected groups. However, these studies often overlook the impact of noisy protected groups, leading to fairness violations in practice. To address this, we propose the first robust AUC fairness approach under noisy protected groups with fairness theoretical guarantees using distributionally robust optimization. Extensive experiments on tabular and image datasets show that our method outperforms state-of-the-art approaches in preserving AUC fairness. The code is in https://github.com/Purdue-M2/AUC_Fairness_with_Noisy_Groups.
nan
Article 1909
Title@2025-05-24 (6): SMART: Self-Aware Agent for Tool Overuse Mitigation
Title: SMART: Self-Aware Agent for Tool Overuse Mitigation | SMART: Self-Aware Agent für Tool Overuse Mitigation | SMART: 减少工具过度使用自智能剂 2502.11435v2 |
Authors: Cheng Qian, Emre Can Acikgoz, Hongru Wang, Xiusi Chen, Avirup Sil, Dilek Hakkani-Tür, Gokhan Tur, Heng Ji
Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness, failing to balance these approaches effectively. This imbalance leads to Tool Overuse, where models unnecessarily rely on external tools for tasks solvable with parametric knowledge, increasing computational overhead. Inspired by human metacognition, we introduce SMART (Strategic Model-Aware Reasoning with Tools), a paradigm that enhances an agent’s self-awareness to optimize task handling and reduce tool overuse. To support this paradigm, we introduce SMART-ER, a dataset spanning three domains, where reasoning alternates between parametric knowledge and tool-dependent steps, with each step enriched by rationales explaining when tools are necessary. Through supervised training, we develop SMARTAgent, a family of models that dynamically balance parametric knowledge and tool use. Evaluations show that SMARTAgent reduces tool use by 24% while improving performance by over 37%, enabling 7B-scale models to match its 70B counterpart and GPT-4o. Additionally, SMARTAgent generalizes to out-of-distribution test data like GSM8K and MINTQA, maintaining accuracy with just one-fifth the tool calls. These highlight the potential of strategic tool use to enhance reasoning, mitigate overuse, and bridge the gap between model size and performance, advancing intelligent and resource-efficient agent designs.
nan
Article 1910
Title@2025-05-24 (6): Compositional Generalization via Forced Rendering of Disentangled Latents
Title: Compositional Generalization via Forced Rendering of Disentangled Latents | Zusammensetzungelle Verallgemeinerung durch Zwangsverleumdung entwirrter Latente | 通过强迫拆散的内流流流体 2501.18797v2 |
Authors: Qiyao Liang, Daoyuan Qian, Liu Ziyin, Ila Fiete
Composition-the ability to generate myriad variations from finite means-is believed to underlie powerful generalization. However, compositional generalization remains a key challenge for deep learning. A widely held assumption is that learning disentangled (factorized) representations naturally supports this kind of extrapolation. Yet, empirical results are mixed, with many generative models failing to recognize and compose factors to generate out-of-distribution (OOD) samples. In this work, we investigate a controlled 2D Gaussian “bump” generation task with fully disentangled (x,y) inputs, demonstrating that standard generative architectures still fail in OOD regions when training with partial data, by re-entangling latent representations in subsequent layers. By examining the model’s learned kernels and manifold geometry, we show that this failure reflects a “memorization” strategy for generation via data superposition rather than via composition of the true factorized features. We show that when models are forced-through architectural modifications with regularization or curated training data-to render the disentangled latents into the full-dimensional representational (pixel) space, they can be highly data-efficient and effective at composing in OOD regions. These findings underscore that disentangled latents in an abstract representation are insufficient and show that if models can represent disentangled factors directly in the output representational space, it can achieve robust compositional generalization.
nan
Article 1911
Title@2025-05-24 (6): CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs
Title: CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs | CLaDMoP: Übertragbare Modelle aus erfolgreichen klinischen Studien über LLMs lernen | CLADMOP:通过LLMs成功临床试验学习可转让模型 2505.18527v1 |
Authors: Yiqing Zhang, Xiaozhong Liu, Fabricio Murai
Many existing models for clinical trial outcome prediction are optimized using task-specific loss functions on trial phase-specific data. While this scheme may boost prediction for common diseases and drugs, it can hinder learning of generalizable representations, leading to more false positives/negatives. To address this limitation, we introduce CLaDMoP, a new pre-training approach for clinical trial outcome prediction, alongside the Successful Clinical Trials dataset(SCT), specifically designed for this task. CLaDMoP leverages a Large Language Model-to encode trials’ eligibility criteria-linked to a lightweight Drug-Molecule branch through a novel multi-level fusion technique. To efficiently fuse long embeddings across levels, we incorporate a grouping block, drastically reducing computational overhead. CLaDMoP avoids reliance on task-specific objectives by pre-training on a “pair matching” proxy task. Compared to established zero-shot and few-shot baselines, our method significantly improves both PR-AUC and ROC-AUC, especially for phase I and phase II trials. We further evaluate and perform ablation on CLaDMoP after Parameter-Efficient Fine-Tuning, comparing it to state-of-the-art supervised baselines, including MEXA-CTP, on the Trial Outcome Prediction(TOP) benchmark. CLaDMoP achieves up to 10.5% improvement in PR-AUC and 3.6% in ROC-AUC, while attaining comparable F1 score to MEXA-CTP, highlighting its potential for clinical trial outcome prediction. Code and SCT dataset can be downloaded from https://github.com/murai-lab/CLaDMoP.
nan
Article 1912
Title@2025-05-24 (6): Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition
Title: Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition | Skalierbare Gauß-Prozesse mit niederrassiger Tiefenkernzersetzung | 可缩放高斯进程,且低射深内核内核分解 2505.18526v1 |
Authors: Yunqin Zhu, Henry Shaowu Yuchi, Yao Xie
Kernels are key to encoding prior beliefs and data structures in Gaussian process (GP) models. The design of expressive and scalable kernels has garnered significant research attention. Deep kernel learning enhances kernel flexibility by feeding inputs through a neural network before applying a standard parametric form. However, this approach remains limited by the choice of base kernels, inherits high inference costs, and often demands sparse approximations. Drawing on Mercer’s theorem, we introduce a fully data-driven, scalable deep kernel representation where a neural network directly represents a low-rank kernel through a small set of basis functions. This construction enables highly efficient exact GP inference in linear time and memory without invoking inducing points. It also supports scalable mini-batch training based on a principled variational inference framework. We further propose a simple variance correction procedure to guard against overconfidence in uncertainty estimates. Experiments on synthetic and real-world data demonstrate the advantages of our deep kernel GP in terms of predictive accuracy, uncertainty quantification, and computational efficiency.
nan
Article 1913
Title@2025-05-24 (6): LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
Title: LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs | LiSTEN: Soft Token-Embeddings für neurale Audio-LLMs lernen | LISTEN: 神经音频LMS学习软软制嵌入器 2505.18517v1 |
Authors: Pooneh Mousavi, Shubham Gupta, Cem Subakan, Mirco Ravanelli
Foundation models based on large language models (LLMs) have shown great success in handling various tasks and modalities. However, adapting these models for general-purpose audio-language tasks is challenging due to differences in acoustic environments and task variations. In this work, we introduce LiSTEN Learning Soft Token Embeddings for Neural Audio LLMs), a framework for adapting LLMs to speech and audio tasks. LiSTEN uses a dynamic prompt selection strategy with learnable key-value pairs, allowing the model to balance general and task-specific knowledge while avoiding overfitting in a multitask setting. Our approach reduces dependence on large-scale ASR or captioning datasets, achieves competitive performance with fewer trainable parameters, and simplifies training by using a single-stage process. Additionally, LiSTEN enhances interpretability by analyzing the diversity and overlap of selected prompts across different tasks.
nan
Article 1914
Title@2025-05-24 (6): Test-Time Adaptation with Binary Feedback
Title: Test-Time Adaptation with Binary Feedback | Test-Zeit-Anpassung mit Binär-Feedback | 带有二进制反馈的测试时间适应 2505.18514v1 |
Authors: Taeckyung Lee, Sorn Chottananurak, Junsu Kim, Jinwoo Shin, Taesik Gong, Sung-Ju Lee
Deep learning models perform poorly when domain shifts exist between training and test data. Test-time adaptation (TTA) is a paradigm to mitigate this issue by adapting pre-trained models using only unlabeled test samples. However, existing TTA methods can fail under severe domain shifts, while recent active TTA approaches requiring full-class labels are impractical due to high labeling costs. To address this issue, we introduce a new setting of TTA with binary feedback. This setting uses a few binary feedback inputs from annotators to indicate whether model predictions are correct, thereby significantly reducing the labeling burden of annotators. Under the setting, we propose BiTTA, a novel dual-path optimization framework that leverages reinforcement learning to balance binary feedback-guided adaptation on uncertain samples with agreement-based self-adaptation on confident predictions. Experiments show BiTTA achieves 13.3%p accuracy improvements over state-of-the-art baselines, demonstrating its effectiveness in handling severe distribution shifts with minimal labeling effort. The source code is available at https://github.com/taeckyung/BiTTA.
nan
Article 1915
Title@2025-05-24 (6): Enhancing Training Data Attribution with Representational Optimization
Title: Enhancing Training Data Attribution with Representational Optimization | Verbesserung der Schulungsdatenzuweisung mit repräsentativer Optimierung | 提高培训数据分配,优化代表性 2505.18513v1 |
Authors: Weiwei Sun, Haokun Liu, Nikhil Kandpal, Colin Raffel, Yiming Yang
Training data attribution (TDA) methods aim to measure how training data impacts a model’s predictions. While gradient-based attribution methods, such as influence functions, offer theoretical grounding, their computational costs make them impractical for large-scale applications. Representation-based approaches are far more scalable, but typically rely on heuristic embeddings that are not optimized for attribution, limiting their fidelity. To address these challenges, we propose AirRep, a scalable, representation-based approach that closes this gap by learning task-specific and model-aligned representations optimized explicitly for TDA. AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence. We train AirRep using a ranking objective over automatically constructed training subsets labeled by their empirical effect on target predictions. Experiments on instruction-tuned LLMs demonstrate that AirRep achieves performance on par with state-of-the-art gradient-based approaches while being nearly two orders of magnitude more efficient at inference time. Further analysis highlights its robustness and generalization across tasks and models. Our code is available at https://github.com/sunnweiwei/AirRep.
nan
Article 1916
Title@2025-05-24 (6): AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking
Title: AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking | AcuRank: Ungewissheits-Bewusst-Adaptive-Computation für Listwise-Reranking | AcuRank: 列表排序的不确定性- 软件适应性计算 2505.18512v1 |
Authors: Soyoung Yoon, Gyuwan Kim, Gyu-Hwung Cho, Seung-won Hwang
Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications. Due to the limit in context size and high inference cost of long context, reranking is typically performed over a fixed size of small subsets, with the final ranking aggregated from these partial results. This fixed computation disregards query difficulty and document distribution, leading to inefficiencies. We propose AcuRank, an adaptive reranking framework that dynamically adjusts both the amount and target of computation based on uncertainty estimates over document relevance. Using a Bayesian TrueSkill model, we iteratively refine relevance estimates until reaching sufficient confidence levels, and our explicit modeling of ranking uncertainty enables principled control over reranking behavior and avoids unnecessary updates to confident predictions. Results on the TREC-DL and BEIR benchmarks show that our method consistently achieves a superior accuracy-efficiency trade-off and scales better with compute than fixed-computation baselines. These results highlight the effectiveness and generalizability of our method across diverse retrieval tasks and LLM-based reranking models.
nan
Article 1917
Title@2025-05-24 (6): SPDEBench: An Extensive Benchmark for Learning Regular and Singular Stochastic PDEs
Title: SPDEBench: An Extensive Benchmark for Learning Regular and Singular Stochastic PDEs | SPDEBench: Ein umfassender Benchmark für das Lernen regelmäßiger und singulärer stochastischer PDEs | SPDEBENCH: 定期学习和单声速学项目的广泛基准 2505.18511v1 |
Authors: Zheyan Li, Yuantu Zhu, Hao Ni, Siran Li, Bingguang Chen, Qi Meng
Stochastic Partial Differential Equations (SPDEs) driven by random noise play a central role in modelling physical processes whose spatio-temporal dynamics can be rough, such as turbulence flows, superconductors, and quantum dynamics. To efficiently model these processes and make predictions, machine learning (ML)-based surrogate models are proposed, with their network architectures incorporating the spatio-temporal roughness in their design. However, it lacks an extensive and unified datasets for SPDE learning; especially, existing datasets do not account for the computational error introduced by noise sampling and the necessary renormalization required for handling singular SPDEs. We thus introduce SPDEBench, which is designed to solve typical SPDEs of physical significance (e.g., the $\Phi^4_d$, wave, incompressible Navier–Stokes, and KdV equations) on 1D or 2D tori driven by white noise via ML methods. New datasets for singular SPDEs based on the renormalization process have been constructed, and novel ML models achieving the best results to date have been proposed. In particular, we investigate the impact of computational error introduced by noise sampling and renormalization on the performance comparison of ML models and highlight the importance of selecting high-quality test data for accurate evaluation. Results are benchmarked with traditional numerical solvers and ML-based models, including FNO, NSPDE and DLR-Net, etc. It is shown that, for singular SPDEs, naively applying ML models on data without specifying the numerical schemes can lead to significant errors and misleading conclusions. Our SPDEBench provides an open-source codebase that ensures full reproducibility of benchmarking across a variety of SPDE datasets while offering the flexibility to incorporate new datasets and machine learning baselines, making it a valuable resource for the community.
nan
Article 1918
Title@2025-05-24 (6): How Particle System Theory Enhances Hypergraph Message Passing
Title: How Particle System Theory Enhances Hypergraph Message Passing | Wie Partikelsystemtheorie die Hypergraph-Nachricht verbessert | 粒子系统理论如何增强超光速消息传递 2505.18505v1 |
Authors: Yixuan Ma, Kai Yi, Pietro Lio, Shi Jin, Yu Guang Wang
Hypergraphs effectively model higher-order relationships in natural phenomena, capturing complex interactions beyond pairwise connections. We introduce a novel hypergraph message passing framework inspired by interacting particle systems, where hyperedges act as fields inducing shared node dynamics. By incorporating attraction, repulsion, and Allen-Cahn forcing terms, particles of varying classes and features achieve class-dependent equilibrium, enabling separability through the particle-driven message passing. We investigate both first-order and second-order particle system equations for modeling these dynamics, which mitigate over-smoothing and heterophily thus can capture complete interactions. The more stable second-order system permits deeper message passing. Furthermore, we enhance deterministic message passing with stochastic element to account for interaction uncertainties. We prove theoretically that our approach mitigates over-smoothing by maintaining a positive lower bound on the hypergraph Dirichlet energy during propagation and thus to enable hypergraph message passing to go deep. Empirically, our models demonstrate competitive performance on diverse real-world hypergraph node classification tasks, excelling on both homophilic and heterophilic datasets.
nan
Article 1919
Title@2025-05-24 (6): Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks
Title: Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks | Repräsentationslernen mit gegenseitigem Einfluss von Modalitäten für die Knotenklassifikation in multimodalen Heterogenen Netzwerken | 多模式不同形式网络节点分类方式相互影响,代表学习 2505.07895v2 |
Authors: Jiafan Li, Jiaqi Zhu, Liang Chang, Yilin Li, Miaomiao Li, Yang Wang, Hongan Wang
Nowadays, numerous online platforms can be described as multi-modal heterogeneous networks (MMHNs), such as Douban’s movie networks and Amazon’s product review networks. Accurately categorizing nodes within these networks is crucial for analyzing the corresponding entities, which requires effective representation learning on nodes. However, existing multi-modal fusion methods often adopt either early fusion strategies which may lose the unique characteristics of individual modalities, or late fusion approaches overlooking the cross-modal guidance in GNN-based information propagation. In this paper, we propose a novel model for node classification in MMHNs, named Heterogeneous Graph Neural Network with Inter-Modal Attention (HGNN-IMA). It learns node representations by capturing the mutual influence of multiple modalities during the information propagation process, within the framework of heterogeneous graph transformer. Specifically, a nested inter-modal attention mechanism is integrated into the inter-node attention to achieve adaptive multi-modal fusion, and modality alignment is also taken into account to encourage the propagation among nodes with consistent similarities across all modalities. Moreover, an attention loss is augmented to mitigate the impact of missing modalities. Extensive experiments validate the superiority of the model in the node classification task, providing an innovative view to handle multi-modal data, especially when accompanied with network structures.
nan
Article 1920
Title@2025-05-24 (6): LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes
Title: LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes | LiDAR-EDIT: LiDAR-Datenerstellung durch Bearbeiten der Objektlayouts in realen Szenen | LiDAR-EDIT:通过在真实世界景点中编辑对象布局生成LIDAR数据 2412.00592v3 |
Authors: Shing-Hei Ho, Bao Thach, Minghan Zhu
We present LiDAR-EDIT, a novel paradigm for generating synthetic LiDAR data for autonomous driving. Our framework edits real-world LiDAR scans by introducing new object layouts while preserving the realism of the background environment. Compared to end-to-end frameworks that generate LiDAR point clouds from scratch, LiDAR-EDIT offers users full control over the object layout, including the number, type, and pose of objects, while keeping most of the original real-world background. Our method also provides object labels for the generated data. Compared to novel view synthesis techniques, our framework allows for the creation of counterfactual scenarios with object layouts significantly different from the original real-world scene. LiDAR-EDIT uses spherical voxelization to enforce correct LiDAR projective geometry in the generated point clouds by construction. During object removal and insertion, generative models are employed to fill the unseen background and object parts that were occluded in the original real LiDAR scans. Experimental results demonstrate that our framework produces realistic LiDAR scans with practical value for downstream tasks.
nan
Article 1921
Title@2025-05-24 (6): EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents
Title: EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents | EscapeBench: Auf dem Weg zu mehr kreativer Intelligenz von Sprachmodell-Agenten | 逃避:努力推进语言示范代理的创意智能 2412.13549v2 |
Authors: Cheng Qian, Peixuan Han, Qinyu Luo, Bingxiang He, Xiusi Chen, Yuji Zhang, Hongyi Du, Jiarui Yao, Xiaocheng Yang, Denghui Zhang, Yunzhu Li, Heng Ji
Language model agents excel in long-session planning and reasoning, but existing benchmarks primarily focus on goal-oriented tasks with explicit objectives, neglecting creative adaptation in unfamiliar environments. To address this, we introduce EscapeBench, a benchmark suite of room escape game environments designed to challenge agents with creative reasoning, unconventional tool use, and iterative problem-solving to uncover implicit goals. Our results show that current LM models, despite employing working memory and Chain-of-Thought reasoning, achieve only 15% average progress without hints, highlighting their limitations in creativity. To bridge this gap, we propose EscapeAgent, a framework designed to enhance creative reasoning through Foresight (innovative tool use) and Reflection (identifying unsolved tasks). Experiments show that EscapeAgent can execute action chains over 1,000 steps while maintaining logical coherence. It navigates and completes games with up to 40% fewer steps and hints, performs robustly across difficulty levels, and achieves higher action success rates with more efficient and innovative puzzle-solving strategies.
nan
Article 1922
Title@2025-05-24 (6): Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks
Title: Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks | Wahrnehmungs-informierte neurale Netzwerke: Jenseits physikinformierter neuraler Netzwerke | 感知内化神经网络:超越物理内化神经网络 2505.03806v2 |
Authors: Mehran Mazandarani, Marzieh Najariyan
This article introduces Perception-Informed Neural Networks (PrINNs), a framework designed to incorporate perception-based information into neural networks, addressing both systems with known and unknown physics laws or differential equations. Moreover, PrINNs extend the concept of Physics-Informed Neural Networks (PINNs) and their variants, offering a platform for the integration of diverse forms of perception precisiation, including singular, probability distribution, possibility distribution, interval, and fuzzy graph. In fact, PrINNs allow neural networks to model dynamical systems by integrating expert knowledge and perception-based information through loss functions, enabling the creation of modern data-driven models. Some of the key contributions include Mixture of Experts Informed Neural Networks (MOEINNs), which combine heterogeneous expert knowledge into the network, and Transformed-Knowledge Informed Neural Networks (TKINNs), which facilitate the incorporation of meta-information for enhanced model performance. Additionally, Fuzzy-Informed Neural Networks (FINNs) as a modern class of fuzzy deep neural networks leverage fuzzy logic constraints within a deep learning architecture, allowing online training without pre-training and eliminating the need for defuzzification. PrINNs represent a significant step forward in bridging the gap between traditional physics-based modeling and modern data-driven approaches, enabling neural networks to learn from both structured physics laws and flexible perception-based rules. This approach empowers neural networks to operate in uncertain environments, model complex systems, and discover new forms of differential equations, making PrINNs a powerful tool for advancing computational science and engineering.
nan
Article 1923
Title@2025-05-24 (6): Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
Title: Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection | Gruppenadaptive Schwellenoptimierung für robuste KI-generierte Texterkennung | 强力AI-发光的文本探测的集团-适应性阈值优化 2502.04528v4 |
Authors: Minseok Jung, Cynthia Fuertes Panizo, Liam Dugan, Yi R., Fung, Pin-Yu Chen, Paul Pu Liang
The advancement of large language models (LLMs) has made it difficult to differentiate human-written text from AI-generated text. Several AI-text detectors have been developed in response, which typically utilize a fixed global threshold (e.g., $\theta = 0.5$) to classify machine-generated text. However, one universal threshold could fail to account for distributional variations by subgroups. For example, when using a fixed threshold, detectors make more false positive errors on shorter human-written text, and more positive classifications of neurotic writing styles among long texts. These discrepancies can lead to misclassifications that disproportionately affect certain groups. We address this critical limitation by introducing FairOPT, an algorithm for group-specific threshold optimization for probabilistic AI-text detectors. We partitioned data into subgroups based on attributes (e.g., text length and writing style) and implemented FairOPT to learn decision thresholds for each group to reduce discrepancy. In experiments with nine AI text classifiers on three datasets, FairOPT decreases overall balanced error rate (BER) discrepancy by 12\% while minimally sacrificing accuracy by 0.003\%. Our framework paves the way for more robust classification in AI-generated content detection via post-processing.
nan
Article 1924
Title@2025-05-24 (6): Knowledge Grafting of Large Language Models
Title: Knowledge Grafting of Large Language Models | Wissen Graften von großen Sprachmodellen | 大语言模式知识转让 2505.18502v1 |
Authors: Guodong Du, Xuanning Zhou, Junlin Li, Zhuo Li, Zesheng Shi, Wanyu Lin, Ho-Kin Tang, Xiucheng Li, Fangming Liu, Wenya Wang, Min Zhang, Jing Li
Cross-capability transfer is a key challenge in large language model (LLM) research, with applications in multi-task integration, model compression, and continual learning. Recent works like FuseLLM and FuseChat have demonstrated the potential of transferring multiple model capabilities to lightweight models, enhancing adaptability and efficiency, which motivates our investigation into more efficient cross-capability transfer methods. However, existing approaches primarily focus on small, homogeneous models, limiting their applicability. For large, heterogeneous models, knowledge distillation with full-parameter fine-tuning often overlooks the student model’s intrinsic capacity and risks catastrophic forgetting, while PEFT methods struggle to effectively absorb knowledge from source LLMs. To address these issues, we introduce GraftLLM, a novel method that stores source model capabilities in a target model with SkillPack format. This approach preserves general capabilities, reduces parameter conflicts, and supports forget-free continual learning and model fusion. We employ a module-aware adaptive compression strategy to compress parameter updates, ensuring efficient storage while maintaining task-specific knowledge. The resulting SkillPack serves as a compact and transferable knowledge carrier, ideal for heterogeneous model fusion and continual learning. Experiments across various scenarios demonstrate that GraftLLM outperforms existing techniques in knowledge transfer, knowledge fusion, and forget-free learning, providing a scalable and efficient solution for cross-capability transfer. The code is publicly available at: https://github.com/duguodong7/GraftLLM.
nan
Article 1925
Title@2025-05-24 (6): MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning
Title: MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning | MENTOR: Mixture-of-Experts-Netzwerk mit Task-Oriented Perturbation für visuelles Verstärkungslernen | INTOOR: 视力强化学习中以任务为导向的干扰干扰模拟专家网络 2410.14972v2 |
Authors: Suning Huang, Zheyu Zhang, Tianhai Liang, Yihan Xu, Zhehao Kou, Chenhao Lu, Guowei Xu, Zhengrong Xue, Huazhe Xu
Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.
nan
Article 1926
Title@2025-05-24 (6): G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Title: G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning | G1: LLMs zur Vernunft bringen bei Diagrammen mit Verstärkungslernen | G1:在加强学习的图表方面向理性者传授法学硕士 2505.18499v1 |
Authors: Xiaojun Guo, Ang Li, Yifei Wang, Stefanie Jegelka, Yisen Wang
Although Large Language Models (LLMs) have demonstrated remarkable progress, their proficiency in graph-related tasks remains notably limited, hindering the development of truly general-purpose models. Previous attempts, including pretraining graph foundation models or employing supervised fine-tuning, often face challenges such as the scarcity of large-scale, universally represented graph data. We introduce G1, a simple yet effective approach demonstrating that Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale LLMs’ graph reasoning abilities. To enable RL training, we curate Erd~os, the largest graph reasoning dataset to date comprising 50 diverse graph-theoretic tasks of varying difficulty levels, 100k training data and 5k test data, all drived from real-world graphs. With RL on Erd~os, G1 obtains substantial improvements in graph reasoning, where our finetuned 3B model even outperforms Qwen2.5-72B-Instruct (24x size). RL-trained models also show strong zero-shot generalization to unseen tasks, domains, and graph encoding schemes, including other graph-theoretic benchmarks as well as real-world node classification and link prediction tasks, without compromising general reasoning abilities. Our findings offer an efficient, scalable path for building strong graph reasoners by finetuning LLMs with RL on graph-theoretic tasks, which combines the strengths of pretrained LLM capabilities with abundant, automatically generated synthetic data, suggesting that LLMs possess graph understanding abilities that RL can elicit successfully.
nan
Article 1927
Title@2025-05-24 (6): Quantum Feature Space of a Qubit Coupled to an Arbitrary Bath
Title: Quantum Feature Space of a Qubit Coupled to an Arbitrary Bath | Quanten-Feature-Raum eines Qubits in Verbindung mit einem willkürlichen Bad | 与任意浴室结合的Qubit夫妇的 量量地貌空间 2505.03397v3 |
Authors: Chris Wise, Akram Youssry, Alberto Peruzzo, Jo Plested, Matt Woolley
Qubit control protocols have traditionally leveraged a characterisation of the qubit-bath coupling via its power spectral density. Previous work proposed the inference of noise operators that characterise the influence of a classical bath using a grey-box approach that combines deep neural networks with physics-encoded layers. This overall structure is complex and poses challenges in scaling and real-time operations. Here, we show that no expensive neural networks are needed and that this noise operator description admits an efficient parameterisation. We refer to the resulting parameter space as the \textit{quantum feature space} of the qubit dynamics resulting from the coupled bath. We show that the Euclidean distance defined over the quantum feature space provides an effective method for classifying noise processes in the presence of a given set of controls. Using the quantum feature space as the input space for a simple machine learning algorithm (random forest, in this case), we demonstrate that it can effectively classify the stationarity and the broad class of noise processes perturbing a qubit. Finally, we explore how control pulse parameters map to the quantum feature space.
nan
Article 1928
Title@2025-05-24 (6): FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Title: FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers | FuseGPT: Lernbare Ebenen Fusion generativer vortrainierter Transformer | FuseGPT: 训练前改造器的产生型先导变异器的可学习层融合 2411.14507v2 |
Authors: Zehua Pei, Hui-Ling Zhen, Xianzhi Yu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu
Generative Pre-trained Transformers (GPTs) have demonstrated remarkable performance across diverse domains, largely due to the extensive scaling of model parameters. Recent works have observed redundancy within transformer blocks and developed compression methods by structured pruning of less important blocks. However, such direct removal often leads to irreversible performance degradation. In this paper, we propose FuseGPT, a novel methodology designed to recycle pruned transformer blocks, thereby recovering the model’s performance. Firstly, we introduce a new importance detection metric, Macro Influence (MI), which evaluates the long-term impact of each transformer block by quantifying the information loss incurred upon its removal. Next, we propose group-level layer fusion, which leverages the parameters from layers of less important blocks and integrates them into the corresponding layers of neighboring blocks. This fusion process is not a one-time operation but is refined through iterative parameter updates by lightweight group-level fine-tuning. Specifically, the injected parameters are frozen but are weighted with learnable rank decomposition matrices to reduce the computational overhead during fine-tuning. Our approach not only works well for large language models but also for large multimodal models. Experimental results indicate that, even with modest amounts of data, FuseGPT surpasses previous methods in both perplexity and zero-shot task performance.
nan
Article 1929
Title@2025-05-24 (6): Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking
Title: Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking | Beyond Masked and Unmasked: Diskrete Diffusion Models via Partial Masking | 超越遮盖和无遮盖:通过部分遮盖分解扩散模型 2505.18495v1 |
Authors: Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, Rahul G. Krishnan
Masked diffusion models (MDM) are powerful generative models for discrete data that generate samples by progressively unmasking tokens in a sequence. Each token can take one of two states: masked or unmasked. We observe that token sequences often remain unchanged between consecutive sampling steps; consequently, the model repeatedly processes identical inputs, leading to redundant computation. To address this inefficiency, we propose the Partial masking scheme (Prime), which augments MDM by allowing tokens to take intermediate states interpolated between the masked and unmasked states. This design enables the model to make predictions based on partially observed token information, and facilitates a fine-grained denoising process. We derive a variational training objective and introduce a simple architectural design to accommodate intermediate-state inputs. Our method demonstrates superior performance across a diverse set of generative modeling tasks. On text data, it achieves a perplexity of 15.36 on OpenWebText, outperforming previous MDM (21.52), autoregressive models (17.54), and their hybrid variants (17.58), without relying on an autoregressive formulation. On image data, it attains competitive FID scores of 3.26 on CIFAR-10 and 6.98 on ImageNet-32, comparable to leading continuous generative models.
nan
Article 1930
Title@2025-05-24 (6): FedHL: Federated Learning for Heterogeneous Low-Rank Adaptation via Unbiased Aggregation
Title: FedHL: Federated Learning for Heterogeneous Low-Rank Adaptation via Unbiased Aggregation | FedHL: Föderiertes Lernen für heterogene Low-Rank-Anpassung durch unvoreingenommene Aggregation | FFHL:通过无偏见的聚合体进行异种性、低兰克低差异适应的联邦学习 2505.18494v1 |
Authors: Zihao Peng, Jiandian Zeng, Boyuan Li, Guo Li, Shengbo Chen, Tian Wang
Federated Learning (FL) facilitates the fine-tuning of Foundation Models (FMs) using distributed data sources, with Low-Rank Adaptation (LoRA) gaining popularity due to its low communication costs and strong performance. While recent work acknowledges the benefits of heterogeneous LoRA in FL and introduces flexible algorithms to support its implementation, our theoretical analysis reveals a critical gap: existing methods lack formal convergence guarantees due to parameter truncation and biased gradient updates. Specifically, adapting client-specific LoRA ranks necessitates truncating global parameters, which introduces inherent truncation errors and leads to subsequent inaccurate gradient updates that accumulate over training rounds, ultimately degrading performance. To address the above issues, we propose \textbf{FedHL}, a simple yet effective \textbf{Fed}erated Learning framework tailored for \textbf{H}eterogeneous \textbf{L}oRA. By leveraging the full-rank global model as a calibrated aggregation basis, FedHL eliminates the direct truncation bias from initial alignment with client-specific ranks. Furthermore, we derive the theoretically optimal aggregation weights by minimizing the gradient drift term in the convergence upper bound. Our analysis shows that FedHL guarantees $\mathcal{O}(1/\sqrt{T})$ convergence rate, and experiments on multiple real-world datasets demonstrate a 1-3\% improvement over several state-of-the-art methods.
nan
Article 1931
Title@2025-05-24 (6): TextArena
Title: TextArena | TextArena | TextArenna 文本 2504.11442v2 |
Authors: Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, Cheston Tan
TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs). It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system (against humans and other submitted models) with real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social skills such as negotiation, theory of mind, and deception, creating a gap that TextArena addresses. Designed with research, community and extensibility in mind, TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models. Detailed documentation of environments, games, leaderboard, and examples are available on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.
nan
Article 1932
Title@2025-05-24 (6): Statistical Inference under Performativity
Title: Statistical Inference under Performativity | Statistische Schlussfolgerung unter Performativität | 性能下统计推断值 2505.18493v1 |
Authors: Xiang Li, Yunai Li, Huiying Zhong, Lihua Lei, Zhun Deng
Performativity of predictions refers to the phenomena that prediction-informed decisions may influence the target they aim to predict, which is widely observed in policy-making in social sciences and economics. In this paper, we initiate the study of statistical inference under performativity. Our contribution is two-fold. First, we build a central limit theorem for estimation and inference under performativity, which enables inferential purposes in policy-making such as constructing confidence intervals or testing hypotheses. Second, we further leverage the derived central limit theorem to investigate prediction-powered inference (PPI) under performativity, which is based on a small labeled dataset and a much larger dataset of machine-learning predictions. This enables us to obtain more precise estimation and improved confidence regions for the model parameter (i.e., policy) of interest in performative prediction. We demonstrate the power of our framework by numerical experiments. To the best of our knowledge, this paper is the first one to establish statistical inference under performativity, which brings up new challenges and inference settings that we believe will add significant values to policy-making, statistics, and machine learning.
nan
Article 1933
Title@2025-05-24 (6): Synthesizing and Adapting Error Correction Data for Mobile Large Language Model Applications
Title: Synthesizing and Adapting Error Correction Data for Mobile Large Language Model Applications | Synchronisieren und Anpassen von Fehlerkorrekturdaten für mobile Großsprachen-Modellanwendungen | 合成和调整移动大语言模型应用错误校正数据 2505.18488v1 |
Authors: Yanxiang Zhang, Zheng Xu, Shanshan Wu, Yuanbo Zhang, Daniel Ramage
Error correction is an important capability when applying large language models (LLMs) to facilitate user typing on mobile devices. In this paper, we use LLMs to synthesize a high-quality dataset of error correction pairs to evaluate and improve LLMs for mobile applications. We first prompt LLMs with error correction domain knowledge to build a scalable and reliable addition to the existing data synthesis pipeline. We then adapt the synthetic data distribution to match the mobile application domain by reweighting the samples. The reweighting model is learnt by predicting (a handful of) live A/B test metrics when deploying LLMs in production, given the LLM performance on offline evaluation data and scores from a small privacy-preserving on-device language model. Finally, we present best practices for mixing our synthetic data with other data sources to improve model performance on error correction in both offline evaluation and production live A/B testing.
nan
Article 1934
Title@2025-05-24 (6): Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning
Title: Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning | Bodily Bewusstsein in visuellen Darstellungen für effizientes politisches Lernen geerdet | 提高政策学习效率的视觉表现方面的共同认识 2505.18487v1 |
Authors: Junlin Wang, Zhiyun Lin
Learning effective visual representations for robotic manipulation remains a fundamental challenge due to the complex body dynamics involved in action execution. In this paper, we study how visual representations that carry body-relevant cues can enable efficient policy learning for downstream robotic manipulation tasks. We present $\textbf{I}$nter-token $\textbf{Con}$trast ($\textbf{ICon}$), a contrastive learning method applied to the token-level representations of Vision Transformers (ViTs). ICon enforces a separation in the feature space between agent-specific and environment-specific tokens, resulting in agent-centric visual representations that embed body-specific inductive biases. This framework can be seamlessly integrated into end-to-end policy learning by incorporating the contrastive loss as an auxiliary objective. Our experiments show that ICon not only improves policy performance across various manipulation tasks but also facilitates policy transfer across different robots. The project website: https://github.com/HenryWJL/icon
nan
Article 1935
Title@2025-05-24 (6): The Prompt is Mightier than the Example
Title: The Prompt is Mightier than the Example | Die Aufforderung ist mächtiger als das Beispiel | 火急比例子更强 2505.18485v1 |
Authors: Shengzhe Xu, Nikhil Muralidhar, Naren Ramakrishnan
Numerous recent prompt optimization approaches like chain-of-thought, have been demonstrated to significantly improve the quality of content generated by large language models (LLMs). In-context learning (ICL), a recent paradigm where a few representative examples guide content generation has also led to strong improvements in generation quality of LLM generated content. This idea has been applied to great effect in synthetic tabular data generation, where LLMs, through effective use of ICL and prompt optimization, can generate data that approximate samples from complex, heterogeneous distributions based on representative examples. However, ensuring high-fidelity synthetic data often requires a very large number of ICL examples which may be unavailable or costly to obtain. At the same time, as LLMs get larger and larger, their in-built prior knowledge becomes vast and can potentially substitute for specific data examples. In this paper, we introduce Knowledge-Guided Prompting (KGP) as a new knob in prompt optimization and explore the ability of KGP-based prompt optimization to offset the cost of ICL. Specifically, we explore the question `how many examples can a prompt substitute for?’ and explore knowledge-guided prompting (KGP) where domain knowledge, either inferred or available, is explicitly injected into the prompt, reducing dependence on ICL examples. Our experiments systematically explore the trade-off between ICL and KGP, revealing an empirical scaling law that quantifies how quality of generated synthetic data varies with increasing domain knowledge and decreasing example count. Our results demonstrate that knowledge-guided prompting can be a scalable alternative, or addition, to in-context examples, unlocking new approaches to synthetic data generation.
nan
Article 1936
Title@2025-05-24 (6): DiffPuter: Empowering Diffusion Models for Missing Data Imputation
Title: DiffPuter: Empowering Diffusion Models for Missing Data Imputation | DiffPuter: Empowering Diffusion Modelle für fehlende Daten-Imputation | DiffPuter:赋予缺失数据计算传播模型权力 2405.20690v2 |
Authors: Hengrui Zhang, Liancheng Fang, Qitian Wu, Philip S. Yu
Generative models play an important role in missing data imputation in that they aim to learn the joint distribution of full data. However, applying advanced deep generative models (such as Diffusion models) to missing data imputation is challenging due to 1) the inherent incompleteness of the training data and 2) the difficulty in performing conditional inference from unconditional generative models. To deal with these challenges, this paper introduces DiffPuter, a tailored diffusion model combined with the Expectation-Maximization (EM) algorithm for missing data imputation. DiffPuter iteratively trains a diffusion model to learn the joint distribution of missing and observed data and performs an accurate conditional sampling to update the missing values using a tailored reversed sampling strategy. Our theoretical analysis shows that DiffPuter’s training step corresponds to the maximum likelihood estimation of data density (M-step), and its sampling step represents the Expected A Posteriori estimation of missing values (E-step). Extensive experiments across ten diverse datasets and comparisons with 17 different imputation methods demonstrate DiffPuter’s superior performance. Notably, DiffPuter achieves an average improvement of 6.94% in MAE and 4.78% in RMSE compared to the most competitive existing method.
nan
Article 1937
Title@2025-05-24 (6): Change Point Detection in the Frequency Domain with Statistical Reliability
Title: Change Point Detection in the Frequency Domain with Statistical Reliability | Punkterkennung im Frequenzbereich mit statistischer Zuverlässigkeit ändern | 具有统计可靠性的频率域的更改点探测 2502.03062v2 |
Authors: Akifumi Yamada, Tomohiro Shiraishi, Shuichi Nishino, Teruyuki Katsuoka, Kouichi Taji, Ichiro Takeuchi
Effective condition monitoring in complex systems requires identifying change points (CPs) in the frequency domain, as the structural changes often arise across multiple frequencies. This paper extends recent advancements in statistically significant CP detection, based on Selective Inference (SI), to the frequency domain. The proposed SI method quantifies the statistical significance of detected CPs in the frequency domain using $p$-values, ensuring that the detected changes reflect genuine structural shifts in the target system. We address two major technical challenges to achieve this. First, we extend the existing SI framework to the frequency domain by appropriately utilizing the properties of discrete Fourier transform (DFT). Second, we develop an SI method that provides valid $p$-values for CPs where changes occur across multiple frequencies. Experimental results demonstrate that the proposed method reliably identifies genuine CPs with strong statistical guarantees, enabling more accurate root-cause analysis in the frequency domain of complex systems.
nan
Article 1938
Title@2025-05-24 (6): Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective
Title: Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective | Sigmoid-Selbstaufmerksamkeit hat eine geringere Probenkomplexität als Softmax-Selbstaufmerksamkeit: Eine Mischung aus Experten-Perspektive | 与 Softmax自觉:混合专家视角相比,Sigmoid自觉的样本复杂性较低。 2502.00281v2 |
Authors: Fanqi Yan, Huy Nguyen, Pedram Akbarian, Nhat Ho, Alessandro Rinaldo
At the core of the popular Transformer architecture is the self-attention mechanism, which dynamically assigns softmax weights to each input token so that the model can focus on the most salient information. However, the softmax structure slows down the attention computation due to its row-wise nature, and it inherently introduces competition among tokens: as the weight assigned to one token increases, the weights of others decrease. This competitive dynamic may narrow the focus of self-attention to a limited set of features, potentially overlooking other informative characteristics. Recent experimental studies have shown that using the element-wise sigmoid function helps eliminate token competition and reduce the computational overhead. Despite these promising empirical results, a rigorous comparison between sigmoid and softmax self-attention mechanisms remains absent in the literature. This paper closes this gap by theoretically demonstrating that sigmoid self-attention is more sample-efficient than its softmax counterpart. Toward that goal, we represent the self-attention matrix as a mixture of experts and show that ``experts’’ in sigmoid self-attention require significantly less data to achieve the same approximation error as those in softmax self-attention.
nan
Article 1939
Title@2025-05-24 (6): Provably Robust Training of Quantum Circuit Classifiers Against Parameter Noise
Title: Provably Robust Training of Quantum Circuit Classifiers Against Parameter Noise | Wahrscheinlich robustes Training von Quantum Circuit Klassifikatoren gegen Parametergeräusche | 针对参数噪音的量子电路分级器的可证实的强力培训 2505.18478v1 |
Authors: Lucas Tecot, Di Luo, Cho-Jui Hsieh
Advancements in quantum computing have spurred significant interest in harnessing its potential for speedups over classical systems. However, noise remains a major obstacle to achieving reliable quantum algorithms. In this work, we present a provably noise-resilient training theory and algorithm to enhance the robustness of parameterized quantum circuit classifiers. Our method, with a natural connection to Evolutionary Strategies, guarantees resilience to parameter noise with minimal adjustments to commonly used optimization algorithms. Our approach is function-agnostic and adaptable to various quantum circuits, successfully demonstrated in quantum phase classification tasks. By developing provably guaranteed optimization theory with quantum circuits, our work opens new avenues for practical, robust applications of near-term quantum computers.
nan
Article 1940
Title@2025-05-24 (6): CAPE: Covariate-Adjusted Pre-Training for Generalized Epidemic Time Series Forecasting
Title: CAPE: Covariate-Adjusted Pre-Training for Generalized Epidemic Time Series Forecasting | CAPE: Kovariat-adjustierte Vorschulung für generalisierte epidemische Zeitreihen | CAPE: 通用流行病时间序列预测共同调整前培训 2502.03393v3 |
Authors: Zewen Liu, Juntong Ni, Max S. Y. Lau, Wei Jin
Accurate forecasting of epidemic infection trajectories is crucial for safeguarding public health. However, limited data availability during emerging outbreaks and the complex interaction between environmental factors and disease dynamics present significant challenges for effective forecasting. In response, we introduce CAPE, a novel epidemic pre-training framework designed to harness extensive disease datasets from diverse regions and integrate environmental factors directly into the modeling process for more informed decision-making on downstream diseases. Based on a covariate adjustment framework, CAPE utilizes pre-training combined with hierarchical environment contrasting to identify universal patterns across diseases while estimating latent environmental influences. We have compiled a diverse collection of epidemic time series datasets and validated the effectiveness of CAPE under various evaluation scenarios, including full-shot, few-shot, zero-shot, cross-location, and cross-disease settings, where it outperforms the leading baseline by an average of 9.9% in full-shot and 14.3% in zero-shot settings.
nan
Article 1941
Title@2025-05-24 (6): Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive Survey
Title: Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive Survey | Große Sprachmodelle nutzen, um grundlegende Herausforderungen im Graphenlernen zu bewältigen: Eine umfassende Umfrage | 使用大语言模式应对图表学习中的基本挑战:全面调查 2505.18475v1 |
Authors: Mengran Li, Pengyu Zhang, Wenbin Xing, Yijia Zheng, Klim Zaporojets, Junzhou Chen, Ronghui Zhang, Yong Zhang, Siyuan Gong, Jia Hu, Xiaolei Ma, Zhiyuan Liu, Paul Groth, Marcel Worring
Graphs are a widely used paradigm for representing non-Euclidean data, with applications ranging from social network analysis to biomolecular prediction. Conventional graph learning approaches typically rely on fixed structural assumptions or fully observed data, limiting their effectiveness in more complex, noisy, or evolving settings. Consequently, real-world graph data often violates the assumptions of traditional graph learning methods, in particular, it leads to four fundamental challenges: (1) Incompleteness, real-world graphs have missing nodes, edges, or attributes; (2) Imbalance, the distribution of the labels of nodes or edges and their structures for real-world graphs are highly skewed; (3) Cross-domain Heterogeneity, graphs from different domains exhibit incompatible feature spaces or structural patterns; and (4) Dynamic Instability, graphs evolve over time in unpredictable ways. Recent advances in Large Language Models (LLMs) offer the potential to tackle these challenges by leveraging rich semantic reasoning and external knowledge. This survey provides a comprehensive review of how LLMs can be integrated with graph learning to address the aforementioned challenges. For each challenge, we review both traditional solutions and modern LLM-driven approaches, highlighting how LLMs contribute unique advantages. Finally, we discuss open research questions and promising future directions in this emerging interdisciplinary field. To support further exploration, we have curated a repository of recent advances on graph learning challenges: https://github.com/limengran98/Awesome-Literature-Graph-Learning-Challenges.
nan
Article 1942
Title@2025-05-24 (6): Performance and Generalizability Impacts of Incorporating Geolocation into Deep Learning for Dynamic PM2.5 Estimation
Title: Performance and Generalizability Impacts of Incorporating Geolocation into Deep Learning for Dynamic PM2.5 Estimation | Leistung und Verallgemeinerbarkeit Auswirkungen der Einbeziehung von Geolocation in Deep Learning für dynamische PM2.5 Abschätzung | 将地理定位纳入深入学习以进行动态PP2.5估算的绩效和通用性影响 2505.18461v1 |
Authors: Morteza Karimzadeh, Zhongying Wang, James L. Crooks
Deep learning models have demonstrated success in geospatial applications, yet quantifying the role of geolocation information in enhancing model performance and geographic generalizability remains underexplored. A new generation of location encoders have emerged with the goal of capturing attributes present at any given location for downstream use in predictive modeling. Being a nascent area of research, their evaluation has remained largely limited to static tasks such as species distributions or average temperature mapping. In this paper, we discuss and quantify the impact of incorporating geolocation into deep learning for a real-world application domain that is characteristically dynamic (with fast temporal change) and spatially heterogeneous at high resolutions: estimating surface-level daily PM2.5 levels using remotely sensed and ground-level data. We build on a recently published deep learning-based PM2.5 estimation model that achieves state-of-the-art performance on data observed in the contiguous United States. We examine three approaches for incorporating geolocation: excluding geolocation as a baseline, using raw geographic coordinates, and leveraging pretrained location encoders. We evaluate each approach under within-region (WR) and out-of-region (OoR) evaluation scenarios. Aggregate performance metrics indicate that while na"ive incorporation of raw geographic coordinates improves within-region performance by retaining the interpolative value of geographic location, it can hinder generalizability across regions. In contrast, pretrained location encoders like GeoCLIP enhance predictive performance and geographic generalizability for both WR and OoR scenarios. However, qualitative analysis reveals artifact patterns caused by high-degree basis functions and sparse upstream samples in certain areas, and ablation results indicate varying performance among location encoders…
nan
Article 1943
Title@2025-05-24 (6): EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks
Title: EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks | EdgeAgentX: Ein neuartiges Framework für Agentische KI am Rand in militärischen Kommunikationsnetzwerken | EdgeAgengengenderX:军事通信网络边缘地带AAA剂性AI新框架 2505.18457v1 |
Authors: Abir Ray
This paper introduces EdgeAgentX, a novel framework integrating federated learning (FL), multi-agent reinforcement learning (MARL), and adversarial defense mechanisms, tailored for military communication networks. EdgeAgentX significantly improves autonomous decision-making, reduces latency, enhances throughput, and robustly withstands adversarial disruptions, as evidenced by comprehensive simulations.
nan
Article 1944
Title@2025-05-24 (6): On the Limitations and Possibilities of Nash Regret Minimization in Zero-Sum Matrix Games under Noisy Feedback
Title: On the Limitations and Possibilities of Nash Regret Minimization in Zero-Sum Matrix Games under Noisy Feedback | Über die Einschränkungen und Möglichkeiten der Nash Regret Minimierung in Zero-Sum Matrix Games unter Noisy Feedback | 根据噪音反馈在零-苏姆母体运动会中尽量减少纳什迟缓的限制和可能性 2306.13233v3 |
Authors: Arnab Maiti, Kevin Jamieson, Lillian J. Ratliff
This paper studies a variant of two-player zero-sum matrix games, where, at each timestep, the row player selects row $i$, the column player selects column $j$, and the row player receives a noisy reward with expected value $A_{i,j}$, along with noisy feedback on the input matrix $A$. The row player’s goal is to maximize their total reward against an adversarial column player. Nash regret, defined as the difference between the player’s total reward and the game’s Nash equilibrium value scaled by the time horizon $T$, is often used to evaluate algorithmic performance in zero-sum games. We begin by studying the limitations of existing algorithms for minimizing Nash regret. We show that standard algorithm–including Hedge, FTRL, and OMD–as well as the strategy of playing the Nash equilibrium of the empirical matrix–all incur $\Omega(\sqrt{T})$ Nash regret, even when the row player receives noisy feedback on the entire matrix $A$. Furthermore, we show that UCB for matrix games, a natural adaptation of the well-known bandit algorithm, also suffers $\Omega(\sqrt{T})$ Nash regret under bandit feedback. Notably, these lower bounds hold even in the simplest case of $2 \times 2$ matrix games, where the instance-dependent matrix parameters are constant. We next ask whether instance-dependent $\text{polylog}(T)$ Nash regret is achievable against adversarial opponents. We answer this affirmatively. In the full-information setting, we present the first algorithm for general $n \times m$ matrix games that achieves instance-dependent $\text{polylog}(T)$ Nash regret. In the bandit feedback setting, we design an algorithm with similar guarantees for the special case of $2 \times 2$ game–the same regime in which existing algorithms provably suffer $\Omega(\sqrt{T})$ regret despite the simplicity of the instance. Finally, we validate our theoretical results with empirical evidence.
nan
Article 1945
Title@2025-05-24 (6): Reinforcement Learning for Stock Transactions
Title: Reinforcement Learning for Stock Transactions | Verstärkungslernen für Aktientransaktionen | 证券交易强化学习 2505.16099v2 |
Authors: Ziyi Zhou, Nicholas Stern, Julien Laasri
Much research has been done to analyze the stock market. After all, if one can determine a pattern in the chaotic frenzy of transactions, then they could make a hefty profit from capitalizing on these insights. As such, the goal of our project was to apply reinforcement learning (RL) to determine the best time to buy a stock within a given time frame. With only a few adjustments, our model can be extended to identify the best time to sell a stock as well. In order to use the format of free, real-world data to train the model, we define our own Markov Decision Process (MDP) problem. These two papers [5] [6] helped us in formulating the state space and the reward system of our MDP problem. We train a series of agents using Q-Learning, Q-Learning with linear function approximation, and deep Q-Learning. In addition, we try to predict the stock prices using machine learning regression and classification models. We then compare our agents to see if they converge on a policy, and if so, which one learned the best policy to maximize profit on the stock market.
nan
Article 1946
Title@2025-05-24 (6): Anchored Diffusion Language Model
Title: Anchored Diffusion Language Model | Verankertes Diffusions-Sprachenmodell | 原成品的传播语言模式 2505.18456v1 |
Authors: Litu Rout, Constantine Caramanis, Sanjay Shakkottai
Diffusion Language Models (DLMs) promise parallel generation and bidirectional context, yet they underperform autoregressive (AR) models in both likelihood modeling and generated text quality. We identify that this performance gap arises when important tokens (e.g., key words or low-frequency words that anchor a sentence) are masked early in the forward process, limiting contextual information for accurate reconstruction. To address this, we introduce the Anchored Diffusion Language Model (ADLM), a novel two-stage framework that first predicts distributions over important tokens via an anchor network, and then predicts the likelihoods of missing tokens conditioned on the anchored predictions. ADLM significantly improves test perplexity on LM1B and OpenWebText, achieving up to 25.4% gains over prior DLMs, and narrows the gap with strong AR baselines. It also achieves state-of-the-art performance in zero-shot generalization across seven benchmarks and surpasses AR models in MAUVE score, which marks the first time a DLM generates better human-like text than an AR model. Theoretically, we derive an Anchored Negative Evidence Lower Bound (ANELBO) objective and show that anchoring improves sample complexity and likelihood modeling. Beyond diffusion, anchoring boosts performance in AR models and enhances reasoning in math and logic tasks, outperforming existing chain-of-thought approaches
nan
Article 1947
Title@2025-05-24 (6): On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Title: On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts | Zur Minimax-Abschätzung von Parametern in Softmax-kontaminierter Mischung von Experten | 关于Softmax 被污染的专家混合体参数最小估计 2505.18455v1 |
Authors: Fanqi Yan, Huy Nguyen, Dung Le, Pedram Akbarian, Nhat Ho, Alessandro Rinaldo
The softmax-contaminated mixture of experts (MoE) model is deployed when a large-scale pre-trained model, which plays the role of a fixed expert, is fine-tuned for learning downstream tasks by including a new contamination part, or prompt, functioning as a new, trainable expert. Despite its popularity and relevance, the theoretical properties of the softmax-contaminated MoE have remained unexplored in the literature. In the paper, we study the convergence rates of the maximum likelihood estimator of gating and prompt parameters in order to gain insights into the statistical properties and potential challenges of fine-tuning with a new prompt. We find that the estimability of these parameters is compromised when the prompt acquires overlapping knowledge with the pre-trained model, in the sense that we make precise by formulating a novel analytic notion of distinguishability. Under distinguishability of the pre-trained and prompt models, we derive minimax optimal estimation rates for all the gating and prompt parameters. By contrast, when the distinguishability condition is violated, these estimation rates become significantly slower due to their dependence on the prompt convergence rate to the pre-trained model. Finally, we empirically corroborate our theoretical findings through several numerical experiments.
nan
Article 1948
Title@2025-05-24 (6): $μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
Title: $μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts | $μ$-MoE: Test-Time Pruning als Mikro-Grained Mixture-of-Experts | 美元-MoE:作为微粒混合剂专家进行试验时休整 2505.18451v1 |
Authors: Toshiaki Koike-Akino, Jing Liu, Ye Wang
To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these rely on calibration data, domain shift may arise for unknown downstream tasks. With a computationally efficient calibration, activation-aware pruning can be executed for every prompt adaptively, yet achieving reduced complexity at inference. We formulate it as a mixture of micro-experts, called $\mu$-MoE. Several experiments demonstrate that $\mu$-MoE can dynamically adapt to task/prompt-dependent structured sparsity on the fly.
nan
Article 1949
Title@2025-05-24 (6): Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting
Title: Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting | Breaking Silos: Adaptive Modellfusion löst bessere Zeitreihen voraus | 破碎硅:适应性模型融合解锁更好的时间序列预测 2505.18442v1 |
Authors: Zhining Liu, Ze Yang, Xiao Lin, Ruizhong Qiu, Tianxin Wei, Yada Zhu, Hendrik Hamann, Jingrui He, Hanghang Tong
Time-series forecasting plays a critical role in many real-world applications. Although increasingly powerful models have been developed and achieved superior results on benchmark datasets, through a fine-grained sample-level inspection, we find that (i) no single model consistently outperforms others across different test samples, but instead (ii) each model excels in specific cases. These findings prompt us to explore how to adaptively leverage the distinct strengths of various forecasting models for different samples. We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models. TimeFuse utilizes meta-features to characterize input time series and trains a learnable fusor to predict optimal model fusion weights for any given input. The fusor can leverage samples from diverse datasets for joint training, allowing it to adapt to a wide variety of temporal patterns and thus generalize to new inputs, even from unseen datasets. Extensive experiments demonstrate the effectiveness of TimeFuse in various long-/short-term forecasting tasks, achieving near-universal improvement over the state-of-the-art individual models. Code is available at https://github.com/ZhiningLiu1998/TimeFuse.
nan
Article 1950
Title@2025-05-24 (6): DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces
Title: DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces | DB-KSVD: Skalierbare alternierende Optimierung für das Entwirren hochdimensionaler Einbettungsräume | DB-KSVD: 拆分高多元嵌入空间的可缩放变换最佳优化 2505.18441v1 |
Authors: Romeo Valentin, Sydney M. Katz, Vincent Vanhoucke, Mykel J. Kochenderfer
Dictionary learning has recently emerged as a promising approach for mechanistic interpretability of large transformer models. Disentangling high-dimensional transformer embeddings, however, requires algorithms that scale to high-dimensional data with large sample sizes. Recent work has explored sparse autoencoders (SAEs) for this problem. However, SAEs use a simple linear encoder to solve the sparse encoding subproblem, which is known to be NP-hard. It is therefore interesting to understand whether this structure is sufficient to find good solutions to the dictionary learning problem or if a more sophisticated algorithm could find better solutions. In this work, we propose Double-Batch KSVD (DB-KSVD), a scalable dictionary learning algorithm that adapts the classic KSVD algorithm. DB-KSVD is informed by the rich theoretical foundations of KSVD but scales to datasets with millions of samples and thousands of dimensions. We demonstrate the efficacy of DB-KSVD by disentangling embeddings of the Gemma-2-2B model and evaluating on six metrics from the SAEBench benchmark, where we achieve competitive results when compared to established approaches based on SAEs. By matching SAE performance with an entirely different optimization approach, our results suggest that (i) SAEs do find strong solutions to the dictionary learning problem and (ii) that traditional optimization approaches can be scaled to the required problem sizes, offering a promising avenue for further research. We provide an implementation of DB-KSVD at https://github.com/RomeoV/KSVD.jl.
nan
Article 1951
Title@2025-05-24 (6): Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning
Title: Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning | Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methoden für dezentralisiertes Mehr-Agenten-Verstärkungs-Lernen | 分散式多机构强化学习的深神经立体-集中式多机构强化学习方法中全球最佳程度趋同 2505.18433v1 |
Authors: Zhiyao Zhang, Myeung Suk Oh, FNU Hairi, Ziyue Luo, Alvaro Velasquez, Jia Liu
Actor-critic methods for decentralized multi-agent reinforcement learning (MARL) facilitate collaborative optimal decision making without centralized coordination, thus enabling a wide range of applications in practice. To date, however, most theoretical convergence studies for existing actor-critic decentralized MARL methods are limited to the guarantee of a stationary solution under the linear function approximation. This leaves a significant gap between the highly successful use of deep neural actor-critic for decentralized MARL in practice and the current theoretical understanding. To bridge this gap, in this paper, we make the first attempt to develop a deep neural actor-critic method for decentralized MARL, where both the actor and critic components are inherently non-linear. We show that our proposed method enjoys a global optimality guarantee with a finite-time convergence rate of O(1/T), where T is the total iteration times. This marks the first global convergence result for deep neural actor-critic methods in the MARL literature. We also conduct extensive numerical experiments, which verify our theoretical results.
nan